The Example you have provided is having the different output than you mentioned in statements. For eg.) The customer id "abc" in "MX" country is actually having hotel bookings on 1st March and he also have ticket booking on 4th March as per Airline data. Which is actually falling in (+/-) 30 days range.

By understanding your statements, I have made below example.

Airline Data

airline_data = pd.DataFrame({"Customer_Id":["xyz", "abc", "xyz"], "Country":["US", "MX", "CA"],"Date_Of_Travel":["10-20-2018", "03-04-2018", "04-05-2019"]})

print(airline_data)

Customer_Id Country Date_Of_Travel
xyz         US      10-20-2018
abc         MX      03-04-2018
xyz         CA      04-05-2019

Hotel Data

Hotel_data = pd.DataFrame({"Customer_Id":["xyz","xyz", "xyz","abc", "abc"], "Country":["CA","US","US","PH","MX"],"Date_of_Booking":["03-30-2019", "05-01-2018", "10-15-2018", "10-20-2018", "03-01-2018"]})

print(Hotel_data)

Customer_Id Country Date_of_Booking
xyz         CA      03-30-2019
xyz         US      05-01-2018
xyz         US      10-15-2018
abc         PH      10-20-2018
abc         MX      03-01-2018

Convert the Date column from string to Datetime object

airline_data["Date_Of_Travel"] = pd.to_datetime(airline_data["Date_Of_Travel"])
Hotel_data["Date_of_Booking"] = pd.to_datetime(Hotel_data["Date_of_Booking"])

Create two columns for Minimum date and Maximum date which is (+/-) 30 days.

airline_data["Min_date"] = (airline_data["Date_Of_Travel"]) - pd.Timedelta(days=30)
airline_data["Max_date"] = (airline_data["Date_Of_Travel"]) + pd.Timedelta(days=30)

Join two dataframes based on the Customer_id, Country to get the required data

df_1 = pd.merge(airline_data, Hotel_data, on=["Customer_Id", "Country"],how="left")

print(df_1)


Customer_Id Country Date_Of_Travel  Min_date    Max_date    Date_of_Booking
xyz         US      2018-10-20      2018-09-20  2018-11-19  2018-05-01
xyz         US      2018-10-20      2018-09-20  2018-11-19  2018-10-15
abc         MX      2018-03-04      2018-02-02  2018-04-03  2018-03-01
xyz         CA      2019-04-05      2019-03-06  2019-05-05  2019-03-30

Create a column which is satisfying the condition of Booking is +/- 30 days from Travel date.

df_1["Bool"] = (df_1.Date_of_Booking >= df_1.Min_date) & (df_1.Date_of_Booking <= df_1.Max_date)

df_1["Bool"] = df_1["Bool"].apply(int)

print(df_1)


Customer_Id Country Date_Of_Travel  Min_date    Max_date    Date_of_Booking Bool
xyz         US      2018-10-20      2018-09-20  2018-11-19  2018-05-01      0
xyz         US      2018-10-20      2018-09-20  2018-11-19  2018-10-15      1
abc         MX      2018-03-04      2018-02-02  2018-04-03  2018-03-01      1
xyz         CA      2019-04-05      2019-03-06  2019-05-05  2019-03-30      1

Now, apply groupby on the df_1 and get the sum of the Bool to get the total bookings of Each customer in particular country.

Req_Results = df_1.groupby(["Customer_Id","Country","Date_Of_Travel","Date_of_Booking"]).sum().reset_index()

Req_Results = Req_Results[Req_Results.Bool!=0]

Req_Results.rename(columns={"Bool":"Total_Hotel_Bookings"},inplace=True)

print(Req_Results)


Customer_Id Country Date_Of_Travel  Date_of_Booking   Total_Hotel_Bookings
abc        MX       2018-03-04      2018-03-01           1
xyz        CA       2019-04-05      2019-03-30           1
xyz        US       2018-10-20      2018-10-15           1

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top