Understand the logical query processing when ‘Between’ condition is used in a self join of a SQL query

Once the cartesian product has been performed, every value of r1.date will be compared with the range of r2.date defined based on the condition you have provided (o1.order_date BETWEEN o2.order_date AND o2.order_date + 2). Remember for every value of o2.order_date, this date range will be redefined.

Example: When o1.order_date=’2020-10-01′:

  • It will compare if o1.order_date lies within o2.order_date range between ‘2020-10-01’ and ‘2020-10-03’, the condition evaluates to True, and this row is selected from the cartesian product.
  • Next time, o2.order_date range becomes ‘2020-10-02’ and ‘2020-10-04′, now order_date=’2020-10-01′ doesn’t lie within this range and hence this condition evaluates to false. Therefore, only 1 row (mentioned in previous step) from the cartesian product is selected for o1.order_date=’2020-10-01’.

The above steps are repeated unless all the rows in your cartesian product have been evaluated, and only the ones that satisfy the given date range condition will be selected to go in the group by clause for the aggregation of revenue.

Based on the above steps, following rows will be selected to go to the group-by clause:

o1.order_date | o2.order_date | o2.revenue
-------------------------------------------
2020-10-01    | 2020-10-01    | 10  
2020-10-02    | 2020-10-01    | 10  
2020-10-02    | 2020-10-02    | 5
2020-10-03    | 2020-10-01    | 10  
2020-10-03    | 2020-10-02    | 5
2020-10-03    | 2020-10-03    | 10
...

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top