how can i avoid using limit queries in sql in redshift?

So 2 removals of LIMIT requested. The first, in the CTE, can be replaced by adding a WHERE clause in the outer select – “WHERE dates.week_info > 8 weeks ago” (I’ll leave it to you to define 8 weeks ago. Also there are more efficient ways to make 8 dates than using a window function and scanning an unneeded table but that is your choice. Changing this will remove the LIMIT / WHERE need all together. Your CTE then looks something like:

select date_trunc('week', getdate() + INTERVAL '1 day')::DATE -  (t.num * 7) - 1 as week_info
from (select 0 union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7) as t (num)

The second LIMIT is coming about because of the inequality in the JOIN clause which is causing a lot of row replication – I hope this is really what you need. There will only be 8 dates coming from the CTE AND having a GROUP BY on this date means that there will only be 8 rows of output. If there are only 8 possible rows there is no reason to have a LIMIT.

EDIT – merged code (untested):

WITH dates
AS (
    select date_trunc('week', getdate() + INTERVAL '1 day')::DATE -  (t.num * 7) - 1 as week_info
from (select 0 union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7) as t (num)
    )
SELECT dates.week_info
    ,'W' || ceiling(date_part('week', dates.week_info + INTERVAL '1 day')) AS week_number
    ,COUNT(DISTINCT zeus.client_id) AS PROC
FROM data.active_values zeus
JOIN dates ON zeus.updated_timestamp <= dates.week_info
WHERE zeus.kites_version = (
        SELECT MAX(kites_version)
        FROM data.active_values f2
        WHERE zeus.client_id = f2.client_id
            AND zeus.type = f2.type
            AND f2.updated_timestamp <= dates.week_info
        )
    AND zeus.type = 'hello-world'
    AND zeus.STATUS = 'CURRENT'
GROUP BY dates.week_info
ORDER BY dates.week_info DESC

EDIT 2 – attempt to address correlated subquery issue:

If I understand correctly the where clause in question is just trying to ensure that only client_ids with values that match on kite_version are counted. A more direct (and less error prone) way to get this is to calculate the subgroup max directly. The below code attempts to do this but I don’t have your data nor your business intent so this is an example of a better way to attack this type of requirement.

WITH dates
AS (
    select date_trunc('week', getdate() + INTERVAL '1 day')::DATE -  (t.num * 7) - 1 as week_info
from (select 0 union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7) as t (num)
    ),
active_values_plus AS (
SELECT client_id, updated_timestamp, type, status, kites_version, MAX(kites_version) OVER (PARTITION BY client_id, type) AS max_kites_version
        FROM data.active_values
)
SELECT dates.week_info
    ,'W' || ceiling(date_part('week', dates.week_info + INTERVAL '1 day')) AS week_number
    ,COUNT(DISTINCT zeus.client_id) AS PROC
FROM active_values_plus zeus
JOIN dates ON zeus.updated_timestamp <= dates.week_info
WHERE zeus.kites_version = zeus.max_kites_version
    AND zeus.type = 'hello-world'
    AND zeus.STATUS = 'CURRENT'
GROUP BY dates.week_info
ORDER BY dates.week_info DESC

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top