Drop lines in a long dataset by group, based on some condition

We can use the tidyverse package to do a few manipulations to get the result. We group_by Country, and sort descending by Date. After that, we generate row_numbers. Finally, we filter based on the condition you described:

library(tidyverse)

df %>%
    group_by(Country) %>%
    arrange(desc(Date)) %>%
    mutate(rn = row_number()) %>%
    filter(!(Value_A == 0 & rn <= 2))

#   Date       Country Value_A    rn
# 1 2020-10-03 Mexico       34     2
# 2 2020-10-03 Japan        27     2
# 3 2020-10-02 USA          40     3
# 4 2020-10-02 Mexico       29     3
# 5 2020-10-02 Japan        25     3
# 6 2020-10-01 USA           0     4
# 7 2020-10-01 Mexico       25     4
# 8 2020-10-01 Japan        20     4

Another method would be to use rank(desc(Date))

library(tidyverse)
df %>%
    group_by(Country) %>%
    mutate(rank_date = rank(desc(Date))) %>%
    filter(!(rank_date <= 2 & Value_A == 0))

#   Date       Country Value_A rank_date
# 1 2020-10-01 USA           0         4
# 2 2020-10-02 USA          40         3
# 3 2020-10-01 Mexico       25         4
# 4 2020-10-02 Mexico       29         3
# 5 2020-10-03 Mexico       34         2
# 6 2020-10-01 Japan        20         4
# 7 2020-10-02 Japan        25         3
# 8 2020-10-03 Japan        27         2

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top