select rows based on multiple conditions from two independent databases

You can try this approach :

library(dplyr)

left_join(data1, data2, by = 'ID') %>%
  group_by(ID, Eventdate.x) %>%
  summarise(Eventdate = Eventdate.y[Eventdate.y >= Eventdate.x][1], 
            Eventcode = {
              inds <- Eventdate.y >= Eventdate.x
              val <- sum(inds, na.rm = TRUE)
              if(val == 1) Eventcode[inds]
              else if(val > 1) sample(Eventcode[inds], 1)
              else NA_real_
              })

#    ID Eventdate.x Eventdate  Eventcode
#  <dbl> <chr>       <chr>          <dbl>
#1     1 2019-01-01  2019-01-01       201
#2     2 2019-02-01  2019-02-11       201
#3     3 2019-03-01  2019-03-01       205
#4     4 2019-04-01  NA                NA
#5     5 2019-05-01  NA                NA
#6     6 2019-06-01  NA                NA

The complicated logic in Eventcode data is for randomness, if you are ok selecting the 1st value like Eventdate you can simplify it to :

left_join(data1, data2, by = 'ID') %>%
  group_by(ID, Eventdate.x) %>%
  summarise(Eventdate = Eventdate.y[Eventdate.y >= Eventdate.x][1], 
            Eventcode = Eventcode[Eventdate.y >= Eventdate.x][1])

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top