how can i generate patterns from patterns in dataframes?

The approach I suggest you is with purrr.

  1. Make each row of the dataframe as one string
  2. Detect in each row if one of the words in df_genes$Genes is there
  3. Wrap up the result
library(stringr)
library(purrr)

rows <- pmap(df, str_c, sep = " ") %>% 
  map(str_detect, paste0('\\b', df_genes$Genes, '\\b')) %>% 
  map_lgl(any)
df[rows,]
#>   Study_ID              Title       Drug
#> 1        1     Study of Gene1 Gene1-drug
#> 3        3 Study of something Gene4-drug

The paste0 + \\b idea comes from this great answer


INPUT DATA:

df_genes <- data.frame(Genes = c("Gene1",
                                 "Gene2",
                                 "Gene3",
                                 "Gene4",
                                 "Gene5"))

df <- data.frame(Study_ID = 1:3,
                 Title = c("Study of Gene1",
                           "Study of Gene10",
                           "Study of something"),
                 Drug = c("Gene1-drug",
                          "Gene10-drug",
                          "Gene4-drug"))

Check which genes were found in each row with this:

pmap(df, str_c, sep = " ") %>% 
  map(str_detect, paste0('\\b', df_genes$Genes, '\\b')) %>% 
  map(~keep(df_genes$Genes, .))
#> [[1]]
#> [1] "Gene1"
#> 
#> [[2]]
#> character(0)
#> 
#> [[3]]
#> [1] "Gene4"

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top