You could use non equi join in
library(data.table) df1 <- setDT(df1) df2 <- setDT(df2) df1[,group := 1:.N] df1[df2,on = .(chrom, low < position, high > position)] chrom low high group Gene 1: 1 1200 1200 1 Gene1 2: 1 10000 10000 NA Gene2 3: 5 500 500 3 Gene3 4: 5 560 560 3 Gene4 5: 1 20100 20100 2 Gene5
Here I first set a group for each line of
df1. After the merge, the line is associated to a group if the condition is met.
Non equi merge are not super intuitive, but super powerfull, and explicit: the merging condition
.(chrom, low < position, high > position) is letterally what you explicited (you want same chromosome, and position between low and high).
data.table, when you do
df1[df2,on = something]
df1 with the lines of
df2 meeting the condition expressed by
something is just a common variable of
df2, then it is equivalent to
merge(df1,df2,all.y = T,by = "someting")
something can be a list of variable and conditions between the variables of your two data.tables. Here,
.() indicates a list, and
.(chrom,low < position, high > position) indicate you merge on the variable
chrom (identical between the two data.tables), and
low < position, and
high > position. When you express inequality, you must start with the variable from the main data.table (
df1 here), then the variables of the subsetting data.table (
The output of this non equi merge using inequalities replace the variable expressed in inequalities of the main data.table (i.e.
df1) by the variables of the subsetting data.table (i.e.
df2 here), and so
position. If you want to keep the
high values, you should copy them in an other variable, or merge on a copy of these variables.
You can actually do the opposite merge, wew you subset
df1 entries, with the same condition:
df2[df1,on = .(chrom,position >low , position<high)] Gene chrom position position.1 group 1: Gene1 1 500 1700 1 2: Gene5 1 19500 20600 2 3: Gene3 5 400 1500 3 4: Gene4 5 400 1500 3
Here you subset
df1 with the entries of
df2 meeting the conditions expressed in
on = .(), and obtain the list of
Gene that actually belong to a group (
Gene2 is not here because it does not match the subset).
Similarly to what has been explained above, here
I just saw @DavidArenburg ‘s comment, and it is a more condensed and better version of what I proposed and explained:
df2[, grp := df1[.SD, which = TRUE, on = .(chrom, low <= position, high >= position)]]
directly associate the result of the non equi merge
df1[df2,on = .(chrom, low < position, high > position)] to the group variable, using
which = TRUE, which gives you the line of
df2 which meet the merge condition of
df1[df2 , on =....].
CLICK HERE to find out more related problems solutions.