# how can i combine rows in a range and consider a 3rd column?

You could use non equi join in `data.table`:

``````library(data.table)
df1 <- setDT(df1)
df2 <- setDT(df2)

df1[,group := 1:.N]
df1[df2,on = .(chrom, low < position, high > position)]

chrom   low  high group  Gene
1:     1  1200  1200     1 Gene1
2:     1 10000 10000    NA Gene2
3:     5   500   500     3 Gene3
4:     5   560   560     3 Gene4
5:     1 20100 20100     2 Gene5
``````

Here I first set a group for each line of `df1`. After the merge, the line is associated to a group if the condition is met.

Non equi merge are not super intuitive, but super powerfull, and explicit: the merging condition `.(chrom, low < position, high > position)` is letterally what you explicited (you want same chromosome, and position between low and high).

In `data.table`, when you do

``````df1[df2,on = something]
``````

you subset `df1` with the lines of `df2` meeting the condition expressed by `on`. If `something` is just a common variable of `df1` and `df2`, then it is equivalent to

``````merge(df1,df2,all.y = T,by = "someting")
``````

But `something` can be a list of variable and conditions between the variables of your two data.tables. Here, `.()` indicates a list, and `.(chrom,low < position, high > position)` indicate you merge on the variable `chrom` (identical between the two data.tables), and `low < position`, and `high > position`. When you express inequality, you must start with the variable from the main data.table (`df1` here), then the variables of the subsetting data.table (`df2`).

The output of this non equi merge using inequalities replace the variable expressed in inequalities of the main data.table (i.e. `df1`) by the variables of the subsetting data.table (i.e. `df2` here), and so `low` and `high` become `position`. If you want to keep the `low` and `high` values, you should copy them in an other variable, or merge on a copy of these variables.

You can actually do the opposite merge, wew you subset `df2` by `df1` entries, with the same condition:

``````df2[df1,on = .(chrom,position >low , position<high)]

Gene chrom position position.1 group
1: Gene1     1      500       1700     1
2: Gene5     1    19500      20600     2
3: Gene3     5      400       1500     3
4: Gene4     5      400       1500     3
``````

Here you subset `df1` with the entries of `df2` meeting the conditions expressed in `on = .()`, and obtain the list of `Gene` that actually belong to a group (`Gene2` is not here because it does not match the subset).

Similarly to what has been explained above, here `position` become `low` and `high`

## Edit

I just saw @DavidArenburg ‘s comment, and it is a more condensed and better version of what I proposed and explained:

``````df2[, grp := df1[.SD, which = TRUE, on = .(chrom, low <= position, high >= position)]]
``````

directly associate the result of the non equi merge `df1[df2,on = .(chrom, low < position, high > position)]` to the group variable, using `which = TRUE`, which gives you the line of `df2` which meet the merge condition of `df1[df2 , on =....]`.

CLICK HERE to find out more related problems solutions.

Scroll to Top