updating data.table in a for loop and with a grouping variable

It may be better to use .I and .SDcols.

dt[dt[, .I[.SD[[1]] > 0], .SDcols = varName], (newVarName) := .SD[[1]], 
       .SDcols = varName]

In the third expression, error occured because it is trying to subset the column from the whole dataset where the length is different. Instead, we could use .SD

dt[dt[[varName]]>0, (newVarName):= .SD[[varName]]]

Benchmarks

set.seed(24)
dt <- data.table(education = sample(0:50, 682446, replace = TRUE))
dt1 <- copy(dt)

varName <- 'education'
newVarName <- paste0(varName, 'NewVersion')

system.time(dt[dt[[varName]]>0, (newVarName):= .SD[[varName]]])
#   user  system elapsed 
#  0.022   0.003   0.026 

system.time(  dt1[dt1[, .I[.SD[[1]] > 0], .SDcols = varName],
     (newVarName) := .SD[[1]], 
            .SDcols = varName])
#   user  system elapsed 
#  0.023   0.003   0.024 

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top