looping though data frames for na’s for a beginner

You’re better off reading all your data frames into a list to begin with. If all of your data frames are separate objects, you can do:

# Character vector of all objects in the current environment 
# (including all the data frames)
dfs = ls()

# Filter to keep only names of the data frames
dfs = dfs[grep("df_.*", dfs)]

# Add names (so that the elements of the list we create below will
# be named with the name of the source data frame)
names(dfs) = dfs

# Return a list where each element is a data frame.
# In each data frame, all rows with at least one NA will be removed.
df.na.remove = lapply(dfs, function(x) na.omit(get(x)))

# Or this
df.na.remove = lapply(dfs, function(x) {
  d = get(x)
  d[complete.cases(d), ]
})

You now have a list containing all of your data frames, but with rows removed if they had any NA values.

If you want to remove the original data frames from the global environment, you can do:

rm(list=dfs)

If you want to read all of the data into a list to begin with, here’s some code for that. The examples below switch to tidyverse functions.

library(tidyverse)

# Save two data frames, just to have something to work with
write_csv(mtcars[1:5, ], "df_1.csv")
write_csv(mtcars[6:10, ], "df_2.csv")

# Create a character vector with names of our data files
f = list.files(pattern="df_.*") %>% set_names()

# Read each data frame into a single list
d1 = map(f, read_csv)

# Remove NA values
d1 = map(d1, na.omit)

As another option, you can read in all of the data files, remove any row with at least one NA value, and stack all the data frames into a single data frame, all in one operation:

d = map_df(f, ~ {
  x = read_csv(.x)
  na.omit(x)
  }, .id="source")

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top