Loop to plot boxplot with ggplot

Instead of multiple plots, I suggest facets. To do this, though, we need to convert the data from “wide” format to “longer” format, and the canonical way in the tidyverse is with tidyr::pivot_longer.

> basePlot
# A tibble: 53,940 x 8
   carat cut       depth table price     x     y     z
   <dbl> <ord>     <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1 0.23  Ideal      61.5    55   326  3.95  3.98  2.43
 2 0.21  Premium    59.8    61   326  3.89  3.84  2.31
 3 0.23  Good       56.9    65   327  4.05  4.07  2.31
 4 0.290 Premium    62.4    58   334  4.2   4.23  2.63
 5 0.31  Good       63.3    58   335  4.34  4.35  2.75
 6 0.24  Very Good  62.8    57   336  3.94  3.96  2.48
 7 0.24  Very Good  62.3    57   336  3.95  3.98  2.47
 8 0.26  Very Good  61.9    55   337  4.07  4.11  2.53
 9 0.22  Fair       65.1    61   337  3.87  3.78  2.49
10 0.23  Very Good  59.4    61   338  4     4.05  2.39
# ... with 53,930 more rows
> pivot_longer(basePlot, -cut, names_to="var", values_to="val")
# A tibble: 377,580 x 3
   cut     var      val
   <ord>   <chr>  <dbl>
 1 Ideal   carat   0.23
 2 Ideal   depth  61.5 
 3 Ideal   table  55   
 4 Ideal   price 326   
 5 Ideal   x       3.95
 6 Ideal   y       3.98
 7 Ideal   z       2.43
 8 Premium carat   0.21
 9 Premium depth  59.8 
10 Premium table  61   
# ... with 377,570 more rows

With this, we only have to tell ggplot2 to worry about val for the values, and var for the x-axis.

library(ggplot2)
library(tidyr) # pivot_longer

ggplot(pivot_longer(basePlot, -cut, names_to="var", values_to="val"),
       aes(cut, val, color=cut)) +
  geom_boxplot(outlier.colour="black", outlier.shape=16, outlier.size=1, notch=FALSE) +
  xlab("Diamond Cut") +
  facet_wrap(~var, nrow=2, scales="free") +
  scale_x_discrete(guide=guide_axis(n.dodge=2))

ggplot2, faceted boxplots

The reason you have cut both in the x-axis and in the legend is because color= will add the legend. Since it’s redundant, we could either remove the color aesthetic (which would also remove the legend) or we could just suppress the legend (by adding + scale_color_discrete(guide=FALSE)).

There are two ways of faceting: facet_wrap and facet_grid. The latter is well tuned for multiple variables (one facet variable on the x, one on the y) and many other configurations. Granted, you can use facet_grid with just one variable (which is similar to facet_wrap(nrow=1) or ncol=1), but there are some styling distinctions between them.

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top