Quick ggplot2 Tip: Creating Duplicate Legends

January 21, 2020

Mastering the R package ggplot2

The ggplot2 package provides powerful methods to display data as graphics. The beauty of the package lies in it’s simplicity - understanding the core methods (applying variables to aesthetics and transformations) covers ~95% of static visualizations a data visualization developer might be interested in generating. Most of final 5% can be achieved by understanding the infrastructure of the package. One such example is how plot components are “written” to the graphics device.

I’ll walk through how legends are generated and how we can create a second duplicate legend to bookend the top and bottom of a long bar plot. I I contributed a method to solve this problem on stackoverflow and wanted to get into some further details in this post.

Let’s walk through creating a second legend in ggplot2.

Creating Dual Legends in ggplot2 - Libraries and Data Reshaping

The midwest data set from the ggplot2 package contains demographic information of midwest counties and should work as a representative dummy data set for this post. Here is a quick overview of the midwest data set using glimpse:

R Libraries, Midwest data set overview

# load libraries and fonts
library(ggplot2)
library(scales)
library(dplyr)
library(grid)
library(extrafont)
loadfonts(quiet = TRUE)
glimpse(midwest)
## Observations: 437
## Variables: 28
## $ PID                  <int> 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 5...
## $ county               <chr> "ADAMS", "ALEXANDER", "BOND", "BOONE", "BROWN", "BUREAU", "CALHOUN", "CARROLL", "CASS", "CHAMPAIGN", "CHRISTIAN", "C...
## $ state                <chr> "IL", "IL", "IL", "IL", "IL", "IL", "IL", "IL", "IL", "IL", "IL", "IL", "IL", "IL", "IL", "IL", "IL", "IL", "IL", "I...
## $ area                 <dbl> 0.052, 0.014, 0.022, 0.017, 0.018, 0.050, 0.017, 0.027, 0.024, 0.058, 0.042, 0.030, 0.028, 0.029, 0.030, 0.058, 0.02...
## $ poptotal             <int> 66090, 10626, 14991, 30806, 5836, 35688, 5322, 16805, 13437, 173025, 34418, 15921, 14460, 33944, 51644, 5105067, 194...
## $ popdensity           <dbl> 1270.9615, 759.0000, 681.4091, 1812.1176, 324.2222, 713.7600, 313.0588, 622.4074, 559.8750, 2983.1897, 819.4762, 530...
## $ popwhite             <int> 63917, 7054, 14477, 29344, 5264, 35157, 5298, 16519, 13384, 146506, 34176, 15842, 14403, 32688, 50177, 3204947, 1930...
## $ popblack             <int> 1702, 3496, 429, 127, 547, 50, 1, 111, 16, 16559, 82, 10, 4, 1021, 925, 1317147, 63, 5, 2069, 25, 16, 15462, 68, 6, ...
## $ popamerindian        <int> 98, 19, 35, 46, 14, 65, 8, 30, 8, 331, 51, 26, 17, 48, 92, 10289, 34, 6, 123, 37, 19, 962, 24, 8, 45, 40, 14, 106, 8...
## $ popasian             <int> 249, 48, 16, 150, 5, 195, 15, 61, 23, 8033, 89, 36, 29, 104, 341, 188565, 48, 26, 1751, 43, 41, 39634, 24, 19, 95, 3...
## $ popother             <int> 124, 9, 34, 1139, 6, 221, 0, 84, 6, 1596, 20, 7, 7, 83, 109, 384119, 19, 6, 1021, 24, 108, 10703, 10, 6, 29, 71, 21,...
## $ percwhite            <dbl> 96.71206, 66.38434, 96.57128, 95.25417, 90.19877, 98.51210, 99.54904, 98.29813, 99.60557, 84.67331, 99.29688, 99.503...
## $ percblack            <dbl> 2.57527614, 32.90043290, 2.86171703, 0.41225735, 9.37285812, 0.14010312, 0.01878993, 0.66051770, 0.11907420, 9.57029...
## $ percamerindan        <dbl> 0.14828264, 0.17880670, 0.23347342, 0.14932156, 0.23989034, 0.18213405, 0.15031943, 0.17851830, 0.05953710, 0.191301...
## $ percasian            <dbl> 0.37675897, 0.45172219, 0.10673071, 0.48691813, 0.08567512, 0.54640215, 0.28184893, 0.36298721, 0.17116916, 4.642681...
## $ percother            <dbl> 0.18762294, 0.08469791, 0.22680275, 3.69733169, 0.10281014, 0.61925577, 0.00000000, 0.49985123, 0.04465282, 0.922410...
## $ popadults            <int> 43298, 6724, 9669, 19272, 3979, 23444, 3583, 11323, 8825, 95971, 22945, 10734, 9647, 21563, 29136, 3291995, 13317, 6...
## $ perchsd              <dbl> 75.10740, 59.72635, 69.33499, 75.47219, 68.86152, 76.62941, 62.82445, 75.95160, 72.27195, 87.49935, 73.07474, 71.334...
## $ percollege           <dbl> 19.63139, 11.24331, 17.03382, 17.27895, 14.47600, 18.90462, 11.91739, 16.19712, 14.10765, 41.29581, 13.56723, 15.110...
## $ percprof             <dbl> 4.355859, 2.870315, 4.488572, 4.197800, 3.367680, 3.275891, 3.209601, 3.055727, 3.206799, 17.757448, 3.089998, 2.776...
## $ poppovertyknown      <int> 63628, 10529, 14235, 30337, 4815, 35107, 5241, 16455, 13081, 154934, 33788, 15615, 14248, 32190, 45693, 5023523, 191...
## $ percpovertyknown     <dbl> 96.27478, 99.08714, 94.95697, 98.47757, 82.50514, 98.37200, 98.47802, 97.91729, 97.35060, 89.54429, 98.16956, 98.078...
## $ percbelowpoverty     <dbl> 13.151443, 32.244278, 12.068844, 7.209019, 13.520249, 10.399635, 15.149781, 11.710726, 13.875086, 15.572437, 11.7082...
## $ percchildbelowpovert <dbl> 18.011717, 45.826514, 14.036061, 11.179536, 13.022889, 14.158819, 13.787761, 17.225462, 17.994784, 14.132234, 16.320...
## $ percadultpoverty     <dbl> 11.009776, 27.385647, 10.852090, 5.536013, 11.143211, 8.179287, 12.932331, 10.027037, 11.914343, 17.562728, 9.569700...
## $ percelderlypoverty   <dbl> 12.443812, 25.228976, 12.697410, 6.217047, 19.200000, 11.008586, 21.085271, 9.525052, 13.660180, 8.105017, 11.490641...
## $ inmetro              <int> 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,...
## $ category             <chr> "AAR", "LHR", "AAR", "ALU", "AAR", "AAR", "LAR", "AAR", "AAR", "HAU", "AAR", "AAR", "LAR", "LAU", "AAR", "AAU", "AAR...

I’ll subset the data a little bit to accomodate the plot and organize the display order of counties by their total population. The theme_white block of code adds some basic aesthetics and fonts.

R Libraries, Data Manipulation, and Plot Generation

# load libraries and fonts
library(ggplot2)
library(scales)
library(grid)
library(extrafont)
loadfonts()


# data subset and refactoring
midwest <- midwest[!duplicated(midwest$county), ]
midwestAAU <- midwest %>% filter(category == "AAU") %>% arrange(poptotal)
midwestAAU$county <- factor(midwestAAU$county,levels = midwestAAU$county,labels = toupper(midwestAAU$county))

gg <- ggplot(midwestAAU) + geom_bar(aes(y = poptotal, x = county, fill = state), stat = "identity") + 
    scale_y_continuous(expand = expand_scale(mult = c(0, .1)), labels = comma) + coord_flip() + theme_minimal() + labs(title = "Random Midwest Counties Arranged by Total Population",y = "Total Population", x  = "County", fill = "State") + scale_fill_brewer(palette = "Set3")

# plot aesthetics
theme_white <- theme(text = element_text(family="Open Sans"),
                     panel.grid.major.y=element_blank(),
                     panel.grid.major.x=element_blank(),
                     panel.grid.minor.x=element_blank(),
                     panel.grid.minor.y=element_blank(),
                     plot.title=element_text(size=24,family = "Open Sans",lineheight=.75),
                     axis.title.x=element_text(size=20, family = "Open Sans Semibold"),
                     axis.title.y=element_text(size=20,family = "Open Sans Semibold"),
                     axis.text.x=element_text(size=12),
                     axis.text.y=element_text(size=12),
                     axis.ticks = element_blank(),
                     legend.position = "bottom",
                     legend.margin = margin(b = 0)
)

# apply theme and export plot
gg <- gg + theme_white
ggsave(gg, filename = "midwestPlot.png",height = 12, width = 12, dpi = 300, units = "in", device='png')

Original Plot - Random Midwest Counties Arranged by Total Population

Original Plot - Random Midwest Counties Arranged by Total Population

Now that we’ve generated our plot we can focus on creating the second legend.

Creating a Second Legend in ggplot2

So far we’ve covered ggplot2 functionalities that should create the ~95% of plots I discussed earlier. To expand upon these, let’s get into some ggplot2 internals. The function ggplotGrob allows us to parse our saved gg graphical object. This object can be manipulated to override default ggplot2 conventions or provide methods to hack our plot in ways that the package isn’t designed for intentionally (i.e. where there isn’t a compiled function.)

The createTopLegend function below easily duplicates a bottom legend at the top of the plot by:

1. Grabbing the ggplot graphical object
2. Retrieving the legendGrob within the ggplot object
3. Duplicating the legendGrob layout
4. Specifying the location of the new legendGrob
5. Appending the new legendGrob to the ggplot object

createTopLegend Function

createTopLegend <- function(ggplot, heightFromTop = 1) {
  # grab the saved ggplot2 object
  g <- ggplotGrob(ggplot)
  
  # count the number of grobs in this plot (which we'll use to append another)
  nGrobs <- (length(g$grobs))
  
  # find the guide-box object which provides the plot information for the legend
  legendGrob <- which(g$layout$name == "guide-box")

  # duplicate the legend's grob and layout
  g$grobs[[nGrobs+ 1]] <- g$grobs[[legendGrob]]
  g$layout[nGrobs+ 1,] <- g$layout[legendGrob,]

  # g$layout$t <- ifelse( g$layout$t > heightFromTop, g$layout$t + 1, g$layout$t)
    
  # retrieve the alignment of the legend
  rightLeft <- unname(unlist(g$layout[legendGrob, c(2,4)]))
  
  # specify the location of the new legendGrob (t,r,b,l)
  # use the heightFromTop argument to adjust the vertical positioning
  g$layout[nGrobs+ 1, 1:4] <- c(heightFromTop, rightLeft[1], heightFromTop, rightLeft[2])
  g
}

We can then apply the createTopLegend function on our saved ggplot2 object gg and redraw our plot with grid.draw:

Plot with Duplicate Legend…Overlapping

gg2 <- createTopLegend(gg, 3)
ggsave(grid.draw(gg2), filename = "midwestPlot2.png",height = 12, width = 12, dpi = 300, units = "in", device='png')

Overlapping Title/Top Legend - Random Midwest Counties Arranged by Total Population

Overlapping Title/Top Legend - Random Midwest Counties Arranged by Total Population

You’ll notice that our top legend now overlaps with the positioning of the title. To remedy this we can add some additional margins from within theme_white. We’ll add a bottom margin to the title to add spacing, a bottom margin to the legends, and a negative margin to the bottom of the plot. Each of these margins work in tandem so the negative plot margin is necessary to account for the extra spacing we’re adding to the top legend for the plot to be appropriately spaced.

Fiddling with Title, Legend, and Plot Margins to Accommodate for the Top Legend

# plot aesthetics
theme_white <- theme_white + theme(
                     plot.title=element_text(size=24,family = "Open Sans",lineheight=.75, margin = margin(b = 40)),
                     legend.margin = margin(b = 40),
                     plot.margin = margin(t = 10, r = 10, b = -30, l = 10)
                     
                     )
gg <- gg + theme_white

gg2 <- createTopLegend(gg, heightFromTop = 4)
ggsave(grid.draw(gg2), filename = "midwestPlotLegend.png",height = 12, width = 12, dpi = 300, units = "in", device='png')


Final Product - Random Midwest Counties Arranged by Total Population

Final Product - Random Midwest Counties Arranged by Total Population


Hope you’ve found this useful! Feel free to reach out to me on twitter with any questions or feedback - https://twitter.com/mikeleeco