0

I'm hoping to have a legend that includes references to all colours, not just the vertical lines, and does not include a title.

I've tried scale_colour_manual and scale_fill_manual and they all either overlap or only show the vertical lines. I would appreciate any suggestions.

Reprex is below, including the custom colour palette.

var1 <- c(head(randu$x,n=12))
var2 <- as.Date(c("2010-01-01","2010-02-01","2010-03-01","2010-04-01","2010-05-01","2010-06-01","2010-07-01","2010-08-01","2010-09-01","2010-10-01","2010-11-01","2010-12-01"))
var3 <- c(tail(randu[which(randu$x + randu$y < 1),]$x,n=12))
var4 <- c(tail(randu[which(randu$x + randu$y < 1),]$y,n=12))

dat <- data.frame(var1,var2,var3,var4)
setDT(dat)
dat$var5 <- dat[,(var3+var4)]

new_dates <- as.Date(c("2010-09-01","2010-05-01"))

cbp2 <- c("#000000", "#56B4E9", "#009E73", "#0072B2", "#D55E00", "#CC79A7")

ggplot()+
  geom_bar(data=dat,colour=cbp2[1],fill = cbp2[1],aes(x=var2,y=var5,colour="var4"),stat="identity")+
  geom_bar(data=dat,colour=cbp2[2],fill = cbp2[2],aes(x=var2,y=var3,colour="var3"),stat="identity")+
  geom_line(data=dat,colour=cbp2[1],aes(x=var2,y=var1))+
  geom_vline(data=data.frame(xintercept = new_dates),
             aes(xintercept = new_dates,linetype = "Changes", colour="red"),
             linetype="dashed",key_glyph = "path")+
  scale_color_manual(name = "",
                     values = c("red",cbp2[2],cbp2[1]), 
                     breaks = c("red",cbp2[2],cbp2[1]),
                     labels = c("Changes","Var3","Var4"))+
  scale_fill_manual(name = "",
                    values = c(cbp2[2],cbp2[1]), 
                    breaks = c(cbp2[2],cbp2[1]),
                    labels = c("var3","var4"))+
  ylab("")+
  xlab("")+
  scale_x_date(expand=c(0,0),date_breaks = "3 month", date_labels =  "%b %y") + 
  scale_y_continuous(labels = function(var5) paste0(var5*100, "%"), 
                     limits=c(0,1),
                     breaks=c(0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1)) +
  theme(panel.background = element_blank(),
        axis.line = element_line(colour = "#000000"),
        axis.text.x = element_text(angle=60, hjust=1),  
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.title.x= (element_text(margin = unit(c(3, 0, 0, 0), "mm"))),
        legend.position = "top")

enter image description here

3
  • 2
    So what exactly should be in this legend? How many groups/colors?
    – MrFlick
    Commented Sep 1, 2020 at 4:41
  • Four groups - two lines of black and red and two colour blocks with black and blue. They each will need to be named and the names are not all the same as the variable names. Commented Sep 1, 2020 at 5:00
  • It should be very similar to the above but with the additional variables. Commented Sep 1, 2020 at 5:02

1 Answer 1

2

There's quite a lot to unpack here with this one, but I gave it my best shot.

First of all, consider what you are trying to plot here. Normally, it's not a problem to call things var1, var2, var3,...; however, in this context it's really quite confusing. Consequently, for this solution, I will be re-posting your entire code reworked instead of just the plotting portion for reasons I hope to outline in this answer.

The Data and the Question

With all that being said, here is my understanding about the nature of the dataset and your desire for the final plot:

  • var2 in the dataset contains Date class information, and this is the common x axis for the entire plot.

  • var1 contains values that are to be used for the y values of the geom_line plot layer

  • var3 and var4 contain values that are to be used for creation of the stacked barplot which should make up the background of the plot

  • var5 is a sum of var3 + var4, and was a device to create the plot. Herein, it will not be useful, given the data analysis we are to do on the dataset and the application of Tidy Data principles.

  • xintercept Values for the geom_vline plot layer are supplied as the two dates new_dates

The OP's question indicates a need for the Legend to be displayed correctly. In this case, we want to indicate:

  • fill color of the bars as var3 and var4
  • the nature of the vertical lines as dashed red lines.. called "Changes"
  • A label for the geom_line plot layer. Assume the label will be var1.

Hope all that was correct!

Synthesizing the Dataset

I encourage the OP to consult use of Tidy Data Principles, which will make synthesis of data such as this much more straightforward in the future. Herein, I will apply these principles to the dataset dat.

First of all, let's handle the bar layer data. Applying Tidy Data principles, we would want to gather together var3 and var4 and create out of them two columns: (1) one for the name of the variable ("var3" or "var4"), and (2) one for the value. We will be telling ggplot2 to "stack" bars, so var5 is not needed here: ggplot2 will do that calculation automatically. To gather the columns together, my preference is always to use gather() from dplyr and tidyr:

library(dplyr)
library(tidyr)
library(ggplot2)
library(data.table)

var1 <- c(head(randu$x,n=12))
var2 <- as.Date(c("2010-01-01","2010-02-01","2010-03-01","2010-04-01","2010-05-01","2010-06-01","2010-07-01","2010-08-01","2010-09-01","2010-10-01","2010-11-01","2010-12-01"))
var3 <- c(tail(randu[which(randu$x + randu$y < 1),]$x,n=12))
var4 <- c(tail(randu[which(randu$x + randu$y < 1),]$y,n=12))

dat <- data.frame(var1,var2,var3,var4)
setDT(dat)
# dat$var5 <- dat[,(var3+var4)]   no longer needed
new_dates <- as.Date(c("2010-09-01","2010-05-01"))
cbp2 <- c("#000000", "#56B4E9", "#009E73", "#0072B2", "#D55E00", "#CC79A7")

newdat <- dat %>% 
  gather(key='var_name', value='value', -var2) # gather all columns except for var2

names(newdat) <- c('Dates', 'var_name', 'value')
newdat$var_name <- factor(newdat$var_name, levels=c('var4', 'var3','var1'))

In addition to gathering together, you will also note that I'm adjusting the names of the columns to make them a bit more easier to follow when it comes down to plotting. Additionally, I'm setting the order of the levels for newdat$var_name. The purpose here is that the order we specify will relate to the ordering used to create the plot. I want var3 to appear as a bar "under" var4, so we need to specify that var4 is first.

You could also create a separate dataset containing var2 and var1 to use for plotting the geom_line layer... but this also works fine.

The Plot

For the plot, I've tried to organize the code into separate sections. What OP was trying to do was to plot column-by-column, rather than using aes(fill= and aes(color= to set and create legends. In addition, the OP's original code had numerous examples of the following:

geom_*(aes(color=...), color=...)

The result of this in ggplot2 is that if you set an aesthetic value (like color=) outside of aes() while also stating this argument inside aes(), the value on the outside will overwrite the value specified inside the mapping--effectively removing any call to place that within a legend. This was the biggest cause for issue in the OP's example, and why certain items were the "right" color, but did not appear in any legend.

Specifying arguments in aes() only indicates that a legend should be created and tells ggplot2 on what basis to apply color, fill, linetype... it does not actually specify the color. Color should be specified using the scale_*_*() functions. In this case, we have 3 legend types created. The OP can organize however they wish to do so, but I tried to keep this example a bit illustrative to allow for some changing on the OP's case, since it is still not entirely clear how the legend is wanted to look completely.

Note that values= is used to apply the color, linetype, or fill aesthetic, and is done by feeding that argument a named vector. You can also use a non-named vector, in which case the attributes will be applied according to the ordering of the levels for that factor.

Note that I changed the line color of the geom_line to blue... just so that it stands out a bit. It would be a bit confusing otherwise, since there is a fill color that is also black.

ggplot(dat, aes(x=Dates, y=value)) +
  
  # plot layers
  geom_col(
    data=subset(newdat, var_name != 'var1'),
    aes(fill=var_name), position='stack') +
  geom_line(
    data=subset(newdat, var_name == 'var1'),
    aes(color=var_name)
  ) +
  geom_vline(data=data.frame(xintercept = new_dates),
                         aes(xintercept = new_dates, linetype = "Changes"), colour="red",
                         key_glyph = "path")+
 
  # color and legend settings 
  scale_fill_manual(
    name="Fill",
    values=c('var3'=cbp2[2], 'var4'=cbp2[1])) +
  
  scale_color_manual(
    name='Color',
    values = 'blue') +
  
  scale_linetype_manual(
    name='Linetype',
    values=2) +

  # scale adjustment and theme stuff
  scale_y_continuous(labels = function(var5) paste0(var5*100, "%"),
                     limits=c(0,1),
                     breaks=c(0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1)) +
  
  theme(panel.background = element_blank(),
                axis.line = element_line(colour = "#000000"),
                axis.text.x = element_text(angle=60, hjust=1),
                panel.grid.major = element_blank(),
                panel.grid.minor = element_blank(),
                axis.title.x= (element_text(margin = unit(c(3, 0, 0, 0), "mm"))),
                legend.position = "top")

enter image description here

2
  • Your understanding was entirely correct. The legend was meant to be how you've set it up but with legend names set to blank (which can be easily done by using name= " "). Thank you so much! Commented Sep 2, 2020 at 0:48
  • 1
    Of course. Also, my recommendation is to use name=NULL rather than name="". In the case of using "", spacing for the name is still included, whereas when it is set to NULL, it kind of removes it entirely. Probably not really a difference here in a horizontal legend; however, it becomes obvious the difference this makes when you have the legends stacked on top of one another on the right or left. Commented Sep 2, 2020 at 17:36

Not the answer you're looking for? Browse other questions tagged or ask your own question.