0

I have a problem making the plot legend to show. I've checked other posts and most of them mention that color should be defined within the aes() of the ggplot.

So I have the following dataset:

  date     `N Rx AFTN` `N Tx AFTN`  SumAFTN `N Rx AMHS` `N Tx AMHS`  SumAMHS   SUM_Tx SUM_Total   SUM_Rx
  <chr>          <int>       <int>    <dbl>       <int>       <int>    <dbl>    <dbl>     <dbl>    <dbl>
1 2018_M12    22844731    36682853 59527584     5932852     5942479 11875331 42625332  71402915 28777583
2 2019_M12    31858229    52049649 83907878    23412883    25242891 48655774 77292540 132563652 55271112
3 2020_M12    12788348    27191283 39979631    14769682    18274489 33044171 45465772  73023802 27558030
4 2021_M12    11718972    36313167 48032139    22200241    24076705 46276946 60389872  94309085 33919213
5 2022_M12    13362373    46255935 59618308    32268850    31050963 63319813 77306898 122938121 45631223
6 2023_M12    14479914    50957248 65437162    35808014    34036231 69844245 84993479 135281407 50287928

and I'm plotting using:

    print(ggplot(all_df))+
      geom_point(aes(x=date,y=SUM_Total/1000000))+
      geom_line(aes(x=date,y=SUM_Total/1000000,color=SUM_Total),color="blue",group=1)+
      geom_line(aes(x=date,y=SUM_Rx/1000000, color=SUM_Rx),color="green",group=1) +
      geom_line(aes(x=date,y=SUM_Tx/1000000,color=SUM_Tx),color="red",group=1)

The resulting plot is

plot

where no legend is shown.

What am I missing?

The output of dput(all_df) is :

structure(list(date = c("2018_M12", "2019_M12", "2020_M12", "2021_M12", "2022_M12", "2023_M12"), N Rx AFTN = c(22844731L, 31858229L, 12788348L, 11718972L, 13362373L, 14479914L), N Tx AFTN = c(36682853L, 52049649L, 27191283L, 36313167L, 46255935L, 50957248L), SumAFTN = c(59527584, 83907878, 39979631, 48032139, 59618308, 65437162), N Rx AMHS = c(5932852L, 23412883L, 14769682L, 22200241L, 32268850L, 35808014L), N Tx AMHS = c(5942479L, 25242891L, 18274489L, 24076705L, 31050963L, 34036231L), SumAMHS = c(11875331, 48655774, 33044171, 46276946, 63319813, 69844245), SUM_Rx = c(28777583, 55271112, 27558030, 33919213, 45631223, 50287928), SUM_Tx = c(42625332, 77292540, 45465772, 60389872, 77306898, 84993479), SUM_Total = c(71402915, 132563652, 73023802, 94309085, 122938121, 135281407)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"), .internal.selfref = <pointer: 0x5613f6742840>, sorted = "date")

2
  • 1
    You have two color arguments in each call to geom_line call. That's not good. Aslo, you're data frame isn't tidy (because you have information in the column names that you need for your plot) which is making life harder than it need be. You would help us to help you if you posted the results of dput(all_df) to your question.
    – Limey
    Commented May 9 at 12:04
  • When posting a data frame kindly post it in a reproducible manner. Commented May 9 at 12:07

3 Answers 3

1

When you assign colors using color argument directly within the geom_line() function, ggplot interprets these as instructions to use a specific color rather than to map data to an aesthetic, which is what generates legends. In order to generate a legend based on the data, you need to move the color argument into the aes() function. Here I use scale_color_manual().

ggplot(all_df) +
  geom_point(aes(x = date, y = SUM_Total / 1000000)) +
  geom_line(aes(x = date, y = SUM_Total / 1000000, color = "Total"), group = 1) +
  geom_line(aes(x = date, y = SUM_Rx / 1000000, color = "Rx"), group = 1) +
  geom_line(aes(x = date, y = SUM_Tx / 1000000, color = "Tx"), group = 1) +
  scale_color_manual(values = c("Total" = "blue", "Rx" = "green", "Tx" = "red"),
                     name = "Message Types", labels = c("Total", "Received", "Transmitted")) +
  labs(title = "Plot Title", x = "Date/Time", y = "Messages (in millions)") +
  theme(axis.text = element_text(size = 8), 
        axis.title = element_text(size = 8),
        plot.title = element_text(size = 10),
        axis.text.x = element_text(angle = 45))

enter image description here

0

And here's a tidy solution. I leave you to sort out the finer details of the formatting for yourself.

all_df %>% 
  pivot_longer(starts_with("SUM_")) %>% 
  mutate(plotDate = as.numeric(str_sub(date, 1, 4))) %>% 
  ggplot() +
  geom_line(aes(x = plotDate,y = value/1000000, color = name))

enter image description here

pivot_longer both filters the data frame to include only the "columns" you need and produces a data frame with one row per value of date per column in the original data frame and three columns: date, name and value. This makes it tidy.

A numeric value is needed on the x axis to make geom_line and aes(colour=) interact correctly: hence the call to mutate.

0

Thank you all for your support.

It seems to me that, end of the day, there was an issue with the RStudio I was using (.deb for Ubuntu linux).

Ofcourse there were mistakes in the code I posted above regarding the color outsise the aesthetics -I had tried that as well-, but after restarting the RStudio the Legend appeared.

Thanks again

Not the answer you're looking for? Browse other questions tagged or ask your own question.