I'm a R frequent user but always try to understand why these two graphs are super different and what I can do to mimic geom_line to match the display produced by stat_summary (which is way better) Bonus question: What is the reasonable justification for keeping geom_line() working like that ?
library(tidyverse)
df = structure(list(
year_completed_cat = structure(
c(5L, 4L, 5L, 4L, 4L, 6L, 6L, 4L, 6L, 4L, 6L, 5L, 4L, 4L
4L, 5L, 6L, 5L, 6L, 5L, 6L, 6L, 6L, 5L, 4L, 6L, 6L, 6L,
6L, 5L, 4L, 6L, 6L, 5L, 5L, 6L, 6L, 4L, 4L, 6L, 6L, 6L,
6L, 5L, 4L, 6L, 5L, 6L, 6L, 5L),
levels = c("18", "19", "20", "21", "22", "23", "24"),
class = "factor"),
asqse_quest = structure(
c(6L, 7L, 7L, 7L, 7L, 6L, 6L, 5L, 5L, 5L, 5L, 6L, 6L, 5L,
7L, 6L, 5L, 6L, 7L, 5L, 6L, 7L, 6L, 7L, 7L, 7L, 5L, 7L,
5L, 5L, 6L, 7L, 5L, 5L, 7L, 5L, 7L, 6L, 6L, 5L, 6L, 5L,
6L, 6L, 6L, 5L, 6L, 6L, 5L, 5L),
levels = c("2", "6", "12", "18", "24", "30", "36", "48", "60"),
class = "factor"),
asqse_total =
c(205, 40, 80, 60, 40, 60, 120, 0, 20, 20, 70, 70, 35, 35,
225, 140, 80, 215, 230, 110, 180, 155, 25, 165, 75, 60, 20
85, 20, 75, 30, 35, 25, 55, 160, 70, 140, 35, 140, 30, 40,
40, 25, 40, 75, 5, 35, 205, 5, 40)),
row.names = c(NA, -50L), class = "data.frame")
ggplot(df, aes(x = year_completed_cat, y = asqse_total,
group = asqse_quest, color = asqse_quest)) +
geom_line() + geom_point()
ggplot(df, aes(x = year_completed_cat, y = asqse_total,
group = asqse_quest, color = asqse_quest)) +
stat_summary(geom = "line", fun = mean)
Created on 2024-07-07 with reprex v2.1.0
geom_line
draws lines connecting the data.stat_summary
summarises the data first withfun
(here mean) and then draws lines connecting the "funs" of the groups.