I am attempting to produce a table of weighted mean survey scores by categorical variables in gtsummary using the tbl_svysummary function.

How do I do this using survey weights, using the tbl_svysummary function? I've gone through the two examples that are close on here with solutions provided by Daniel Sjoberg, but I just don't really understand what is happening with them. I can do this by transposing the by and include arguments in tbl_svysummary, but then I have a whole new problem of the table being the wrong way round.

Here's some sample data:

dat <- structure(list(uuid = c("p41112019021430", "p41222013024584", 
"p41212017017560", "p41212017011700", "p41212019022003", "p41212019133026", 
"p41212017014434", "p41112019023063", "p41212019077561", "p41212017050030"
), age_cat = structure(c(3L, 1L, 4L, 2L, 2L, 3L, 3L, 2L, 1L, 
4L), levels = c("18-24", "25-44", "45-64", "65-74", "75+"), class = "factor"), 
    cvh_score = c(6, 4, 1, 3, 0, 2, 3, 2, 6, 1), weights = c(p41112019021430 = 0.360602284454939, 
    p41222013024584 = 5.00004172246093, p41212017017560 = 0.276025143197602, 
    p41212017011700 = 1.55086389757734, p41212019022003 = 2.20669366738008, 
    p41212019133026 = 0.878664071962474, p41212017014434 = 1.15252329666968, 
    p41112019023063 = 1.51638372307208, p41212019077561 = 2.1408232841115, 
    p41212017050030 = 0.282529671403006)), row.names = c(NA, 
-10L), class = "data.frame")

dat_svy <- dat |> 
    ids = uuid, 
    weights = weights

I can achieve what I am looking for without weights using the tbl_continuous function like follows:

tbl_01 <- gtsummary::tbl_continuous(
  variable = cvh_score, 
  include = c(age_cat), 
  statistic = list(
    everything() ~ "{mean} ({sd})"


which gives me the following:

How do I do this with weights?

Here's the solution I came up with:

weighted_mean <- dat_svy |>
  filter(!is.na(age_cat)) |>
  group_by(age_cat) |>
  summarize(mean = survey_mean(cvh_score), sd = survey_sd(cvh_score)) |>
    row_type = "level",
    label = age_cat,
    stat_3 = str_glue("{round(mean, 2)} ({round(sd, 2)})")
  ) |>
    row_type = "label",
    label = "Age",
    stat_3 = NA,
    .before = 1L

table <- tbl_svysummary(
  include = c("age_cat"), 
  label = list(
    age_cat ~ "Age" 
  statistic = list(
    all_categorical() ~ "{n}"
  percent = "row", 
  missing = "no"
  ) |> 
    ~ .x %>% 
        by = c("row_type", "label")
  ) |> 
  modify_column_unhide(columns = c("stat_3")) |> 
  modify_column_hide(columns = c("stat_0")) |> 
    label = "**Characteristic**", 
    stat_3 = "**CVH Score**, Mean (SD)"

You could do the calculations in the survey package, which gives you the SE, and then format the output using gt:

svyby(~cvh_score, ~age_cat, dat_svy, svymean) |>
  gt::gt() |>
  tab_footnote(footnote="cvh_score: Mean (SE)") |>
    style = list(
      cell_text(weight = "bold")),
    locations = cells_column_labels()
  ) |>

I suggest using surveytable for this. The code is very simple:

dat_svy <- svydesign(ids = ~uuid, weights = ~weights, data = dat)
set_survey(dat_svy, opts = "general")
tab_subset("cvh_score", "age_cat")


cvh_score (for different levels of age_cat) {dat_svy}
│ Level │ % known │ Mean │   SEM │   SD │
│ 18-24 │     100 │ 4.6  │ 0.626 │ 1.3  │
│ 25-44 │     100 │ 1.46 │ 0.818 │ 1.58 │
│ 45-64 │     100 │ 3.08 │ 0.627 │ 1.6  │
│ 65-74 │     100 │ 1    │ 0     │ 0    │

tab_subset() returns the table, so you can pipe it through to any other function that you want. You can also optionally send the table to CSV, HTML, or LaTeX.

