0

I have a list with multiple gene sets, let's say:

genes <- paste("gene",1:1000,sep="")
x <- list(A = sample(genes,300), 
          B = sample(genes,525), 
          C = sample(genes,440),
          D = sample(genes,350))

I managed to get a pairwise (2 first columns, A vs B) Venn/euler diagram:

library(eulerr)
plot(euler(x[1:2]), quantities = TRUE)

Euler

Any idea how I can get automated pairwise 2-by-2 Venn/euler Diagram of all these gene sets?

ie: A vs B, A vs C, A vs D, B vs C, B vs D, C vs D.

In reality I have 70+ geneset that I want to compare to each other, then build a sort of matrix which would look like this:

Pairwise euler

Thanks !

2
  • Even without the lower half or diagonal, you're still describing a matrix of >2400 diagrams (for 70 genes). Is there an alternative way you could present the number of overlaps between all pairs of genes in your dataset?
    – Seth
    Commented Jan 16 at 21:09
  • At the end, I will filter out some of them of course :)
    – B_slash_
    Commented Jan 16 at 22:05

1 Answer 1

2

It's fairly involved, but you can get Venns for all pairs using combn, and strip the data out to create a facetted ggplot:

library(eulerr)
library(tidyverse)

venns <- combn(length(x), 2, FUN = function(i) {
  res <- plot(euler(x[i]), quantities = TRUE)
  res$col_vals <- names(x)[i]
  res }, simplify = FALSE)
  
plot_df <- do.call("rbind", lapply(venns, function(e) {
    do.call("rbind", Map(\(a, b, c, d) {
      data.frame(x = a$x, y = a$y, piece = b, row = c, col = d)
      }, a = e$data$fills, b = seq_along(e$data$fills), c = e$col_vals[1],
      d = e$col_vals[2]))
  })) |> mutate(row = factor(row, names(.env[["x"]])),
                col = factor(col, names(.env[["x"]])))

text_df <- do.call("rbind", lapply(venns, function(a) {
  res <- cbind(a$data$centers, row = factor(a$col_vals[1], names(x)),
        col = factor(a$col_vals[2], names(x)))
  res$labels[is.na(res$labels)] <- ""
  res
}))

ggplot(plot_df, aes(x, y)) +
  geom_polygon(aes( fill = factor(piece)), color = "black") +
  geom_text(aes(label = labels), data = text_df, nudge_y = 1, fontface = 2) +
  geom_text(aes(label = quantities), data = text_df, nudge_y = -2) +
  facet_grid(row ~ col, drop = FALSE, switch = "y") +
  scale_fill_manual(values = c("white", "#ececec", "#d9d9d9")) +
  coord_equal() +
  theme_void(base_size = 20) +
  theme(legend.position = "none", 
        panel.background = element_rect(),
        panel.spacing = unit(0, "mm"),
        strip.text = element_text(margin = margin(10, 10, 10, 10)))

enter image description here

1
  • thanks a lot by providing this solution! it works fine indeed. I would have hopped for a simpler, more easily reproductible solution (without having to name each column). I am working with a larger list with numerous gene sets. For example, I would like to compare gene set 1 to 10 with 11 to 20... thanks anyway!
    – B_slash_
    Commented Jan 17 at 16:50

Not the answer you're looking for? Browse other questions tagged or ask your own question.