0

I am trying to visualize PCA results, and I want to have a quadrant plot (fviz_pca_var) showing the groups visually and then a bar plot showing the actual values of the contribution (fviz_contrib)

Ideally, I would like the fviz_contrib bar graph to match the colors I used to identify clusters in the fviz_pca_var plot. When I try to do this via "fill" or "color" parameters, I get the error:

    Caused by error in `if (fill %in% names(data)) ...
! the condition has length > 1`:

This is the code I used to get the cluster groups:

var <- get_pca_var(data_no_na.std.pc)
set.seed(123)
res.km <- kmeans(var$coord, centers = 3, nstart = 25)
grp <- as.factor(res.km$cluster)

Where the grp object is my groupings -- it looks like a factor list with three levels where all of my variables are assigned either 1,2, or 3.

The code to create the plot itself is;

fviz_contrib(data_no_na.std.pc, choice="var", axes = 1, top=50, fill=grp, 
             color="black") +
  scale_fill_manual(value=c("#481567FF", "#E7B800", "#20A387FF"),
                         breaks=grp) 

The result would be the regular fviz_contrib bar plot but with the bars showing different colors based on the group I've assigned that variable to (1,2,3). I tried the solution here, but when I try to fill based on the grp object it doesn't work, and I've tried to append grp as it's own element in the data_no_na.std.pc object and use that, but it also does not work (I get an "unknown color name" error):

data_no_na.std.pc$grp <- grp
fviz_contrib(data_no_na.std.pc, choice="var", axes = 1, top=50, fill="grp", 
             color="black")

Any ideas? This seems like it should be straightforward...

1
  • 1
    Could you please share some reproducible data using dput?
    – Quinten
    Commented Mar 15 at 16:51

1 Answer 1

0

I figured out how to do this, for anyone who is interested. I broke down the fviz_contrib function itself (you can turn this back into a function if you want, but I only had to run this once so I just kept it as multiple lines of code), and instead of using ggpubr I just used ggplot:

library(factoextra)
library(ggplot2)

#Calculate the contrib and make new dataframe
dd <- facto_summarize(data.std.pc, element = "var", result = 
    "contrib", 
                  axes = 1:2)
contrib <- dd$contrib
theo_contrib <- 100/length(contrib)
names(contrib) <- rownames(dd)
df <- data.frame(name = factor(names(contrib), levels = names(contrib)), 
                 contrib = contrib, stringsAsFactors = TRUE)
grp1 <- data.frame(grp)
grp1 <- tibble::rownames_to_column(grp1, "name")
df1 <- left_join(df, grp1, by="name")

# Calculate the dotted line
axes = 1:2
eig <- get_eigenvalue(data_no_na.std.pc)[axes, 1]
theo_contrib <- sum(theo_contrib * eig)/sum(eig)

# Assign colors to the correct groups and make plot
grp_cols <- c(1="#481567FF", 2="#E7B800", 3="#20A387FF")

ggplot(df1, aes(x=reorder(name, -contrib), y=contrib, fill=grp)) +
  geom_bar(stat="identity") +
  geom_hline(yintercept = theo_contrib, linetype = 2, color = "black") +
  theme(axis.text.x=element_text(angle=45,hjust=0.5,vjust=0.5,
                                 size=11)) +
  scale_fill_discrete(labels = scales::parse_format()) + 
  scale_x_discrete(labels = ggplot2:::parse_safe) + # I had subscripts in the labels, you don't need to include this unless you too have subscripts
  scale_fill_manual(values=grp_cols) +
  theme(axis.title.x=element_blank(),
        panel.background = element_rect(fill = "white", colour = 
"grey50")) +
  labs(y = element_text("Contribution (%)"),
       title="Contribution of variables to PC1 & PC2")+
  guides(fill=guide_legend(title="Cluster"))

Not the answer you're looking for? Browse other questions tagged or ask your own question.