1

How would one solve the following toy problem using dplyr:

Take a data frame where each row contains at least two iris species separated by spaces:

mySpecies <- data.frame(
  Species=c("lazica uniflora setosa", 
        "virginica setosa uniflora loczyi",
        "versicolor virginica"))

I'd like to add 2 columns to 'mySpecies' where each row contains the mean of the Sepal.Length and Sepal.Width for only those species available in a separate lookup table: the iris dataset: unique(iris$Species)

The output of this example should be the mySpecies data frame with additional 'Sepal.Length.mean' and 'Sepal.Width.mean' columns containing the mean of those variables across each species that appear in iris$Species.

So the first row would just contain the Sepal.Length and Sepal.Width for 'setosa', because the other species names don't appear in iris. The second row, however, would contain the means of Sepal.Length and Sepal.Width across 'virginica' and 'setosa', because they both appear in the lookup table (i.e. iris).

Note that this is a toy example but my actual dataframes are quite large.

4
  • So what's the desired output for your example?
    – lukeA
    Commented Mar 12, 2016 at 17:42
  • It is not clear how you want the output
    – akrun
    Commented Mar 12, 2016 at 17:43
  • Did you mean iris %>% group_by(Species) %>% summarise_each(funs(mean), Sepal.Length:Sepal.Width) %>% bind_cols(., mySpecies)
    – akrun
    Commented Mar 12, 2016 at 17:46
  • I've elaborated on the desired output
    – BDA
    Commented Mar 12, 2016 at 18:53

2 Answers 2

1

Here you go. First, split up your string into individual species; then for each group: filter the rows that match, and compute the mean.

mySpecies %>%
    group_by(Species) %>%
    do({
        spec <- strsplit(as.character(.$Species), " ", fixed=TRUE)[[1]]
        filter(iris, Species %in% spec) %>%
            summarise_each(funs(mean), Sepal.Length, Sepal.Width)
    })
0
library(dplyr)

mySpecies= c("setosa", "loczyi", "virginica")

filter(iris, Species %in% mySpecies) %>%
    group_by(iris, Species) %>% 
    summarise(mean_width = mean(Sepal.Width),
              mean_length = mean(Sepal.Length))

Not the answer you're looking for? Browse other questions tagged or ask your own question.