R dplyr: summarise_each from an external lookup table?

Question

How would one solve the following toy problem using dplyr:

Take a data frame where each row contains at least two iris species separated by spaces:

mySpecies <- data.frame(
  Species=c("lazica uniflora setosa", 
        "virginica setosa uniflora loczyi",
        "versicolor virginica"))

I'd like to add 2 columns to 'mySpecies' where each row contains the mean of the Sepal.Length and Sepal.Width for only those species available in a separate lookup table: the iris dataset: unique(iris$Species)

The output of this example should be the mySpecies data frame with additional 'Sepal.Length.mean' and 'Sepal.Width.mean' columns containing the mean of those variables across each species that appear in iris$Species.

So the first row would just contain the Sepal.Length and Sepal.Width for 'setosa', because the other species names don't appear in iris. The second row, however, would contain the means of Sepal.Length and Sepal.Width across 'virginica' and 'setosa', because they both appear in the lookup table (i.e. iris).

Note that this is a toy example but my actual dataframes are quite large.

Did you mean iris %>% group_by(Species) %>% summarise_each(funs(mean), Sepal.Length:Sepal.Width) %>% bind_cols(., mySpecies) — akrun, Commented Mar 12, 2016 at 17:46

Hong Ooi · Accepted Answer · 2016-03-12 19:05:28Z

1

Here you go. First, split up your string into individual species; then for each group: filter the rows that match, and compute the mean.

mySpecies %>%
    group_by(Species) %>%
    do({
        spec <- strsplit(as.character(.$Species), " ", fixed=TRUE)[[1]]
        filter(iris, Species %in% spec) %>%
            summarise_each(funs(mean), Sepal.Length, Sepal.Width)
    })

answered Mar 12, 2016 at 19:05

Hong Ooi

57.3k13 gold badges138 silver badges189 bronze badges

Add a comment |

saladi · Accepted Answer · 2016-03-12 18:13:46Z

0

library(dplyr)

mySpecies= c("setosa", "loczyi", "virginica")

filter(iris, Species %in% mySpecies) %>%
    group_by(iris, Species) %>% 
    summarise(mean_width = mean(Sepal.Width),
              mean_length = mean(Sepal.Length))

answered Mar 12, 2016 at 18:13

saladi

3,2336 gold badges38 silver badges65 bronze badges

Add a comment |

Collectives™ on Stack Overflow

R dplyr: summarise_each from an external lookup table?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
r
dplyr
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged rdplyr or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
r
dplyr
or ask your own question.