-1

I want to create a data frame column based on whether a specific dictionary of terms appears within text data.

I currently have a data frame, where the column text contains different texts. I would like to create a new variable within that data frame which detects whether the words in my dictionary appear in any given text (to be coded as 1), or if the words in my dictionary do not appear in any given text (coded as 0).

I already created an animals_dict dictionary with three terms: "cat, "dog", and "fish". The code I have so far creates a new column in my data frame, but all values appear as 0.

data_zoo <- mutate(data_zoo, animals = if_else(text=="animals_dict", 1, 0)) 

I believe the issue is that this code is detecting whether the rows in my text column appear as "animals_dict", vs. detecting whether the words in my animals dictionary appear in that text at all.

0

1 Answer 1

0

You can use str_detect to return a logical like this

library(stringr)
library(dplyr)


animals_to_keep = c("cat", "fish", "dog")

examp_data = data.frame(creatures = c("cat", "fish",
 "dog", "elephant", "shark", "whale"))

examp_data |> 
  mutate(animals = if_else(str_detect(creatures, paste(animals_to_keep, collapse = "|")), 1,0))
#>   creatures animals
#> 1       cat       1
#> 2      fish       1
#> 3       dog       1
#> 4  elephant       0
#> 5     shark       0
#> 6     whale       0

Created on 2024-07-03 with reprex v2.1.0

6
  • thanks for this! I tried running this and it does look like it detects those words, but I don't actually see a new column/variable created within my dataframe (it only comes up in my console). any reason why this could be and how to get this variable to appear in my dataframe?
    – user17896
    Commented Jul 4 at 17:49
  • That’s just due to the fact that you haven’t assigned it
    – Josh Allen
    Commented Jul 4 at 21:17
  • what’s the best way to do that? Sorry I’m new to R!
    – user17896
    Commented Jul 4 at 21:43
  • No worries! You can use ‘<-‘ or ‘=‘. So when you did ‘data_zoo <- stuff’. That’s just called assignment. Same with the ‘animals_to_keep = c(“stuff”)’.
    – Josh Allen
    Commented Jul 4 at 22:08
  • gotcha, so something like: data_zoo$animals <- animals? I'm still getting an error here. specifically, here's the error i'm getting: "Error in $<-.data.frame(*tmp*, animals, value = c(0, 0, 0, 0, 0, 0, 0, : replacement has 2713 rows, data has 326"
    – user17896
    Commented Jul 5 at 13:55

Not the answer you're looking for? Browse other questions tagged or ask your own question.