0

I am trying to join two datasets. It should be a basic left_join, but every time I try to do so, I end up getting the new column (the one I want to merge in question) completely filled with NA. This is not a column class issue -- I have changed the classes to all match. There is no white space, I have trimmed all the white space. All the column names match. I cannot figure out for the life of me what the error is here. As an example, the two datasets look vaguely like this:

date hour site PNC_stat
2021-03-03 0 Chelsea 19203.2
2021-03-03 1 Chelsea 72837.2
2021-03-03 2 Chelsea 23683.1
2021-03-03 0 Winthrop 27728.2
2021-03-03 1 Winthrop 8374728

and the dataset to merge:

date hour site PNC_mob
2021-03-03 0 Chelsea 1837238.5
2021-03-03 1 Chelsea 2314.2
2021-03-03 2 Chelsea 283147.2
2021-03-03 0 Winthrop 9385.3
2021-03-03 1 Winthrop 83934.2

This basic code should do the trick:

all1 <- right_join(all_stat, mob, by=c("NEAR_SITE", "date", "hour"))

And yet either the entire PNC_mob column for example will append as NA, OR sometimes I will get the PNC_mob column to have values ONLY for one site group (i.e. all the Chelsea values will have this column filled in, but the others will day NA).

Please tell me what I am doing wrong here, I have used this function in the past with no issue.

For context, there are not the same number of rows in each df, so I will need the first df to repeat for all of the matches in the second df, but again this has always worked in a basic left_join.

3
  • 1
    The column in your data frame is called site but your code tries to join by "NEAR_SITE"
    – SamR
    Commented Jul 28, 2023 at 17:43
  • That's just a typo on my part, in the dataframes on R they are the same
    – bre123
    Commented Jul 28, 2023 at 21:51
  • I suspected so as you would get an error if you tried to join on columns that didn't exist. Nevertheless unless you include the exact code and data you're using no one is going to be able to help you. I'm voting to close this for now but happy to withdraw the vote if you edit the question.
    – SamR
    Commented Jul 28, 2023 at 22:19

1 Answer 1

0

I copied and pasted your tables in Excel and saved them as csv files. Then used this code:

library(dplyr)
all_stat <- read.csv("/Users/johndoe/Coding/table1.csv")
mob <- read.csv("/Users/johndoe/Coding/table2.csv")

all1 <- right_join(all_stat, mob, by = c("site", "date", "hour"))

Which gives me the following output:

r$> all1
        date hour     site  PNC_stat   PNC_mob
1 03/03/2021    0  Chelsea   19203.2 1837238.5
2 03/03/2021    1  Chelsea   72837.2    2314.2
3 03/03/2021    2  Chelsea   23683.1  283147.2
4 03/03/2021    0 Winthrop   27728.2    9385.3
5 03/03/2021    1 Winthrop 8374728.0   83934.2

r$> all_stat
        date hour     site  PNC_stat
1 03/03/2021    0  Chelsea   19203.2
2 03/03/2021    1  Chelsea   72837.2
3 03/03/2021    2  Chelsea   23683.1
4 03/03/2021    0 Winthrop   27728.2
5 03/03/2021    1 Winthrop 8374728.0

r$> mob
        date hour     site   PNC_mob
1 03/03/2021    0  Chelsea 1837238.5
2 03/03/2021    1  Chelsea    2314.2
3 03/03/2021    2  Chelsea  283147.2
4 03/03/2021    0 Winthrop    9385.3
5 03/03/2021    1 Winthrop   83934.2

So maybe something is going wrong with loading the data?

Not the answer you're looking for? Browse other questions tagged or ask your own question.