All dplyr join functions (left_join, etc) fill the entire merged column with NA

Question

I am trying to join two datasets. It should be a basic left_join, but every time I try to do so, I end up getting the new column (the one I want to merge in question) completely filled with NA. This is not a column class issue -- I have changed the classes to all match. There is no white space, I have trimmed all the white space. All the column names match. I cannot figure out for the life of me what the error is here. As an example, the two datasets look vaguely like this:

date	hour	site	PNC_stat
2021-03-03	0	Chelsea	19203.2
2021-03-03	1	Chelsea	72837.2
2021-03-03	2	Chelsea	23683.1
2021-03-03	0	Winthrop	27728.2
2021-03-03	1	Winthrop	8374728

and the dataset to merge:

date	hour	site	PNC_mob
2021-03-03	0	Chelsea	1837238.5
2021-03-03	1	Chelsea	2314.2
2021-03-03	2	Chelsea	283147.2
2021-03-03	0	Winthrop	9385.3
2021-03-03	1	Winthrop	83934.2

This basic code should do the trick:

all1 <- right_join(all_stat, mob, by=c("NEAR_SITE", "date", "hour"))

And yet either the entire PNC_mob column for example will append as NA, OR sometimes I will get the PNC_mob column to have values ONLY for one site group (i.e. all the Chelsea values will have this column filled in, but the others will day NA).

Please tell me what I am doing wrong here, I have used this function in the past with no issue.

For context, there are not the same number of rows in each df, so I will need the first df to repeat for all of the matches in the second df, but again this has always worked in a basic left_join.

The column in your data frame is called site but your code tries to join by "NEAR_SITE" — SamR, Commented Jul 28, 2023 at 17:43
That's just a typo on my part, in the dataframes on R they are the same — bre123, Commented Jul 28, 2023 at 21:51
I suspected so as you would get an error if you tried to join on columns that didn't exist. Nevertheless unless you include the exact code and data you're using no one is going to be able to help you. I'm voting to close this for now but happy to withdraw the vote if you edit the question. — SamR, Commented Jul 28, 2023 at 22:19

Joan · Accepted Answer · 2023-07-29 01:09:38Z

I copied and pasted your tables in Excel and saved them as csv files. Then used this code:

library(dplyr)
all_stat <- read.csv("/Users/johndoe/Coding/table1.csv")
mob <- read.csv("/Users/johndoe/Coding/table2.csv")

all1 <- right_join(all_stat, mob, by = c("site", "date", "hour"))

Which gives me the following output:

r$> all1
        date hour     site  PNC_stat   PNC_mob
1 03/03/2021    0  Chelsea   19203.2 1837238.5
2 03/03/2021    1  Chelsea   72837.2    2314.2
3 03/03/2021    2  Chelsea   23683.1  283147.2
4 03/03/2021    0 Winthrop   27728.2    9385.3
5 03/03/2021    1 Winthrop 8374728.0   83934.2

r$> all_stat
        date hour     site  PNC_stat
1 03/03/2021    0  Chelsea   19203.2
2 03/03/2021    1  Chelsea   72837.2
3 03/03/2021    2  Chelsea   23683.1
4 03/03/2021    0 Winthrop   27728.2
5 03/03/2021    1 Winthrop 8374728.0

r$> mob
        date hour     site   PNC_mob
1 03/03/2021    0  Chelsea 1837238.5
2 03/03/2021    1  Chelsea    2314.2
3 03/03/2021    2  Chelsea  283147.2
4 03/03/2021    0 Winthrop    9385.3
5 03/03/2021    1 Winthrop   83934.2

So maybe something is going wrong with loading the data?

Collectives™ on Stack Overflow

All dplyr join functions (left_join, etc) fill the entire merged column with NA

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
r
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged r or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
r
or ask your own question.