I am trying to create a metadata file with average pollution concentrations over the course of a year per site ID. I can easily calculate the mean, max, min, etc. but what I cannot do is extract the date on which the minimum and maximum concentration values occured from the parent dataset and make it a column in the new dataset. Example:
Parent dataset from which I am calculating the mean, min, max, etc:
ID | Conc | Date |
---|---|---|
1 | 3000 | 01-04-2022 |
1 | 3256 | 01-05-2022 |
1 | 6352 | 02-09-2022 |
1 | 7362 | 03-04-2022 |
2 | 5364 | 01-04-2022 |
2 | 6453 | 01-05-2022 |
2 | 3490 | 02-09-2022 |
and so on..
The desired output would look something like this:
ID | Min | Max | min_date | max_date |
---|---|---|---|---|
1 | 3000 | 7362 | 01-04-2022 | 03-04-2022 |
2 | 3490 | 6453 | 02-09-2022 | 01-05-2022 |
3 | 900 | 37267 | 01-05-2022 | 08-09-2022 |
4 | 3490 | 5666 | 02-09-2022 | 07-01-2022 |
I cannot seem to grab the min and max dates from the dataset. This is the code I have right now to calculate all of the other variables I need:
annual_table <- all %>%
group_by(NEAR_FID) %>%
dplyr::summarize(
avg = mean(Conc, na.rm = T),
n_data_points = length(NEAR_FID),
median = median(Conc),
quant_95 = quantile(Conc,0.5),
quant_5 = quantile(Conc,0.95),
max = max(Conc),
min = min(Conc))
I've tried various indexing but it won't work properly and quite frankly I'd like a dplyr solution that I can just throw into this code rather than a long workaround with filtering and joining. Any ideas?