How to extract the date on which the minimum and maximum values occur?

Question

I am trying to create a metadata file with average pollution concentrations over the course of a year per site ID. I can easily calculate the mean, max, min, etc. but what I cannot do is extract the date on which the minimum and maximum concentration values occured from the parent dataset and make it a column in the new dataset. Example:

Parent dataset from which I am calculating the mean, min, max, etc:

ID	Conc	Date
1	3000	01-04-2022
1	3256	01-05-2022
1	6352	02-09-2022
1	7362	03-04-2022
2	5364	01-04-2022
2	6453	01-05-2022
2	3490	02-09-2022

and so on..

The desired output would look something like this:

ID	Min	Max	min_date	max_date
1	3000	7362	01-04-2022	03-04-2022
2	3490	6453	02-09-2022	01-05-2022
3	900	37267	01-05-2022	08-09-2022
4	3490	5666	02-09-2022	07-01-2022

I cannot seem to grab the min and max dates from the dataset. This is the code I have right now to calculate all of the other variables I need:

    annual_table <- all %>%
       group_by(NEAR_FID) %>%
       dplyr::summarize(
          avg = mean(Conc, na.rm = T),
          n_data_points = length(NEAR_FID),
          median = median(Conc),
          quant_95 = quantile(Conc,0.5),
          quant_5 = quantile(Conc,0.95),
          max = max(Conc),
          min = min(Conc))

I've tried various indexing but it won't work properly and quite frankly I'd like a dplyr solution that I can just throw into this code rather than a long workaround with filtering and joining. Any ideas?

Melissa Key · Accepted Answer · 2023-06-16 17:03:09Z

0

Try

all %>%
       group_by(NEAR_FID) %>%
       dplyr::summarize(
          avg = mean(Conc, na.rm = T),
          n_data_points = length(NEAR_FID),
          median = median(Conc),
          quant_95 = quantile(Conc,0.5),
          quant_5 = quantile(Conc,0.95),
          max = max(Conc),
          min = min(Conc),
          date_min = Date[which.min(Conc)],
          date_max = Date[which.max(Conc)]
  )

answered Jun 16, 2023 at 17:03

Melissa Key

4,54113 silver badges22 bronze badges

Add a comment |

Adrian Fletcher · Accepted Answer · 2023-06-16 17:06:52Z

0

You need to coerce the date column to be a date. You can use the ymd function to change it.

annual_table <- all %>%
       group_by(NEAR_FID) %>%
       
       mutate(Date = ymd(Date) %>% 

       dplyr::summarize(
          avg = mean(Conc, na.rm = T),
          n_data_points = length(NEAR_FID),
          median = median(Conc),
          quant_95 = quantile(Conc,0.5),
          quant_5 = quantile(Conc,0.95),
          max = max(Conc),
          min = min(Conc))

answered Jun 16, 2023 at 17:06

Adrian Fletcher

14011 bronze badges

Add a comment |

TarJae · Accepted Answer · 2023-06-16 18:19:20Z

0

We could use summarise()

library(dplyr)
library(lubridate)

df %>% 
  group_by(group = year(dmy(Date)), ID) %>% 
  summarise(
    Min = min(Conc),
    Max = max(Conc),
    min_date = Date[which.min(Conc)],
    max_date = Date[which.max(Conc)], .groups = "drop") %>% 
  select(-group)

    ID   Min   Max min_date   max_date  
  <int> <int> <int> <chr>      <chr>     
1     1  3000  7362 01-04-2022 03-04-2022
2     2  3490  6453 02-09-2022 01-05-2022

answered Jun 16, 2023 at 18:19

TarJae

78.1k6 gold badges24 silver badges84 bronze badges

Recognized by R Language Collective

Add a comment |

Onyambu · Accepted Answer · 2023-06-16 18:31:29Z

in base R

subset(df, ave(Conc, ID, FUN=\(x)x %in% range(x))>0)|>
   transform(time = c("max", "min")[order(ID, Conc)%%2 + 1]) |>
   reshape(idvar = "ID", dir="wide", sep="_")
  ID Conc_min   Date_min Conc_max   Date_max
1  1     3000 01-04-2022     7362 03-04-2022
6  2     3490 02-09-2022     6453 01-05-2022

stack(with(df, tapply(Conc, ID, range)))|>
   setNames(c("Conc", "ID")) |>
   transform(time = c("min", "max")) |>
   merge(df, y = _)|>
   reshape(idvar = "ID", dir="wide", sep="_")

  ID Conc_min   Date_min Conc_max   Date_max
1  1     3000 01-04-2022     7362 03-04-2022
3  2     3490 02-09-2022     6453 01-05-2022

in Tidyverse:

df %>%
   filter(Conc %in%range(Conc), .by = ID)%>%
   cbind(name = c("min", "max")) %>%
   pivot_wider(id_cols = ID, names_from = name,
               values_from = c(Conc, Date))
# A tibble: 2 × 5
     ID Conc_min Conc_max Date_min   Date_max  
  <int>    <int>    <int> <chr>      <chr>     
1     1     3000     7362 01-04-2022 03-04-2022
2     2     6453     3490 01-05-2022 02-09-2022

Collectives™ on Stack Overflow

How to extract the date on which the minimum and maximum values occur?

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
r
date
dplyr
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged rdatedplyr or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
r
date
dplyr
or ask your own question.