0

I downloaded discharge data for a river from a government website, they had formatted the date and time data as so please see format.

This is my code

library(ggplot2)
ggplot(CHEM_RESULTS, aes(x= `Date and Time`, y=`Discharge (cumec)`,  group = 1)) +
  geom_line( color="powderblue", size=1, alpha=0.9, linetype=1)

I produced this graph please see graph .

DATA SAMPLE:

head(CHEM_RESULTS)

Date and Time
<chr>
Discharge (cumec)
<dbl>
2024-03-05T00:00:01.000+10:00   3.202           
2024-03-05T00:35:01.000+10:00   3.124           
2024-03-05T01:00:01.000+10:00   3.040           
2024-03-05T01:30:01.000+10:00   2.956           
2024-03-05T02:00:01.000+10:00   2.919           
2024-03-05T03:00:01.000+10:00   2.867   

I think due to the format of the date and time being so long and having so many entries(1896) it is creating the bar on the x axis rather than displaying the data. I do not think all data needs to be shown but some date/time points are needed to provide context. I think it may be challenging to reformat the way the government has given the date/ time data, again given how many entries there are.

I need to overlay other water quality data onto the graph e.g. pH at 4 sites and 4 different time periods. Once I put these points onto the graph will it highlight them? as that would be useful in providing only the necessary date and time information.

any help on how to approach this is greatly appreciated.

Thank you !

tried to make a line graph of river discharge getting a bar on the x axis instead of displaying time stamps

5
  • 1
    Perhaps try converting Date and Time character vector into an R datetime format. For example with: lubridate::ymd_hms(sub("\\.000\\+10\\.00", "", Date and Time))
    – missuse
    Commented May 18 at 4:05
  • 1
    You need to convert the "Date and Time" column from a character string to a date time object. See help for as.POSIXct() and strptime()
    – Dave2e
    Commented May 18 at 4:07
  • @Dave2e hey thank you so much, I tried this : strptime(CHEM_RESULTS$Date and Time,"%Y/%m/%e/%y/%Z", tz = "") the code ran but I am not sure if I formatted it properly as I still get the square underneath my graph. Thanks! Commented May 18 at 23:27
  • You format is incorrect, you column has - and not / also you need to account for the “T”. There might be other errors. Also you need to assign the result back to the “Date time” column. If you edit the question and add a sample of your data. dput(head(CHEM_RESULTS)) it would be easier to help.
    – Dave2e
    Commented May 18 at 23:49
  • Thank you @Dave2e I changed to "-" I am not sure how to account for "T" do I enter it into the format ? I added the head of my data, discharge is the measurement on the right it has just formatted strange in stack overflow. Thanks so much for all your help ! Commented May 19 at 3:43

2 Answers 2

0

I've generated a similar data structure with Montjean station between 01-01-2022 and 12-31-2023 (source : GRDC).

First lines (head(data)):

# A tibble: 6 × 2
  `Date and Time`               `Discharge (cumec)`
  <chr>                                       <dbl>
1 2022-01-02T09:00:00.000+10:00               1928.
2 2022-01-03T09:00:00.000+10:00               2042.
3 2022-01-04T09:00:00.000+10:00               2161.
4 2022-01-05T09:00:00.000+10:00               2274.
5 2022-01-06T09:00:00.000+10:00               2227.
6 2022-01-07T09:00:00.000+10:00               2052.

To reproduce your error :

### Packages
library(dplyr)
library(lubridate)
library(ggplot2)

### Plot the graph without specifying breaks for the abscissa axis
ggplot(data, aes(x= `Date and Time`, y=`Discharge (cumec)`, group = 1)) +
  geom_line(color="powderblue", linewidth=1, alpha=0.9, linetype=1)

Output : Error

To fix this :

### Transform the first column to POSIX time :
data=data %>%
 mutate(`Date and Time`=ymd_hms(`Date and Time`,tz="Etc/GMT-10"))

### Plot the graph with ggplot2 with `scale_x_date_time` and `date_breaks`
ggplot(data, aes(x= `Date and Time`, y=`Discharge (cumec)`, group = 1)) +
  scale_x_datetime(date_breaks = "3 months", date_labels = "%b %Y",limits = c(min(data$`Date and Time`), max(data$`Date and Time`)), expand = c(0, 0)) +
  geom_line(color="powderblue", linewidth=1, alpha=0.9, linetype=1)

Output :

Fixed

0
0

Please update to the appropriate time zone.

CHEM_RESULTS <- structure(list(`Date and Time` = c("2024-03-05T00:00:01.000+10:00", 
                             "2024-03-05T00:35:01.000+10:00", "2024-03-05T01:00:01.000+10:00", 
                             "2024-03-05T01:30:01.000+10:00", "2024-03-05T02:00:01.000+10:00", 
                             "2024-03-05T03:00:01.000+10:00"), 
     `Discharge (cumec)` = c(3.202, 3.124, 3.04, 2.956, 2.919, 2.867)), 
      class = "data.frame", row.names = c(NA, -6L))


CHEM_RESULTS$`Date and Time`<- as.POSIXct(CHEM_RESULTS$`Date and Time`, "%Y-%m-%dT%H:%M:%S", tz="Etc/GMT-10")

ggplot(CHEM_RESULTS, aes(x= `Date and Time`, y=`Discharge (cumec)`,  group = 1)) +
        geom_line( color="powderblue", size=1, alpha=0.9, linetype=1)

Not the answer you're looking for? Browse other questions tagged or ask your own question.