The same way we can’t escape the influence of time, data can’t neither. When we work with air pollutant data, date is the variable that deal with time. If you define well this variable it’s much easier to make calculations (means, sd, etc) in different times intervals (i.e. seconds, hours, days, months, etc).
I usually define my date variable using as.POSIXct() rather than as.Date(), the biggest different that I notice is that the latter doesn’t deal with hours and seconds.
- From character to POSIXct
To tell R that you are working with date data just type:
date <- "01/02/2018 02:00" date <- as.POSIXct(strptime(date, format = "%d/%m/%Y %H:%M"), tz = "America/Sao_Paulo")
With strptime() we transform characters/strings to POSIXct format (“%Y-%m-%d %H:%M), and with as.POSIXct we define it as POSIXct variable. In this example, with format argument of strptime, we tell R how year, day, month, etc is ordered in the string. I think is important to define the time zone (tz), It will be later useful to change time zones.
To look for tz zone code names check this.
- A sequence of dates
To create a sequence of dates we combine seq() and as.POSIXct() function:
start <- as.POSIXct('2018-01-01 00:00', tz = 'GMT') end <- as.POSIXct('2018-12-31 23:00', tz = 'GMT') # By day dayly <- seq(start, end, by = 'day') # Only 24 hours hourly <- seq(start. by = 'hour', length.out = 24)
‘by = ‘ can be hour, min, month, sec and it could also be ’30 mins’, ‘2 hours’, etc.
- Calculation with dates: How long does it last?
We can use difftime() to answer this question. For example:
start <- as.POSIXct('2018-06-20 17:17') end <- as.POSIXct('2018-06-22 01:36') run.time <- difftime(end, start, units = 'secs') # To float run.time <- as.numeric(run.time)
‘units=’ can be secs, days, hours, mins and even weeks
- Calculation with dates: Different times intervals calculation
In this post, we examined how to do different calculations using aggregate(). Here, I present other example:
# Creating a data frame start <- as.POSIXct('2018-01-01 00:00') end <- as.POSIXct('2018-12-31 23:00') dates <- seq(start, end, by = 'hours') o3 <- runif(length(dates)) * 100 df <- data.frame(dates = dates, o3 = o3) # Calculating daily mean daily.mean <- aggregate(df['o3'], format(df['date'], '%Y-%j'), mean, na.rm = T) daily.mean$date <- seq(min(df$date), max(df$date), by = 'day') ## We fix the format of date
To calculate monthly mean just change in format argument from’%Y-%j’ to ‘%m’. Other important stuff is that mean can be other functions like sd(), max(), min(), sum(), etc, or even a function that you created. In OpenaAir manual they present a nice table of different date-time format for calculations:
Format | Function |
---|---|
%Y | Annual means |
%m | Monthly means |
%Y-%m | Monthly averages for whole series |
%Y-%j | Daily averages for whole series |
%Y-%W | weekly averages for whole series |
%w-%H | Day of week – hour of day |
Once you got the date data as POSIXct you can do this calculations easier with openair package by using the function timeAverage().
library(openair) daily <- timeAverage(df, avg.time = 'day', statistic = 'mean')
- Changing time zone
I find the answer here. We need to change the attribute tzone of the object:
start <- as.POSIXct('2018-01-01 00:00', tz = 'GMT') attributes(start)$tzone <- 'America/Sao_Paulo' # Now start is "2017-12-31 22:00:00 -02"
3 thoughts on “Working with dates in R”