Working with dates in R

The same way we can’t escape the influence of time, data can’t neither. When we work with air pollutant data, date is the variable that deal with time. If you define well this variable it’s much easier to make calculations (means, sd, etc) in different times intervals (i.e. seconds, hours, days, months, etc).

I usually define my date variable using as.POSIXct() rather than as.Date(), the biggest different that I notice is that the latter doesn’t deal with hours and seconds.

  • From character to POSIXct

To tell R that you are working with date data just type:


date <- "01/02/2018 02:00"

date <- as.POSIXct(strptime(date, format = "%d/%m/%Y %H:%M"), tz = "America/Sao_Paulo")

With strptime() we transform characters/strings to POSIXct format (“%Y-%m-%d %H:%M), and with as.POSIXct we define it as POSIXct variable. In this example, with format argument of strptime, we tell R how year, day, month, etc is ordered in the string. I think is important to define the time zone (tz), It will be later useful to change time zones.

To look for tz zone code names check this.

  • A sequence of dates

To create a sequence of dates we combine seq() and as.POSIXct() function:


start <- as.POSIXct('2018-01-01 00:00', tz = 'GMT')

end <- as.POSIXct('2018-12-31 23:00', tz = 'GMT')

# By day

dayly <- seq(start, end, by = 'day')

# Only 24 hours

hourly <- seq(start. by = 'hour', length.out = 24)

‘by = ‘ can be hour, min, month, sec and it could also be ’30 mins’, ‘2 hours’, etc.

  • Calculation with dates: How long does it last?

We can use difftime() to answer this question. For example:


start <-  as.POSIXct('2018-06-20 17:17')

end  <-  as.POSIXct('2018-06-22 01:36')

run.time <- difftime(end, start, units = 'secs')

# To float

run.time <- as.numeric(run.time)

‘units=’ can be secs, days, hours, mins and even weeks

  • Calculation with dates: Different times intervals calculation

In this post, we examined how to do different calculations using aggregate(). Here, I present other example:


# Creating a data frame

start <- as.POSIXct('2018-01-01 00:00')

end  <- as.POSIXct('2018-12-31 23:00')

dates <- seq(start, end, by = 'hours')

o3     <- runif(length(dates)) * 100

df <- data.frame(dates = dates, o3 = o3)

# Calculating daily mean

daily.mean <- aggregate(df['o3'], format(df['date'], '%Y-%j'), mean, na.rm = T)

daily.mean$date <- seq(min(df$date), max(df$date), by = 'day')  ## We fix the format of date

To calculate monthly mean just change in format argument from’%Y-%j’ to ‘%m’. Other important stuff is that mean can be other functions like sd(), max(), min(), sum(), etc, or even a function that you created. In OpenaAir manual they present a nice table of different date-time format for calculations:

Format Function
%Y Annual means
%m Monthly means
%Y-%m Monthly averages for whole series
%Y-%j Daily averages for whole series
%Y-%W weekly averages for whole series
%w-%H Day of week – hour of day

Once you got the date data as POSIXct you can do this calculations easier with openair package by using the function timeAverage().


library(openair)

daily <- timeAverage(df, avg.time = 'day', statistic = 'mean')

  • Changing time zone

I find the answer here. We need to change the attribute tzone of the object:


start <- as.POSIXct('2018-01-01 00:00', tz = 'GMT')

attributes(start)$tzone <- 'America/Sao_Paulo'

# Now start is  "2017-12-31 22:00:00 -02"

3 thoughts on “Working with dates in R

Leave a comment