Functions: 24 hour mean starting in a different time hour

This is the problem: For any particular reason, you need to calculate a 24 hour mean from hourly concentration data set, but it’s not a daily mean, it’s not a mean from 00:00 to 23:00; instead,  an average that starts at 2017-05-01 12:00 and end at 2017-05-02 11:00, and it goes on till the end of the month. I tried to used timeAverage from Openair package, but it gave a daily mean. So, I decided to do it “by hand”, to make a function. (I’m not quite sure if this is the best/fastest way to do it , but at least it works!)

First, we create some fictional data


set.seed(111)

df <- data.frame(date = seq(as.POSIXct("2017-05-01 00:00"), by = "hour", length.out = 744),

                 o3 = runif(744, 1, 160))

 

The algorithm is the following:  we create another column that contains only the date hour as character. Then we use this new column to get the index (row number) of each date that has “12:00:00”. Finally, we use these indexes to create data blocks with 24 elements (24 hours), and calculate the mean of each block with a loop.


df$char <- format(df$date, "%H")
idx <- which(df$char == "12")
ans <- data.frame(date = df$date[idx], mean = NA)

for (i in seq(1, length(idx) - 1)){
   ans$mean[i] <- mean(df$o3[idx[i] : (idx[i] + 23)], na.rm = T)
}

The "length(idx) – 1" in the loop is to avoid an error, it won't calculate the mean with 12 elements.

WARNING: This script only works if you got your data complete and date in as.POSIXct format, for more information look this and this.

If we got another set of data frames, like other pollutants, we could edit the script and replace the name of the data frame and the name of the column (i.e. df1$no2), and run the script. Or we could create different loops for each df. To avoid repeat yourself, we can create a function.

Now that we got the recipe (aka the algorithm), it’s quite easy. We will create a function named MeanDiffTime, that receive a data frame, with columns name “date” and “con” (concentration) and returns a 24 hour mean starting from 12:00.


MeanDiffTime <- function(df){

   df$char <- format(df$date, "%H")

   idx <- which(df$char == "12")

   ans <- data.frame(date = df$date[idx], mean = NA)

   for (i in seq(1, length(idx) -  1) ){

      ans$mean[i] <- mean(df$con[idx[i] : (idx[i] + 23)], na.rm = T)

   }
   return(ans)
}

 

To test it, let's create new fictional data

set.seed(111)
o3 <- data.frame(date = seq(as.POSIXct("2017-05-01 00:00"), by = "hour", length.out = 744),
con = runif(744, 1, 160))

pm10 <- data.frame(date = seq(as.POSIXct("2017-06-01 00:00"),by = "hour", length.out = 720),
con = runif(720, 30, 200))

no2 <- data.frame(date = seq(as.POSIXct("2017-01-01 00:00"), by = "hour", length.out = 720),
con = runif(720, 10, 230))

o3.mean <- MeanDiffTime(o3)
pm10.mean <- MeanDiffTime(pm10)
no2.mean <- MeanDiffTime(no2)

 

Leave a comment