Introduction

In this demo we will introduce the main control structures (if, for, while, etc.) used in R and then we will apply them to build a simple weather generator to create artificial precipitation amount series. To this aim we will use the output of some loading functions of the R package downscaleR, that we are using during the next days in the SPECS Workshop.

Control structures in R

Control structures in R allow you to control the flow of execution of the program, depending on runtime conditions. Common structures are

Most control structures are not used in interactive sessions, but rather when writing functions or longer expresisons.

Control structures: if

if(condition1) {
   ## do something
}

If the condition is FALSE we can include an else clause to do other alternatives instructions.

if(condition) {
   ## do something
} else {
   ## do something else
}´

We could nest consecutive conditions:

if(condition1) {
   ## do something
} else if(condition2) {
   ## do something different
} else {
   ## do something else
}

This is a valid if/else structure.

x <- 5
if(x > 3) {
      y <- 10
} else {
      y <- 0
}

We could display the y value:

str(y)
##  num 10

So is this one.

x <- 2
y <- if(x > 3) {
      10
} else {
      0
}

We could display the y value:

str(y)
##  num 0

Control structures: for

for loops take an interator variable and assign it successive values from a sequence or vector. for loops are most commonly used for iterating over the elements of an object (list, vector, etc.).

for(i in 1:10) {
      print(i)
}

This loop takes the i variable and in each iteration of the loop gives it values 1, 2, 3, …, 10, and then exits.

These three loops have the same behavior.

x <- c("a", "b", "c", "d")
for(i in 1:4) {
      print(x[i])
}
## [1] "a"
## [1] "b"
## [1] "c"
## [1] "d"
for(i in seq_along(x)) {
      print(x[i])
}

for(letter in x) {
      print(letter)
}

Type ?seq_along for more information about sequence generation in R.

for loops can be nested.

x <- matrix(1:6, 2, 3)
for(i in seq_len(nrow(x))) {
      for(j in seq_len(ncol(x))) {
            print(x[i, j])
      }
}
## [1] 1
## [1] 3
## [1] 5
## [1] 2
## [1] 4
## [1] 6

Be careful with nesting though. Nesting beyond 2-3 levels is often very dificult to read/understand.

Control structures: while

while loops begin by testing a condition. If it is true, then they execute the loop body. Once the loop body is executed, the condition is tested again, and so forth.

count <- 0
while(count < 10) {
      print(count)
      count <- count + 1
}
## [1] 0
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9

while loops can potentially result in infinite loops if not written properly. Use with care! Sometimes there will be more than one condition in the test.

z <- 5
while(z >= 3 && z <= 10) {
      print(z)
      coin <- rbinom(1, 1, 0.5)
      if(coin == 1) { ## random walk
            z <- z + 1
      } else {
            z <- z - 1
      }
}
## [1] 5
## [1] 6
## [1] 7
## [1] 6
## [1] 5
## [1] 4
## [1] 3

Conditions are always evaluated from left to right. Type ?rbinom for more information about the generation of random number with a specific theoretical distribution.

Control structures: try

These functions provide a mechanism for handling unusual conditions, including errors and warnings. Type ?tryCatch for more details. This function evaluates its expression argument and, then, depending on the result obtained it apply the different expressions included for each case:

  • Warning:
x <- 4
y1 <- tryCatch(sqrt(x), warning = function(w){1+sqrt(abs(x))})
x <- -4
y2 <- tryCatch(sqrt(x), warning = function(w){1+sqrt(abs(x))})

Display the results

c(str(y1),str(y2))
##  num 2
##  num 3
## NULL

We could include a final instruction or message:

x <- -4
y <- tryCatch(sqrt(x), warning = function(w){1+sqrt(abs(x))}, finally = print("x <=0") )
## [1] "x <=0"
str(y)
##  num 3
  • error:
x1 <- matrix(data = runif(20, min = 0, max = 1), nrow = 4, ncol = 5)
x2 <- matrix(data = runif(20, min = 0, max = 1), nrow = 5, ncol = 4)
y <- tryCatch(x1*x2, error = function(e){x1*t(x2)})

Display the results

str(y)
##  num [1:4, 1:5] 0.00647 0.29827 0.53995 0.25926 0.23712 ...

We could include a final instruction or message:

y <- tryCatch(x1*x2, error = function(e){x1*t(x2)}, finally = print("x2 has been transposed."))
## [1] "x2 has been transposed."

This code is equivalent to:

y <- tryCatch({
      x1*x2
      }, error = function(e){
            x1*t(x2)
            }, finally = {
                  print("The second matrix has been transposed.")
                  }
      )
## [1] "The second matrix has been transposed."
str(y)
##  num [1:4, 1:5] 0.00647 0.29827 0.53995 0.25926 0.23712 ...

Other control structures

  • repeat initiates an infinite loop; these are not commonly used in statistical applications but they do have their uses. The only way to exit a repeat loop is to call break.
x0 <- 1
tol <- 1e-3
repeat {
      x1 <- runif(1, min = 0, max = 1)
      if(abs(x1 - x0) < tol) {
            break # If x0 and x1 are close enough (< tol) exit of the loop with break
      } else {
            x0 <- x1
      }
}
str(x0)
##  num 0.894

The loop in the previous slide is a bit dangerous because there’s no guarantee it will stop. Better to set a hard limit on the number of iterations (e.g. using a for loop) and then report whether convergence was achieved or not.

  • next is used to skip an iteration of a loop.
for(i in 1:10) {
      if(i <= 9) {
            ## Skip the first 9 iterations
            next
      }
      ## Do something here
      print(i)
}
## [1] 10
  • return signals that a function should exit and return a given value.

Control structures: Summary

  • Control structures like if, while, and for allow you to control the flow of an R program.
  • Infinite loops should generally be avoided, even if they are theoretically correct.
  • Control structures mentiond here are primarily useful for writing programs; for command-line interactive work, the *apply functions are more useful. Type ? apply for more information about these functions.
  • When possible, avoid loops in favor of matrix mainpulations which are computationally most efficient.

Demo: Building a weather generator

In the previous section, we have used some functions which simulate (pseudo)random numbers from different families of distributions (uniform, exponential, normal, etc.). This is a key task for several computational statistics problems.

? distributions
## starting httpd help server ... done

For these families, R includes functions to directly simulate random series (runif, rexp, rnorm, etc.). For example,

runif(365, min = 0, max = 1)

Simulates a series of 365 uniform numbers (between 0 and 1). This could be used, for instance, to generate random probabilities.

Precipitation occurrence in Santander

Weather generators are computational models which generate (daily) synthetic series of random values simulating the distribution of a particular meteorological variable. Precipitation is the most popular example due to its mixed character: discrete (dry or wet) and continuous (rain amount for wet days). The discrete part is typically characterized by the frequency of wet days (say, e.g. 0.4 in Santander). Therefore, we could simulate a random series of dry/wet (0/1) values for Santander:

# Parameters:
 # We consider a year
ndays <- 365
# Probability of dry days occurrence for Santander.
probDry <- 0.6
# Random values with uniform distribution in the [0,1] interval
u <- runif(ndays, min = 0, max = 1)
par(mfrow = c(1,1))
plot(u, type = "p", col = "black", ylim = c(0,1))
lines(c(0,ndays),c(probDry,probDry), col = "red")

plot of chunk unnamed-chunk-26

The days under the red line will be considered dry days.

dry <- rep(NaN, ndays)
for (k in 1:ndays){
      if (u[k] < probDry) {
            dry[k] <- 1
      }else {
            dry[k] <- 0 
      }
}

In the previous series we have not taken into account the autocorrelation existing in a real rain series. This autocorrelation structure can be characterized by the probability of observing a rain day after a rainy/wet day (transition probabilities).

# For example:
probDry2Wet <- 0.3 # Probability of observing a rain day after a dry day
probWet2Wet <- 0.5 # Probability of observing a rain day after a wet day

We will modify the previously generated dry vector considering the autocorrelation.

dryWet <- dry
for (k in 2:ndays){
      if (u[k-1] < probDry) {
            if (u[k] < probDry2Wet){
                  dryWet[k] <- 0
            }else {
                  dryWet[k] <- 1
            }
      }else {
            if (u[k] < probWet2Wet){
                  dryWet[k] <- 0
            }else {
                  dryWet[k] <- 1
            }
      }
}

Type dry == dryWet to compare both synthetic series, with and without autocorrelation.

Simulating the precipitation in Iberia

we will repeat the previous experiment but with real precipitation series from Spain. To this aim, the predefined data should be loaded:

# Load the data
load(url("http://meteo.unican.es/work/downscaler/aux/iberiaGSN_precip.rda"), verbose = TRUE)
## Loading objects:
##   station.data
str(station.data)
## List of 5
##  $ Variable:List of 1
##   ..$ varName: chr "precip"
##  $ Data    : num [1:12419, 1:6] 0.9 0 1.4 0.2 1.2 0 0.2 0.1 1.2 2.7 ...
##   ..- attr(*, "dimensions")= chr [1:2] "time" "station"
##  $ xyCoords: num [1:6, 1:2] -2.04 2.07 -5.5 -4.01 -1.86 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:6] "SP000008027" "SP000008181" "SP000008202" "SP000008215" ...
##   .. ..$ : chr [1:2] "longitude" "latitude"
##  $ Dates   :List of 2
##   ..$ start: chr [1:12419] "1979-01-01 00:00:00 GMT" "1979-01-02 00:00:00 GMT" "1979-01-03 00:00:00 GMT" "1979-01-04 00:00:00 GMT" ...
##   ..$ end  : chr [1:12419] "1979-01-02 00:00:00 GMT" "1979-01-03 00:00:00 GMT" "1979-01-04 00:00:00 GMT" "1979-01-05 00:00:00 GMT" ...
##  $ Metadata:List of 5
##   ..$ station_id  : chr [1:6] "SP000008027" "SP000008181" "SP000008202" "SP000008215" ...
##   ..$ altitude    : int [1:6] 251 4 790 1894 704 90
##   ..$ location    : chr [1:6] "SAN SEBASTIAN - IGUELDO" "BARCELONA/AEROPUERTO" "SALAMANCA AEROPUERTO" "NAVACERRADA" ...
##   ..$ WMO_Id      : int [1:6] 8027 8181 8202 8215 8280 8410
##   ..$ Koppen.class: chr [1:6] "Cfb" "Csa" "BSk" "Csb" ...

In this case, this is a type of object returned by downscaleR including the stations belonging the GSN Network over the Iberian Peninsula. Note that we should define a threshold to identify the dry/wet days.

# Useful parameters
Nsta <- dim(station.data$Data)[2] # Number of stations
Ndays <- dim(station.data$Data)[1] # Number of days
pr.threshold <- 1 # We define as dry days with precipitation below than 1mm.

In the previous case, the probability was predefined. However, in this case we should estimate the needed probabilities. In particular, the probability of dry days and of observing a rain/dry day after a dry day.

probDry <- apply(matrix(as.numeric(station.data$Data < pr.threshold 
                     & !is.na(station.data$Data)), nrow = Ndays, ncol = Nsta), 
                 FUN = mean, MARGIN = 2, na.rm = TRUE)
probDry2Wet <- apply(matrix(as.numeric(station.data$Data[1:(Ndays-1),] < pr.threshold 
                     & !is.na(station.data$Data[1:(Ndays-1),])), nrow = Ndays-1, ncol = Nsta)
               *matrix(as.numeric(station.data$Data[2:Ndays,] >= pr.threshold 
                     & !is.na(station.data$Data[2:Ndays,])), nrow = Ndays-1, ncol = Nsta), 
               FUN = mean, MARGIN = 2, na.rm = TRUE)
probWet2Wet <- apply(matrix(as.numeric(station.data$Data[1:(Ndays-1),] >= pr.threshold 
                     & !is.na(station.data$Data[1:(Ndays-1),])), nrow = Ndays-1, ncol = Nsta)
               *matrix(as.numeric(station.data$Data[2:Ndays,] >= pr.threshold 
                     & !is.na(station.data$Data[2:Ndays,])), nrow = Ndays-1, ncol = Nsta), 
               FUN = mean, MARGIN = 2, na.rm = TRUE)

Note that we should take into account that the observations could have missing data. We could estimate also the probabilities of wet days and the transitions to dry days: dry -> dry and wet -> dry

probWet <- apply(matrix(as.numeric(station.data$Data >= pr.threshold 
                     & !is.na(station.data$Data)), nrow = Ndays, ncol = Nsta), 
                 FUN = mean, MARGIN = 2, na.rm = TRUE)
probDry2Dry <- apply(matrix(as.numeric(station.data$Data[1:(Ndays-1),] < pr.threshold 
                     & !is.na(station.data$Data[1:(Ndays-1),])), nrow = Ndays-1, ncol = Nsta)
               *matrix(as.numeric(station.data$Data[2:Ndays,] < pr.threshold 
                     & !is.na(station.data$Data[2:Ndays,])), nrow = Ndays-1, ncol = Nsta), 
               FUN = mean, MARGIN = 2, na.rm = TRUE)
probWet2Dry <- apply(matrix(as.numeric(station.data$Data[1:(Ndays-1),] >= pr.threshold 
                     & !is.na(station.data$Data[1:(Ndays-1),])), nrow = Ndays-1, ncol = Nsta)
               *matrix(as.numeric(station.data$Data[2:Ndays,] < pr.threshold 
                     & !is.na(station.data$Data[2:Ndays,])), nrow = Ndays-1, ncol = Nsta), 
               FUN = mean, MARGIN = 2, na.rm = TRUE)

Once the probabilities have been estimated, we could simulate the occurrence of dry days

occuDry <- matrix(data = 0, nrow = Ndays, ncol = Nsta)
for (i in 1:Nsta){
      u <- runif(Ndays) # uniform
      for (k in 2:Ndays){
            if (u[k-1] < probDry[i]) {
                  if (u[k] < probDry2Wet[i]){
                        occuDry[k,i] <- 0
                  }else {
                        occuDry[k,i] <- 1
                  }
            }else {
                  if (u[k] < probWet2Wet[i]){
                        occuDry[k,i] <- 0
                  }else {
                        occuDry[k,i] <- 1
                  }
            }
      }
}

and the precipitation amount for the wet days (occuDry == 0). To this aim, we need to fit a theoretical distribution to the observed data for each station. The two distributions most commonly used to simulate precipitation are the Gamma and Exponential distributions. We should load the library MASS and use the fitdistr function.

library(MASS)
## Warning: package 'MASS' was built under R version 3.1.1
? fitdistr
# Using the gamma distribution:
amount <- matrix(data = 0, nrow = Ndays, ncol = Nsta)
for (i in 1:Nsta){
      indWet <- which(station.data$Data[,i] >= pr.threshold & !is.na(station.data$Data[,i]))
      auxGamma <- fitdistr(station.data$Data[indWet,i],"gamma")
      indWet <- which(occuDry[,i] != 1)
      amount[indWet,i] <- rgamma(length(indWet), auxGamma$estimate[1], rate = auxGamma$estimate[2])
}
## Warning: NaNs produced
# Using the exponential distribution:
amount1 <- matrix(data = 0, nrow = Ndays, ncol = Nsta)
for (i in 1:Nsta){
      indWet <- which(station.data$Data[,i] >= pr.threshold & !is.na(station.data$Data[,i]))
      auxExp <- fitdistr(station.data$Data[indWet,i],"exponential")
      indWet <- which(occuDry[,i] != 1)
      amount1[indWet,i] <- rexp(length(indWet), rate = auxExp$estimate[1])
}

# We compare the simulated and observed time series
par(mfrow = c(2,3))
plot(station.data$Data[,5], type = "l", col = "black", ylim = c(0,100))
plot(amount[,5], type = "l", col = "black", ylim = c(0,100))
plot(amount1[,5], type = "l", col = "black", ylim = c(0,100))
indWet <- which(station.data$Data[,5] >= pr.threshold)
hist(station.data$Data[indWet,5], breaks = 150)
indWet <- which(occuDry[,5] != 1)
hist(amount[indWet,5], breaks = 150)
hist(amount1[indWet,5], breaks = 150)

plot of chunk unnamed-chunk-36

Acknowledgments

Although most of the material included in this tutorial is original, some of the is based on the material of the course “Computing for Data Analysis” from Coursera Platform.