In this demo we will introduce the main control structures (`if`

, `for`

, `while`

, etc.) used in R and then we will apply them to build a simple weather generator to create artificial precipitation amount series. To this aim we will use the output of some loading functions of the R package `downscaleR`

, that we are using during the next days in the SPECS Workshop.

Control structures in R allow you to control the flow of execution of the program, depending on runtime conditions. Common structures are

`if`

,`else`

: testing a condition`for`

: execute a loop a fixed number of times`while`

: execute a loop while a condition is true`repeat`

: execute an infinite loop`break`

: break the execution of a loop`next`

: skip an interation of a loop`return`

: exit a function`try`

,`catch`

: error handling and recovery

Most control structures are not used in interactive sessions, but rather when writing functions or longer expresisons.

`if`

```
if(condition1) {
## do something
}
```

If the condition is `FALSE`

we can include an `else`

clause to do other alternatives instructions.

```
if(condition) {
## do something
} else {
## do something else
}´
```

We could nest consecutive conditions:

```
if(condition1) {
## do something
} else if(condition2) {
## do something different
} else {
## do something else
}
```

This is a valid `if/else`

structure.

```
x <- 5
if(x > 3) {
y <- 10
} else {
y <- 0
}
```

We could display the `y`

value:

`str(y)`

`## num 10`

So is this one.

```
x <- 2
y <- if(x > 3) {
10
} else {
0
}
```

We could display the `y`

value:

`str(y)`

`## num 0`

`for`

`for`

loops take an interator variable and assign it successive values from a sequence or vector. `for`

loops are most commonly used for iterating over the elements of an object (`list`

, `vector`

, etc.).

```
for(i in 1:10) {
print(i)
}
```

This loop takes the `i`

variable and in each iteration of the loop gives it values 1, 2, 3, …, 10, and then exits.

These three loops have the same behavior.

```
x <- c("a", "b", "c", "d")
for(i in 1:4) {
print(x[i])
}
```

```
## [1] "a"
## [1] "b"
## [1] "c"
## [1] "d"
```

```
for(i in seq_along(x)) {
print(x[i])
}
for(letter in x) {
print(letter)
}
```

Type `?seq_along`

for more information about sequence generation in R.

`for`

loops can be nested.

```
x <- matrix(1:6, 2, 3)
for(i in seq_len(nrow(x))) {
for(j in seq_len(ncol(x))) {
print(x[i, j])
}
}
```

```
## [1] 1
## [1] 3
## [1] 5
## [1] 2
## [1] 4
## [1] 6
```

Be careful with nesting though. Nesting beyond 2-3 levels is often very dificult to read/understand.

`while`

`while`

loops begin by testing a condition. If it is true, then they execute the loop body. Once the loop body is executed, the condition is tested again, and so forth.

```
count <- 0
while(count < 10) {
print(count)
count <- count + 1
}
```

```
## [1] 0
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
```

`while`

loops can potentially result in infinite loops if not written properly. Use with care! Sometimes there will be more than one condition in the test.

```
z <- 5
while(z >= 3 && z <= 10) {
print(z)
coin <- rbinom(1, 1, 0.5)
if(coin == 1) { ## random walk
z <- z + 1
} else {
z <- z - 1
}
}
```

```
## [1] 5
## [1] 6
## [1] 7
## [1] 6
## [1] 5
## [1] 4
## [1] 3
```

Conditions are always evaluated from left to right. Type `?rbinom`

for more information about the generation of random number with a specific theoretical distribution.

`try`

These functions provide a mechanism for handling unusual conditions, including errors and warnings. Type `?tryCatch`

for more details. This function evaluates its expression argument and, then, depending on the result obtained it apply the different expressions included for each case:

`Warning`

:

```
x <- 4
y1 <- tryCatch(sqrt(x), warning = function(w){1+sqrt(abs(x))})
x <- -4
y2 <- tryCatch(sqrt(x), warning = function(w){1+sqrt(abs(x))})
```

Display the results

`c(str(y1),str(y2))`

```
## num 2
## num 3
```

`## NULL`

We could include a final instruction or message:

```
x <- -4
y <- tryCatch(sqrt(x), warning = function(w){1+sqrt(abs(x))}, finally = print("x <=0") )
```

`## [1] "x <=0"`

`str(y)`

`## num 3`

`error`

:

```
x1 <- matrix(data = runif(20, min = 0, max = 1), nrow = 4, ncol = 5)
x2 <- matrix(data = runif(20, min = 0, max = 1), nrow = 5, ncol = 4)
y <- tryCatch(x1*x2, error = function(e){x1*t(x2)})
```

Display the results

`str(y)`

`## num [1:4, 1:5] 0.00647 0.29827 0.53995 0.25926 0.23712 ...`

We could include a final instruction or message:

`y <- tryCatch(x1*x2, error = function(e){x1*t(x2)}, finally = print("x2 has been transposed."))`

`## [1] "x2 has been transposed."`

This code is equivalent to:

```
y <- tryCatch({
x1*x2
}, error = function(e){
x1*t(x2)
}, finally = {
print("The second matrix has been transposed.")
}
)
```

`## [1] "The second matrix has been transposed."`

`str(y)`

`## num [1:4, 1:5] 0.00647 0.29827 0.53995 0.25926 0.23712 ...`

`repeat`

initiates an infinite loop; these are not commonly used in statistical applications but they do have their uses. The only way to exit a`repeat`

loop is to call`break`

.

```
x0 <- 1
tol <- 1e-3
repeat {
x1 <- runif(1, min = 0, max = 1)
if(abs(x1 - x0) < tol) {
break # If x0 and x1 are close enough (< tol) exit of the loop with break
} else {
x0 <- x1
}
}
str(x0)
```

`## num 0.894`

The loop in the previous slide is a bit dangerous because there’s no guarantee it will stop. Better to set a hard limit on the number of iterations (e.g. using a for loop) and then report whether convergence was achieved or not.

`next`

is used to skip an iteration of a loop.

```
for(i in 1:10) {
if(i <= 9) {
## Skip the first 9 iterations
next
}
## Do something here
print(i)
}
```

`## [1] 10`

`return`

signals that a function should exit and return a given value.

- Control structures like
`if`

,`while`

, and`for`

allow you to control the flow of an R program. - Infinite loops should generally be avoided, even if they are theoretically correct.
- Control structures mentiond here are primarily useful for writing programs; for command-line interactive work, the *apply functions are more useful. Type
`? apply`

for more information about these functions. - When possible, avoid loops in favor of matrix mainpulations which are computationally most efficient.

In the previous section, we have used some functions which simulate (pseudo)random numbers from different families of distributions (uniform, exponential, normal, etc.). This is a key task for several computational statistics problems.

`? distributions`

`## starting httpd help server ... done`

For these families, R includes functions to directly simulate random series (`runif`

, `rexp`

, `rnorm`

, etc.). For example,

`runif(365, min = 0, max = 1)`

Simulates a series of 365 uniform numbers (between 0 and 1). This could be used, for instance, to generate random probabilities.

Weather generators are computational models which generate (daily) synthetic series of random values simulating the distribution of a particular meteorological variable. Precipitation is the most popular example due to its mixed character: discrete (dry or wet) and continuous (rain amount for wet days). The discrete part is typically characterized by the frequency of wet days (say, e.g. 0.4 in Santander). Therefore, we could simulate a random series of dry/wet (0/1) values for Santander:

```
# Parameters:
# We consider a year
ndays <- 365
# Probability of dry days occurrence for Santander.
probDry <- 0.6
# Random values with uniform distribution in the [0,1] interval
u <- runif(ndays, min = 0, max = 1)
par(mfrow = c(1,1))
plot(u, type = "p", col = "black", ylim = c(0,1))
lines(c(0,ndays),c(probDry,probDry), col = "red")
```

The days under the red line will be considered dry days.

```
dry <- rep(NaN, ndays)
for (k in 1:ndays){
if (u[k] < probDry) {
dry[k] <- 1
}else {
dry[k] <- 0
}
}
```

In the previous series we have not taken into account the autocorrelation existing in a real rain series. This autocorrelation structure can be characterized by the probability of observing a rain day after a rainy/wet day (transition probabilities).

```
# For example:
probDry2Wet <- 0.3 # Probability of observing a rain day after a dry day
probWet2Wet <- 0.5 # Probability of observing a rain day after a wet day
```

We will modify the previously generated `dry`

vector considering the autocorrelation.

```
dryWet <- dry
for (k in 2:ndays){
if (u[k-1] < probDry) {
if (u[k] < probDry2Wet){
dryWet[k] <- 0
}else {
dryWet[k] <- 1
}
}else {
if (u[k] < probWet2Wet){
dryWet[k] <- 0
}else {
dryWet[k] <- 1
}
}
}
```

Type `dry == dryWet`

to compare both synthetic series, with and without autocorrelation.

we will repeat the previous experiment but with real precipitation series from Spain. To this aim, the predefined data should be loaded:

```
# Load the data
load(url("http://meteo.unican.es/work/downscaler/aux/iberiaGSN_precip.rda"), verbose = TRUE)
```

```
## Loading objects:
## station.data
```

`str(station.data)`

```
## List of 5
## $ Variable:List of 1
## ..$ varName: chr "precip"
## $ Data : num [1:12419, 1:6] 0.9 0 1.4 0.2 1.2 0 0.2 0.1 1.2 2.7 ...
## ..- attr(*, "dimensions")= chr [1:2] "time" "station"
## $ xyCoords: num [1:6, 1:2] -2.04 2.07 -5.5 -4.01 -1.86 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:6] "SP000008027" "SP000008181" "SP000008202" "SP000008215" ...
## .. ..$ : chr [1:2] "longitude" "latitude"
## $ Dates :List of 2
## ..$ start: chr [1:12419] "1979-01-01 00:00:00 GMT" "1979-01-02 00:00:00 GMT" "1979-01-03 00:00:00 GMT" "1979-01-04 00:00:00 GMT" ...
## ..$ end : chr [1:12419] "1979-01-02 00:00:00 GMT" "1979-01-03 00:00:00 GMT" "1979-01-04 00:00:00 GMT" "1979-01-05 00:00:00 GMT" ...
## $ Metadata:List of 5
## ..$ station_id : chr [1:6] "SP000008027" "SP000008181" "SP000008202" "SP000008215" ...
## ..$ altitude : int [1:6] 251 4 790 1894 704 90
## ..$ location : chr [1:6] "SAN SEBASTIAN - IGUELDO" "BARCELONA/AEROPUERTO" "SALAMANCA AEROPUERTO" "NAVACERRADA" ...
## ..$ WMO_Id : int [1:6] 8027 8181 8202 8215 8280 8410
## ..$ Koppen.class: chr [1:6] "Cfb" "Csa" "BSk" "Csb" ...
```

In this case, this is a type of object returned by downscaleR including the stations belonging the GSN Network over the Iberian Peninsula. Note that we should define a threshold to identify the dry/wet days.

```
# Useful parameters
Nsta <- dim(station.data$Data)[2] # Number of stations
Ndays <- dim(station.data$Data)[1] # Number of days
pr.threshold <- 1 # We define as dry days with precipitation below than 1mm.
```

In the previous case, the probability was predefined. However, in this case we should estimate the needed probabilities. In particular, the probability of dry days and of observing a rain/dry day after a dry day.

```
probDry <- apply(matrix(as.numeric(station.data$Data < pr.threshold
& !is.na(station.data$Data)), nrow = Ndays, ncol = Nsta),
FUN = mean, MARGIN = 2, na.rm = TRUE)
probDry2Wet <- apply(matrix(as.numeric(station.data$Data[1:(Ndays-1),] < pr.threshold
& !is.na(station.data$Data[1:(Ndays-1),])), nrow = Ndays-1, ncol = Nsta)
*matrix(as.numeric(station.data$Data[2:Ndays,] >= pr.threshold
& !is.na(station.data$Data[2:Ndays,])), nrow = Ndays-1, ncol = Nsta),
FUN = mean, MARGIN = 2, na.rm = TRUE)
probWet2Wet <- apply(matrix(as.numeric(station.data$Data[1:(Ndays-1),] >= pr.threshold
& !is.na(station.data$Data[1:(Ndays-1),])), nrow = Ndays-1, ncol = Nsta)
*matrix(as.numeric(station.data$Data[2:Ndays,] >= pr.threshold
& !is.na(station.data$Data[2:Ndays,])), nrow = Ndays-1, ncol = Nsta),
FUN = mean, MARGIN = 2, na.rm = TRUE)
```

Note that we should take into account that the observations could have missing data. We could estimate also the probabilities of wet days and the transitions to dry days: dry -> dry and wet -> dry

```
probWet <- apply(matrix(as.numeric(station.data$Data >= pr.threshold
& !is.na(station.data$Data)), nrow = Ndays, ncol = Nsta),
FUN = mean, MARGIN = 2, na.rm = TRUE)
probDry2Dry <- apply(matrix(as.numeric(station.data$Data[1:(Ndays-1),] < pr.threshold
& !is.na(station.data$Data[1:(Ndays-1),])), nrow = Ndays-1, ncol = Nsta)
*matrix(as.numeric(station.data$Data[2:Ndays,] < pr.threshold
& !is.na(station.data$Data[2:Ndays,])), nrow = Ndays-1, ncol = Nsta),
FUN = mean, MARGIN = 2, na.rm = TRUE)
probWet2Dry <- apply(matrix(as.numeric(station.data$Data[1:(Ndays-1),] >= pr.threshold
& !is.na(station.data$Data[1:(Ndays-1),])), nrow = Ndays-1, ncol = Nsta)
*matrix(as.numeric(station.data$Data[2:Ndays,] < pr.threshold
& !is.na(station.data$Data[2:Ndays,])), nrow = Ndays-1, ncol = Nsta),
FUN = mean, MARGIN = 2, na.rm = TRUE)
```

Once the probabilities have been estimated, we could simulate the occurrence of dry days

```
occuDry <- matrix(data = 0, nrow = Ndays, ncol = Nsta)
for (i in 1:Nsta){
u <- runif(Ndays) # uniform
for (k in 2:Ndays){
if (u[k-1] < probDry[i]) {
if (u[k] < probDry2Wet[i]){
occuDry[k,i] <- 0
}else {
occuDry[k,i] <- 1
}
}else {
if (u[k] < probWet2Wet[i]){
occuDry[k,i] <- 0
}else {
occuDry[k,i] <- 1
}
}
}
}
```

and the precipitation amount for the wet days (occuDry == 0). To this aim, we need to fit a theoretical distribution to the observed data for each station. The two distributions most commonly used to simulate precipitation are the Gamma and Exponential distributions. We should load the library `MASS`

and use the `fitdistr`

function.

`library(MASS)`

`## Warning: package 'MASS' was built under R version 3.1.1`

```
? fitdistr
# Using the gamma distribution:
amount <- matrix(data = 0, nrow = Ndays, ncol = Nsta)
for (i in 1:Nsta){
indWet <- which(station.data$Data[,i] >= pr.threshold & !is.na(station.data$Data[,i]))
auxGamma <- fitdistr(station.data$Data[indWet,i],"gamma")
indWet <- which(occuDry[,i] != 1)
amount[indWet,i] <- rgamma(length(indWet), auxGamma$estimate[1], rate = auxGamma$estimate[2])
}
```

`## Warning: NaNs produced`

```
# Using the exponential distribution:
amount1 <- matrix(data = 0, nrow = Ndays, ncol = Nsta)
for (i in 1:Nsta){
indWet <- which(station.data$Data[,i] >= pr.threshold & !is.na(station.data$Data[,i]))
auxExp <- fitdistr(station.data$Data[indWet,i],"exponential")
indWet <- which(occuDry[,i] != 1)
amount1[indWet,i] <- rexp(length(indWet), rate = auxExp$estimate[1])
}
# We compare the simulated and observed time series
par(mfrow = c(2,3))
plot(station.data$Data[,5], type = "l", col = "black", ylim = c(0,100))
plot(amount[,5], type = "l", col = "black", ylim = c(0,100))
plot(amount1[,5], type = "l", col = "black", ylim = c(0,100))
indWet <- which(station.data$Data[,5] >= pr.threshold)
hist(station.data$Data[indWet,5], breaks = 150)
indWet <- which(occuDry[,5] != 1)
hist(amount[indWet,5], breaks = 150)
hist(amount1[indWet,5], breaks = 150)
```

Although most of the material included in this tutorial is original, some of the is based on the material of the course “Computing for Data Analysis” from Coursera Platform.