Introduction

In this demo we will load a R object containing climate data. In particular, the objects we are using are the output of some loading functions of the R package downscaleR, that we are using during the next days in the SPECS Workshop.

Loading a package

First of all, it is necessary to invoke the library downscaleR, whose data we are using. As previously indicated, many packages contain example data that is useful in order to illustrate examples or to show the user the characteristics or functionalities of the package they are using. After loading the package downscaleR that we have already installed, we will load one of such examples datasets:

library(downscaleR)
## Loading required package: rJava
## Loading required package: downscaleR.java
## NetCDF Java Library v4.3.22 (27 May 2014) loaded and ready
## Loading required package: maps
## downscaleR version 0.4-1 (06-Sep-2014) is loaded

Note that a number of messages appear when loading the library. Many packages include such startup messages, displaying different information deemed relevant for the user, or just as a confirmation that the library has been successfully loaded.

Loading an example dataset

Next, the command data is invoked in order to load an example dataset. In this case, this is a type of object returned by downscaleR that we call a field. However, we shall enter in details during the next sessions this week, and now we shall just concentrate on data structures and how to handle them.

data(iberia_ncep_psl, verbose = TRUE)
## name=iberia_ncep_psl:     found in Rdata.rds
# Note the verbose argument, which displays some information on screen

Remember that the command str is our best friend (excepting some particular situations) for exploring an object:

str(iberia_ncep_psl)      
## List of 4
##  $ Variable:List of 3
##   ..$ varName   : chr "psl"
##   ..$ isStandard: logi TRUE
##   ..$ level     : NULL
##  $ Data    : num [1:1805, 1:6, 1:10] 101095 100928 100760 100958 101388 ...
##   ..- attr(*, "dimensions")= chr [1:3] "time" "lat" "lon"
##  $ xyCoords:List of 2
##   ..$ x: num [1:10] -12.5 -10 -7.5 -5 -2.5 0 2.5 5 7.5 10
##   ..$ y: num [1:6] 35 37.5 40 42.5 45 47.5
##   ..- attr(*, "projection")= chr "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0"
##  $ Dates   :List of 2
##   ..$ start: chr [1:1805] "1990-12-01 00:00:00 GMT" "1990-12-02 00:00:00 GMT" "1990-12-03 00:00:00 GMT" "1990-12-04 00:00:00 GMT" ...
##   ..$ end  : chr [1:1805] "1990-12-02 00:00:00 GMT" "1990-12-03 00:00:00 GMT" "1990-12-04 00:00:00 GMT" "1990-12-05 00:00:00 GMT" ...
##  - attr(*, "dataset")= chr "NCEP"
##  - attr(*, "source")= chr "ECOMS User Data Gateway"
##  - attr(*, "URL")= chr "<http://meteo.unican.es/ecoms-udg>"

The output of str says that the object loaded is a list.

class(iberia_ncep_psl)
## [1] "list"
length(iberia_ncep_psl)
## [1] 4

As we already know, a list is a special R data structure, in the sense that it is the only one able to contain data of different classes. Next we will expore its different components:

Exploring the structure of an object

In this case, each element of the list is named, so we can retrieve them bu name using the $:

The first element of the list is names Variable. It contains information of the name of the variable (as a character), a logical indicating if the variable is standard (logical) and the vertical level of the variable. In this case, the element level is empty (it is assigned a NULL):

iberia_ncep_psl$Variable
## $varName
## [1] "psl"
## 
## $isStandard
## [1] TRUE
## 
## $level
## NULL

We get an identical output as above using the [[ symbol, as in iberia_ncep_psl[["Variable"]] and iberia_ncep_psl[[1]]

The next element is named Data, but we will focus on this at the end. Next, there is an element called xyCoords, which is in turn another list:

str(iberia_ncep_psl$xyCoords)
## List of 2
##  $ x: num [1:10] -12.5 -10 -7.5 -5 -2.5 0 2.5 5 7.5 10
##  $ y: num [1:6] 35 37.5 40 42.5 45 47.5
##  - attr(*, "projection")= chr "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0"

The element xyCoords contains the georeference information of the data loaded. It is composed of a named list of length 2, with components x and y, and it also has an attribute, that can be extracted as follows:

attributes(iberia_ncep_psl$xyCoords) # Note that the names of the list are another attribute
## $names
## [1] "x" "y"
## 
## $projection
## [1] "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0"
attributes(iberia_ncep_psl$xyCoords)$projection
## [1] "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0"

The attribute is a character string. We won’t enter now in details explaining its meaning. It is enough by now to know that it contains georeference information regarding the projection of the data.

Let’s have a quick look to the spatial domain of the object:

coordinate.grid <- expand.grid(iberia_ncep_psl$xyCoords$x, iberia_ncep_psl$xyCoords$y)
plot(coordinate.grid, asp = 1, col = "blue", xlab = "lon", ylab = "lat")
require(fields) # We load the library fields to add a world map
## Loading required package: fields
## Loading required package: spam
## Loading required package: grid
## Spam version 0.41-0 (2014-02-26) is loaded.
## Type 'help( Spam)' or 'demo( spam)' for a short introduction 
## and overview of this package.
## Help for individual functions is also obtained by adding the
## suffix '.spam' to the function name, e.g. 'help( chol.spam)'.
## 
## Attaching package: 'spam'
## 
## The following objects are masked from 'package:base':
## 
##     backsolve, forwardsolve
world(add = TRUE)

plot of chunk unnamed-chunk-9

We have also the time information, which is another list:

str(iberia_ncep_psl)
## List of 4
##  $ Variable:List of 3
##   ..$ varName   : chr "psl"
##   ..$ isStandard: logi TRUE
##   ..$ level     : NULL
##  $ Data    : num [1:1805, 1:6, 1:10] 101095 100928 100760 100958 101388 ...
##   ..- attr(*, "dimensions")= chr [1:3] "time" "lat" "lon"
##  $ xyCoords:List of 2
##   ..$ x: num [1:10] -12.5 -10 -7.5 -5 -2.5 0 2.5 5 7.5 10
##   ..$ y: num [1:6] 35 37.5 40 42.5 45 47.5
##   ..- attr(*, "projection")= chr "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0"
##  $ Dates   :List of 2
##   ..$ start: chr [1:1805] "1990-12-01 00:00:00 GMT" "1990-12-02 00:00:00 GMT" "1990-12-03 00:00:00 GMT" "1990-12-04 00:00:00 GMT" ...
##   ..$ end  : chr [1:1805] "1990-12-02 00:00:00 GMT" "1990-12-03 00:00:00 GMT" "1990-12-04 00:00:00 GMT" "1990-12-05 00:00:00 GMT" ...
##  - attr(*, "dataset")= chr "NCEP"
##  - attr(*, "source")= chr "ECOMS User Data Gateway"
##  - attr(*, "URL")= chr "<http://meteo.unican.es/ecoms-udg>"

In order to unequivocally define the time resolution of a variable, there are two vectors of dates instead of one, called start and end respectively. As their name indicate, the represent give the lower (inclusive) and upper (exclusive) bounds of the time interval for which each data verifies.

Note that the dates are stored as a character string. It is strightforward in this case to convert them to the POSIXlt time class in R in order to extract time information and perform operations on the data:

start <- as.POSIXlt(iberia_ncep_psl$Dates$start, tz = "GMT")
end <- as.POSIXlt(iberia_ncep_psl$Dates$end, tz = "GMT")      
end[1]-start[1] # Difference between start and end (daily data in this case)
## Time difference of 1 days
range(start) # This gives the period encompassed by the dataset (10 years)
## [1] "1990-12-01 GMT" "2010-02-28 GMT"
unique(start$year + 1900)
##  [1] 1990 1991 1992 1993 1994 1995 1996 1997 1998
## [10] 1999 2000 2001 2002 2003 2004 2005 2006 2007
## [19] 2008 2009 2010
season <- unique(start$mon + 1) # This is the season (DJF, winter)
season
## [1] 12  1  2
month.name[season]
## [1] "December" "January"  "February"

Handling data arrays

Until now we have seen that the object loaded contains variable information, geo-referencing information and time information, but.. Where are the data?

The data are stored in the element Data of the object:

str(iberia_ncep_psl$Data)
##  num [1:1805, 1:6, 1:10] 101095 100928 100760 100958 101388 ...
##  - attr(*, "dimensions")= chr [1:3] "time" "lat" "lon"
class(iberia_ncep_psl$Data)
## [1] "array"
attributes(iberia_ncep_psl$Data)
## $dim
## [1] 1805    6   10
## 
## $dimensions
## [1] "time" "lat"  "lon"

The object is 3-dimensional data array containing the dimensions time, lat and lon.

The dim attribute returns the length of each dimensions, that we have previously inspected. So, for instance, the dimension time has a length of 1805. In this case, we know the data is daily because we have already inspected the Dates element, so we know that our datasets contains data for 1805 days.

length(iberia_ncep_psl$Dates$start)
## [1] 1805

In the same vein, the lat and lon dimensions contain 6 and 10 units respectively, as we have previously displayed in the previous figure.

length(iberia_ncep_psl$xyCoords$y)
## [1] 6
length(iberia_ncep_psl$xyCoords$x)
## [1] 10

Subsetting data arrays

So we can think of our Data array as a cube of data. Suppose we want to access the data for the 11th day for one point, say the 4th latitude and the 7th longitude:

iberia_ncep_psl$Dates$start[11] # 11 December 1990
## [1] "1990-12-11 00:00:00 GMT"
iberia_ncep_psl$xyCoords$x[7] # 2.5 degrees east
## [1] 2.5
iberia_ncep_psl$xyCoords$y[4] # 42.5 degrees north
## [1] 42.5
iberia_ncep_psl$Data[11, 2, 7] 
## [1] 100675

This is the sea-level pressure value for the selected point for that particular day.

If instead of one single day, we want the whole time series (1805 days) for that particular point:

psl.time.series <- iberia_ncep_psl$Data[, 2, 7]
length(psl.time.series)
## [1] 1805

This is the representation of the time series:

plot(psl.time.series, ty = "l", xlab = "time", ylab = "sea-level pressure (Pa)")

plot of chunk unnamed-chunk-17

Note that our data are discontinuous, because we are only considering winter time. Thus, if we want to maintain the temporal dimensions adequately represented in the plot, we have to include in the x-axis the time vector. Remember that for convenience, dates are stored as character strings, so prior to analysing times we need to convert them to a time class in R:

date.vec <- as.POSIXlt(iberia_ncep_psl$Dates$start, tz = "GMT")
plot(date.vec, psl.time.series, ty = "l", xlab = "time", ylab = "sea-level pressure (Pa)")

plot of chunk unnamed-chunk-18

Time aggregations (example of tapply usage)

The plot above is a bit congested. Actually, it is not really useful to plot a daily time series for such a long period. It may be more useful to have a annual time series, displaying the mean winter sea-level pressure for each year. Next we aggregate the data on a yearly basis using tapply:

year.vec <- date.vec$year + 1900 # vector of months for each daily observation
length(year.vec)
## [1] 1805
length(psl.time.series)
## [1] 1805
annual.ts <- tapply(psl.time.series, INDEX = year.vec, FUN = mean, na.rm = TRUE)
plot(unique(year.vec), annual.ts, ty = "o", xlab = "year", ylab = "Sea-level pressure (Pa)")

plot of chunk unnamed-chunk-19

This has been done just as an example, although in this case we are analysing winter, which is troublesome because it is a “year-crossing” season. You can find out more about this problem by typing help(getYearsAsINDEX), which is a downscaleR function intended to compute the annual aggregations as we have done now.

Suppose now that we are only interested in the data from February. Then it is necessary to filter the time series and retain only the data from February:

months <- date.vec$mon + 1 # Remember that POSIXlt months start in 0 (January)
feb.index <- which(months == 2)
psl.feb <- psl.time.series[feb.index]
yr.feb <- year.vec[feb.index]
psl.feb.series <- tapply(psl.feb, INDEX = yr.feb, FUN = mean)
plot(unique(yr.feb), psl.feb.series, ty = "o", xlab = "year", ylab = "Pa")
title(main = "February Mean Sea-level pressure")

plot of chunk unnamed-chunk-20

Spatial slices

Similarly, we can subset a particular day for the whole spatial field. In this example, we choose the Christmas day of year 2001:

day.index <- which(date.vec$mday == 25 & date.vec$mon + 1 == 12 & date.vec$year + 1900 == 2001)
day.index
## [1] 1018
spatial.field <- iberia_ncep_psl$Data[day.index,,]
str(spatial.field)
##  num [1:6, 1:10] 101548 101788 101985 102152 102048 ...

The spatial.field object is a 2D matrix with 6 rows (latitudes, y coordinates) and 10 columns (longitudes, x coordinates).

lon <- iberia_ncep_psl$xyCoords$x  
lat <- iberia_ncep_psl$xyCoords$y

Plotting spatial fields

There are many plot functions in R to display this kind of data. Here they go some examples from the “basic” R installation (plus an example from the previously loaded package fields)

Note that in oder to match the shape of the coordinates in the right order, we traspose the spatial data matrix (type ?graphics::image) for details on how this kind of data is represented.

z <- t(spatial.field)
str(z)
##  num [1:10, 1:6] 101548 101450 101418 101415 101475 ...

image function

image(lon, lat, z, asp = 1) # asp = 1 preserves the x-y aspect ratio
world(add = TRUE)

plot of chunk unnamed-chunk-24

contour function

contour(lon, lat, z)