wiki:udg/ecoms/RPackage/examples/bias

Example 3. Calculating the bias of an ensemble forecast

In this example we will calculate the bias of the multimember forecast used in the in the previous example, but this time considering the monthly aggregated data, which dramatically reduces the size of the data loaded.

First of all, we load the same data as in the previous example, but using a monthly aggregation. To this aim, first a daily aggregation is specified (time = DD, aggr.m = min), and then the aggregation function for the daily data is specified to obtain the monthly aggregation via the aggr.m argument:

ex3.cfs <- loadECOMS(dataset = "CFSv2_seasonal",
                       var = "tasmin",
                       members = 1:2,
                       lonLim = c(-15,35),
                       latLim = c(32, 75),
                       season = c(12,1,2),
                       years = 1991:2000,
                       leadMonth = 3,
                       time = "DD",
                       aggr.d = "min",
                       aggr.m = "mean")

## [2016-05-13 13:41:56] Defining homogeneization parameters for variable "tasmin"
## [2016-05-13 13:41:56] Opening dataset...
## [2016-05-13 13:41:56] The dataset was successfuly opened
## [2016-05-13 13:41:56] Defining geo-location parameters
## [2016-05-13 13:41:56] Defining initialization time parameters
## NOTE: Daily aggregation will be computed from 6-hourly data
## NOTE: Daily data will be monthly aggregated
## [2016-05-13 13:41:58] Retrieving data subset ...
## [2016-05-13 13:47:30] Done

Note the difference in size of the daily-aggregated data of the in the previous example (35.1 Mb) as compared to the new monthly-aggregated data size (1.2 Mb). This is particularly important if our goal is to perform a validation of a large domain (possibly global), that can be easily handled on a monthly basis but that will cause memory problems using daily/sub-daily data.

print(object.size(ex3.cfs), units = "Mb")
## 1.2 Mb

We the load the reference observations (WFDEI dataset) for the spatio-temporal domain previously chosen, using the same monthly aggregation (note that in this case the original data are daily, so there is no need to specify a daily aggregation):

ex3.obs <- loadECOMS(dataset = "WFDEI",
                     var = "tasmin",
                     lonLim = c(-15,35),
                     latLim = c(32, 75), 
                     season = c(12,1,2),
                     years = 1991:2000,
                     aggr.m = "mean")

## [2016-05-13 13:48:39] Defining homogeneization parameters for variable "tasmin"
## [2016-05-13 13:48:39] Opening dataset...
## [2016-05-13 13:48:39] The dataset was successfuly opened
## [2016-05-13 13:48:39] Defining geo-location parameters
## [2016-05-13 13:48:39] Defining time selection parameters
## NOTE: Daily data will be monthly aggregated
## [2016-05-13 13:48:39] Retrieving data subset ...
## [2016-05-13 13:49:08] Done

This is the map of the observed mean minimum surface temperature observed for DJF 2001-2010:

plotMeanGrid(ex3.obs)

Note that WFDEI provides data for land areas only, and its spatial resolution is finer than CFS (1º vs 0.5º). In order to compare both datasets, it is first necessary to put them in the same grid (i.e., to interpolate). We use bilinear interpolation to this aim, using the downscaleR function interpGrid in combination with the getGrid method, useful to recover the parameters defining the grid of a dataset to pass them to the interpolator:

obs.regridded <- downscaleR::interpGrid(grid = ex3.obs,
                                        new.coordinates = getGrid(ex3.cfs),
                                        method = "nearest")
## [2016-05-13 13:51:37] Calculating nearest neighbors...
## [2016-05-13 13:51:37] Performing nearest interpolation... may take a while
## [2016-05-13 13:51:37] Done
## Warning messages:
## 1: In downscaleR::interpGrid(grid = ex3.obs, new.coordinates = getGrid(ex2),  :
##   The new longitudes are outside the data extent
## 2: In downscaleR::interpGrid(grid = ex3.obs, new.coordinates = getGrid(ex2),  :
##   The new latitudes are outside the data extent

Note the warnings reminding us that the extent of the input grid is wider than that from CFS. However, in this case we can safely ignore this warnings, since all the land areas we are interest in are within the CFS domain.

plotMeanGrid(obs.regridded)

After regridding, both model data and observations are in the same grid. We can compute the bias.

In order to compute the mean bias, we need to calculate the climatologies from both the observations, and the predictions. This can be easily accomplished using the function climatology in downscaleR. By default, it will compute the mean along the time dimension, but it is flexible and any other function could be defined by the user. In addition, the logical flag by.member allows to compute the climatology on each member by sepparate, rather than on the ensemble mean:

obs.clim <- climatology(obs.regridded)
## [2016-05-13 14:19:15] - Computing climatology...
## [2016-05-13 14:19:16] - Done.
pred.clim <- climatology(ex3.cfs, by.member = TRUE)
## [2016-05-13 14:19:31] - Computing climatology...
## [2016-05-13 14:19:31] - Done.

Now, the biases can be computed, member by member, by just subtracting the observed climatology from the member climatologies. In this example, we create a new data array of the same dimensions than the predictions, include it in a new multimember grid object of the same characteristics as ex3.cfs and plot it:

n.members <- dim(pred.clim$Data)[1]
arr <- pred.clim$Data
for (i in 1:n.members) {
      arr[i,,,] <- pred.clim$Data[i,,,] - obs.clim$Data[1,,,]
}
bias.grid <- ex3.cfs
bias.grid$Data <- arr
plotMeanGrid(bias.grid, multi.member = TRUE)

NOTE: a more elaborated example to compute bias as well as other verification metrics is provided is this EXAMPLE

Last modified 6 years ago Last modified on May 13, 2016 2:27:10 PM

Attachments (3)

Download all attachments as: .zip