Changes between Version 6 and Version 7 of udg/ecoms/RPackage/homogeneization


Ignore:
Timestamp:
Feb 14, 2014 1:22:07 PM (8 years ago)
Author:
juaco
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • udg/ecoms/RPackage/homogeneization

    v6 v7  
    1 The different nature of the different climate products, models and variables, and the idiosyncratic naming and storage conventions often applied by the various modelling centres, makes necessary a previous homogeneization across datasets in order to implement a truly user-friendly toolbox for data access. The `meteor` package achieves this aim by defining a common ''vocabulary'' to all climate datasets. The particular variables of each dataset are translated -and transformed if necessary- into the common vocabulary by means of a ''dictionary''. Both features are next described.
     1The different nature of the different climate products, models and variables, and the idiosyncratic naming and storage conventions often applied by the various modelling centres, makes necessary a previous homogeneization across datasets in order to implement a truly user-friendly toolbox for data access. The `ecomsUDG.Raccess` package achieves this aim by defining a common ''vocabulary'' to all climate datasets. The particular variables of each dataset are translated -and transformed if necessary- into the common vocabulary by means of a ''dictionary''. Both features are next described:
    22
    33== Vocabulary definition
    44
    5 In order to set a common framework with a precise definition of the variables, the `meteoR` package is based on the use of a vocabulary. In essence, the vocabulary is a table containing the standard names of a number of variables commonly used in impact studies and downscaling applications, subject to permanent revision or addition of new standard variables. The naming conventions and the units are based on the standard name [http://www.specs-fp7.eu/wiki/index.php/Data#SPECS_convention:_Standard_output_and_data_management_description table] provided in the frame of the SPECS project, although in case of conflict, and in order to maximize the inter-operability of the vocabulary, the nomenclature is also compliant with the [http://cf-pcmdi.llnl.gov/documents/cf-standard-names/standard-name-table/16/cf-standard-name-table.html/ NetCDF Climate and Forecast Metadata Convention].
     5In order to set a common framework with a precise definition of the variables, the `ecomsUDG.Raccess` package is based on the use of a vocabulary. In essence, the vocabulary is a table containing the standard names of a number of variables commonly used in impact studies and downscaling applications, subject to permanent revision or addition of new standard variables. The naming conventions and the units are based on the standard name [http://www.specs-fp7.eu/wiki/index.php/Data#SPECS_convention:_Standard_output_and_data_management_description table] provided in the frame of the SPECS project, although in case of conflict, and in order to maximize the inter-operability of the vocabulary, the nomenclature is also compliant with the [http://cf-pcmdi.llnl.gov/documents/cf-standard-names/standard-name-table/16/cf-standard-name-table.html/ NetCDF Climate and Forecast Metadata Convention].
    66
    77
    8 * `Identifier`: this is the standard name that the loading functions require as argument when we set the `standard.vars` argument to `TRUE` (see the [wiki:../Functions defined functions]).
    9 * `Standard_name`: standard name of the variable as defined by the CF convention.
    10 * `Units`: units in which the standard variable is returned
     8* `identifier`: this is the standard name that the `loadSeasonalForecast` function will recognize automatically.
     9* `standard_name`: standard name of the variable as defined by the CF convention.
     10* `units`: units in which the standard variable is returned.
    1111
     12The vocabulary has been included as a is a built-in dataset of the `ecomsUDG.Raccess` package in order to provide the user a reference of the standard variables.
    1213{{{
    13 "identifier","standard_name","units"
    14 "ta","temperature","degrees Celsius"
    15 "tas","2-meter temperature","degrees Celsius"
    16 "tasmax","maximum 2-m temperature","degrees Celsius"
    17 "tasmin","minimum 2-m temperature","degrees Celsius"
    18 "pr","Precipitation amount","mm"
    19 "zg","geopotential_height","m"
    20 "plev","air_pressure","Pa"
    21 "psl","air_pressure_at_sea_level","Pa"
    22 "ps","surface_air_pressure","Pa"
    23 "hus","specific_humidity","kg kg-1"
    24 "hur","relative_humidity","1"
    25 "ua","eastward_wind","m s-1"
    26 "va","northward_wind","m s-1"
     14> library(ecomsUDG.Raccess)
     15Loading required package: rJava
     16Loading required package: sp
     17> data(vocabulary)
     18> print(vocabulary)
     19  identifier              standard_name           units
     201        tas        2-meter temperature degrees Celsius
     212     tasmax    maximum 2-m temperature degrees Celsius
     223     tasmin    minimum 2-m temperature degrees Celsius
     234         tp Total precipitation amount              mm
     245        psl  air pressure at sea level              Pa
    2725}}}
     26
    2827
    2928
     
    3130
    3231The dictionary is a table whose aim is twofold:
    33  1. On the one hand, the dictionary is intended for the translation of generic variables, as idiosyncratically defined in each particular dataset, to the standard variables defined in the vocabulary with their corresponding nomenclature and units. This is achieved by providing a correspondence between the name of the variable as encoded in the dataset (`short_name`) and the corresponding name of the standard variable as defined in the vocabulary (`identifier`), and by applying the corresponding transformation to the native variable in order to match the standard units by means of a `scale` factor and an `offset`. In some particular cases (e.g. the precipitation provided by the System4 model outputs), the variables are also deaccumulated.
     32 1. On the one hand, the dictionary is intended for the translation of generic variables, as idiosyncratically defined in each particular dataset, to the standard variables defined in the vocabulary with their corresponding nomenclature and units. This is achieved by providing a correspondence between the name of the variable as encoded in the dataset (this is the variable name returned by the `datasetInventory` function) and the corresponding name of the standard variable as defined in the vocabulary (i.e., the `identifier`), and by applying the corresponding transformation to the native variable in order to match the standard units by means of an `offset` and a `scale` factor. In some particular cases (e.g. the precipitation provided by the System4 model outputs), the variables are also deaccumulated.
    3433 2. The dictionary also provides additional metadata often not explicitly declared in the datasets, regarding the ''time'' aggregation of the dataset (often referred to as the ''cell method''). This includes the fields `time_step`, which is merely informative, and describes the time interval between two consecutive values, and the `lower_time_bound` and `upper_time_bound`, which are the values that should be summed to each verification time to unequivocally delimit the time span encompassed by each value.
    3534       
    36 The dictionary is a comma-sepparated text file (csv), that by default is identified with the same name than the dataset, and the extension ''.dic'', and stored in the same directory than the dataset, although its name and location can be other if adequately specified in the loading functions by the argument `dictionary`. The dictionary must be created  ''"by hand"'' by the user, because it requires some ''a priori'' knowledge about the characteristics of the data stored in the dataset, that can be partly obtained using the function [wiki:../Functions#dataInventory dataInventory]. The columns of the dictionary are next described:
     35In essence, the dictionary is a comma-sepparated text file (csv), that by default is identified with the same name than the argument `dataset` of the `loadSeasonalForecast` function. The dictionaries for the currently available datasets are included in the `ecomsUDG.Raccess` package, within the ''dictionaries'' folder. The dictionaries are read internally by `loadSeasonalForecast` to undertake the conversions needed for returning the standard variables, so by default, the user does not need to worry about it. In case an interested user wants to inspect a particular dictionary, he/she can proceed as follows:
     36
     37{{{
     38> ip <- installed.packages()
     39> # Path to the installed library
     40> libPath <- ip[grep("ecomsUDG.Raccess", ip[ ,1]), 2]
     41> # Path to the dictionaries folder
     42> dicPath <- paste(libPath, "/ecomsUDG.Raccess/dictionaries", sep = "")
     43> list.files(dicPath)
     44[1] "CFSv2_seasonal_16.csv"   "System4_annual_15.csv"   "System4_seasonal_15.csv"
     45[4] "System4_seasonal_51.csv"
     46> dic <- read.csv(list.files(dicPath, full = TRUE)[grep("CFSv2", list.files(dicPath))])
     47> str(dic.cfs)
     48'data.frame':      5 obs. of  9 variables:
     49 $ identifier      : Factor w/ 5 levels "psl","tas","tasmax",..: 3 4 2 5 1
     50 $ short_name      : Factor w/ 5 levels "Maximum_temperature_height_above_ground" ...
     51 $ time_step       : Factor w/ 1 level "6h": 1 1 1 1 1
     52 $ lower_time_bound: int  0 0 0 0 0
     53 $ upper_time_bound: int  6 6 6 6 0
     54 $ aggr_fun        : Factor w/ 5 levels "max","mean","min",..: 1 3 2 5 4
     55 $ offset          : num  -273 -273 -273 0 0
     56 $ scale           : int  1 1 1 21600 1
     57 $ deaccum         : int  0 0 0 0 0
     58}}}
     59
     60The latest version of the dictionaries can be checked-out in the development version of the package at the [https://github.com/SantanderMetGroup/ecomsUDG.Raccess/tree/master/inst/dictionaries GitHub repository].
     61
     62The columns of the dictionary are next described:
    3763 
    3864 * `identifier`: this is the name of the standard variable, as defined in the vocabulary
     
    4672 * `deaccum`. This is a logical flag (0 = FALSE, 1= TRUE), which indicates if the variable should be de-accumulated at each time step. Typically applied to precipitation in some forecast datasets.
    4773
    48 In the following example, we show the characteristics of the dictionary constructed for the 15 members seasonal forecast of the ECMWF's System4 model:
    49 
    50 {{{
    51 identifier,short_name,time_step,lower_time_bound,upper_time_bound,aggr_fun,offset,scale,deaccum
    52 tasmax,Maximum_temperature_at_2_metres_since_last_24_hours_surface,24h,0,24,max,-273.15,1,0
    53 tasmin,Minimum_temperature_at_2_metres_since_last_24_hours_surface,24h,0,24,min,-273.15,1,0
    54 tas,Mean_temperature_at_2_metres_since_last_24_hours_surface,24h,0,24,mean,-273.15,1,0
    55 pr,Total_precipitation_surface,24h,0,24,sum,0,1000,1
    56 psl,Mean_sea_level_pressure_surface,6h,0,0,none,0,1,0
    57 }}}
    58 
    59 
    60 Note that column names matter (not so their relative order), because the data load functions will perform the conversion of the variables to the standard format by finding the corresponding values by the name of the columns. The [https://www.meteo.unican.es/trac/meteo/attachment/wiki/EcomsUdg/meteoR_v1_0.zip meteoR] package includes some dictionaries, and specific examples are given in the [wiki:../Examples Examples section]