# Changes between Version 3 and Version 4 of udg/ecoms/RPackage

Ignore:
Timestamp:
May 17, 2013 10:01:02 AM (8 years ago)
Comment:

--

### Legend:

Unmodified
 v3 Since the [http://www.r-project.org/ R language] has been adopted for some key tasks in the EUPORIAS and SPECS projects (including the development of comprehensive validation and statistical-downscaling packages) a R package is currently under development. In the current status of this task, a first function has been created and the first trial version is already available, in order to explore and access the data portal in a user-friendly way, allowing the retrieval of dimensional slices of selected simulation members from the ECMWF's SYSTEM4 model. A full R package with added capabilities (including specific plot methods) and access to new datasets will be soon released for the SPECS/EUPORIAS community, as soon as new simulation datasets are incorporated into the project's THREDDS Data Server and new user's needs and requirements are identified and discussed. '''Vocabulary definition''' == Overview of the R package under development == In order to set a common framework of work allowing a precise definition of the variables, the R package is based on the use of a vocabulary, containing the standard names of a number of variables commonly used in impact studies and downscaling applications. The naming conventions and the units are based on the standard name table provided by the [http://cf-pcmdi.llnl.gov/documents/cf-standard-names/standard-name-table/16/cf-standard-name-table.html/ NetCDF Climate and Forecast Metadata Convention]. The vocabulary consists of a table with * Identifier: this is the standard name that the loading functions require as argument when we set the standard.vars' argument to TRUE. * Standard_name: standar name of the variable as defined by the CF convention. * Units: units in which the standard variable is returned Since the [http://www.r-project.org/ R language] has been adopted for some key tasks in the EUPORIAS and SPECS projects (including the development of comprehensive validation and statistical-downscaling packages) a R package is currently under development. In the current status of this task, some functions for data exploration and access have been created. These functions allow the creation of accessible datasets from locally stored climate files, the creation of data inventories providing an overview of the characteristics of the data (variables stored, units, time resolution ...) and accessing local and remote datasets in a straightforward manner by means of simple arguments, allowing the retrieval of dimensional slices of observational, reanalysis and forecast (System4) climate data. A full R package with added capabilities (including specific plot methods) and access to new datasets will be soon released for the SPECS/EUPORIAS community, as soon as new databases are incorporated into the SPECS-EUPORIAS THREDDS Data Server and new user's needs and requirements are identified and discussed. == Vocabulary definition == In order to set a common framework with a precise definition of the variables, the R package is based on the use of a vocabulary. Essentially, the vocabulary is simply a table containing the standard names of a number of variables commonly used in impact studies and downscaling applications. The naming conventions and the units are based on the standard name table provided by the [http://cf-pcmdi.llnl.gov/documents/cf-standard-names/standard-name-table/16/cf-standard-name-table.html/ NetCDF Climate and Forecast Metadata Convention]. The vocabulary consists of a table with: * Identifier: this is the standard name that the loading functions require as argument when we set the standard.vars argument to TRUE. * Standard_name: standard name of the variable as defined by the CF convention. * Units: units in which the standard variable is returned {{{ '''Dictionary''' == Dictionary == The dictionary is a table that defines the conversion between the variables of the model and the standard variables defined in the Vocabulary. The dictionary is a comma-sepparated text file (csv), that is identified with the same name than the dataset, and the extension .dic. In addition, it should be stored in the same directory that the dataset. The creation of the dictionary must be made by the user 'by hand', because it requires knowledge about the characteristics of the data stored in the dataset. The columns of the dictionary are next described: The dictionary is a table that defines the conversion between the variables of the model and the standard variables defined in the Vocabulary. The dictionary is a comma-sepparated text file (csv), that is identified with the same name than the dataset, and the extension ''.dic''. In addition, it should be stored in the same directory that the dataset. The creation of the dictionary must be made by the user 'by hand', because it requires knowledge about the characteristics of the data stored in the dataset. The columns of the dictionary are next described: * identifier: this is the name of the standard variable, as defined in the vocabulary * short_name: this is the name with which the original variable has been coded in the dataset * time_step: the time interval between consecutive times in the time dimension axis (in hours) * lower_time_bound: lower time bound of the variable * upper_time_bound: upper time bound of the variable. For instance, if a variable has identical lower and upper time bounds, it means that it is instantaneous. * aggr_fun: time aggregation function. Type of aggregation function applied to the variable between the lower and upper time bound. * offset: constant summed to the original variable for units conversion (e.g.: offset = -273.15 for conversion from Kelvin to Celsius) * scale: scale factor applied to the original variable for units conversion (e.g.: scale = 0.001 for conversion from m to mm) * identifier: this is the name of the standard variable, as defined in the vocabulary * short_name: this is the name with which the original variable has been coded in the dataset * time_step: the time interval between consecutive times in the time dimension axis (in hours) * lower_time_bound: lower time bound of the variable * upper_time_bound: upper time bound of the variable. For instance, if a variable has identical lower and upper time bounds, it means that it is instantaneous. * aggr_fun: time aggregation function. Type of aggregation function applied to the variable between the lower and upper time bound. * offset: constant summed to the original variable for units conversion (e.g.: offset = -273.15 for conversion from Kelvin to Celsius) * scale: scale factor applied to the original variable for units conversion (e.g.: scale = 0.001 for conversion from m to mm) {{{ Note that the names of the columns are important (not so their relative order), because the loadData.R' function will perform the conversion of the variable to the standard format by finding the corresponding values by the name of the columns. Note that the names of the columns are important (not so their relative order), because the loadData.R and loadObservations.R functions will perform the conversion of the variable to the standard format by finding the corresponding values by the name of the columns.