Changes between Version 8 and Version 9 of udg/ecoms/RPackage


Ignore:
Timestamp:
May 28, 2013 10:06:19 AM (8 years ago)
Author:
juaco
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • udg/ecoms/RPackage

    v8 v9  
    11
    22
    3 = Overview of the R package under development
     3= Overview of the `meteoR` package
    44
    55
    66
    7 Since the [http://www.r-project.org/ R language] has been adopted for some key tasks in the EUPORIAS and SPECS projects (including the development of comprehensive validation and statistical-downscaling packages) a R package is currently under development. In the current status of this task, some functions for data exploration and access have been created. These functions allow the creation of accessible datasets from locally stored climate files, the creation of data inventories providing an overview of the characteristics of the data (variables stored, units, time resolution ...) and accessing local and remote datasets in a straightforward manner by means of simple arguments, allowing the retrieval of dimensional slices of observational, reanalysis and forecast (System4) climate data. A full R package with added capabilities (including specific plot methods) and access to new datasets will be soon released for the SPECS/EUPORIAS community, as soon as new databases are incorporated into the SPECS-EUPORIAS THREDDS Data Server and new user's needs and requirements are identified and discussed.
     7Since the [http://www.r-project.org/ R language] has been adopted for some key tasks in the EUPORIAS and SPECS projects (including the development of comprehensive validation and statistical-downscaling packages) a R package (`meteoR`) is currently under development. In the current status of this task, some functions for data exploration and access have been created. These functions allow the creation of accessible datasets from locally stored climate files, the creation of data inventories providing an overview of the characteristics of the data (variables stored, units, time resolution ...) and accessing local and remote datasets in a straightforward manner by means of simple arguments, allowing the retrieval of dimensional slices of observational, reanalysis and forecast (System4) climate data. A full R package with added capabilities (including specific plot methods) and access to new datasets will be soon released for the SPECS/EUPORIAS community, as soon as new databases are incorporated into the SPECS-EUPORIAS THREDDS Data Server and new user's needs and requirements are identified and discussed.
    88
    99
    1010== Vocabulary definition
    1111
    12 In order to set a common framework with a precise definition of the variables, the R package is based on the use of a vocabulary. Essentially, the vocabulary is simply a table containing the standard names of a number of variables commonly used in impact studies and downscaling applications. The naming conventions and the units are based on the standard name table provided by the [http://cf-pcmdi.llnl.gov/documents/cf-standard-names/standard-name-table/16/cf-standard-name-table.html/ NetCDF Climate and Forecast Metadata Convention]. The vocabulary consists of a table with:
     12In order to set a common framework with a precise definition of the variables, the `meteoR` package is based on the use of a vocabulary. In essence, the vocabulary is a table containing the standard names of a number of variables commonly used in impact studies and downscaling applications, subject to permanent revision or addition of new standard variables. The naming conventions and the units are based on the standard name table provided by the [http://cf-pcmdi.llnl.gov/documents/cf-standard-names/standard-name-table/16/cf-standard-name-table.html/ NetCDF Climate and Forecast Metadata Convention]. The vocabulary consists of a table with:
    1313
    14 * `Identifier`: this is the standard name that the loading functions require as argument when we set the `standard.vars` argument to `TRUE`.
     14* `Identifier`: this is the standard name that the loading functions require as argument when we set the `standard.vars` argument to `TRUE` (see the [wiki:SpecsEuporias/RPackage/Functions defined functions]).
    1515* `Standard_name`: standard name of the variable as defined by the CF convention.
    1616* `Units`: units in which the standard variable is returned
     
    3737
    3838The dictionary is a table whose aim is twofold:
    39  1. On the one hand, the dictionary is intended for the translation of generic variables, as idiosyncratically defined in each particular dataset, to the standard variables defined in the vocabulary with their corresponding nomenclature and units. This is achieved by providing a correspondence between the name of the variable as encoded in the dataset (`short_name`) and the corresponding name of the standard variable as defined in the vocabulary (`identifier`), and by applying the corresponding transformation to the native variable in order to match the standard units by means of a `scale` factor and an `offset`.
    40  2. In addition, the dictionary provides additional metadata often not explicitly declared in the datasets, regarding the ''time'' aggregation of the dataset (often referred to as the ''cell method''). This includes the fields `time_step`, which is merely informative, and describes the time interval between two consecutive values, and the `lower_time_bound` and `upper_time_bound`, which are the values that should be summed to each verification time to unequivocally delimit the time span encompassed by each value.
     39 1. On the one hand, the dictionary is intended for the translation of generic variables, as idiosyncratically defined in each particular dataset, to the standard variables defined in the vocabulary with their corresponding nomenclature and units. This is achieved by providing a correspondence between the name of the variable as encoded in the dataset (`short_name`) and the corresponding name of the standard variable as defined in the vocabulary (`identifier`), and by applying the corresponding transformation to the native variable in order to match the standard units by means of a `scale` factor and an `offset`. In some particular cases (e.g. the precipitation provided by the System4 model outputs), the variables are also deaccumulated.
     40 2. The dictionary also provides additional metadata often not explicitly declared in the datasets, regarding the ''time'' aggregation of the dataset (often referred to as the ''cell method''). This includes the fields `time_step`, which is merely informative, and describes the time interval between two consecutive values, and the `lower_time_bound` and `upper_time_bound`, which are the values that should be summed to each verification time to unequivocally delimit the time span encompassed by each value.
    4141       
    42 The dictionary is a comma-sepparated text file (csv), that by default is identified with the same name than the dataset, and the extension ''.dic'', and stored in the same directory than the dataset, although its name and location can be other if adequately specified in the loading functions. The dictionary must be created  ''"by hand"'' by the user, because it requires some ''a priori'' knowledge about the characteristics of the data stored in the dataset, that can be partly obtained using the function [https://www.meteo.unican.es/trac/meteo/wiki/SpecsEuporias/RPackage/Functions dataInventory]. The columns of the dictionary are next described:
     42The dictionary is a comma-sepparated text file (csv), that by default is identified with the same name than the dataset, and the extension ''.dic'', and stored in the same directory than the dataset, although its name and location can be other if adequately specified in the loading functions by the argument `dictionary`. The dictionary must be created  ''"by hand"'' by the user, because it requires some ''a priori'' knowledge about the characteristics of the data stored in the dataset, that can be partly obtained using the function [wiki:SpecsEuporias/RPackage/Functions#dataInventory dataInventory]. The columns of the dictionary are next described:
    4343 
    4444 * `identifier`: this is the name of the standard variable, as defined in the vocabulary
     
    6464
    6565
    66 Note that the names of the columns are important (not so their relative order), because the `loadData` and `loadObservations` R functions will perform the conversion of the variable to the standard format by finding the corresponding values by the name of the columns.
     66Note that column names matter (not so their relative order), because the data load functions will perform the conversion of the variables to the standard format by finding the corresponding values by the name of the columns. The [https://www.meteo.unican.es/trac/meteo/attachment/wiki/SpecsEuporias/meteoR_v1_0.zip meteoR] package includes some dictionaries, and specific examples are given in the [wiki:SpecsEuporias/RPackage/Examples Examples section]
    6767
    6868