{{{#!comment {{{#!div style="text-align:justify" }}} }}} '''The SPECS-EUPORIAS Data Server''' Different sector-specific impact activities to be undertaken in [http://www.specs-fp7.eu SPECS] and [http://www.euporias.eu EUPORIAS] projects require a reduced number of variables (typically at surface) from different data sources (mainly seasonal forecasts, reanalysis, and observations). The [http://www.meteo.unican.es/tds5/catalogs/system4/System4Datasets.html SPECS-EUPORIAS Data Server] has been established by the Santander Meteorology Group (UC-CSIC) as part of the data management activities in these projects to provide a unique access for these impact-relevant variables, gathered from existing datasets. The data portal is based on a THREDDS data server providing metadata and data access using OPeNDAP and other remote data access protocols. Moreover, a user-friendly [http://www.r-project.org R] package has also been developed for exploring and remotely accessing subsets of data, thus reducing the burden of data access in these activities. This package will be also a key component for other tasks of the projects based on R, including the validation and downscaling packages to be developed within SPECS and sector-specific calibration and modeling tools to be developed in EUPORIAS. This trac/wiki page provides an up-to-date description of the SPECS-EUPORIAS Data Server, including information of the available datasets and the documentation and code of the R data access package. This page is currently under construction, but both a first tutorial describing the basic functioning and a first version of the R package (a R function) are already available: '''Dataset catalog:''' [http://www.meteo.unican.es/tds5/catalogs/system4/System4Datasets.html] '''R code:''' [mtl:browser:MLToolbox/trunk/MLToolbox_experiments/antonio/system4/r/loadSystem4.R loadSystem4.R] '''Tutorial:''' [attachment:DataPortal_Tutorial.pdf PDF file] Contents (under development): [[TOC(SpecsEuporias,depth=2,inline,noheading)]] 1. [wiki:DataServer Data Server] * [wiki:DataPortal/The THREDDS Data Server] * [wiki:DataPortal/Datasets Available Datasets] 2. [wiki:RPackage R Package for Data Access] * [wiki:RPackage/Authentication Authentication] * [wiki:RPackage/Datasets Available datasets] * [wiki:RPackage/Examples Examples] 3. [wiki:Interfaces Other interfaces for Data Access] * [wiki:RPackage/Python Python] * [wiki:RPackage/Matlab Matlab] = Introduction and Motivation = #s.intro The impact activities on seasonal timescales involved in [http://www.specs-fp7.eu SPECS] and [http://www.euporias.eu EUPORIAS] projects require the use of different data sources (mainly seasonal forecasts, reanalysis, and observations). These activities include the calibration, downscaling, and modelling of sector-specific indices in agriculture, energy, health, etc., building on meteorological information. Typically, only a reduced subset of surface variables (precipitation, temperatures, mean sea level pressure, etc.) or in a reduced number of vertical levels (circulation and termodynamic drivers at, e.g., 850, 500, 200 hPa) is required for these activities. The ''SPECS-EUPORIAS Data Portal'' has been established by the '''Santander Meteorology Group (UC-CSIC)''' to gather the relevant information from existing datasets in order to provide a unique homogenized access to data for the SPECS and EUPORIAS partners (in particular for impact-users). The ''SPECS-EUPORIAS Data Portal'' is based on a `THREDDS Data Server` ([http://www.unidata.ucar.edu/projects/THREDDS/ TDS]) providing metadata and data access using `OPeNDAP` and other remote data access protocols. Moreover, since the `R` language (http://www.r-project.org) has been adopted for some key tasks in these projects (including the development of comprehensive validation and statistical-downscaling packages) a user-friendly `R` package has been developed to explore and access the data portal. This package can be used in `R` programs to remotely access subsets of data, thus reducing the burden of data access (versions for Python and Matlab are also available under request). This package will be continuously updated (keep informed at the documentation URL above) as part of the data management activities to build a data bridge for impact users and for the `R` developments to be done in these projects. This document briefly describes the current state of the data portal, which has initially focused on data from the ''ECMWF's System4 seasonal model'', as agreed in the downscaling parallel session of the kick-off meeting. = The THREDDS Data Server = #s.thredds The ''SPECS-EUPORIAS Data Portal'' is based on a password-protected `THREDDS data server` (`TDS`) providing metadata and data access to a set of georeferenced atmospheric variables using OPeNDAP and other remote data access protocols. The variables names, units and additional metadata follow the [http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.4/cf-conventions.html CF convention]. The variables are spatial grids based on multidimensional arrays of indexed values, following Unidata's ''_Coordinate convention''[[FootNote(http://www.unidata.ucar.edu/software/netcdf-java/reference/CoordinateAttributes.html)]][[FootNote(http://www.unidata.ucar.edu/software/netcdf-java/tutorial/GridDatatype.html)]]. Typically the data portal will include information at a daily resolution, but monthly-aggregated values could be also provided in some cases due to data limitations (in particular, ''Mètèo-France'' and ''Met Office'' have agreeed to provide monthly mean hindcasts for their use by the ''SPECS'' and ''EUPORIAS'' partners). In general, the data available will be typical surface variables (e.g. precipitation and near-surface temperature), although several variables (e.g. geopotential and temperature) on pressure levels will also be stored for the statistical downscaling activities. The data gathering activities have initially focused on the ''ECMWF System4 seasonal model''. The Meteorological Archival and Retrieval System (`MARS`) is the main repository of meteorological data at the ''ECMWF'' (European Centre for Medium-Range Weather Forecasts). It contains terabytes of operational and research data as well as data from special projects[[FootNote(http://www.ecmwf.int/services/archive/)]]. The large amount of information stored and the inherent complexities of data access, download and post-processing is a first shortcoming for a flexible use of these datasets by a large number of partners. To overcome this issue, a reduced subset of surface variables[[FootNote(http://www.ecmwf.int/products/changes/system4/technical_description.html#description)]] (precipitation, temperatures and mean sea level pressure) have been downloaded from `MARS` (a colection of `GRIB-1` files) at 0.75º spatial resolution and made available throught the ''SPECS-EUPORIAS data portal''. The downloaded data has been exposed as three different virtual datasets using `TDS`: * '''System4 seasonal range (15 members)''': There are twelve initializations (hereafter called `runtimes`) per year (the first of January, February, ...) running for 7 months (hereafter called simply `times`). An ensemble of 15 members is available for the whole 1981-2010 period. * '''System4 seasonal range (51 members)''': There are only four `runtimes` per year (the first of February, May, August and November) and the forecasts run for 7 months. An ensemble of 51 members is available for the whole 1981-2010 period. * '''System4 annual range (15 members)''': As in the previous case, there are four `runtimes` per year, but the forecasts run for 13 months. An ensemble of 15 members is available for the whole 1981-2010 period. Data gathering activities will next move to the CFS (http://cfs.ncep.noaa.gov) version 2 hindcast, developed at the ''Environmental Modeling Center at NCEP'' and also to reanalysis and observational datasets. Although the `TDS` provides a web interface to explore and access the datasets (shown in [#s.web.access web access section]), it is strongly recommented the use of `OPeNDAP` (a.k.a. `DODS`) client libraries to remotely access the data from scientific computing environments (`R`, `Matlab`, `Python`, etc.). For instance, the `R` function provided in this tutorial is based on the ''NetCDF Java'' OPeNDAP client[[FootNote(http://www.unidata.ucar.edu/software/netcdf-java/documentation.htm)]], using the `rJava` `R` package (a similar approach is been also made for the `Matlab` implementation). Alternatively, the most recent ''NetCDF library'' versions provide access to `OPeNDAP` datasets (this is the solution for the `Python` implementation). In the following, we show a simple example of data access using the `R` package developed as part of the data portal. In particular the ''System4'' datasets can by directly accessed using the `loadSystem4` function, allowing the retrieval of slices for a particular variable in any of the dataset dimensions (`member`/space/`runtime`/`time`). Note that a more ellaborated worked example using `R` is shown in the [#Appendix.rexample R example section]. Moreover, for a better understanding of the datasets structure, the use of the web interface for the `OPeNDAP` service is also illustrated [#s.web.access web access section]. = Accesing the Data Portal via `R` = #s.r.access = Accesing the Data Portal via Web = #s.web.access The ''SPECS-EUPORIAS Data Portal'' can be accessed through the '''Data Portal URL''' provided in the abstract. First of all, an authentication dialog will request a valid user name and password. [[Image(loginTHREDDS.png,align=center,width=320px,title=Authentication dialog)]] {{{#!comment \begin{figure}[H] \begin{center} \includegraphics[width = 0.8 \linewidth]{loginTHREDDS.png} \caption{Authentication dialog} \label{fig:login} \end{center} \end{figure} }}} Afterwards, the different datasets described in [#s.thredds TDS section] are listed as links in the web browser window. [[Image(fig01.png,align=center,width=320px,title=Catalog of the EUPORIAS-SPECS System4 datasets. Note that although they only include a few variables, their size range from one to four Terabytes)]] {{{#!comment \begin{figure}[H] \begin{center} \includegraphics[width = 0.85 \linewidth]{fig01.png} \caption{Catalog of the EUPORIAS-SPECS System4 datasets. Note that although they only include a few variables, their size range from one to four Terabytes. } \label{fig:dir} \end{center} \end{figure} }}} By clicking in any of the datasets, a new window will appear providing information on the variables and geospatial and time coverages, and offering different options for data access and/or visualization. [[Image(fig02.png,align=center,width=320px,title=Detail of a particular dataset with information on the included variables and geospatial and time coverages. The different options for data access and visualization are also shown.)]] {{{#!comment \begin{figure}[H] \begin{center} \includegraphics[width=0.8 \linewidth]{fig02.png} \caption{Detail of a particular dataset with information on the included variables and geospatial and time coverages. The different options for data access and visualization are also shown.} \label{fig:mainwin} \end{center} \end{figure} }}} Currently, only the `OPeNDAP` access service is fully operative in the portal. Therefore, in this example, we will illustrate the use of this service, which allows selecting time/spatial data slices from the `OPeNDAP` data access form shown in figure and downloading the resulting data in both ''ASCII'' and ''Binary'' formats. [[Image(openDAPwindow.png,align=center,width=320px,title=Detail of the OPeNDAP dataset access form for a particular dataset.)]] {{{#!comment \begin{figure}[H] \begin{center} \includegraphics[width=0.8 \linewidth]{openDAPwindow.png} \caption{Detail of the OPeNDAP dataset access form for a particular dataset.} \label{fig:opendapwin} \end{center} \end{figure} }}} Note that, as explained before, the variables provided by the data portal (e.g. minimum temperature) are stored as gridsets. Thus, in addition to these variables, also auxiliary coordinate variables (lat, lon, run, time, member) should be handled for geo-temporal data referencing ([attachment:openDAPwindow.png see Figure]). Moreover, three time coordinates are included as referece for different grid variables because they are defined for different forecast times (one extra time for precipitation and different temporal resolution for mean sea level pressure). Note that this highly complicates the direct analysis of the data and, hence, this options is only recommend for data exploration. In the following we show how to use this service to explore the structure of the datasets and to obtain simple pieces of information in `ASCII format`. By default, if no specifications are given in the different subsetting boxes of the OpenDAP form, the whole data on the whole spatio/temporal and member ranges of the dataset would be accessed. However, this option will raise an error due to the large size of the request (the maximum size of a single request has been set to 100 Mbytes in the ''SPECS-EUPORIAS data portal'' for the sake of multi-connection efficiency). The basic steps to retrieve subsets of data are the following: 1. To select a variable click on the checkbox to its left. 1. To constrain the variable, edit the information that appears in the text boxes below the variable. This is a vector of integers indicating index positions of length three, with the following order: `[start:stride:end]`. 1. To get ''ASCII'' or ''binary'' values for the selected variables, click on the ''Get ASCII'' or ''Get Binary'' buttons of the ''Action'' field. Note that the URL displayed in the ''Data URL'' field is updated as you select and/or constrain variables. The URL in this field can be cut and pasted in various `OPeNDAP` clients. The main disadvantage of the `OPeNDAP` service from the end-user point of view is that the specifications for subsetting dimensions are not given in their original magnitudes (i.e., latitudes and longitudes are not given in decimal degrees), but by the indexes of their position along their respective axes (note that first index value is always 0). Thus, to find out the indexes for the desired selection, we need to dump and analyze the particular values defined in the coordinate variable. For instance, this figure shows the 241 values defined for the `lat` (latitude) coordinate, as provided by the ''Get ASCII'' option (selecting the corresponding check-box). [[Image(latlonDump.png,align=center,width=320px,title=Text file displaying the values for the lat (latitude) coordinate variable.)]] {{{#!comment \begin{figure}[H] \begin{center} \includegraphics[width=\linewidth]{latlonDump.png} \caption{Text file displaying the values for the \texttt{lat} (latitude) coordinate variable.} \label{fig:latlonDump} \end{center} \end{figure} }}} Using these facilities it can be obtained after some calculations that the closest `lat` and `lon` coordinates for a particular location of interest (e.g. `Madrid`) are 66 and 475, respectively. Thus, the time series for Madrid corresponding to the example described in the previous section (minimum temperature forecasts for January with one-month lead time, i.e. from the simulations started the first of December) could be requested as shown in Figure [[Image(opendapquery.png,align=center,width=320px,title=Detail of the query from the OPeNDAP dataset access form to retrieve a subset (a time series for a single gridbox) of minimum temperature.)]] {{{#!comment \begin{figure}[H] \begin{center} \includegraphics[width= 0.95 \linewidth]{opendapquery.png} \caption{Detail of the query from the OPeNDAP dataset access form to retrieve a subset (a time series for a single gridbox) of minimum temperature.} \label{fig:opendapquery} \end{center} \end{figure} }}} Note that the indices selected for the run coordinate correspond to the December initilizations (index positions 11, 23,...; note that indexes start in 0) and for the time coordinate correspond to January (positions, 31 to 62, in days after the run time). Note that the proper use of this service requires a full understanding of the data structure and, therefore, it is only advised for data exploration. = Accessing to the Data portal using Python (Pydap version) = #ex.pydap [[NoteBox(warn,This section needs revision)]] {{{#!csh [user@host ~]$ pip install Pydap ........................................................................ [user@host ~]$ python Python 2.7.2 (default, Mar 3 2012, 10:45:44) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> }}} {{{#!python >>> from pydap.client import open_url >>> dataset = open_url('http://username:password@www.meteo.unican.es/tds5/dodsC/system4/System4_Seasonal_15Members.ncml') >>> print type(dataset) >>> print dataset.keys() ['lat', 'lon', 'run', 'time', 'time1', 'time2', 'member', 'Maximum_temperature_at_2_metres_since_last_24_hours_surface', 'Minimum_temperature_at_2_metres_since_last_24_hours_surface', 'Mean_temperature_at_2_metres_since_last_24_hours_surface', 'Total_precipitation_surface', 'Mean_sea_level_pressure_surface'] >>> MN2T24 = dataset['Minimum_temperature_at_2_metres_since_last_24_hours_surface'] >>> print MN2T24.dimensions ('member', 'run', 'time', 'lat', 'lon') >>> print MN2T24.shape (15, 360, 215, 241, 480) >>> arr = MN2T24[0,11:360:12,31:62,66,475] >>> print numpy.squeeze(numpy.mean(arr,2)) [ 270.79171753 273.29437256 271.56661987 271.03707886 271.82745361 272.49279785 271.48086548 268.59121704 271.53125 273.82156372 270.99401855 274.23626709 270.99328613 271.56115723 273.98986816 270.50756836 272.45046997 270.65560913 271.31182861 272.77200317 273.4359436 271.85021973 273.39648438 274.16384888 269.98248291 271.30166626 273.11950684 271.27301025 272.29147339 270.46688843] }}} = Accessing to the Data portal using Octave = #ex.octave [[NoteBox(warn,This section needs revision and integrtion with the [#ex.matlab Matlab example section])]] {{{#!text/matlab >> ver ---------------------------------------------------------------------- GNU Octave Version 3.6.1 GNU Octave License: GNU General Public License Operating System: unknown ---------------------------------------------------------------------- }}} {{{#!text/matlab >> urlwrite('http://www.meteo.unican.es/work/netcdfAll-4.3.jar','netcdfAll-4.3.jar') >> javaaddpath('./netcdfAll-4.3.jar'); >> javaMethod('setGlobalCredentialsProvider','ucar.nc2.util.net.HTTPSession',javaObject('ucar.nc2.util.net.HTTPBasicProvider','username','password')); >> ncfile = javaMethod('openDataset','ucar.nc2.dataset.NetcdfDataset','http://www.meteo.unican.es/tds5/dodsC/system4/System4_Seasonal_15Members.ncml'); >> v = ncfile.findVariable('Minimum_temperature_at_2_metres_since_last_24_hours_surface'); >> disp(v.getDimensions.toString) [ member = 15;, run = 360;, time = 215;, lat = 241;, lon = 480;] >> d = v.read('0,11:359:12,31:61,66,475'); >> tmp = javaObject('org.octave.Matrix',d.reduce.copyToNDJavaArray); >> oldFlag = java_convert_matrix (1); >> octaveMatrix = tmp.ident(tmp); [ (30 by 31) array of double ] >> disp(squeeze(mean(octaveMatrix,2))') Columns 1 through 13: 270.79 273.29 271.57 271.04 271.83 272.49 271.48 268.59 271.53 273.82 270.99 274.24 270.99 Columns 14 through 26: 271.56 273.99 270.51 272.45 270.66 271.31 272.77 273.44 271.85 273.40 274.16 269.98 271.30 Columns 27 through 30: 273.12 271.27 272.29 270.47 }}} = Accessing to the Data portal using Matlab= #ex.matlab [[NoteBox(warn,This section needs revision])]] {{{#!text/matlab >> ver ------------------------------------------------------------------------------------- MATLAB Version 7.8.0.347 (R2009a) MATLAB License Number: 161051 Operating System: Microsoft Windows Vista Version 6.1 (Build 7601: Service Pack 1) Java VM Version: Java 1.6.0_04-b12 with Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM mixed mode ------------------------------------------------------------------------------------- >> javaaddpath('http://www.meteo.unican.es/work/netcdfAll-4.3.jar'); >> %javaaddpath('ftp://ftp.unidata.ucar.edu/pub/netcdf-java/v4.3/netcdfAll-4.3.jar'); }}} {{{#!text/matlab >> import ucar.nc2.util.net.* %this will download the netcdfAll-4.3.jar >> HTTPSession.setGlobalCredentialsProvider(HTTPBasicProvider('username','password')); >> import ucar.nc2.*; >> import ucar.nc2.dataset.*; >> ncfile = NetcdfDataset.openDataset('http://www.meteo.unican.es/tds5/dodsC/system4/System4_Seasonal_15Members.ncml'); >> v = ncfile.findVariable('Minimum_temperature_at_2_metres_since_last_24_hours_surface'); >> disp(v.getDimensions) [ member = 15;, run = 360;, time = 215;, lat = 241;, lon = 480;] >> data = v.read('0,11:359:12,31:61,66,475').copyToNDJavaArray(); >> disp(squeeze(mean(data,3))) Columns 1 through 13 270.7917 273.2944 271.5666 271.0371 271.8275 272.4928 271.4809 268.5912 271.5313 273.8216 270.9940 274.2363 270.9933 Columns 14 through 26 271.5612 273.9899 270.5076 272.4505 270.6556 271.3118 272.7720 273.4359 271.8502 273.3965 274.1638 269.9825 271.3017 Columns 27 through 30 273.1195 271.2730 272.2915 270.4669 }}} = Example of Data Analysis with `R` = #app1 = References = [[FootNote]]