A tool that is integrated in ?ToolsUI NetCDF Java for including a desktop client for ?ESGF services, with features like search of data along multiple nodes, download manager and metadata aggregation for allowing a full exploration through dataset services.
The added functionality in ToolsUI NetCDF is in ESGF tab. This tab contains 4 sub-tabs, each one with an specific functionality.
JDK from Sun / Oracle (1.5 - newer) or OpenJDK 6
OpenJDK 7 is not supported
java -jar ESGFToolsUI.jar
A search in ESGF returns the records that matching the search constraints after querying to an index node. The ESGF search service is always served by a index node and this node is able to do local and distributed searches. In this tool, the distributed capabilities of the ESGF are used so any selected index node will be used to query all nodes in the ESGF system.
The search panel of ESGF climate data is in the "Search" sub-tab (be patient, the panel takes to load). The search constraints are defined by parameter configuration in the panel or by introduction of a free-text search query. The result are the number of records finding in the federation that satisfy the constraints of search (in panel in bottom right area). Record is the physical replica of a climate dataset that are stored on a data node.
The searches are executed in a index node of the federation. Each node have different processing times for the response. For this reason, the user is allowed to select the index node where prefers send the search petition.
The index node may be configured in the top left drop-down list. After selected a new index node the search panel is updated.
A global search in ESGF must be the same result in all ESGF index nodes (according to ESGF wiki). However, for reasons external to this application and that are related with bugs in ESGF search service, sometimes may give different results
The search parameters can be configured on the panel in two ways:
The tree of parameters shows the parameters with bounded values which are defined by the federation. In the first level is the name of the parameter and the number of its bounded values. In the second level are the parameter values, each one with the count of records that satisfy the previous configured constraints (restrictions that are listed in the "configured parameters" section) + the new constraint (parameter-value).
For each parameter can be selected more than one value doing double click in the checkbox. Finally, for adding the selected values on the tree in the search constraints you must click on "Add parameters" button. All values selected in a parameter are linked by logic OR and the selected parameters are linked by logic AND. Generally, the parameter configuration is as follows, where "P" is a parameter and "V" a value.
For example, if are selected: * Institute * BNU ? * CCCMA ? * Model * BNU-ESM ? Then the result will be the number of records that belong to BNU-ESM model and made by BNU or CCCMA institutes i.e. ((Institute,BNU)V(Institute,CCCMA))??((Model,BNU-ESM))
After adding new parameters the search panel is updated (?Add parameters? button). Now in the tree, all parameters that already have values configured only shows this values. This happens because the allowed values ??for a parameter are always those that do not result in an empty set if they are selected. To configure a parameter previously configurated you can uncheck the previously configured values and then click on "Add parameters" or you can remove them in "Configured parameters" section.
The parameter selection in drop-down list (top right) allows values configuration of some parameters. In this list are contained the parameters without bounded values and that for this reason can't be configured in the tree of parameters. Also contains the parameters with bounded values. After selecting a list parameter, below of the list will be displayed a specific configuration panel for the selected parameter.
In the figure below you can see the panel that is displayed when ?temporal range of data coverage? is selected. Finally, to be added this configuration click in the bottom button "Add parameter" ("Add temporal range parameter" in this case).
All configured parameters are displayed in a specific panel in bottom left. In this panel are listed the configured parameters with its values assigned.
In the figure below can see that the current search is configured for searching climate data that belong to CMIP5 project, that were made by BNU or CCCMA institute, that have a time frequency of six hours, belongs to "historical" experiment and that have data between 15-01-1920 and 20-01-2000, and finally contains at least one of this variables: "hus", "ps", "psl", "ta".
This panel also allow to select a parameter and delete its configuration doing click on "Remove" and delete all parameters configuration doing click on "Remove all"
The free-text search o full text search allows to do a search rich in syntax, by arbitrary words, logic operations and wild-cards. In this case, the records searched will be records that contains metadata that are related with what is specified in the free text query.
In top center in the search panel can see the text box where can be introduced the query of free text search. The "Edit" button enables input by keyboard in the text box. The "Save" button adds the free text query parameter in the search. The new configured parameter will be displayed in the "configured parameters" section with the name "query".
The free-text query may have:
For example, the query:
Will search all records in ESGF whose institute is CCCMA OR that satisfy the following points:
A configured search can be saved for later for to be able do a harvesting of metadata and services of datasets that satisfy the search constraints. The save section is in right bottom of search panel. You can overwrite the selected search with "Save search" button or save it with a new name clicking on "Save Search As..." button.
The name of search must be unique. Duplicated names aren't allowed
You can select a previously saved search from the right top drop-down list. After select a search, the search panel is updated.
To restart the search panel must be selected in the drop-down list the option: <<New search>>
The metadata harvesting will allow after completed, download the datasets from multiple data nodes, and will allow exploring climate datasets without having to download the dataset itself . So we can know its nature in detail before selecting them to download or we can know the services offered from ESGF to access and/or explore the dataset. Some of these services can be explored from ToolsUI NetCDF Java as will be explained after.
The harvesting are doing at dataset level.
Dataset is a climate data in a specific version stored by ESGF. One dataset may have several records i.e. the records are the physical replica of a dataset in a data node.
Also, the datasets are formed of files, i.e. dataset are a virtual container of data, so that the information is contained in files and sometimes in aggregations. The versions of datasets are generated when errors are found in the datasets giving rise to a new corrected version.
Datasets, files and aggregations have replicas in data nodes. And the ESGF services (THREDDS, LAS, HTTP, GridFTP and OPeNDAP) are always served at replicas level. That is the reason why the harvesting must be done before the download.
The harvesting panel may be selected clicking on the sub-tab "Search Harvesting". In this panel are deployed a list of searches and their harvesting states. Also, provides several options for flow control, complete exploration of harvested datasets and the posibility to do a manual selection of files to put them in download queue.
Each harvesting for a search is an element in the list deployed in the harvesting panel. In the figure below can see one search harvesting. In the left are the search data (name of search and configured parameters). In the right are the state data of the harvesting and the flow options, also shows the number of files that are selected from the total number of files and their sizes in bytes.
One harvesting is always of a completed dataset (with all files). However, a search may include only some files of the total in a dataset (e.g a search with less variables that there is in the dataset or a search in a range of time). That is why, by default, the files selected to future downloads (in the application or by generation of metalink) are always the files that satisfy ALL search constraints.
Noteworthy that this application allows to manually select files to download in case that you want to download files that are not selected by default or deselected some of these.
The flow control options are below progress bar:
In the center bottom there are the options to explore and download the harvested datasets:
|Explore Search||is always visible and allows explore and put to download a individual dataset. Also, view a individual state of harvesting of a dataset|
|Export to metalink||is visible when harvesting of search is completed. Allows to generate a metalink file with the files and its resources (replicas in data nodes)|
|Download ...||is visible when harvesting of search is completed. Allows to put to download all files that satisfy the constraints of the search or to put to download a set of files manually selected of the search|