A tool that is integrated in ?ToolsUI NetCDF Java for including a desktop client for ?ESGF services, with features like search of data along multiple nodes, download manager and metadata aggregation for allowing a full exploration through dataset services.
The added functionality in ToolsUI NetCDF is in ESGF tab. This tab contains 4 sub-tabs, each one with an specific functionality.
JDK from Sun / Oracle (1.5 - newer) or OpenJDK 6
OpenJDK 7 is not supported
java -jar ESGFToolsUI.jar
A search in ESGF returns the records that matching the search constraints after querying to an index node. The ESGF search service is always served by a index node and this node is able to do local and distributed searches. In this tool, the distributed capabilities of the ESGF are used so any selected index node will be used to query all nodes in the ESGF system.
The search panel of ESGF climate data is in the "Search" sub-tab (be patient, the panel takes to load). The search constraints are defined by parameter configuration in the panel or by introduction of a free-text search query. The result are the number of records finding in the federation that satisfy the constraints of search (in panel in bottom right area). Record is the physical replica of a climate dataset that are stored on a data node.
The searches are executed in a index node of the federation. Each node have different processing times for the response. For this reason, the user is allowed to select the index node where prefers send the search petition.
The index node may be configured in the top left drop-down list. After selected a new index node the search panel is updated.
A global search in ESGF must be the same result in all ESGF index nodes (according to ESGF wiki). However, for reasons external to this application and that are related with bugs in ESGF search service, sometimes may give different results
The search parameters can be configured on the panel in two ways:
The tree of parameters shows the parameters with bounded values which are defined by the federation. In the first level is the name of the parameter and the number of its bounded values. In the second level are the parameter values, each one with the count of records that satisfy the previous configured constraints (restrictions that are listed in the "configured parameters" section) + the new constraint (parameter-value).
For each parameter can be selected more than one value doing double click in the checkbox. Finally, for adding the selected values on the tree in the search constraints you must click on "Add parameters" button. All values selected in a parameter are linked by logic OR and the selected parameters are linked by logic AND. Generally, the parameter configuration is as follows, where "P" is a parameter and "V" a value.
For example, if are selected: * Institute * BNU ? * CCCMA ? * Model * BNU-ESM ? Then the result will be the number of records that belong to BNU-ESM model and made by BNU or CCCMA institutes i.e. ((Institute,BNU)V(Institute,CCCMA))??((Model,BNU-ESM))
After adding new parameters the search panel is updated (?Add parameters? button). Now in the tree, all parameters that already have values configured only shows this values. This happens because the allowed values ??for a parameter are always those that do not result in an empty set if they are selected. To configure a parameter previously configurated you can uncheck the previously configured values and then click on "Add parameters" or you can remove them in "Configured parameters" section.
The parameter selection in drop-down list (top right) allows values configuration of some parameters. In this list are contained the parameters without bounded values and that for this reason can't be configured in the tree of parameters. Also contains the parameters with bounded values. After selecting a list parameter, below of the list will be displayed a specific configuration panel for the selected parameter.
In the figure below you can see the panel that is displayed when ?temporal range of data coverage? is selected. Finally, to be added this configuration click in the bottom button "Add parameter" ("Add temporal range parameter" in this case).
All configured parameters are displayed in a specific panel in bottom left. In this panel are listed the configured parameters with its values assigned.
In the figure below can see that the current search is configured for searching climate data that belong to CMIP5 project, that were made by BNU or CCCMA institute, that have a time frequency of six hours, belongs to "historical" experiment and that have data between 15-01-1920 and 20-01-2000, and finally contains at least one of this variables: "hus", "ps", "psl", "ta".
This panel also allow to select a parameter and delete its configuration doing click on "Remove" and delete all parameters configuration doing click on "Remove all"
The free-text search o full text search allows to do a search rich in syntax, by arbitrary words, logic operations and wild-cards. In this case, the records searched will be records that contains metadata that are related with what is specified in the free text query.
In top center in the search panel can see the text box where can be introduced the query of free text search. The "Edit" button enables input by keyboard in the text box. The "Save" button adds the free text query parameter in the search. The new configured parameter will be displayed in the "configured parameters" section with the name "query".
The free-text query may have:
For example, the query:
Will search all records in ESGF whose institute is CCCMA OR that satisfy the following points:
A configured search can be saved for later for to be able do a harvesting of metadata and services of datasets that satisfy the search constraints. The save section is in right bottom of search panel. You can overwrite the selected search with "Save search" button or save it with a new name clicking on "Save Search As..." button.
The name of search must be unique. Duplicated names aren't allowed
You can select a previously saved search from the right top drop-down list. After select a search, the search panel is updated.
To restart the search panel must be selected in the drop-down list the option: <<New search>>
The metadata harvesting will allow, after completed, download the datasets from multiple data nodes, and will allow exploring climate datasets without having to download the dataset itself . So we can know its nature in detail before selecting them to download or we can know the services offered from ESGF to access and/or explore the dataset. Some of these services can be explored from ToolsUI NetCDF Java as will be explained after.
The harvesting are doing at dataset level.
Dataset is a climate data in a specific version stored by ESGF. One dataset may have several records i.e. the records are the physical replica of a dataset in a data node.
Also, the datasets are formed of files, i.e. dataset are a virtual container of data, so that the information is contained in files and sometimes in aggregations. The versions of datasets are generated when errors are found in the datasets giving rise to a new corrected version.
Datasets, files and aggregations have replicas in data nodes. And the ESGF services (THREDDS, LAS, HTTP, GridFTP and OPeNDAP) are always served at replicas level. That is the reason why the harvesting must be done before the download.
The harvesting panel may be selected clicking on the sub-tab "Search Harvesting". In this panel are deployed a list of searches and their harvesting states. Also, provides several options for flow control, complete exploration of harvested datasets and the posibility to do a manual selection of files to put them in download queue.
Each harvesting for a search is an element in the list deployed in the harvesting panel. In the figure below can see one search harvesting. In the left are the search data (name of search and configured parameters). In the right are the state data of the harvesting and the flow options, also shows the number of files that are selected from the total number of files and their sizes in bytes.
One harvesting is always of a completed dataset (with all files). However, a search may include only some files of the total in a dataset (e.g a search with less variables that there is in the dataset or a search in a range of time). That is why, by default, the files selected to future downloads (in the application or by generation of metalink) are always the files that satisfy ALL search constraints.
Noteworthy that this application allows to manually select files to download in case that you want to download files that are not selected by default or deselected some of these.
The flow control options are below progress bar:
In the center bottom there are the options to explore and download the harvested datasets:
|Explore Search||is always visible and allows explore and put to download a individual dataset. Also, view a individual state of harvesting of a dataset|
|Export to metalink||is visible when harvesting of search is completed. Allows to generate a metalink file with the files and its resources (replicas in data nodes)|
|Download ...||is visible when harvesting of search is completed. Allows to put to download all files that satisfy the constraints of the search or to put to download a set of files manually selected of the search|
To exploring the search harvesting of a search you must click on Explore Search option that is always visible in harvesting panel. And then a window pops up with the exploring options.
This window provides a paginated list of harvesting states of datasets that belong to the search. Each dataset are identify by its instance_id. The last value comma separated is the version of dataset. The exploring options in this window are provided at dataset level. Each dataset have options, and these are only provided when the dataset harvesting is completed.
In ESGF the dataset are identify by id, instance_id and master_id. And in this application the "instance_id" is the more important because identify all replicas across federation and is specific to each version.
For more information ?https://github.com/ESGF/esgf.github.io/wiki/ESGF_Search_REST_API#identifiers
In figure below can see the exploring window and its options.
Harvesting View option allows to explore a dataset in a tree view. Here you can explore the metadata for datasets that include files, replicas of datasets, replicas of files and finally, ESGF services offered for explore them (by URL endpoint).
Open metadata option allows to explore an abstract of metadata of a dataset in a table view.
Download Dataset... allows to manually select the files that will be put to download. In the top of window can see the id of dataset, the number of files selected, the total number of files of the dataset, the size in bytes of the sum of selected files and the total size of the whole dataset.
This windows displays a list of files and its size in bytes. You can select manually the files that you want download doing click in the check-box associated in each file in the list.
This window also provides the next options:
When you click on Download selected a window asks for destination path.
To export the search to Metalink you must click on "Export to Metalink". This option is is visible when harvesting of search is completed. And then a window pops up with the save options.
The files that will be include in the Metalink file are only the files that satisfy the constraints of search.
Metalink is an extensible metadata file format that describes one or more computer files available for download. It specifies files appropriate for the user's language and operating system; facilitates file verification and recovery from data corruption; and lists alternate download sources (mirror URIs). For more info: ?Metalink, ?Metalink-Wikipedia
Exist many clients that supports Metalink: ?Aria2, ?GetRight, ?DownloadThemAll, ?Orbit Downloader, etc.
Note that if you want use a external client to download ESGF files you must configure the client for using the ESGF certificates.
To put the files that belongs to a search in downloads queue you must click on Download .... This option is visible when harvesting of search is completed. And then a window pops up with the download options.
In this window (that you can see below) are displayed the list of datasets that satisfy the constraints of search. Each dataset have: id, description, the number of files that satisfy the search constraints with respect the total number of files and the size in disk of the selected files with respect the total size in disk of dataset.
For download all files that satisfy the constraints (without manually selection) you must click on "Download" button that is in the bottom of window. Then a dialog box will be showed to confirm the download and select the destination path.
For manually selection you may click on "Select file to Download", that displays a file selection window as explained in Exploring search harvesting : DownloadDataset - File selection window