wiki:ESGFToolsUI

Version 27 (modified by terryk, 7 years ago) (diff)

--

What is ESGF Tools UI?

A tool that is integrated in ToolsUI NetCDF Java for including a desktop client for ESGF services, with features like search of data along multiple nodes, download manager and metadata aggregation for allowing a full exploration through dataset services.

The added functionality in ToolsUI NetCDF is in ESGF tab. This tab contains 4 sub-tabs, each one with an specific functionality.

No image "tabs.png" attached to ESGFToolsUI

Getting started

Pre-requisites

JDK from Sun / Oracle (1.5 - newer) or OpenJDK 6

OpenJDK 7 is not supported

Installation

  1. Download the zip in ESGFToolsUI-0.4.1.zip
  2. Unzip it
  3. Go to unzipped directory
    • In Windows:
      • Open ESGFToolsUI.jar
    • In Linux
      • In sell:
           java -jar ESGFToolsUI.jar
        

Search of ESGF Climate Data

A search in ESGF returns the records that matching the search constraints after querying to an index node. The ESGF search service is always served by a index node and this node is able to do local and distributed searches. In this tool, the distributed capabilities of the ESGF are used so any selected index node will be used to query all nodes in the ESGF system.

The search panel of ESGF climate data is in the "Search" sub-tab (be patient, the panel takes to load). The search constraints are defined by parameter configuration in the panel or by introduction of a free-text search query. The result are the number of records finding in the federation that satisfy the constraints of search (in panel in bottom right area). Record is the physical replica of a climate dataset that are stored on a data node.

Selecting a index node

The searches are executed in a index node of the federation. Each node have different processing times for the response. For this reason, the user is allowed to select the index node where prefers send the search petition.
The index node may be configured in the top left drop-down list. After selected a new index node the search panel is updated.

A global search in ESGF must be the same result in all ESGF index nodes (according to ESGF wiki). However, for reasons external to this application and that are related with bugs in ESGF search service, sometimes may give different results







Selecting search parameters

The search parameters can be configured on the panel in two ways:

  • By values selection in tree of parameters double click
  • By parameter selection in drop-down list

Selecting search parameters: Tree of parameters

The tree of parameters shows the parameters with bounded values which are defined by the federation. In the first level is the name of the parameter and the number of its bounded values. In the second level are the parameter values, each one with the count of records that satisfy the previous configured constraints (restrictions that are listed in the "configured parameters" section) + the new constraint (parameter-value).

For each parameter can be selected more than one value doing double click in the checkbox. Finally, for adding the selected values on the tree in the search constraints you must click on "Add parameters" button. All values selected in a parameter are linked by logic OR and the selected parameters are linked by logic AND. Generally, the parameter configuration is as follows, where "P" is a parameter and "V" a value.

For example, if are selected:

* Institute
  * BNU   ✔
  * CCCMA ✔
* Model
  * BNU-ESM ✔

Then the result will be the number of records that belong to BNU-ESM model and made by BNU or CCCMA institutes i.e.
((Institute,BNU)V(Institute,CCCMA))𐌡((Model,BNU-ESM))

After adding new parameters the search panel is updated (“Add parameters” button). Now in the tree, all parameters that already have values configured only shows this values. This happens because the allowed values ​​for a parameter are always those that do not result in an empty set if they are selected. To configure a parameter previously configurated you can uncheck the previously configured values and then click on "Add parameters" or you can remove them in "Configured parameters" section.

Selecting search parameters: drop-down list of parameters

The parameter selection in drop-down list (top right) allows values configuration of some parameters. In this list are contained the parameters without bounded values and that for this reason can't be configured in the tree of parameters. Also contains the parameters with bounded values. After selecting a list parameter, below of the list will be displayed a specific configuration panel for the selected parameter.

In the figure below you can see the panel that is displayed when “temporal range of data coverage” is selected. Finally, to be added this configuration click in the bottom button "Add parameter" ("Add temporal range parameter" in this case).

Selecting search parameters: configured parameters

All configured parameters are displayed in a specific panel in bottom left. In this panel are listed the configured parameters with its values assigned.

In the figure below can see that the current search is configured for searching climate data that belong to CMIP5 project, that were made by BNU or CCCMA institute, that have a time frequency of six hours, belongs to "historical" experiment and that have data between 15-01-1920 and 20-01-2000, and finally contains at least one of this variables: "hus", "ps", "psl", "ta".

This panel also allow to select a parameter and delete its configuration doing click on "Remove" and delete all parameters configuration doing click on "Remove all"

Using free text search

The free-text search o full text search allows to do a search rich in syntax, by arbitrary words, logic operations and wild-cards. In this case, the records searched will be records that contains metadata that are related with what is specified in the free text query.

In top center in the search panel can see the text box where can be introduced the query of free text search. The "Edit" button enables input by keyboard in the text box. The "Save" button adds the free text query parameter in the search. The new configured parameter will be displayed in the "configured parameters" section with the name "query".

The free-text query may have:

  • Logic operations (AND & OR) between words
  • Wildcard * for the words
  • Parentheses ( ) as separators
  • metadata_name:word to specify the metadata name where must match the given word

For example, the query:

institute:CCCMA OR (id:*IPSL* AND (model:IPSL-CM5A-LR OR model:IPSL-CM5B-LR))

Will search all records in ESGF whose institute is CCCMA OR that satisfy the following points:

  1. Have an id that contains the word IPSL (id:*IPSL*)
  2. Belong to IPSL-CM5A-LR model OR IPSL-CM5B-LR model

Save a search

A configured search can be saved for later for to be able do a harvesting of metadata and services of datasets that satisfy the search constraints. The save section is in right bottom of search panel. You can overwrite the selected search with "Save search" button or save it with a new name clicking on "Save Search As..." button.

The name of search must be unique. Duplicated names aren't allowed

Select a saved search

You can select a previously saved search from the right top drop-down list. After select a search, the search panel is updated.

Reset search panel

To restart the search panel must be selected in the drop-down list the option: <<New search>>

Metadata Harvesting of ESGF Datasets of a search

The metadata harvesting will allow after completed, download the datasets from multiple data nodes, and will allow exploring climate datasets without having to download the dataset itself . So we can know its nature in detail before selecting them to download or we can know the services offered from ESGF to access and/or explore the dataset. Some of these services can be explored from ToolsUI NetCDF Java as will be explained after.

The harvesting are doing at dataset level.

What is a ESGF Dataset?

Dataset is a climate data in a specific version stored by ESGF. One dataset may have several records i.e. the records are the physical replica of a dataset in a data node.

Also, the datasets are formed of files, i.e. dataset are a virtual container of data, so that the information is contained in files and sometimes in aggregations. The versions of datasets are generated when errors are found in the datasets giving rise to a new corrected version.

Datasets, files and aggregations have replicas in data nodes. And the ESGF services (THREDDS, LAS, HTTP, GridFTP and OPeNDAP) are always served at replicas level. That is the reason why the harvesting must be done before the download.

Search Harvesting Panel

The harvesting panel may be selected clicking on the sub-tab "Search Harvesting". In this panel are deployed a list of searches and their harvesting states. Also, provides several options for flow control, complete exploration of harvested datasets and the posibility to do a manual selection of files to put them in download queue.

Each harvesting for a search is an element in the list deployed in the harvesting panel. In the figure below can see one search harvesting. In the left are the search data (name of search and configured parameters). In the right are the state data of the harvesting and the flow options, also shows the number of files that are selected from the total number of files and their sizes in bytes.

One harvesting is always of a completed dataset (with all files). However, a search may include only some files of the total in a dataset (e.g a search with less variables that there is in the dataset or a search in a range of time). That is why, by default, the files selected to future downloads (in the application or by generation of metalink) are always the files that satisfy ALL search constraints.

Noteworthy that this application allows to manually select files to download in case that you want to download files that are not selected by default or deselected some of these.

The flow control options are below progress bar:

  • start - start the harvesting
  • pause - pause the harvesting
  • remove - remove the search and its harvesting (but the harvesting remains in file system)
  • reset - remove all data of harvesting in file system and put the state of harvesting to zero)

In the center bottom there are the options to explore and download the harvested datasets:

Explore Searchis always visible and allows explore and put to download a individual dataset. Also, view a individual state of harvesting of a dataset
Export to metalinkis visible when harvesting of search is completed. Allows to generate a metalink file with the files and its resources (replicas in data nodes)
Download ...is visible when harvesting of search is completed. Allows to put to download all files that satisfy the constraints of the search or to put to download a set of files manually selected of the search

Exploring search harvesting

For exploring the search harvesting of a search you must click on Explore Search option that is always visible in harvesting panel. And then a window pops up with the exploring options.

This window provides a paginated list of harvesting states of datasets that belong to the search. Each dataset are identify by its instance_id. The last value comma separated is the version of dataset. The exploring options in this window are provided at dataset level. Each dataset have options, and these are only provided when the dataset harvesting is completed.

In ESGF the dataset are identify by id, instance_id and master_id. And in this application the "instance_id" is the more important because identify all replicas across federation and is specific to each version.

For more information https://github.com/ESGF/esgf.github.io/wiki/ESGF_Search_REST_API#identifiers

In figure below can see the exploring window and its options.

Exploring search harvesting : Harvesting view

Harvesting View option allows to explore a dataset in a tree view. Here you can explore the metadata for datasets that include files, replicas of datasets, replicas of files and finally, ESGF services offered for explore them (by URL endpoint).

  • LAS and THREDDS are services of datasets offered at replica level.
  • HTTP, OPeNDAP and GridFTP are services of files offered at replica level.

Exploring search harvesting : Open metadata

Open metadata option allows to explore an abstract of metadata of a dataset in a table view.

Exploring search harvesting : Download Dataset

Download Dataset... allows to manually select the files that will be put to download. In the top of window can see the id of dataset, the number of files selected, the total number of files of the dataset, the size in bytes of the sum of selected files and the total size of the whole dataset.

This windows displays a list of files and its size in bytes. You can select manually the files that you want download doing click in the check-box associated in each file in the list.

This window also provides the next options:

  • Deselect all
  • Select all
  • Filter by constraints of search - select only files that satisfy the constraints of search
  • Download dataset - put to download the files selected in this window

Attachments (52)