wiki:ESGF

Version 8 (modified by terryk, 7 years ago) (diff)

--

See Also


ESGF

The Earth System Grid Federation (ESGF1) is a spontaneous collaboration of groups, agencies and institutions around the world, that are dedicated to the development and operation of a long-term system for the management, access and analysis of climate data. ESGF's primary goal is to facilitate advancements in Earth System Science. Some of the challenges that ESGF is committed to address include2:

  • The enormous scale of the data holdings, moving from Peta-bytes to Exa-bytes.
  • Support for both model output and a wide variety of observational data
  • The distributed nature of the data archives, which are geographically distributed and autonomously operated
  • The need to enable users to access and analyze data with a wide variety of client tools - not just web browsers, but also rich desktop clients, libraries and toolkits
  • The need to harmonize and federate multiple local access policies

Sponsors:
ESGF is not a directly funded organization. The current core contributors to the project work for various agencies around the world, including:

U.S

Europe

Australia

DOE

NASA

NOAA

NSF

IS-ENES

NCI

ESGF Architecture

The ESGF architecture is based in the peer-to-peer(P2P) paradigm3, allowing a system of autonomous and distributed Nodes, which interoperate through common acceptance of federation protocols and trust agreements. The system is composed of multiple sites (called “Nodes”) that are geographically distributed around the world, but can interoperate because they have adopted a common set of services, protocols and APIs. Nodes exchange information about their data holdings and services, trust each other for registering users and establishing access control decisions.

Data and metadata are managed and stored independently at each Node. Internally, each ESGF Node is composed of a set of services and applications that collectively enable data and metadata access and user management. The software components are logically grouped in four areas of functionality to be able install ESGF modularly.


Data node

Includes services for secure data publication and access

Data Publisher

Generates the metadata catalogs. Scans data stored on a Data Node and making it available through the system.
Extracts metadata from the directory structure and filenames, and from the content of the files themselves and then generates THREDDS/XML catalogs.

THREDDS Data Server

Provides access to the ESGF data and metadata. Is developed by Unidata.

GridFTP server

Serves data using a special protocol based on FTP to allow for a high-performance, securely authenticated, and reliable data transfer. Is developed by Globus.

OpenID Relying Party
and
Authorization Service

Ensure proper authentication and authorization.

Index node

Contains services for indexing and searching metadata, currently implemented using Apache Solr as the back-end server

Indexing Service

Parses the metadata content available at some repository (located by its URL), and ingests it in the back-end metadata storage. At present ESGF parses metadata only from THREDDS catalogs.

Search Service

Queries the index metadata content and retrieves matching results that include descriptive information as well as all the available data access points (e.g., HTTP, GridFTP, OPeNDAP, and LAS).
The search service is invoked by clients through its REST API.

Apache Solr

Apache Solr is the underlying search engine. Solr is a popular web application which is used in many commercial web sites, featuring high-performance text and faceted searching, geospatial and temporal querying, and partition of searchable metadata across multiple local indexes (cores) and distributed servers (shards).

Web Portal UI

The web UI materializes a some of ESGF-services through the browser: user account management, collection and file-level search and discovery, the dashboard service, the LAS visualization engine, and the CIM viewer.

Dashboard

The Dashboard is the distributed monitoring system of ESGF. It is responsible for collecting historical informationabout the status of the federation.

Identity provider node
(IdP node)

Allows user authentication and secure delivery of user attributes

OpenID Provider

Allows users to register and authenticate with the system, including Single-Sign-On functionality for browser-based access throughout the federation.

MyProxy Server

The MyProxy server, developed by NCSA, is used to issue short term certificates that can be used by client libraries and toolkits to authenticate the user during a data product request.

Attribute Service
and
Registration Service

Make available to trusted clients the user attributes. When authorization is required, the local Authorization Service enforces the Node security policies by querying the particular Attribute Service in the federation that manages the configured access control group (e.g. CMIP5 Research, CMIP5 Commercial, CORDEX Research, ...).

Compute node

Contains higher-level services for data analysis and visualization.

Live Acces Server (LAS)
+
Ferret

The Live Access Server (LAS), developed by NOAA/PMEL, is an analysis and visualization engine that allows users to request advanced data and imaging products from multiple ESGF Nodes at once.
It can be configured with a pluggable visualization engine such as Ferret (the default), NCL or CDAT.

Attachments (9)

Download all attachments as: .zip