• English 
  • Spanish 

Enhancing the ESGF data node to load balance a distributed cluster of THREDDS instances

The THREDDS data server (TDS) is designed to work as a standalone web application serving data from a single server instance. This design makes TDS performance to scale badly and management difficulties arise when a huge catalog tree has to be maintained or when the TDS has to deal with overloads, causing a degraded or faulty service. Currently, the ESGF-node includes a gateway to the services running in the ESGF-node. One of these services is the TDS web application which runs in the ESGF-node sharing existing resources on the host. The ESGF-node design only considers one TDS instance running besides to the rest of ESGF-node services. Also, this TDS instance deploys the complete catalog hierarchy automatically generated by the esg-publisher, which can become difficult to maintain and to scale if lots of datasets and collections are generated. In this contribution, we show a way of deploying a load balanced and automatic provisioned cluster of TDS instances. The definition of the desired infrastructure is declared in a YAML file for Ansible (Infrastructure as Code, IaC), which uses roles and playbooks, that will automatically deploy the cluster of TDS instances and catalogs. This definition of the deployment infrastructure follows the TDS Deployment Model, which is composed by Collections, Replicas and Instances deployed in Hosts conforming Clusters. A Collection is a hierarchy of THREDDS file catalogs that can be deployed to a regular TDS instance on its own. TDS instances are Apache Tomcat server instances, accessed from the outside through a gateway (i.e. reverse proxy), running the TDS web application in a load balanced way. We refer to every publication of the collection in the TDS instances as a replica. Following this TDS Deployment Model, a sysadmin can define both instances and hosts where each replica will be deployed conforming a cluster. The TDS Deployment Model and its implementation have been tested with the current deployment of a ESGF-node in order to extend the data node to act as full functional gateway of a distributed cluster of TDS instances. We conclude that it is feasible to add automatically deployed load balancing support, catalog partitioning and integration with the current publication workflow to the architecture of the ESGF data node.