Changes between Version 5 and Version 6 of WRF4G/ExecutionEnvironments


Ignore:
Timestamp:
Feb 15, 2013 6:14:16 PM (9 years ago)
Author:
carlos
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WRF4G/ExecutionEnvironments

    v5 v6  
    1 = [[http://www.dkrz.de/Nutzerportal-en/doku/getting-started|DKRZ]] =
    2 
    3 '''How to use DKRZ facilities?'''
    4 
    5 Workflows in climate modelling research are complex and comprise, in general, a number of different tasks, such as model formulation and development (including debugging, platform porting, and performance optimization), generation of input data, performing model simulations, postprocessing, visualization and analysis of output data, long-term archiving of the data, documentation and publication of results. The '''DKRZ''' hardware and software infrastructure is optimally adapted to accomplish these tasks in an efficient way. In the graphic below we give a schematic overview on the '''DKRZ''' systems.
    6 
    7 [[Image(http://www.dkrz.de/bilder/bilder-nutzerportal/bilder-dokumentation/DKRZsystems.png,50%,)]]
    8 
    9 For a more detailed description of the different systems shown in the picture and basic software installed on these systems [[http://www.dkrz.de/Nutzerportal-en/doku/getting-started/dkrz_system|click here]].
    10 == Blizzard ==
    11 
    12 [http://www.dkrz.de/Nutzerportal-en/doku/blizzard]
    13 {{{
    14 ssh  <userid>@blizzard.dkrz.de
    15 }}}
    16 
    17 == Lizard ==
    18 [http://www.dkrz.de/Nutzerportal-en/doku/blizzard/lizard]
    19 {{{
    20 ssh  <userid>@lizard.dkrz.de
    21 }}}
     1[[PageOutline(1-10,Page Contents)]]
    222
    233= RES - Red Española de Supercomputación =
    24 
    254
    265== Altamira ==
     
    171150
    172151=== Running Jobs ===
     152
    173153LSF is the utility used at MareNostrum III for batch processing support, so all jobs must be run through it. This document provides information for getting started with job execution at the Cluster.
    174 5.1. Submitting jobs
    175 A job is the execution unit for LSF. A job is defined by a text file containing a set of directives
    176 describing the job, and the commands to execute. Please, bear in mind that there is a limit of 3600
    177 bytes for the size of the text file.
    178 5.1.1. LSF commands
     154
     155=== Submitting jobs ===
     156A job is the execution unit for LSF. A job is defined by a text file containing a set of directives describing the job, and the commands to execute. Please, bear in mind that there is a limit of 3600 bytes for the size of the text file.
     157
     158=== LSF commands ===
    179159These are the basic directives to submit jobs:
    180 • bsub < job_script
    181 submits a “job script” to the queue system (see below for job script directives). Remember to pass
    182 it through STDIN '<'
    183 • bjobs [-w][-X][-l job_id]
    184 shows all the submitted jobs.
    185 • bkill <job_id>
    186 remove the job from the queue system, canceling the execution of the processes, if they were still
    187 running.
    188 5.1.2. Job directives
    189 A job must contain a series of directives to inform the batch system about the characteristics of the
    190 job. These directives appear as comments in the job script, with the following syntax:
     160
     161    '''bsub < job_script''' submits a “job script” to the queue system (see below for job script directives). Remember to pass it through STDIN '<'
     162    '''bjobs [-w][-X][-l job_id]''' shows all the submitted jobs.
     163
     164    ''''bkill <job_id>''' remove the job from the queue system, canceling the execution of the processes, if they were still running.
     165
     166=== Job directives ===
     167A job must contain a series of directives to inform the batch system about the characteristics of the job. These directives appear as comments in the job script, with the following syntax:
     168{{{
     169#!sh
    191170#BSUB -option value
     171}}}
     172{{{
     173#!sh
    192174#BSUB -J job_name
     175}}}
    193176The name of the job.
     177{{{
     178#!sh
    194179#BSUB -q debug
    195 This queue is only intended for small tests, so there is a limit of 1 job per user, using up to 64 cpus
    196 (4 nodes), and one hour of wall clock limit.
     180}}}
     181This queue is only intended for small tests, so there is a limit of 1 job per user, using up to 64 cpus (4 nodes), and one hour of wall clock limit.
     182{{{
     183#!sh
    197184#BSUB -W HH:MM
    198 NOTE: take into account that you can not specify the amount of seconds in LSF. The limit of wall
    199 clock time. This is a mandatory field and you must set it to a value greater than the real execution
    200 time for your application and smaller than the time limits granted to the user. Notice that your job
    201 will be killed after the elapsed period
     185}}}
     186NOTE: take into account that you can not specify the amount of seconds in LSF. The limit of wall clock time. This is a mandatory field and you must set it to a value greater than the real execution time for your  application and smaller than the time limits granted to the user. Notice that your job will be killed after the elapsed period.
     187{{{
     188#!sh
    202189#BSUB -cwd pathname
    203 The working directory of your job (i.e. where the job will run). If not specified, it is the current
    204 working directory at the time the job was submitted.
     190}}}
     191The working directory of your job (i.e. where the job will run). If not specified, it is the current working directory at the time the job was submitted.
     192{{{
     193#!sh
    205194#BSUB -e/-eo file
    206 The name of the file to collect the stderr output of the job. You can use %J for job_id. -e option will
    207 APPEND the file, -eo will REPLACE the file.
    208 7
    209 MareNosutrm III User's Guide
     195}}}
     196The name of the file to collect the stderr output of the job. You can use %J for job_id. -e option will APPEND the file, -eo will REPLACE the file.
     197{{{
     198#!sh
    210199#BSUB -o/-oo file
    211 The name of the file to collect the standard output (stdout) of the job. -o option will APPEND the
    212 file, -oo will REPLACE the file.
     200}}}
     201The name of the file to collect the standard output (stdout) of the job. -o option will APPEND the file, -oo will REPLACE the file.
     202{{{
     203#!sh
    213204#BSUB -n number
     205}}}
    214206The number of processes to start.
     207{{{
     208#!sh
    215209#BSUB -R"span[ptile=number]"
     210}}}
    216211The number of processes assigned to a node.
    217 We really encourage you to read the manual of bsub command to find out other specifications that
    218 will help you to define the job script.
     212
     213We really encourage you to read the manual of bsub command to find out other specifications that will help you to define the job script.
     214{{{
     215#!sh
    219216man bsub
    220 5.1.3. Examples
     217}}}
     218
     219=== Job Examples ===
    221220
    222221Sequential job :
     
    297296}}}
    298297
     298= [[http://www.dkrz.de/Nutzerportal-en/doku/getting-started|DKRZ]] =
     299
     300'''How to use DKRZ facilities?'''
     301
     302Workflows in climate modelling research are complex and comprise, in general, a number of different tasks, such as model formulation and development (including debugging, platform porting, and performance optimization), generation of input data, performing model simulations, postprocessing, visualization and analysis of output data, long-term archiving of the data, documentation and publication of results. The '''DKRZ''' hardware and software infrastructure is optimally adapted to accomplish these tasks in an efficient way. In the graphic below we give a schematic overview on the '''DKRZ''' systems.
     303
     304[[Image(http://www.dkrz.de/bilder/bilder-nutzerportal/bilder-dokumentation/DKRZsystems.png,50%,)]]
     305
     306For a more detailed description of the different systems shown in the picture and basic software installed on these systems [[http://www.dkrz.de/Nutzerportal-en/doku/getting-started/dkrz_system|click here]].
     307== Blizzard ==
     308
     309[http://www.dkrz.de/Nutzerportal-en/doku/blizzard]
     310{{{
     311ssh  <userid>@blizzard.dkrz.de
     312}}}
     313
     314== Lizard ==
     315[http://www.dkrz.de/Nutzerportal-en/doku/blizzard/lizard]
     316{{{
     317ssh  <userid>@lizard.dkrz.de
     318}}}
     319
    299320= National Computational Infrastructure (Australia) =
    300321[http://nf.nci.org.au/facilities/]