Changes between Version 4 and Version 5 of WRF4G/ExecutionEnvironments


Ignore:
Timestamp:
Feb 15, 2013 6:04:44 PM (9 years ago)
Author:
carlos
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WRF4G/ExecutionEnvironments

    v4 v5  
    2121}}}
    2222
    23 = RES =
     23= RES - Red Española de Supercomputación =
    2424
    2525
    2626== Altamira ==
     27
     28{{{
     29ssh  <userid>@altamira1.ifca.es
     30}}}
    2731
    2832=== Running Jobs ===
     
    162166== !MareNostrum ==
    163167
     168{{{
     169ssh  <userid>@mn1.bsc.es
     170}}}
     171
     172=== Running Jobs ===
     173LSF is the utility used at MareNostrum III for batch processing support, so all jobs must be run through it. This document provides information for getting started with job execution at the Cluster.
     1745.1. Submitting jobs
     175A job is the execution unit for LSF. A job is defined by a text file containing a set of directives
     176describing the job, and the commands to execute. Please, bear in mind that there is a limit of 3600
     177bytes for the size of the text file.
     1785.1.1. LSF commands
     179These are the basic directives to submit jobs:
     180• bsub < job_script
     181submits a “job script” to the queue system (see below for job script directives). Remember to pass
     182it through STDIN '<'
     183• bjobs [-w][-X][-l job_id]
     184shows all the submitted jobs.
     185• bkill <job_id>
     186remove the job from the queue system, canceling the execution of the processes, if they were still
     187running.
     1885.1.2. Job directives
     189A job must contain a series of directives to inform the batch system about the characteristics of the
     190job. These directives appear as comments in the job script, with the following syntax:
     191#BSUB -option value
     192#BSUB -J job_name
     193The name of the job.
     194#BSUB -q debug
     195This queue is only intended for small tests, so there is a limit of 1 job per user, using up to 64 cpus
     196(4 nodes), and one hour of wall clock limit.
     197#BSUB -W HH:MM
     198NOTE: take into account that you can not specify the amount of seconds in LSF. The limit of wall
     199clock time. This is a mandatory field and you must set it to a value greater than the real execution
     200time for your application and smaller than the time limits granted to the user. Notice that your job
     201will be killed after the elapsed period
     202#BSUB -cwd pathname
     203The working directory of your job (i.e. where the job will run). If not specified, it is the current
     204working directory at the time the job was submitted.
     205#BSUB -e/-eo file
     206The name of the file to collect the stderr output of the job. You can use %J for job_id. -e option will
     207APPEND the file, -eo will REPLACE the file.
     2087
     209MareNosutrm III User's Guide
     210#BSUB -o/-oo file
     211The name of the file to collect the standard output (stdout) of the job. -o option will APPEND the
     212file, -oo will REPLACE the file.
     213#BSUB -n number
     214The number of processes to start.
     215#BSUB -R"span[ptile=number]"
     216The number of processes assigned to a node.
     217We really encourage you to read the manual of bsub command to find out other specifications that
     218will help you to define the job script.
     219man bsub
     2205.1.3. Examples
     221
     222Sequential job :
     223{{{
     224#!sh
     225#!/bin/bash
     226#BSUB -n 1
     227#BSUB -oo output_%J.out
     228#BSUB -eo output_%J.err
     229#BSUB -J sequential
     230#BSUB -W 00:05
     231
     232./serial.exe
     233}}}
     234
     235The job would be submitted using:
     236{{{
     237#!sh
     238bsub < ptest.cmd
     239}}}
     240
     241Sequential job using OpenMP :
     242{{{
     243#!sh
     244#!/bin/bash
     245#BSUB -n 1
     246#BSUB -oo output_%J.out
     247#BSUB -eo output_%J.err
     248#BSUB -J sequential_OpenMP
     249#BSUB -W 00:05
     250
     251export OMP_NUM_THREADS=16
     252
     253./serial.exe
     254}}}
     255
     256Parallel job :
     257{{{
     258#!sh
     259#!/bin/bash
     260#BSUB -n 128
     261#BSUB -o output_%J.out
     262#BSUB -e output_%J.err
     263# In order to launch 128 processes with 16 processes per node:
     264#BSUB -R"span[ptile=16]"
     265#BSUB -x # Exclusive use
     266#BSUB -J parallel
     267#BSUB -W 02:00
     268# You can choose the parallel environment through modules
     269
     270module load intel openmpi
     271mpirun ./wrf.exe
     272}}}
     273
     274Parallel job using threads:
     275{{{
     276#!sh
     277#!/bin/bash
     278# The total number of MPI processes:
     279#BSUB -n 128
     280#BSUB -oo output_%J.out
     281#BSUB -eo output_%J.err
     282# It will allocate 4 MPI processes per node:
     283#BSUB -R"span[ptile=4]"
     284#BSUB -x # Exclusive use
     285#BSUB -J hybrid
     286#BSUB -W 02:00
     287# You can choose the parallel environment through
     288# modules
     289
     290module load intel openmpi
     291# 4 MPI processes per node and 16 cpus available
     292# (4 threads per MPI process):
     293
     294export OMP_NUM_THREADS=4
     295
     296mpirun ./wrf.exe
     297}}}
     298
    164299= National Computational Infrastructure (Australia) =
    165300[http://nf.nci.org.au/facilities/]