Changes between Version 5 and Version 6 of WRF4G/ExecutionEnvironments
- Timestamp:
- Feb 15, 2013 6:14:16 PM (9 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
WRF4G/ExecutionEnvironments
v5 v6 1 = [[http://www.dkrz.de/Nutzerportal-en/doku/getting-started|DKRZ]] = 2 3 '''How to use DKRZ facilities?''' 4 5 Workflows in climate modelling research are complex and comprise, in general, a number of different tasks, such as model formulation and development (including debugging, platform porting, and performance optimization), generation of input data, performing model simulations, postprocessing, visualization and analysis of output data, long-term archiving of the data, documentation and publication of results. The '''DKRZ''' hardware and software infrastructure is optimally adapted to accomplish these tasks in an efficient way. In the graphic below we give a schematic overview on the '''DKRZ''' systems. 6 7 [[Image(http://www.dkrz.de/bilder/bilder-nutzerportal/bilder-dokumentation/DKRZsystems.png,50%,)]] 8 9 For a more detailed description of the different systems shown in the picture and basic software installed on these systems [[http://www.dkrz.de/Nutzerportal-en/doku/getting-started/dkrz_system|click here]]. 10 == Blizzard == 11 12 [http://www.dkrz.de/Nutzerportal-en/doku/blizzard] 13 {{{ 14 ssh <userid>@blizzard.dkrz.de 15 }}} 16 17 == Lizard == 18 [http://www.dkrz.de/Nutzerportal-en/doku/blizzard/lizard] 19 {{{ 20 ssh <userid>@lizard.dkrz.de 21 }}} 1 [[PageOutline(1-10,Page Contents)]] 22 2 23 3 = RES - Red Española de Supercomputación = 24 25 4 26 5 == Altamira == … … 171 150 172 151 === Running Jobs === 152 173 153 LSF is the utility used at MareNostrum III for batch processing support, so all jobs must be run through it. This document provides information for getting started with job execution at the Cluster. 174 5.1. Submitting jobs 175 A job is the execution unit for LSF. A job is defined by a text file containing a set of directives 176 describing the job, and the commands to execute. Please, bear in mind that there is a limit of 3600 177 bytes for the size of the text file. 178 5.1.1. LSF commands 154 155 === Submitting jobs === 156 A job is the execution unit for LSF. A job is defined by a text file containing a set of directives describing the job, and the commands to execute. Please, bear in mind that there is a limit of 3600 bytes for the size of the text file. 157 158 === LSF commands === 179 159 These are the basic directives to submit jobs: 180 • bsub < job_script 181 submits a “job script” to the queue system (see below for job script directives). Remember to pass 182 it through STDIN '<' 183 • bjobs [-w][-X][-l job_id] 184 shows all the submitted jobs. 185 • bkill <job_id> 186 remove the job from the queue system, canceling the execution of the processes, if they were still 187 running. 188 5.1.2. Job directives 189 A job must contain a series of directives to inform the batch system about the characteristics of the 190 job. These directives appear as comments in the job script, with the following syntax: 160 161 '''bsub < job_script''' submits a “job script” to the queue system (see below for job script directives). Remember to pass it through STDIN '<' 162 '''bjobs [-w][-X][-l job_id]''' shows all the submitted jobs. 163 164 ''''bkill <job_id>''' remove the job from the queue system, canceling the execution of the processes, if they were still running. 165 166 === Job directives === 167 A job must contain a series of directives to inform the batch system about the characteristics of the job. These directives appear as comments in the job script, with the following syntax: 168 {{{ 169 #!sh 191 170 #BSUB -option value 171 }}} 172 {{{ 173 #!sh 192 174 #BSUB -J job_name 175 }}} 193 176 The name of the job. 177 {{{ 178 #!sh 194 179 #BSUB -q debug 195 This queue is only intended for small tests, so there is a limit of 1 job per user, using up to 64 cpus 196 (4 nodes), and one hour of wall clock limit. 180 }}} 181 This queue is only intended for small tests, so there is a limit of 1 job per user, using up to 64 cpus (4 nodes), and one hour of wall clock limit. 182 {{{ 183 #!sh 197 184 #BSUB -W HH:MM 198 NOTE: take into account that you can not specify the amount of seconds in LSF. The limit of wall 199 clock time. This is a mandatory field and you must set it to a value greater than the real execution 200 time for your application and smaller than the time limits granted to the user. Notice that your job 201 will be killed after the elapsed period 185 }}} 186 NOTE: take into account that you can not specify the amount of seconds in LSF. The limit of wall clock time. This is a mandatory field and you must set it to a value greater than the real execution time for your application and smaller than the time limits granted to the user. Notice that your job will be killed after the elapsed period. 187 {{{ 188 #!sh 202 189 #BSUB -cwd pathname 203 The working directory of your job (i.e. where the job will run). If not specified, it is the current 204 working directory at the time the job was submitted. 190 }}} 191 The working directory of your job (i.e. where the job will run). If not specified, it is the current working directory at the time the job was submitted. 192 {{{ 193 #!sh 205 194 #BSUB -e/-eo file 206 The name of the file to collect the stderr output of the job. You can use %J for job_id. -e option will 207 APPEND the file, -eo will REPLACE the file.208 7 209 MareNosutrm III User's Guide 195 }}} 196 The name of the file to collect the stderr output of the job. You can use %J for job_id. -e option will APPEND the file, -eo will REPLACE the file. 197 {{{ 198 #!sh 210 199 #BSUB -o/-oo file 211 The name of the file to collect the standard output (stdout) of the job. -o option will APPEND the 212 file, -oo will REPLACE the file. 200 }}} 201 The name of the file to collect the standard output (stdout) of the job. -o option will APPEND the file, -oo will REPLACE the file. 202 {{{ 203 #!sh 213 204 #BSUB -n number 205 }}} 214 206 The number of processes to start. 207 {{{ 208 #!sh 215 209 #BSUB -R"span[ptile=number]" 210 }}} 216 211 The number of processes assigned to a node. 217 We really encourage you to read the manual of bsub command to find out other specifications that 218 will help you to define the job script. 212 213 We really encourage you to read the manual of bsub command to find out other specifications that will help you to define the job script. 214 {{{ 215 #!sh 219 216 man bsub 220 5.1.3. Examples 217 }}} 218 219 === Job Examples === 221 220 222 221 Sequential job : … … 297 296 }}} 298 297 298 = [[http://www.dkrz.de/Nutzerportal-en/doku/getting-started|DKRZ]] = 299 300 '''How to use DKRZ facilities?''' 301 302 Workflows in climate modelling research are complex and comprise, in general, a number of different tasks, such as model formulation and development (including debugging, platform porting, and performance optimization), generation of input data, performing model simulations, postprocessing, visualization and analysis of output data, long-term archiving of the data, documentation and publication of results. The '''DKRZ''' hardware and software infrastructure is optimally adapted to accomplish these tasks in an efficient way. In the graphic below we give a schematic overview on the '''DKRZ''' systems. 303 304 [[Image(http://www.dkrz.de/bilder/bilder-nutzerportal/bilder-dokumentation/DKRZsystems.png,50%,)]] 305 306 For a more detailed description of the different systems shown in the picture and basic software installed on these systems [[http://www.dkrz.de/Nutzerportal-en/doku/getting-started/dkrz_system|click here]]. 307 == Blizzard == 308 309 [http://www.dkrz.de/Nutzerportal-en/doku/blizzard] 310 {{{ 311 ssh <userid>@blizzard.dkrz.de 312 }}} 313 314 == Lizard == 315 [http://www.dkrz.de/Nutzerportal-en/doku/blizzard/lizard] 316 {{{ 317 ssh <userid>@lizard.dkrz.de 318 }}} 319 299 320 = National Computational Infrastructure (Australia) = 300 321 [http://nf.nci.org.au/facilities/]