Version 28 (modified by carlos, 10 years ago) (diff) |
---|
Page Contents
WRF4G Tutorial part 2
How to manage WRF4G errors ?
In this section, we are going to see how to manage WRF4G errors. In order to do that, we are going to create a new experiment called test_1, based on single_test, in which the end_date will be "2011-08-30_12:00:00". Follow the steps below.
[user@mycomputer~]$ cd $WRF4G_LOCATION/experiments [user@mycomputer~]$ ls single_test wrfuc_physics wrfuc_single_serial [user@mycomputer~]$ cp -r single_test single_test_1 [user@mycomputer~]$ cd single_test_1 [user@mycomputer~]$ cat experiment.wrf4g | grep experiment_name experiment_name = "test" [user@mycomputer~]$ cat experiment.wrf4g | grep "end_date " end_date = "2011-08-30_00:00:00" [user@mycomputer~]$ cat experiment.wrf4g | grep experiment_name experiment_name = "test_1" [user@mycomputer~]$ cat experiment.wrf4g | grep "end_date " end_date = "2011-08-30_12:00:00" [user@mycomputer~]$ wrf4g_prepare Warning: You are using resources.wrf4g located in the /home/user/WRF4G/experiments/single_test_1 directory. Preparing namelist... WRFV3/run/namelist.input WRF Check Warning: CAM radiation selected but paerlev/levsiz/cam_abs_dim1/cam_abs_dim2 was not set. Fixing... WRF Check Warning: radt is shorter than dx (0.500000) ---> Single params run ---> Continuous run ---> cycle_chunks: test_1 2011-08-28_12:00:00 2011-08-30_12:00:00 ---> chunks 1: test_1 2011-08-28_12:00:00 2011-08-29_00:00:00 ---> chunks 2: test_1 2011-08-29_00:00:00 2011-08-29_12:00:00 ---> chunks 3: test_1 2011-08-29_12:00:00 2011-08-30_00:00:00 ---> chunks 4: test_1 2011-08-30_00:00:00 2011-08-30_12:00:00 [user@mycomputer~]$ wrf4g_status --long Realization GW Stat Chunks Comp.Res WN Run.Sta ext % test 2 D 3/3 mycomputer ciclon Finished 0 100.00 test_1 - P 0/4 - - Prepared - 0.00 [user@mycomputer~]$ wrf4g_submit Submitting realization: "test_1" Submitting Chunk 1: 2011-08-28_12:00:00 2011-08-29_00:00:00 Submitting Chunk 2: 2011-08-29_00:00:00 2011-08-29_12:00:00 Submitting Chunk 3: 2011-08-29_12:00:00 2011-08-30_00:00:00 Submitting Chunk 4: 2011-08-30_00:00:00 2011-08-30_12:00:00 [user@mycomputer~]$ wrf4g_status --long Realization GW Stat Chunks Comp.Res WN Run.Sta ext % test 2 D 3/3 mycomputer ciclon Finished 0 100.00 test_1 4 R 2/4 mycomputer ciclon WRF - 25.00 [user@mycomputer~]$ wrf4g_status --long Realization GW Stat Chunks Comp.Res WN Run.Sta ext % test 2 D 3/3 mycomputer ciclon Finished 0 100.00 test_1 5 W 3/4 - - Submitted - 50.00 [user@mycomputer~]$ wrf4g_status --long Realization GW Stat Chunks Comp.Res WN Run.Sta ext % test 2 D 3/3 mycomputer ciclon Finished 0 100.00 test_1 6 W 4/4 - - Submitted - 75.00 [user@mycomputer~]$ wrf4g_status --long Realization GW Stat Chunks Comp.Res WN Run.Sta ext % test 2 D 3/3 mycomputer ciclon Finished 0 100.00 test_1 6 W 4/4 - - Submitted - 75.00 [user@mycomputer~]$ wrf4g_status --long Realization GW Stat Chunks Comp.Res WN Run.Sta ext % test 2 D 3/3 mycomputer ciclon Finished 0 100.00 test_1 6 F 4/4 mycomputer ciclon Failed 62 75.00
Like you can see before, the realization test_1 has finished with an exit code 62. What happened ? The exit code 62 indicates that ungrib binary had an error during its execution. In order to solve the error, we are going to check out the log of the chunk number 4.
[user@mycomputer~]$ cat $WRF4G_LOCATION/etc/resources.wrf4g | grep WRF4G_BASEPATH= WRF4G_BASEPATH="/home/user/WRF4G/repository/output" [user@mycomputer~]$ cd $WRF4G_LOCATION/repository/output/test_1/test_1/log/ [user@mycomputer~]$ ls log_1_4.tar.gz log_2_5.tar.gz log_3_6.tar.gz log_4_7.tar.gz
The chunk log name is composed of using chunk number and job identifier (GW).
- log_{chunk_number}_{job_identifier}.tar.gz
In our case, chunk log name will be log_4_7.tar.gz because the chunk number is 4 and the job identifier is 7.
[user@mycomputer~]$ tar xzfv log_4_7.tar.gz WRF4G.log configure.wps ls.wps ls.wrf ungrib_GFS_2011083000.out
In each log package you are able to see all WRF log binaries as well as WRF4G logs such as WRF4G.log, wrfgel.out and monitor.log. In our case, we are going to focus on WRF4G.log which the main log of the remote simulation.
[user@mycomputer~]$ cat WRF4G.log * Mon Oct 1 17:27:45 CEST 2012: Creating WRF4G structure ... `/home/user/WRF4G/repository/apps/WRFbin-3.1.1_r832INTEL_OMPI.tar.gz' -> `/home/user/.gw_user_6/WRFbin-3.1.1_r832INTEL_OMPI.tar.gz' `/home/user/WRF4G/repository/output/test_1/test_1/namelist.input' -> `/home/user/.gw_user_6/WRFV3/run/namelist.input' * Mon Oct 1 17:27:46 CEST 2012: Preparing WRF4G binaries ... * Mon Oct 1 17:27:46 CEST 2012: Creating parallel environment ... * Mon Oct 1 17:27:46 CEST 2012: Using default configuration ... * Mon Oct 1 17:27:46 CEST 2012: Checking restart information ... WRFGEL(download_file)> START: ['rst', '20110830T000000Z'] WRFGEL(download_file)> END: ['rst', '20110830T000000Z'] * Mon Oct 1 17:27:46 CEST 2012: The boundaries and initial conditions are not available ... * Mon Oct 1 17:27:46 CEST 2012: Downloading geo_em files and namelist.wps ... /home/user/.gw_user_6/WRFGEL/vcp -v /home/user/WRF4G/repository/domains/Santander_50km/* . cp -v -R /home/user/WRF4G/repository/domains/Santander_50km/* /home/user/.gw_user_6/WPS `/home/user/WRF4G/repository/domains/Santander_50km/geo_em.d01.nc' -> `/home/user/.gw_user_6/WPS/geo_em.d01.nc' `/home/user/WRF4G/repository/domains/Santander_50km/namelist.wps' -> `/home/user/.gw_user_6/WPS/namelist.wps' * Mon Oct 1 17:27:46 CEST 2012: Modifying namelist ... Updating parameter start_date in file: namelist.wps Updating parameter end_date in file: namelist.wps Updating parameter max_dom in file: namelist.wps Updating parameter prefix in file: namelist.wps Updating parameter interval_seconds in file: namelist.wps * Mon Oct 1 17:27:46 CEST 2012: About to run preprocessor and Ungrib ... * Mon Oct 1 17:27:46 CEST 2012: Running preprocessor.default ... Linking global data from: /home/user/WRF4G/repository/input/NCEP/GFS `/home/user/.gw_user_6/WPS/grbData/gfs2011082812_00.grb' -> `/home/user/WRF4G/repository/input/NCEP/GFS/2011/gfs2011082812_00.grb' `/home/user/.gw_user_6/WPS/grbData/gfs2011082812_06.grb' -> `/home/user/WRF4G/repository/input/NCEP/GFS/2011/gfs2011082812_06.grb' `/home/user/.gw_user_6/WPS/grbData/gfs2011082812_12.grb' -> `/home/user/WRF4G/repository/input/NCEP/GFS/2011/gfs2011082812_12.grb' `/home/user/.gw_user_6/WPS/grbData/gfs2011082812_18.grb' -> `/home/user/WRF4G/repository/input/NCEP/GFS/2011/gfs2011082812_18.grb' `/home/user/.gw_user_6/WPS/grbData/gfs2011082812_24.grb' -> `/home/user/WRF4G/repository/input/NCEP/GFS/2011/gfs2011082812_24.grb' `/home/user/.gw_user_6/WPS/grbData/gfs2011082812_30.grb' -> `/home/user/WRF4G/repository/input/NCEP/GFS/2011/gfs2011082812_30.grb' `/home/user/.gw_user_6/WPS/grbData/gfs2011082812_36.grb' -> `/home/user/WRF4G/repository/input/NCEP/GFS/2011/gfs2011082812_36.grb' * Mon Oct 1 17:27:47 CEST 2012: Running ungrib ... ********************************************************************************** WRF4G was deployed in ... /home/user/.gw_user_6 and it ran in ... /home/user/.gw_user_6 **********************************************************************************
WRF4G.log shows that ungrid was the last WRF binary. Therefore, if you check out ungrib log, you will probably discover the error.
[user@mycomputer~]$ tail ungrib_GFS_2011083000.out 100.0 X X X X X 70.0 X X X X X 50.0 X X X X X Subroutine DATINT: Interpolating 3-d files to fill in any missing data... Looking for data at time 2011-08-30_00 Found file: GFS:2011-08-30_00 Looking for data at time 2011-08-30_06 ERROR: Data not found: 2011-08-30_06:00:00.0000 Begin rrpr ----------------------------------------------------
The problem is that there is not input data to simulate the last chunk.
[user@mycomputer~]$cat experiment.wrf4g | grep "extdata_path" extdata_path = "${WRF4G_INPUT}/NCEP/GFS" [user@mycomputer~]$ cat $WRF4G_LOCATION/etc/resources.wrf4g | grep "WRF4G_INPUT=" WRF4G_INPUT="/home/user/WRF4G/repository/input" [user@mycomputer~]$ ls -l /home/user/WRF4G/repository/input/NCEP/GFS/2011/ total 36388 -rw-r--r-- 1 user user 4909850 2012-09-13 11:38 gfs2011082812_00.grb -rw-r--r-- 1 user user 5411705 2012-09-13 11:38 gfs2011082812_06.grb -rw-r--r-- 1 user user 5411214 2012-09-13 11:38 gfs2011082812_12.grb -rw-r--r-- 1 user user 5415031 2012-09-13 11:38 gfs2011082812_18.grb -rw-r--r-- 1 user user 5397677 2012-09-13 11:38 gfs2011082812_24.grb -rw-r--r-- 1 user user 5386190 2012-09-13 11:38 gfs2011082812_30.grb -rw-r--r-- 1 user user 5316014 2012-09-13 11:38 gfs2011082812_36.grb
Now, try to ...
How to use wrf4g_kill command
In this example, we are going to simulate an experiment with independent realizations which has multiple_parameters flag activated. The experiment is compose of five realizations with three chunk per realization. In order to use | wrf4g_kill command, we are going to first submit the experiment.
[user@mycomputer~]$ cd $WRF4G_LOCATION/experiments/wrfuc_physics [user@mycomputer~]$ ls experiment.wrf4g [user@mycomputer~]$ cat experiment.wrf4g | grep "param" multiple_parameters=1 multiparams_variables="mp_physics,cu_physics,ra_lw_physics,ra_sw_physics,sf_sfclay_physics,bl_pbl_physics,sf_surface_physics" multiparams_nitems="${max_dom},${max_dom},${max_dom},${max_dom},${max_dom},${max_dom},${max_dom}" multiparams_combinations="5,1:1:0,1,1,2,2,2/4,1:1:0,1,1,1,1,2/4,1:1:0,1,1,2,2,2/4,1:1:0,1,1,7,7,2/4,3:3:0,1,1,7,7,2 " multiparams_labels="phys1/phys2/phys3/phys4/phys5" [user@mycomputer~]$ wrf4g_prepare Warning: You are using resources.wrf4g located in the /home/carlos/WRF4G/etc/ directory. Preparing namelist... WRFV3/run/namelist.input WRF Check Warning: CAM radiation selected but paerlev/levsiz/cam_abs_dim1/cam_abs_dim2 was not set. Fixing... WRF Check Warning: radt is shorter than dx (0.500000) --->Realization: multiparams=phys1 2011-08-28_12:00:00 2011-08-30_00:00:00 Updating parameter mp_physics in file: namelist.input Updating parameter cu_physics in file: namelist.input Updating parameter ra_lw_physics in file: namelist.input Updating parameter ra_sw_physics in file: namelist.input Updating parameter sf_sfclay_physics in file: namelist.input Updating parameter bl_pbl_physics in file: namelist.input Updating parameter sf_surface_physics in file: namelist.input ---> Continuous run ---> cycle_chunks: uc_phys__phys1 2011-08-28_12:00:00 2011-08-30_00:00:00 ---> chunks 1: uc_phys__phys1 2011-08-28_12:00:00 2011-08-29_00:00:00 ---> chunks 2: uc_phys__phys1 2011-08-29_00:00:00 2011-08-29_12:00:00 ---> chunks 3: uc_phys__phys1 2011-08-29_12:00:00 2011-08-30_00:00:00 --->Realization: multiparams=phys2 2011-08-28_12:00:00 2011-08-30_00:00:00 Updating parameter mp_physics in file: namelist.input Updating parameter cu_physics in file: namelist.input Updating parameter ra_lw_physics in file: namelist.input Updating parameter ra_sw_physics in file: namelist.input Updating parameter sf_sfclay_physics in file: namelist.input Updating parameter bl_pbl_physics in file: namelist.input Updating parameter sf_surface_physics in file: namelist.input ---> Continuous run ---> cycle_chunks: uc_phys__phys2 2011-08-28_12:00:00 2011-08-30_00:00:00 ---> chunks 1: uc_phys__phys2 2011-08-28_12:00:00 2011-08-29_00:00:00 ---> chunks 2: uc_phys__phys2 2011-08-29_00:00:00 2011-08-29_12:00:00 ---> chunks 3: uc_phys__phys2 2011-08-29_12:00:00 2011-08-30_00:00:00 --->Realization: multiparams=phys3 2011-08-28_12:00:00 2011-08-30_00:00:00 Updating parameter mp_physics in file: namelist.input Updating parameter cu_physics in file: namelist.input Updating parameter ra_lw_physics in file: namelist.input Updating parameter ra_sw_physics in file: namelist.input Updating parameter sf_sfclay_physics in file: namelist.input Updating parameter bl_pbl_physics in file: namelist.input Updating parameter sf_surface_physics in file: namelist.input ---> Continuous run ---> cycle_chunks: uc_phys__phys3 2011-08-28_12:00:00 2011-08-30_00:00:00 ---> chunks 1: uc_phys__phys3 2011-08-28_12:00:00 2011-08-29_00:00:00 ---> chunks 2: uc_phys__phys3 2011-08-29_00:00:00 2011-08-29_12:00:00 ---> chunks 3: uc_phys__phys3 2011-08-29_12:00:00 2011-08-30_00:00:00 --->Realization: multiparams=phys4 2011-08-28_12:00:00 2011-08-30_00:00:00 Updating parameter mp_physics in file: namelist.input Updating parameter cu_physics in file: namelist.input Updating parameter ra_lw_physics in file: namelist.input Updating parameter ra_sw_physics in file: namelist.input Updating parameter sf_sfclay_physics in file: namelist.input Updating parameter bl_pbl_physics in file: namelist.input Updating parameter sf_surface_physics in file: namelist.input ---> Continuous run ---> cycle_chunks: uc_phys__phys4 2011-08-28_12:00:00 2011-08-30_00:00:00 ---> chunks 1: uc_phys__phys4 2011-08-28_12:00:00 2011-08-29_00:00:00 ---> chunks 2: uc_phys__phys4 2011-08-29_00:00:00 2011-08-29_12:00:00 ---> chunks 3: uc_phys__phys4 2011-08-29_12:00:00 2011-08-30_00:00:00 --->Realization: multiparams=phys5 2011-08-28_12:00:00 2011-08-30_00:00:00 Updating parameter mp_physics in file: namelist.input Updating parameter cu_physics in file: namelist.input Updating parameter ra_lw_physics in file: namelist.input Updating parameter ra_sw_physics in file: namelist.input Updating parameter sf_sfclay_physics in file: namelist.input Updating parameter bl_pbl_physics in file: namelist.input Updating parameter sf_surface_physics in file: namelist.input ---> Continuous run ---> cycle_chunks: uc_phys__phys5 2011-08-28_12:00:00 2011-08-30_00:00:00 ---> chunks 1: uc_phys__phys5 2011-08-28_12:00:00 2011-08-29_00:00:00 ---> chunks 2: uc_phys__phys5 2011-08-29_00:00:00 2011-08-29_12:00:00 ---> chunks 3: uc_phys__phys5 2011-08-29_12:00:00 2011-08-30_00:00:00 [user@mycomputer~]$ wrf4g_status --long Realization GW Stat Chunks Comp.Res WN Run.Sta ext % uc_phys__phys1 - P 0/3 - - Prepared - 0.00 uc_phys__phys2 - P 0/3 - - Prepared - 0.00 uc_phys__phys3 - P 0/3 - - Prepared - 0.00 uc_phys__phys4 - P 0/3 - - Prepared - 0.00 uc_phys__phys5 - P 0/3 - - Prepared - 0.00 [user@mycomputer~]$ wrf4g_submit Submitting realization: "uc_phys__phys1" Submitting Chunk 1: 2011-08-28_12:00:00 2011-08-29_00:00:00 Submitting Chunk 2: 2011-08-29_00:00:00 2011-08-29_12:00:00 Submitting Chunk 3: 2011-08-29_12:00:00 2011-08-30_00:00:00 Submitting realization: "uc_phys__phys2" Submitting Chunk 1: 2011-08-28_12:00:00 2011-08-29_00:00:00 Submitting Chunk 2: 2011-08-29_00:00:00 2011-08-29_12:00:00 Submitting Chunk 3: 2011-08-29_12:00:00 2011-08-30_00:00:00 Submitting realization: "uc_phys__phys3" Submitting Chunk 1: 2011-08-28_12:00:00 2011-08-29_00:00:00 Submitting Chunk 2: 2011-08-29_00:00:00 2011-08-29_12:00:00 Submitting Chunk 3: 2011-08-29_12:00:00 2011-08-30_00:00:00 Submitting realization: "uc_phys__phys4" Submitting Chunk 1: 2011-08-28_12:00:00 2011-08-29_00:00:00 Submitting Chunk 2: 2011-08-29_00:00:00 2011-08-29_12:00:00 Submitting Chunk 3: 2011-08-29_12:00:00 2011-08-30_00:00:00 Submitting realization: "uc_phys__phys5" Submitting Chunk 1: 2011-08-28_12:00:00 2011-08-29_00:00:00 Submitting Chunk 2: 2011-08-29_00:00:00 2011-08-29_12:00:00 Submitting Chunk 3: 2011-08-29_12:00:00 2011-08-30_00:00:00 [user@mycomputer~]$ wrf4g_status --long Realization GW Stat Chunks Comp.Res WN Run.Sta ext % uc_phys__phys1 0 W 1/3 - - Submitted - 0.00 uc_phys__phys2 3 W 1/3 - - Submitted - 0.00 uc_phys__phys3 6 W 1/3 - - Submitted - 0.00 uc_phys__phys4 9 W 1/3 - - Submitted - 0.00 uc_phys__phys5 12 W 1/3 - - Submitted - 0.00 [user@mycomputer~]$ wrf4g_status --long Realization GW Stat Chunks Comp.Res WN Run.Sta ext % uc_phys__phys1 0 R 1/3 mycomputer ciclogenes WRF - 0.00 uc_phys__phys2 3 W 1/3 - - Submitted - 0.00 uc_phys__phys3 6 W 1/3 - - Submitted - 0.00 uc_phys__phys4 9 W 1/3 - - Submitted - 0.00 uc_phys__phys5 12 W 1/3 - - Submitted - 0.00
As the experiment is working now, we are going to stop the ckunks of the uc_phys__phys1 realization using wrf4g_kill command.
[user@mycomputer~]$ wrf4g_kill -r uc_phys__phys1 [user@mycomputer~]$ wrf4g_status --long Realization GW Stat Chunks Comp.Res WN Run.Sta ext % uc_phys__phys1 - P 0/3 - - Prepared - 0.00 uc_phys__phys2 3 W 1/3 - - Submitted - 0.00 uc_phys__phys3 6 W 1/3 - - Submitted - 0.00 uc_phys__phys4 9 W 1/3 - - Submitted - 0.00 uc_phys__phys5 12 W 1/3 - - Submitted - 0.00
Note that, Run.Sta has changed into Prepared value. If you want you submit again the realization, you only need to execute wrf4g_submit -r uc_phys__phys1.
[user@mycomputer~]$ wrf4g_submit -r uc_phys__phys1 Submitting realization: "uc_phys__phys1" Submitting Chunk 1: 2011-08-28_12:00:00 2011-08-29_00:00:00 Submitting Chunk 2: 2011-08-29_00:00:00 2011-08-29_12:00:00 Submitting Chunk 3: 2011-08-29_12:00:00 2011-08-30_00:00:00 [user@mycomputer~]$ wrf4g_status --long Realization GW Stat Chunks Comp.Res WN Run.Sta ext % uc_phys__phys1 15 W 1/3 - - Submitted - 0.00 uc_phys__phys2 3 R 1/3 mycomputer ciclogenes real - 0.00 uc_phys__phys3 6 W 1/3 - - Submitted - 0.00 uc_phys__phys4 9 W 1/3 - - Submitted - 0.00 uc_phys__phys5 12 W 1/3 - - Submitted - 0.00 [user@mycomputer~]$ wrf4g_status --long Realization GW Stat Chunks Comp.Res WN Run.Sta ext % uc_phys__phys1 15 W 1/3 - - Submitted - 0.00 uc_phys__phys2 3 R 1/3 mycomputer ciclogenes WRF - 0.00 uc_phys__phys3 6 W 1/3 - - Submitted - 0.00 uc_phys__phys4 9 W 1/3 - - Submitted - 0.00 uc_phys__phys5 12 W 1/3 - - Submitted - 0.00
The uc_phys__phys2 is running because all realizations are independents.
How to add new computing resources to WRF4G
framework4g.conf file which is located under $WRF4G_LOCATION/etc directory has a section called Computing Resources
WRF4G uses DRM4G in order to configure computing resources. Using that tool, users are able to access different Distributed Resource Managements (DRM) such as:
- PBS/Torque
- SGE
- FORK
- LoadLeveler
- MN SLRUM (only for Red Española de Supercomputación)
This file contains one resource per line, with format:
FQDN attributes ... ... FQDN attributes
where:
- FQDN: is the name of the resource.
- attributes: are the static attributes of the resource. The syntax is:
<scheme>:<username>@<host>?<query>
- scheme: the URL schemes available are "ssh" and "local".
- ssh: access to remote DRM via SSH
- local: use the local DRM
- username: user name
- host: host name
- query: contains additional information. The query string syntax is:
- key1=value1;key2=value2;key3=value3
Variable options:
- LRMS_TYPE (mandatory) : DRM system for execution [pbs | sge | fork | loadleveler | mnslurm ]
- PROJECT (optional for SGE, PBS and LoadLeveler): specifies the project to which the jobs are assigned
- GW_RUNDIR (optional) : directory on the resource in which jobs are deployed. By default, it is user's home
- GW_LOCALDIR (optional) : defines the working directory on the Working Node (have to be an absolute path)
- NODECOUNT (optional) : total number of slots on the DRM system
- QUEUE_NAME (optional) : the name of the queue to configure
Examples:
mycomputer local://localhost?LRMS_TYPE=fork;NODECOUNT=1
PBS_cluster local://localhost?LRMS_TYPE=pbs;QUEUE_NAME=estadistica
SGE_cluster local://localhost?LRMS_TYPE=sge;PROJECT=l.project
remote_PBS_cluster ssh://user@hostname_submitting_machine?LRMS_TYPE=pbs;QUEUE_NAME=short
After modifying this file, in order to make changes effective, users will have to execute: wrf4g_framework reload