Changes between Version 26 and Version 27 of WRF4GWRFReforecast


Ignore:
Timestamp:
May 2, 2013 5:06:18 PM (9 years ago)
Author:
MarkelGarcia
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WRF4GWRFReforecast

    v26 v27  
    9494=== Experiment monitoring ===
    9595
    96 wrf4g_status permits us to monitor the state of all the realizations of the experiment. If the experiment is large, wrf4g_status can be used in combination with shell tools as grep or awk to filter the list with some criteria. For example:
     96With wrf4g_status user can monitor the state of all the realizations of the experiment. If the experiment is large, wrf4g_status can be used in combination with shell tools as grep or awk to filter the list with some criteria. For example:
    9797
    9898{{{
     
    108108Returns all the realizations that are currently in some stage of the WRF4G workflow (Down. Bin., ungrib, metgrid..., etc.)
    109109
    110 If wrf4g_submit is called again, the realizations in Failed status are submitted again. This is very useful to resubmit simulations after some problems with the infrastructure.
     110If wrf4g_submit is called again, the realizations in Failed status are re-submitted. This is very useful to re-submit simulations after they crashed because of some problems in the computing infrastructure.
    111111
    112 Other more complicated situation can occur. For example, if there is a blackout, the jobs can fail before they can send the "Failed" signal to the database. In that case, they can appear to be indefinitely in "WRF" status. These kind of problems need a less confortable monitoring, such as entering to the working nodes and use the commands "top" or "ps -ef" to see it WRF is really running.
     112Also, some realizations may fail because particular problems. Unfortunately, there are a lot of things that can fail, and covering them in this tutorial is not possible. There is a page in this wiki called [wiki:WRFKnownProblems] where many of them are commented. After years using WRF we still find new error messages sometimes. In a few realizations, WRF may crash because of some points not filling "cfl" criteria. This numerical instabilities arise when too strong gradients do appear for vertical velocity or some other variable. They can be solved using a lower timestep. However, each WRF4G experiment has a fixed timestep defined. Thus, in this case, a new experiment with a lower timestep_dxfactor (e.g. sw_failed_days with timestep_dxfactor = 5) must be created.
     113
     114Other more complicated situations can occur. For example, if there is a blackout, the jobs can fail before they can send the "Failed" signal to the database. In that case, they can appear to be indefinitely in "WRF" status. These kind of problems need a less confortable monitoring, such as entering to the working nodes and use the commands "top" or "ps -ef" to see it WRF is really running.
    113115
    114116When a realization has failed but does not appear with the "Failed" status, it can be resubmitted using the --force flag of wrf4g_submit.  Note also that wrf4g_submit can be used refering only to one realization or chunk, using the flags, -r or -c.
     117
     118The monitoring and maintenance of an experiment can be shared my different users, provided they connect their WRF4G to the same database.