[wrfems] MPICH timeout of metgrid...
Brett McDonald
brett.mcdonald at noaa.gov
Tue Apr 10 09:52:37 MDT 2012
Hi All:
Two times a day, I've got a WRF NMM run which should go for 72 hours.
I've had a problem lately where the model either crashes in the initial
processing or only goes out XX hours (something less than 72). This
happened again today. For the first attempt, the horizontal
interpolation of the files to the domain failed even though it appeared
that all of the NAM files came in just fine. In section IV of
ems_autoruns.log, I got a "MPICH Timeout of metgrid after 299
seconds..." and later on, the creation of the WRF initial and boundary
conditions failed.
On second attempt (see attached log file), the "MPICH Timeout..." also
occurred, but the model is continuing, yet states that it is only going
to go out 27 hours instead of 72.
This seems to occur only on my daytime runs when I'm doing other things
on the machine. Is the MPICH Timeout occurring because it is taking
longer for that process to run (the machine is busier)? If so, can I
increase the timeout check for this portion of the processing?
Thanks,
Brett McDonald
RIW WY WFO - SOO
-------------- next part --------------
AUTORUN: Domain to be included in the simulation : 1
AUTORUN: Domains to be processed concurrently with autopost : 1
WRF EMS Program ems_prep started on riw-lw-sac at Tue Apr 10 15:10:00 2012 UTC
The WRF EMS Says: "Who's Awesome? You're Awesome!"
I. WRF EMS ems_prep Model Initialization Summary
Initialization Start Time : Tue Apr 10 18:00:00 2012 UTC
Initialization End Time : Fri Apr 13 18:00:00 2012 UTC
Boundary Condition Frequency : 180 Minutes
Initialization Data Set : namptile
Boundary Condition Data Set : namptile
Static Surface Data Sets : None
Land Surface Data Sets : None
II. Search out requested files for WRF model initialization
* Locating namptile files for model initial and boundary conditions
Areal coverage of the NAM 218 Lambert Conformal personal tiles
Corner Lat-Lon points of the domain:
46.58, -115.83 47.53, -102.70
* *
* 42.98, -108.68
* *
38.12, -114.13 39.04, -102.06
Initiating HTTP connection to soostrc.comet.ucar.edu
Making request #1 of 3 for personal tile data
-> Attempting to acquire 12041012.nam.t12z.awphys06.grb2.tm00 - Success (0.02 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys09.grb2.tm00 - Success (0.02 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys12.grb2.tm00 - Success (0.03 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys15.grb2.tm00 - Success (0.03 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys18.grb2.tm00 - Success (0.03 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys21.grb2.tm00 - Success (0.03 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys24.grb2.tm00 - Success (0.03 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys27.grb2.tm00 - Success (0.03 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys30.grb2.tm00 - Success (0.04 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys33.grb2.tm00 - Success (0.04 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys36.grb2.tm00 - Success (0.01 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys39.grb2.tm00 - Success (0.01 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys42.grb2.tm00 - Success (0.02 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys45.grb2.tm00 - Success (0.02 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys48.grb2.tm00 - Success (0.02 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys51.grb2.tm00 - Success (0.02 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys54.grb2.tm00 - Success (0.02 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys57.grb2.tm00 - Success (0.02 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys60.grb2.tm00 - Success (0.04 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys63.grb2.tm00 - Success (0.03 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys66.grb2.tm00 - Success (0.03 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys69.grb2.tm00 - Success (0.04 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys72.grb2.tm00 - Success (0.05 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys75.grb2.tm00 - Success (0.05 mb/s)
-> Attempting to acquire 12041012.nam.t12z.awphys78.grb2.tm00 - Success (0.05 mb/s)
* All requested namptile files are available for model initialization
Excellent! - Your master plan is working!
III. Create the WPS NMM intermediate format files
* Processing namptile files for use as model initial and boundary conditions - Fantastic!!
NMM core intermediate file processing completed in 48.53 seconds
IV. Horizontal interpolation of the intermediate files to the computational domain
! MPICH Timeout of metgrid after 299 seconds - continuing with some trepidation
* Metgrid processed files are located in
/data/wrfems/runs/wrfnmmriw04/wpsprd
Horizontal interpolation to computational domain completed in 5 minutes
AUTORUN: The ems_prep routine completed successfully - Moving forward
WRF EMS Program ems_run started on riw-lw-sac at Tue Apr 10 15:31:23 2012 UTC
The WRF EMS Says: "Who's Awesome? You're Awesome!"
I. Preforming configuration in preparation for your EMS experience
* You are running the WRF NMM core. Hey Ho! Let's go! - model'n!
* Simulation start and end times:
Domain Start End
1 2012-04-10_18:00:00 2012-04-11_21:00:00
* Simulation length will be 27 hours
* Doing MPI check before running WRF Model
* Large timestep to be used for this simulation is 8 seconds
II. Creating the initial and boundary condition files for the user domain(s)
* The WRF REAL program shall be run on the following systems and processors:
2 processors on riw-lw-sac (1 tiles per processor)
* Creating the WRF initial and boundary condition files
* WRF initial and boundary conditions successfully created in 2 minutes 16 seconds
Moving on to bigger and better delusions of grandeur
III. Running NMM WRF while thinking happy thoughts
* The WRF NMM core shall be run on the following systems and processors:
4 processors on warf1 (192.168.0.10) with 1 tiles per processor
4 processors on warf2 (192.168.0.11) with 1 tiles per processor
* Run Output Frequency Primary wrfout Aux File 1
---------------------------------------------------
Domain 01 : 1 hour Off
! EMS AUTOPOST: It is not recommended that you run the autopost routine on the
same machine as the model (riw-lw-sac)
* Checking connection between riw-lw-sac and riw-lw-sac for the autopost - Success
* WRF EMS Auto Post-Processing Routine Initiated on riw-lw-sac at Tue Apr 10 15:33:53 2012 UTC
If you don't believe me then you can check it out for yourself:
% tail -f /data/wrfems/runs/wrfnmmriw04/log/ems_autopost.log
Or you can just trust me.
* Runnning your simulation with enthusiasm!
You can sing along to the progress of the simulation while watching:
% tail -f /data/wrfems/runs/wrfnmmriw04/rsl.out.0000
Unless you have something better to do with your time
More information about the wrfems
mailing list