[wrfems] wrfpost trouble with big domain

Robert Rozumalski rozumal at ucar.edu
Mon Sep 13 13:01:02 MDT 2010




Matt,

I was able to replicate the error with one of your files. The problem 
appears to be due to a file I/O limitation
with the 32-bit wrfpost executable.  Your files are just too large to 
read on the 32-bit system.

Note that the 64-bit EMS release uses the 32-bit wrfpost executable  
because there have been very few
problems, until now.  This will change for the next release.

In the mean time I built a new 64-bit wrfpost executable for you to try:

      http://soostrc.comet.ucar.edu/strc/wrfems/users/wrfpost.x64.tgz

Bob


Matt Foster wrote:
> Bob,
>
> OK..."big-ish" domain.  >400k grid points is big for us.  ;-)
>
> This doesn't appear to be the problem.  The kernel.shmmax value on our 
> system is very large...like higher than the amount of memory installed 
> in the system.  I also checked the 'limit' command to see if some 
> limits were imposed on user-level processes, but all of the relevant 
> values showed 'unlimited'.  I found there was an environment variable 
> OMP_STACKSIZE set to 16M.  I unset that, as documentation I found 
> online indicated that would allow a 64-bit system to use as much as 
> the system would allow.  That didn't help either.
>
> Interestingly, the v3.2 wrfpost that I compiled locally is working, 
> but only on one CPU at a time, so it's very slow.  It also required 
> some changes to wrfpost.in and the wrfpost control file headers.  But 
> at least I'll be able to see some output.
>
> Matt
>
>
> Robert Rozumalski wrote:
>>
>>
>>
>> Matt,
>>
>> Others have encountered a similar problem, albeit with far more 
>> ambitious domains. I suspect you are getting
>> a memory allocation error that is due to a kernel limit on the amount 
>> of memory that can be used in a shared
>> environment.  I have not encountered this problem myself but you may 
>> be able to modify the maximum values
>> by editing a few of the kernel parameter files:
>>
>> http://www.redhat.com/docs/manuals/database/RHDB-7.1.3-Manual/admin_user/kernel-resources.html 
>>
>>
>> http://www.esrf.eu/computing/scientific/FIT2D/FIT2D_REF/node252.html
>>
>> The values you most likely be concerned with are (from /sbin/sysctl 
>> -a | grep kernel.shm):
>>
>> kernel.shmmax = 33554432
>> kernel.shmall = 2097152
>> kernel.shmmni = 4096
>>
>>
>> Try increasing the kernel.shmmax value.
>>
>> Bob
>>
>>
>>
>> I will be compiling all the WRF executables for distributed memory in 
>> the next release.
>>
>> Bob
>>
>>
>> Matt Foster wrote:
>>> Clarification...I see the wrfpost process using only 1 CPU, as 
>>> indicated by top.  IIRC, wrfpost is a shared-memory executable.
>>>
>>> Matt
>>>
>>>
>>> Matt Foster wrote:
>>>> I did an experiment on our cluster today with a 675x640, 1.25 km 
>>>> domain.  The model run went OK, but wrfpost is not processing the 
>>>> output.  I only see one wrfpost process at a time, even though it's 
>>>> supposed to be processing at "warp factor 8" ;-).  Also, when 
>>>> watching with 'top' the memory usage for wrfpost never goes above 
>>>> 0.0%.  I'm guessing wrfpost is having trouble with ~1.5 GB output 
>>>> files.  Is there something I can do to get it going?
>>>>
>>>> Matt @ OUN
>>>>
>>>> _______________________________________________
>>>> wrfems mailing list
>>>> wrfems at comet.ucar.edu
>>>>   
>>>
>>> _______________________________________________
>>> wrfems mailing list
>>> wrfems at comet.ucar.edu
>>>   
>>
>
> _______________________________________________
> wrfems mailing list
> wrfems at comet.ucar.edu
>   

-- 
Robert A. Rozumalski, PhD
NWS National SOO Science and Training Resource Coordinator

COMET/UCAR PO Box 3000   Phone:  303.497.8356
Boulder, CO 80307-3000





More information about the wrfems mailing list