[wrfems] wrfpost trouble with big domain
Robert Rozumalski
rozumal at ucar.edu
Mon Sep 13 13:01:02 MDT 2010
Matt,
I was able to replicate the error with one of your files. The problem
appears to be due to a file I/O limitation
with the 32-bit wrfpost executable. Your files are just too large to
read on the 32-bit system.
Note that the 64-bit EMS release uses the 32-bit wrfpost executable
because there have been very few
problems, until now. This will change for the next release.
In the mean time I built a new 64-bit wrfpost executable for you to try:
http://soostrc.comet.ucar.edu/strc/wrfems/users/wrfpost.x64.tgz
Bob
Matt Foster wrote:
> Bob,
>
> OK..."big-ish" domain. >400k grid points is big for us. ;-)
>
> This doesn't appear to be the problem. The kernel.shmmax value on our
> system is very large...like higher than the amount of memory installed
> in the system. I also checked the 'limit' command to see if some
> limits were imposed on user-level processes, but all of the relevant
> values showed 'unlimited'. I found there was an environment variable
> OMP_STACKSIZE set to 16M. I unset that, as documentation I found
> online indicated that would allow a 64-bit system to use as much as
> the system would allow. That didn't help either.
>
> Interestingly, the v3.2 wrfpost that I compiled locally is working,
> but only on one CPU at a time, so it's very slow. It also required
> some changes to wrfpost.in and the wrfpost control file headers. But
> at least I'll be able to see some output.
>
> Matt
>
>
> Robert Rozumalski wrote:
>>
>>
>>
>> Matt,
>>
>> Others have encountered a similar problem, albeit with far more
>> ambitious domains. I suspect you are getting
>> a memory allocation error that is due to a kernel limit on the amount
>> of memory that can be used in a shared
>> environment. I have not encountered this problem myself but you may
>> be able to modify the maximum values
>> by editing a few of the kernel parameter files:
>>
>> http://www.redhat.com/docs/manuals/database/RHDB-7.1.3-Manual/admin_user/kernel-resources.html
>>
>>
>> http://www.esrf.eu/computing/scientific/FIT2D/FIT2D_REF/node252.html
>>
>> The values you most likely be concerned with are (from /sbin/sysctl
>> -a | grep kernel.shm):
>>
>> kernel.shmmax = 33554432
>> kernel.shmall = 2097152
>> kernel.shmmni = 4096
>>
>>
>> Try increasing the kernel.shmmax value.
>>
>> Bob
>>
>>
>>
>> I will be compiling all the WRF executables for distributed memory in
>> the next release.
>>
>> Bob
>>
>>
>> Matt Foster wrote:
>>> Clarification...I see the wrfpost process using only 1 CPU, as
>>> indicated by top. IIRC, wrfpost is a shared-memory executable.
>>>
>>> Matt
>>>
>>>
>>> Matt Foster wrote:
>>>> I did an experiment on our cluster today with a 675x640, 1.25 km
>>>> domain. The model run went OK, but wrfpost is not processing the
>>>> output. I only see one wrfpost process at a time, even though it's
>>>> supposed to be processing at "warp factor 8" ;-). Also, when
>>>> watching with 'top' the memory usage for wrfpost never goes above
>>>> 0.0%. I'm guessing wrfpost is having trouble with ~1.5 GB output
>>>> files. Is there something I can do to get it going?
>>>>
>>>> Matt @ OUN
>>>>
>>>> _______________________________________________
>>>> wrfems mailing list
>>>> wrfems at comet.ucar.edu
>>>>
>>>
>>> _______________________________________________
>>> wrfems mailing list
>>> wrfems at comet.ucar.edu
>>>
>>
>
> _______________________________________________
> wrfems mailing list
> wrfems at comet.ucar.edu
>
--
Robert A. Rozumalski, PhD
NWS National SOO Science and Training Resource Coordinator
COMET/UCAR PO Box 3000 Phone: 303.497.8356
Boulder, CO 80307-3000
More information about the wrfems
mailing list