[wrfems] Running slower on cluster vs single workstation

bkolts at firstenergycorp.com bkolts at firstenergycorp.com
Wed Feb 15 09:22:06 MST 2012


Hi Bob,

Thanks for sharing that.  I added a gigabit switch yesterday and saw some
improvement, however it still runs faster on 1 machine versus the 2 machine
cluster as you've experienced.

It could be the comms overhead, We do have a fairly small domain 260*150.

Sounds like I have it running as fast as I can right now.  I'm sure I will
find a way to make use of the extra processors!

Best regards,

Brian


Brian Kolts
Advanced Scientist
Environmental Energy Delivery Services
FirstEnergy Corp.
330.384.5474



From:	Robert Rozumalski <Robert.Rozumalski at noaa.gov>
To:	wrfems at comet.ucar.edu
Date:	02/14/2012 04:30 PM
Subject:	Re: [wrfems] Running slower on cluster vs single workstation
Sent by:	wrfems-bounces at comet.ucar.edu




Hello Brian,


I've finally had the opportunity to look at your problem and cluster
configuration and noticed that your benchmark
results and experience bear a striking resemblance  to mine.

A while back I replaced one  machine on my 3-machine cluster with a newer
6-core AMD system, just like yours
only slightly faster.

Here was my configuration:

  (1)  2 X  Six-Core    AMD Opteron(tm) Processor 2435 @ 2600MHz
  (2)  2 X  Four-Core INTEL Xeon(R) CPU             W5590  @ 3.33GHz
____________
28 total Processors  (16 Xeon & 12 Opteron)


Here is your configuration:

  (1)  2 X  Six-Core    AMD Opteron(tm) Processor  2427 @ 2200MHz
  (1)  2 X  Six-Core   INTEL Xeon(R) CPU   X5690 @ 3.47GHz
________________
24 Total Cores   (12 Xeon & 12 Opteron)


What I found was that when I added the new 2 x six-Core AMD system to the
cluster, my benchmark times were
significantly slower than when I ran on only the 2 INTEL systems and about
the same as when I ran on a single
INTEL machine.

I could not find an explanation for this result.  I thought it might be due
to over-decomposition of the domain
so I created a much larger domain with similar results.  I ended up
requesting a replacement for the AMD system,
which is now used for testing.  The new all Xeon cluster works well.

BTW - More CPUs is not always better if your domain gets decomposed to the
point where communication becomes
a large bottleneck.  This is pretty easy to do.


Also, your timing on the 2 x Six-Core Xeon is similar to that on my system
with a similar CPU.

So, I have no solutions or explanations but it may not be anything you are
doing wrong.

Bob



On 2/9/12 2:04 PM, bkolts at firstenergycorp.com wrote:

      Hi All,

      I'm trying to cluster 2 workstations to run WRF.  After setting this
      up
      I've noticed a significant slow down in run time.  When running the
      benchmark on the master workstation only (no cluster) the benchmark
      test
      took a little over 4 minutes to complete.  When running with the
      cluster,
      it took over 18 minutes.

      I've attached the results from the two benchmark tests.  The second
      machine
      in the cluster is a slightly slower machine.  Could it be that this
      is
      causing the slow down?  Or have I configured things incorrectly?

      Thanks,
      Brian

      (See attached file: singleWorkstation_benchmark.info)(See attached
      file:
      cluster_benchmark.info)

      Brian Kolts
      Advanced Scientist
      Environmental Energy Delivery Services
      FirstEnergy Corp.
      330.384.5474


      -----------------------------------------
      The information contained in this message is intended only for the
      personal and confidential use of the recipient(s) named above. If
      the reader of this message is not the intended recipient or an
      agent responsible for delivering it to the intended recipient, you
      are hereby notified that you have received this document in error
      and that any review, dissemination, distribution, or copying of
      this message is strictly prohibited. If you have received this
      communication in error, please notify us immediately, and delete
      the original message.


      _______________________________________________
      wrfems mailing list
      wrfems at comet.ucar.edu


--
Robert A. Rozumalski, PhD
NWS National SOO Science and Training Resource Coordinator

COMET/UCAR PO Box 3000   Phone:  303.497.8356
Boulder, CO 80307-3000

_______________________________________________
wrfems mailing list
wrfems at comet.ucar.edu




More information about the wrfems mailing list