From owner-nwchem-users Fri Mar 3 10:59:38 2000 Received: (from majordom@localhost) by odyssey.emsl.pnl.gov (8.8.8+Sun/8.8.5) id KAA09788 for nwchem-users-outgoing; Fri, 3 Mar 2000 10:52:44 -0800 (PST) Date: Fri, 03 Mar 2000 12:54:10 -0600 From: Ricky Kendall <rickyk@scl.ameslab.gov> Subject: Re: parallel job on linux cluster To: Todd Raeker <raeker@umich.edu> Cc: nwchem-users@emsl.pnl.gov Message-id: <38C00A52.FE631129@scl.ameslab.gov> Organization: Scalable Computing Laboratory, Ames Laboratory MIME-version: 1.0 Content-type: multipart/mixed; boundary="------------C189598C13135FD686E46C1B" X-Accept-Language: en References: <200003031612.LAA12121@vivalasvegas.rs.itd.umich.edu> Sender: owner-nwchem-users@emsl.pnl.gov Precedence: bulk This is a multi-part message in MIME format. --------------C189598C13135FD686E46C1B Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Todd, This seems like a problem with the network. We have an Alpha Cluster and I enclose a excel spreadsheet with the data The time decreases and the speedup (walltime) gets to 50% at 8 nodes. The only thing I would offer is the timing variance in such a short job may be large enough to caue this. Try something with a little more meat in it time wise. Also as you can tell from the spreadsheet the CPU based scaling is quite good. e.g., we need a better network:) Regards, Ricky Todd Raeker wrote: > Hi all, > > I am testing a two node linux cluster on a 100 Mbits/sec standalone > switched network. A small scf single point energy calculation on one and > two nodes results in the CPU time going from 44 sec. to 22 sec. > respectively. This is great but when I look at the wall time I get 44 sec. > for one node verses 49 sec for the two node calculation. This really > surprises me as the local net is running at 100 Mbits/sec and completely > isolated from other networks. I would like to talk to anybody out there > with experience running nwchem on linux clusters. My hardware consists of > two machines with a 600 MHz AMD Athlon in each with 128 MB memory, 3com905 > NIC and 3com superstack 3300 at 100 Mbits/sec network running RedHat 6.0 > Linux. I hope the slow wall time is a mistake in my network configuration > rather than an high amount of nwchem builtin communication between nodes > which increases overhead. Any advice or info would be appreciated. > > Todd. > > Dr. Todd Raeker > Coordinator of Computer Services raeker@umich.edu > Department of Chemistry (734)647-2867 > University of Michigan > Ann Arbor, MI 48109