From owner-nwchem-users@emsl.pnl.gov Fri Feb 3 06:52:02 2006 Received: from odyssey.emsl.pnl.gov (localhost [127.0.0.1]) by odyssey.emsl.pnl.gov (8.12.10/8.12.10) with ESMTP id k13Eq1rk025780 for ; Fri, 3 Feb 2006 06:52:01 -0800 (PST) Received: (from majordom@localhost) by odyssey.emsl.pnl.gov (8.12.10/8.12.10/Submit) id k13Eq13r025779 for nwchem-users-outgoing-0915; Fri, 3 Feb 2006 06:52:01 -0800 (PST) Date: Fri, 03 Feb 2006 06:51:59 -0800 From: "Straatsma, T P" Subject: RE: [NWCHEM] Problem with MD simulations on Opteron cluster To: Sarah Wilsey , nwchem-users@emsl.pnl.gov Message-id: MIME-version: 1.0 X-MIMEOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-type: text/plain; charset=us-ascii Thread-Topic: [NWCHEM] Problem with MD simulations on Opteron cluster Thread-Index: AcYoqvdohRwpZe+0RaimMm0f43ZYAgAJcjCA Content-class: urn:content-classes:message X-MS-Has-Attach: X-MS-TNEF-Correlator: X-OriginalArrivalTime: 03 Feb 2006 14:51:59.0979 (UTC) FILETIME=[62EFB3B0:01C628D1] Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by odyssey.emsl.pnl.gov id k13Eq1rk025776 Sender: owner-nwchem-users@emsl.pnl.gov Precedence: bulk Sarah, We had a similar problem some time ago on some platforms, and thought we had it fixed. I will see if I can reproduce the error here again on our Myrinet cluster. How many processors were you using? T.P.Straatsma, Associate Division Director Computational Biology and Bioinformatics Computational Sciences and Mathematics Division Pacific Northwest National Laboratory P.O.Box 999, MSIN K7-90 Richland, WA 99352 Tel. (509) 375-2802, (509) 372-4625 Fax (509) 374-4720 -----Original Message----- From: owner-nwchem-users@emsl.pnl.gov [mailto:owner-nwchem-users@emsl.pnl.gov] On Behalf Of Sarah Wilsey Sent: Friday, February 03, 2006 2:00 AM To: nwchem-users@emsl.pnl.gov Subject: [NWCHEM] Problem with MD simulations on Opteron cluster Hi, I've just compiled NWChem4.7 on our Opteron cluster with Redhat Enterprise Linux AS release 3. I used the Portland Group compiler pgf90, with MPICH and myrinet (version 2.014) and Global Arrays version 3-3.1. As well as running the PSPW jobs, I also have some problems running the MD test jobs in parallel. In particular the test jobs crown and ethanol crash with errors such as: MA_verify_allocator_stuff: starting scan ... stack block 'lst', handle 124, address 0xb2ae120: current checksum 184675992 != stored checksum 187359768 stack block 'lst', handle 124, address 0xb2ae120: current left signature 0 != proper left signature 2863311530 stack block 'lst', handle 124, address 0xb2ae120: current right signature 0 != proper right signature 1431655765 1:Segmentation Violation error, status=: 11 Strangely, some of the MD jobs work fine e.g. membrane and nak_md. The jobs that don't work run fine in serial but crash when run on more than one processor (both on the same node and on different nodes). I've attached examples of the input and output files below. I would be very grateful for any advice that you could offer me. Sarah Wilsey