From owner-nwchem-users@emsl.pnl.gov Tue Sep 21 17:00:57 2004 Received: from odyssey.emsl.pnl.gov (localhost [127.0.0.1]) by odyssey.emsl.pnl.gov (8.12.10/8.12.10) with ESMTP id i8M00vgm027990 for ; Tue, 21 Sep 2004 17:00:57 -0700 (PDT) Received: (from majordom@localhost) by odyssey.emsl.pnl.gov (8.12.10/8.12.10/Submit) id i8M00vjs027989 for nwchem-users-outgoing; Tue, 21 Sep 2004 17:00:57 -0700 (PDT) Date: Tue, 21 Sep 2004 18:00:53 -0600 (MDT) From: Jeremy S Archuleta Subject: Solved: NWCHEM 4.6 and LAM-MPI 7.0.6 fails to run on P3, P4, or Xeon In-reply-to: <415096E4.1010607@pnl.gov> To: Edoardo Apra Cc: NWChem Users Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII X-PMX-Version: 4.6.0.99824 X-PMX-Version: 4.6.1.107272 X-Authentication-warning: ccs-mail.lanl.gov: jsarch owned process doing -bs Sender: owner-nwchem-users@emsl.pnl.gov Precedence: bulk Solution below works on P3, P4, Xeon Thanks. Will let you know if it also works on x86_64 32bit tomorrow. -J On Tue, 21 Sep 2004, Edoardo Apra wrote: > Jeremy > please try the following > 1) cd $NWCHEM_TOP/src/tools/tcgmsg-mpi > 2) edit the file pbeginf.c > 3) > option a) > change line 9 from > #if defined(HPUX) || defined(SUN) || defined(SOLARIS) ||defined(PARAGON) > ||defined(FUJITSU) || defined(WIN32) ||defined(LINUX64) || > defined(NEC)|| defined(LINUX) || defined(HITACHI) || defined(__crayx1) > to > #if defined(HPUX) || defined(SUN) || defined(SOLARIS) ||defined(PARAGON) > ||defined(FUJITSU) || defined(WIN32) ||defined(LINUX64) || > defined(NEC)|| defined(LINUX_) || defined(HITACHI) || defined(__crayx1) > in other words, "defined(LINUX)" becomes "define(LINUX_)" (underscore added) > 4) cd .. > 5) make > 6) make link > > Edo > > > > > > > > > Hello all, > > > > > > > > I can't run NWCHEM 4.6 using LAM-MPI 7.0.6 on P3, P4, and Xeon > > machines. > > > > Interestingly, it does run on Opteron and I have been able to get > > NWCHEM > > > > 4.5 to run on the P3, P4, and Xeon machines with this version of > > LAM-MPI. > > > > > > > > The command: > > > > > > /mnt/radar4/software/lam-mpi/p3/lam-7.0.6/gcc-3.3.4-libc-2.3.2/bin/mpirun > > > > -vv -np 2 /home/jeremy/NWCHEM/mpi-nwchem/nwchem-4.6/bin/LINUX/nwchem > > > > tests/auh2o/auh2o.nw > > > > /home/jeremy/NWCHEM/mpi-nwchem/nwchem-4.6/QA/testoutputs/auh2o.out > > > > > > > > The output/error: > > > > 24198 /home/jeremy/NWCHEM/mpi-nwchem/nwchem-4.6/bin/LINUX/nwchem > > running > > > > on n0 (o) > > > > 24199 /home/jeremy/NWCHEM/mpi-nwchem/nwchem-4.6/bin/LINUX/nwchem > > running > > > > on n0 (o) > > > > mpirun: waiting for MPI_INIT from 2 processes... > > > > mpirun: someone died before MPI_INIT -- rank 0 > > > > > > ----------------------------------------------------------------------------- > > > > It seems that [at least] one of the processes that was started with > > > > mpirun did not invoke MPI_INIT before quitting (it is possible that > > > > more than one process did not invoke MPI_INIT -- mpirun was only > > > > notified of the first one, which was on node n0). > > > > > > > > mpirun can *only* be used with MPI programs (i.e., programs that > > > > invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program > > > > to run non-MPI programs over the lambooted nodes. > > > > > > ----------------------------------------------------------------------------- > > > > Killed > > > > mpirun: receiving 1 useless MPI_INIT/MPI_FINALIZE messages... > > > > > > > > Things that I have checked: > > > > 1) There is a lamd running. > > > > ps ax | grep lamd > > > > 19794 ? S 0:00 > > > > > > /mnt/radar4/software/lam-mpi/p3/lam-7.0.6/gcc-3.3.4-libc-2.3.2/bin/lamd -H > > > > 127.0.0.1 -P 52314 -n 0 -o 0 > > > > > > > > 2) I can run other MPI programs using my LAM-MPI binaries/libraries. > > > > mpirun -np 6 a.out > > > > Greetings from process 1! > > > > Greetings from process 2! > > > > Greetings from process 3! > > > > Greetings from process 4! > > > > Greetings from process 5! > > > > > > > > 3) The environment variables are set: > > > > LARGE_FILES=TRUE > > > > NWCHEM_TARGET=LINUX > > > > > > NWCHEM_EXECUTABLE=/home/jeremy/NWCHEM/mpi-nwchem/nwchem-4.6/bin/LINUX/nwchem > > > > TCGRSH=/usr/bin/ssh > > > > USE_MPI=y > > > > > > MPI_LIB=/mnt/radar4/software/lam-mpi/p3/lam-7.0.6/gcc-3.3.4-libc-2.3.2/lib > > > > > > MPIRUN_PATH=/mnt/radar4/software/lam-mpi/p3/lam-7.0.6/gcc-3.3.4-libc-2.3.2/bin/mpirun > > > > NWCHEM_TOP=/home/jeremy/NWCHEM/mpi-nwchem/nwchem-4.6 > > > > LIBMPI=-lmpi -llam -lpthread > > > > > > MPI_INCLUDE=/mnt/radar4/software/lam-mpi/p3/lam-7.0.6/gcc-3.3.4-libc-2.3.2/include > > > > -- ****** UNCLASSIFIED correspondence ******