From owner-nwchem-users@emsl.pnl.gov Tue Jul 19 12:53:37 2005 Received: from odyssey.emsl.pnl.gov (localhost [127.0.0.1]) by odyssey.emsl.pnl.gov (8.12.10/8.12.10) with ESMTP id j6JJrbDB008754 for ; Tue, 19 Jul 2005 12:53:37 -0700 (PDT) Received: (from majordom@localhost) by odyssey.emsl.pnl.gov (8.12.10/8.12.10/Submit) id j6JJrbQl008753 for nwchem-users-outgoing; Tue, 19 Jul 2005 12:53:37 -0700 (PDT) Date: Tue, 19 Jul 2005 15:53:19 -0400 From: "Y. Huang" Subject: Re: Parallel execution errors In-reply-to: To: Edoardo Apra` Cc: nwchem-users@emsl.pnl.gov Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (scienide.uwaterloo.ca [0.0.0.0]); Tue, 19 Jul 2005 15:53:24 -0400 (EDT) X-Miltered: at minos with ID 42DD5A2F.002 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Virus-Scanned: ClamAV version 0.86.1, clamav-milter version 0.86 on localhost X-Virus-Status: Clean Sender: owner-nwchem-users@emsl.pnl.gov Precedence: bulk Hi Edoardo, When I tried to use MPICH2 to compile NWCHEM. I got the errors: ... /home/huang/software/nwchem/nwchem-4.7/lib/LINUX64/libnwpwlib.a(D3dB-mpi.o) (.text+0x479): In function `d3db_c_transpose_jk_': : undefined reference to `mpi_irecv_' /home/huang/software/nwchem/nwchem-4.7/lib/LINUX64/libnwpwlib.a(D3dB-mpi.o) (.text+0x52f): In function `d3db_c_transpose_jk_': : undefined reference to `mpi_irecv_' ... Please help! Yiye On Mon, 18 Jul 2005, Y. Huang wrote: > Hello Edoardo, > > Yes. I used FC=g77 USE_INTEGER4=y to compile NWCHEM on my Opteron box. > > Thanks. > > Yiye > > On Mon, 18 Jul 2005, Edoardo Apra` wrote: > > > Yiye > > did use FC=g77 USE_INTEGER4=y to compile on your Opteron box? > > Edo > > > > Y. Huang wrote: > > > > >Hi Edoardo, > > > > > >Attached the input file "h3tr1.nw", it is just the sample input file under > > >the $NWCHEM_TOP/examples/dirdyvtst/h3. > > > > > >Thanks for your help. > > > > > >Yiye > > > > > > > > >On Mon, 18 Jul 2005, Edoardo Apra` wrote: > > > > > > > > > > > >>Yiye > > >>could you please send us your input file? > > >>Thanks, Edo > > >> > > >>Y. Huang wrote: > > >> > > >> > > >> > > >>>Hi, > > >>> > > >>>I compiled NWCHEM with MPI option under lam-mpi in a dual opteron system > > >>>with two nodes. Its OS is Rocks Cluster. The LAM run-time environment was > > >>>lauched by the lamboot command, the hostfile for lam-mpi contains, > > >>> > > >>>vdw01 cpu=2 > > >>>compute-0-0 cpu=2 > > >>> > > >>>However, when I ran a NWCHEM example: > > >>> > > >>>[huang@vdw01 h3]$ mpirun -np 2 nwchem h3tr1.nw > > >>> > > >>>I got errors: > > >>> > > >>>... > > >>>stpr_wrt_fd_from_sq: overwrite of existing file:./h3.hess > > >>>stpr_wrt_fd_dipole: overwrite of existing file./h3.fd_ddipole > > >>>1:Segmentation Violation error, status=: 11 > > >>>1:Segmentation Violation error, status=: 11 > > >>>--------------------------------------------------------------------- > > >>>One of the processes started by mpirun has exited with a nonzero exit > > >>>code. This typically indicates that the process finished in error. > > >>>If your process did not finish in error, be sure to include a "return > > >>>0" or "exit(0)" in your C code before exiting the application. > > >>> > > >>>PID 20009 failed on node n0 (10.1.1.1) with exit status 1. > > >>>--------------------------------------------------------------------- > > >>> > > >>> > > >>> > > >>>Please note, I don't have any problem to run my own MPI programs in this > > >>>system. > > >>> > > >>>I also compiled NWCHEM without the MPI option, and it ran fine under a > > >>>single processor. However, when I ran it in parallel using TCGMSG (a file > > >>>named "h3tr1.p" was in the working directory): > > >>> > > >>>[huang@vdw01 h3]$ parallel h3tr1 h3tr1.nw > > >>> > > >>>I got, > > >>> > > >>>... > > >>>1:Segmentation Violation error, status=: 11 > > >>>1:Segmentation Violation error, status=: 11 > > >>>Last System Error Message from Task 1:: No such file or directory > > >>> 1: ARMCI aborting 11 (0xb). > > >>> 1: ARMCI aborting 11 (0xb). > > >>>system error message: No such file or directory > > >>> stpr_wrt_fd_from_sq: overwrite of existing file:./h3.hess > > >>>stpr_wrt_fd_dipole: overwrite of existing file./h3.fd_ddipole > > >>>0:Child process terminated prematurely, status=: 256 > > >>>0:Child process terminated prematurely, status=: 256 > > >>>Last System Error Message from Task 0:: No such file or directory > > >>> 0: ARMCI aborting 256 (0x100). > > >>> 0: ARMCI aborting 256 (0x100). > > >>>system error message: No such file or directory > > >>> 2: interrupt(1) > > >>>WaitAll: No children or error in wait? > > >>> > > >>> > > >>>Please help! > > >>> > > >>> > > >>>Yiye > > >>> > > >>>********************************** > > >>> Department of Chemistry > > >>> University of Waterloo > > >>> Waterloo, Ontario N2L 3G1 > > >>> Tel: (519) 888-4567 ext.6110 > > >>> E-mail: huang@uwaterloo.ca > > >>>********************************** > > >>>| (\ > > >>>| http://hpc.uwaterloo.ca ( \ > > >>>|__________________________) ) /> > > >>> / ) / //))/ > > >>> \ \_/ ///// > > >>> \ / > > >>> \_ / > > >>> | | > > >>> |___| > > >>> > > >>> > > >>> > > >>> > > >>> > > >>------------------------------------------------------------------------ > > >> > > >>start h3 > > >> > > >>basis > > >> h library 3-21G > > >>end > > >> > > >>scf > > >> uhf > > >> doublet > > >> thresh 1.0e-6 > > >>end > > >> > > >>dirdyvtst autosym 0.001 > > >> theory scf > > >>*GENERAL > > >> TITLE > > >> Test run: H+H2 reaction, Euler integration, no restart > > >> > > >> ATOMS > > >> 1 H > > >> 2 H > > >> 3 H > > >> END > > >> > > >>*REACT1 > > >> GEOM > > >> 1 0.0 0.0 0.0 > > >> 2 0.0 0.0 1.3886144 > > >> END > > >> > > >> SPECIES LINRP > > >> > > >>*REACT2 > > >> > > >> GEOM > > >> 3 0.0 0.0 190.3612132 > > >> END > > >> > > >> SPECIES ATOMIC > > >> > > >>*PROD2 > > >> > > >> GEOM > > >> 1 0.0 0.0 190.3612132 > > >> END > > >> > > >> SPECIES ATOMIC > > >> > > >>*PROD1 > > >> GEOM > > >> 2 0.0 0.0 1.3886144 > > >> 3 0.0 0.0 0.0 > > >> END > > >> > > >> SPECIES LINRP > > >> > > >>*START > > >> GEOM > > >> 1 0.0 0.0 -1.76531973 > > >> 2 0.0 0.0 0.0 > > >> 3 0.0 0.0 1.76531973 > > >> END > > >> > > >> SPECIES LINTS > > >> > > >>*PATH > > >> SSTEP 0.01 > > >> SSAVE 0.05 > > >> SLP 0.5 > > >> SLM -0.5 > > >> SCALEMASS 0.6718993 > > >> > > >> INTEGRA EULER > > >> > > >> PRINTFREQ > > >>end > > >> > > >>task dirdyvtst > > >> > > >> > > >