From owner-nwchem-users@emsl.pnl.gov Mon Nov 19 16:15:30 2007 Received: from odyssey.emsl.pnl.gov (localhost [127.0.0.1]) by odyssey.emsl.pnl.gov (8.14.1/8.14.1) with ESMTP id lAK0FTn5014576 for ; Mon, 19 Nov 2007 16:15:30 -0800 (PST) Received: (from majordom@localhost) by odyssey.emsl.pnl.gov (8.14.1/8.14.1/Submit) id lAK0FTQ7014575 for nwchem-users-outgoing-0915; Mon, 19 Nov 2007 16:15:29 -0800 (PST) X-Authentication-Warning: odyssey.emsl.pnl.gov: majordom set sender to owner-nwchem-users@emsl.pnl.gov using -f X-IronPort-AV: E=Sophos;i="4.21,438,1188802800"; d="scan'208";a="37919594" Message-ID: <4742271E.4090400@pnl.gov> Date: Mon, 19 Nov 2007 16:15:26 -0800 From: Dunyou Wang User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: =?ISO-8859-15?Q?J=F6rg_Sa=DFmannshausen?= CC: nwchem-users@emsl.pnl.gov Subject: Re: [NWCHEM] NWChem 5.0 compilation Opteron: Problems with pspw.nw (NaN) References: <200711192209.05229.sassmannshausen@tugraz.at> <200711200018.11607.sassmannshausen@tugraz.at> <47421DF6.2030706@pnl.gov> <200711200101.09756.sassmannshausen@tugraz.at> In-Reply-To: <200711200101.09756.sassmannshausen@tugraz.at> Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 8bit X-OriginalArrivalTime: 20 Nov 2007 00:15:26.0716 (UTC) FILETIME=[737AE3C0:01C82B0A] Sender: owner-nwchem-users@emsl.pnl.gov Precedence: bulk Yes. You need to recompile the nwpw module. So go to $NWCHEM_TOP/src/nwpw make clean make FC=ifort cd .. make FC=ifort link Have you considered the 32 bit integer issue in your amcl, mkl, atlas libs with the 64-bit integer we use in NWChem? If didn't, you binary might not work as you expected (see the INSTALL file under $NWCHEM_TOP for linking with 32 bit libs on a 64 bit platform). I also see you linked your program using a MPI compiler wrapper, please make sure it's wrapped around the compiler you used here (ifort in your case). Actually, you don't need to manually link your math libs, just set up your math libs in the 'BLASOPT' option, the nwchem script will take care of the rest. Hope this helps Dunyou Jörg Saßmannshausen wrote: > Dear Dunyou, > > >> First, set your LIBMPI as "-lmpich". >> > > As far as I gathered it, that is only important for the linking, or? I > actually manually linked it so I can use different libraries (acml, ATLAS, > mkl) and the libmpich.a is included there. Other jobs are working well in > parallel. > > >> Second, for the plane wave module tests(pspw etc.), please unset >> USE_MPIF, which will get rid of the NaN error you saw. >> > > Do I need to recompile here? Simply unsetting does not do the trick, I still > get NaN :-( > Again, I thought that is only relevant for the linking, which I did that way: > > mpif90 -i8 -align -w -g -vec_report3 -O2 -g -L/usr/local/src/nwchem-5.0/lib/LINUX64\ > -L/usr/local/src/nwchem-5.0/src/tools/lib/LINUX64 -o \ > /usr/local/src/nwchem-5.0/bin/LINUX64/nwchem nwchem.o \ > stubs.o -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver \ > -ldftgrad -lnwdft -lgradients -lcphf -lesp -lddscf -lguess -lhessian -lvib -lnwcutil\ > -lrimp2 -lproperty -lnwints -lprepar -lnwmd -lnwpw -lpaw -lpspw -lband -lnwpwlib\ > -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd \ > -letrans -lpspw -ltddft -ltce -lbq -lcons -lperfm -lneb -lnwcutil -lpario -lglobal\ > -lma -lpeigs -lperfm -lcons -lbq -lnwcutil /opt/acml3.6.0/ifort64/lib/libacml.a\ > /opt/acml3.6.0/ifort64/lib/libacml.a -larmci -L/opt/mpich2-1.0.6_gcc_ifc/lib/libmpich.a\ > -ltcgmsg-mpi -lrt > > Obviously I exchange the different maths libraries whilst testing them out, > but without recompilation. > > >> Third, the tce tests you mentioned here won't work on 2 cpus, to make it >> work, you need to use more processors for those. >> > > If you increase the shmmax they will work on 2 CPUs ;-) At least this is what > I done on the dual Xeon (32bit) I got. > > >> By the way, is there a network you use to interconnect the nodes on >> your cluster? >> > > Yes and no. I ran the tests on the AMD x2, so within the box (SMP). Otherwise > we got GbE. > > Best wishes > > Jörg > > >> Best regards >> Dunyou >> >> Jörg Saßmannshausen wrote: >> >>> Dear all, >>> thanks for the prompt reply. >>> >>> >>>> Would you please post your environmental variables setup for your >>>> compilation? >>>> >>> Sure: >>> LIBMPI="lmpich" >>> LD_LIBRARY_PATH="/opt/intel/mkl/10.0.011/lib/em64t" >>> MPI_INCLUDE="/opt/mpich2-1.0.6_gcc_ifc/include" >>> MPI_LIB="/opt/mpich2-1.0.6_gcc_ifc/lib" >>> NWCHEM_EXECUTABLE="/usr/local/src/nwchem-5.0/bin/LINUX64/nwchem" >>> NWCHEM_MODULES="all" >>> NWCHEM_TARGET="LINUX64" >>> NWCHEM_TOP="/usr/local/src/nwchem-5.0" >>> PATH="/opt/mpich2-1.0.6_gcc_ifc/bin:/opt/intel/fce/9.1.039/bin:/usr/local >>> /sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/bin/X11" >>> SCRATCH_DEF_DIR="/scr/nwchem" >>> TCGRSH="/usr/bin/ssh" >>> USE_MPI="y" >>> USE_MPIF="y" >>> LANG="de_AT@euro >>> LANGUAGE="de_AT" >>> LARGE_FILES="TRUE" >>> >>> >>>> Have you tried with no BLAS libraries, except the Netlib default which >>>> comes with NWChem? >>>> >>> Yes, same problem >>> >>> >>>> Did you compile mpich with the same compiler as NWChem? >>>> >>> Yes. >>> >>> >>>> Lastly, you might try the GNU compilers for completeness in debugging. >>>> >>> Yes and no. I did not manage to build NWChem on 64bit with >>> gfortran-4.1.2. It is bombing out at one point complaining about a >>> missing object file and (presumably earlier) problem with the fortran. I >>> gave up on that point. I would like to add that I manage to build it on >>> some old PIII machines using gfortran without these problems (also with >>> mpich2) >>> >>> For sake of easier communication I have merged two emails here, I hope >>> nobody minds. >>> >>> Thanks for the prompt respond >>> >>> Jörg >>> >>> >>>> Jörg Saßmannshausen wrote: >>>> >>>>> Dear all, >>>>> I am currently fighting with NWChem5.0 compilation on our Opteron >>>>> Cluster. Following instructions, I managed to build the binaries using >>>>> the Intel Fortran Compiler (9.1.039) and mpich2.-1.0.6. I have tried >>>>> the following math libraries: >>>>> acml3.6.0, acml4.0.0, acml4.0.1, ATLAS3.8.0, Intel mkl10.0.011 >>>>> but I am having problems with above test jobs. It is not only the >>>>> pspw.nw, it apears to me that most of the PW testjobs are generating >>>>> NaN when I am running them. Also similar things happen in band.nw, >>>>> tce_cr_eom_t_ch_rohf.nw, tce_cr_eom_t_ozone.nw, tce_active_ccsdt.nw >>>>> .... Running them on the cluster with one CPU works, with 2 CPUs I get >>>>> NaN in the calculation and hence the job fails. >>>>> >>>>> I am somehow lost, I did not have these problems on my dual Xeon 32bit >>>>> machine (using Intel Fortran compiler and mkl). >>>>> >>>>> I am running Debian Etch, for information just ask. >>>>> >>>>> Could it be a problem with the basis sets? I have noticed there are >>>>> some *.F files in the basis set libraries but I somehow cannot see how >>>>> that could cause problem. >>>>> >>>>> Any help would be appreciated, I am fighting with this now for some >>>>> time. >>>>> >>>>> All the best from Graz >>>>> >>>>> Jörg >>>>> > >