From owner-nwchem-users@emsl.pnl.gov Mon Nov 19 16:01:22 2007 Received: from odyssey.emsl.pnl.gov (localhost [127.0.0.1]) by odyssey.emsl.pnl.gov (8.14.1/8.14.1) with ESMTP id lAK01Ld3013507 for ; Mon, 19 Nov 2007 16:01:22 -0800 (PST) Received: (from majordom@localhost) by odyssey.emsl.pnl.gov (8.14.1/8.14.1/Submit) id lAK01Lg7013506 for nwchem-users-outgoing-0915; Mon, 19 Nov 2007 16:01:21 -0800 (PST) X-Authentication-Warning: odyssey.emsl.pnl.gov: majordom set sender to owner-nwchem-users@emsl.pnl.gov using -f X-Ironport-SG: OK_Domains X-Ironport-SBRS: 5.9 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAMayQUeBGwLKn2dsb2JhbACBW405AQEBAQcEBgkggQ8 X-IronPort-AV: E=Sophos;i="4.21,438,1188802800"; d="scan'208";a="57043939" X-DKIM: Sendmail DKIM Filter v2.3.2 mailrelay2.tugraz.at lAK01BfV027715 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=tugraz.at; s=mailrelay; t=1195516875; bh=tlkiZLKe0P9XUa58ObUOKoSs0plcQRlVy5Y+G Ajq7Lo=; h=From:Organization:To:Subject:Date:User-Agent:References: In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding: Content-Disposition:Message-Id:X-Spam-Scanner:X-Spam-Score-relay: X-Scanned-By; b=ekfH91hkKWisxiR9UUVzmE9iqL6ps+PqIeTZMyQ/oXVHzHftA/ 25v/pUQ0d3ZHUjy4u+YK9ALR6skFP2uCI1meyxEXGx/mboP8Jh6GgU7kTgtAPvNCVYi HInrJkY64pjeRRTrCQPzym5UVyHwr4HZiCpTFMvuG413hfQ9kam3Ck= From: =?iso-8859-15?q?J=F6rg_Sa=DFmannshausen?= Organization: TU-Graz, ICTOS To: nwchem-users@emsl.pnl.gov Subject: Re: [NWCHEM] NWChem 5.0 compilation Opteron: Problems with pspw.nw (NaN) Date: Tue, 20 Nov 2007 01:01:09 +0100 User-Agent: KMail/1.9.5 References: <200711192209.05229.sassmannshausen@tugraz.at> <200711200018.11607.sassmannshausen@tugraz.at> <47421DF6.2030706@pnl.gov> In-Reply-To: <47421DF6.2030706@pnl.gov> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 8bit Content-Disposition: inline Message-Id: <200711200101.09756.sassmannshausen@tugraz.at> X-Spam-Scanner: SpamAssassin 3.002003 X-Spam-Score-relay: -2.6 X-Scanned-By: MIMEDefang 2.63 on 129.27.10.19 Sender: owner-nwchem-users@emsl.pnl.gov Precedence: bulk Dear Dunyou, > First, set your LIBMPI as "-lmpich". As far as I gathered it, that is only important for the linking, or? I actually manually linked it so I can use different libraries (acml, ATLAS, mkl) and the libmpich.a is included there. Other jobs are working well in parallel. > Second, for the plane wave module tests(pspw etc.), please unset > USE_MPIF, which will get rid of the NaN error you saw. Do I need to recompile here? Simply unsetting does not do the trick, I still get NaN :-( Again, I thought that is only relevant for the linking, which I did that way: mpif90 -i8 -align -w -g -vec_report3 -O2 -g -L/usr/local/src/nwchem-5.0/lib/LINUX64\ -L/usr/local/src/nwchem-5.0/src/tools/lib/LINUX64 -o \ /usr/local/src/nwchem-5.0/bin/LINUX64/nwchem nwchem.o \ stubs.o -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver \ -ldftgrad -lnwdft -lgradients -lcphf -lesp -lddscf -lguess -lhessian -lvib -lnwcutil\ -lrimp2 -lproperty -lnwints -lprepar -lnwmd -lnwpw -lpaw -lpspw -lband -lnwpwlib\ -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd \ -letrans -lpspw -ltddft -ltce -lbq -lcons -lperfm -lneb -lnwcutil -lpario -lglobal\ -lma -lpeigs -lperfm -lcons -lbq -lnwcutil /opt/acml3.6.0/ifort64/lib/libacml.a\ /opt/acml3.6.0/ifort64/lib/libacml.a -larmci -L/opt/mpich2-1.0.6_gcc_ifc/lib/libmpich.a\ -ltcgmsg-mpi -lrt Obviously I exchange the different maths libraries whilst testing them out, but without recompilation. > Third, the tce tests you mentioned here won't work on 2 cpus, to make it > work, you need to use more processors for those. If you increase the shmmax they will work on 2 CPUs ;-) At least this is what I done on the dual Xeon (32bit) I got. > By the way, is there a network you use to interconnect the nodes on > your cluster? Yes and no. I ran the tests on the AMD x2, so within the box (SMP). Otherwise we got GbE. Best wishes Jörg > Best regards > Dunyou > > Jörg Saßmannshausen wrote: > > Dear all, > > thanks for the prompt reply. > > > >> Would you please post your environmental variables setup for your > >> compilation? > > > > Sure: > > LIBMPI="lmpich" > > LD_LIBRARY_PATH="/opt/intel/mkl/10.0.011/lib/em64t" > > MPI_INCLUDE="/opt/mpich2-1.0.6_gcc_ifc/include" > > MPI_LIB="/opt/mpich2-1.0.6_gcc_ifc/lib" > > NWCHEM_EXECUTABLE="/usr/local/src/nwchem-5.0/bin/LINUX64/nwchem" > > NWCHEM_MODULES="all" > > NWCHEM_TARGET="LINUX64" > > NWCHEM_TOP="/usr/local/src/nwchem-5.0" > > PATH="/opt/mpich2-1.0.6_gcc_ifc/bin:/opt/intel/fce/9.1.039/bin:/usr/local > >/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/bin/X11" > > SCRATCH_DEF_DIR="/scr/nwchem" > > TCGRSH="/usr/bin/ssh" > > USE_MPI="y" > > USE_MPIF="y" > > LANG="de_AT@euro > > LANGUAGE="de_AT" > > LARGE_FILES="TRUE" > > > >> Have you tried with no BLAS libraries, except the Netlib default which > >> comes with NWChem? > > > > Yes, same problem > > > >> Did you compile mpich with the same compiler as NWChem? > > > > Yes. > > > >> Lastly, you might try the GNU compilers for completeness in debugging. > > > > Yes and no. I did not manage to build NWChem on 64bit with > > gfortran-4.1.2. It is bombing out at one point complaining about a > > missing object file and (presumably earlier) problem with the fortran. I > > gave up on that point. I would like to add that I manage to build it on > > some old PIII machines using gfortran without these problems (also with > > mpich2) > > > > For sake of easier communication I have merged two emails here, I hope > > nobody minds. > > > > Thanks for the prompt respond > > > > Jörg > > > >> Jörg Saßmannshausen wrote: > >>> Dear all, > >>> I am currently fighting with NWChem5.0 compilation on our Opteron > >>> Cluster. Following instructions, I managed to build the binaries using > >>> the Intel Fortran Compiler (9.1.039) and mpich2.-1.0.6. I have tried > >>> the following math libraries: > >>> acml3.6.0, acml4.0.0, acml4.0.1, ATLAS3.8.0, Intel mkl10.0.011 > >>> but I am having problems with above test jobs. It is not only the > >>> pspw.nw, it apears to me that most of the PW testjobs are generating > >>> NaN when I am running them. Also similar things happen in band.nw, > >>> tce_cr_eom_t_ch_rohf.nw, tce_cr_eom_t_ozone.nw, tce_active_ccsdt.nw > >>> .... Running them on the cluster with one CPU works, with 2 CPUs I get > >>> NaN in the calculation and hence the job fails. > >>> > >>> I am somehow lost, I did not have these problems on my dual Xeon 32bit > >>> machine (using Intel Fortran compiler and mkl). > >>> > >>> I am running Debian Etch, for information just ask. > >>> > >>> Could it be a problem with the basis sets? I have noticed there are > >>> some *.F files in the basis set libraries but I somehow cannot see how > >>> that could cause problem. > >>> > >>> Any help would be appreciated, I am fighting with this now for some > >>> time. > >>> > >>> All the best from Graz > >>> > >>> Jörg -- ************************************************************* Jörg Saßmannshausen Institut für chemische Technologie organischer Stoffe TU-Graz Stremayrgasse 16 8010 Graz Austria phone: +43 (0)316 873 8954 fax: +43 (0)316 873 4959 homepage: http://sassy.formativ.net/