From owner-nwchem-users@emsl.pnl.gov Wed May 24 08:54:06 2006 Received: from odyssey.emsl.pnl.gov (localhost [127.0.0.1]) by odyssey.emsl.pnl.gov (8.13.6/8.13.6) with ESMTP id k4OFs6NO009309 for ; Wed, 24 May 2006 08:54:06 -0700 (PDT) Received: (from majordom@localhost) by odyssey.emsl.pnl.gov (8.13.6/8.13.6/Submit) id k4OFs60M009308 for nwchem-users-outgoing-0915; Wed, 24 May 2006 08:54:06 -0700 (PDT) Date: Wed, 24 May 2006 08:53:59 -0700 From: Dunyou Wang Subject: Re: [NWCHEM] Failed to run large parallel jobs In-reply-to: <1148442454.4473d75608598@bamamail.ua.edu> To: Shenggang Li Cc: nwchem-users@emsl.pnl.gov Message-id: <44748197.2010306@pnl.gov> MIME-version: 1.0 Content-type: text/plain; format=flowed; charset=ISO-8859-1 Content-transfer-encoding: 7bit User-Agent: Thunderbird 1.5.0.2 (X11/20060420) References: <1147899995.446b905c06624@bamamail.ua.edu> <446DE53D.30202@pnl.gov> <1148442454.4473d75608598@bamamail.ua.edu> X-OriginalArrivalTime: 24 May 2006 15:54:00.0003 (UTC) FILETIME=[45AED130:01C67F4A] Sender: owner-nwchem-users@emsl.pnl.gov Precedence: bulk Would you set the IB_LIB_NAME = "-lvapi -lmosal -lpthread", and unset the USE_MPIF variable? Then give it another try. Best DY Wang Shenggang Li wrote: > Dunyou, > > sorry for the delay. > > Here is what you have to do. > 1. > > # NWChem 4.7 LINUX64 Intel 9.0 MPICH MELLANOX (Done) > #setenv LARGE_FILES TRUE > #setenv LIB_DEFINES -DDFLT_TOT_MEM=134217728 > #setenv NWCHEM_TOP /home/sli/source/nwchem-4.7/nwchem-4.7 > #setenv NWCHEM_TARGET LINUX64 > #setenv NWCHEM_MODULES all > #setenv FC ifc > #setenv CC icc > #setenv ARMCI_NETWORK MELLANOX > #setenv IB_INCLUDE /usr/local/topspin/include/vapi > #setenv IB_LIB /usr/local/topspin/lib64 > #setenv P4_RSHCOMMAND ssh > #setenv USE_MPI y > #setenv USE_MPIF y > #setenv LIBMPI "-lfmpich_i -lmpich_i -lmpichfsup_i" > #setenv MPI_LIB /usr/local/topspin/mpi/mpich/lib64 > #setenv MPI_INCLUDE /usr/local/topspin/mpi/mpich/include > > 2. Replace the GA with GA 4.0 by > > cd $NWCHEM_TOP/src/tools > mv GNUmakefile ../GNUmakefile.tools > rm -rf * > move all files in GA 4.0 distributions in tools > mv -f ../GNUmakefile.tools GNUmakefile > > 3. Slightly modify GA 4.0 if you are using topspin 3.1 > > cd armci/config > replace -lmtl_comm -lmpga with -lts_ib_cm_user > > 4. Build > > cd $NWCHEM_TOP/src > make FC=ifc CC=icc nwchem_config >& nwchem_config.log & > make FC=ifc CC=icc >& make.log & > > 5. Execute with LSF > > $MPIRUN_SSH -np $NPROC \$LSB_HOSTS $NWCHEM_EXECUTABLE $INPUT.nw >& > $CURRDIR/$INPUT.nwout > > Hope this helps. > > Quoting Dunyou Wang : > > >> Hi Shenggang, >> >> Would you please send us your environment variables when building >> your >> version of NWChem? >> Thanks >> DY Wang >> >> >> Shenggang Li wrote: >> >>> The platform is a Linux cluster with dual Xeon processors on each >>> >> node. >> >>> The NWChem 4.7 package was built with GA 4.0 and Infiniband >>> >> support. >> >>> We tried to run a RB3LYP frequency calculation with ~125 atoms and >>> >> ~1000 >> >>> basis functions. First I used 8 nodes and 16 processors. It >>> >> finished >> >>> the two electron contributions to Hessian, and then never updated >>> >> the >> >>> file again. The second time, I used 32 nodes and 64 processors, >>> >> and it >> >>> stuck at the same point. We have 4GB memory and 50GB scratch space >>> >> per >> >>> node. I am wondering if the job failed because of lack of resource >>> >> or >> >>> something to do with the cluster or NWChem setup. Thanks! >>> >>> >>> >>> > > >