From owner-nwchem-users@emsl.pnl.gov Fri Jun 8 09:58:57 2007 Received: from odyssey.emsl.pnl.gov (localhost [127.0.0.1]) by odyssey.emsl.pnl.gov (8.13.8/8.13.8) with ESMTP id l58Gwu7W007537 for ; Fri, 8 Jun 2007 09:58:57 -0700 (PDT) Received: (from majordom@localhost) by odyssey.emsl.pnl.gov (8.13.8/8.13.8/Submit) id l58GwuGE007536 for nwchem-users-outgoing-0915; Fri, 8 Jun 2007 09:58:56 -0700 (PDT) X-Authentication-Warning: odyssey.emsl.pnl.gov: majordom set sender to owner-nwchem-users@emsl.pnl.gov using -f X-Possible-Spoof: True X-IronPort-AV: E=Sophos;i="4.16,400,1175497200"; d="scan'208";a="34379389" Message-ID: <46698ACD.8030406@pnl.gov> Date: Fri, 08 Jun 2007 09:58:53 -0700 From: Dunyou Wang User-Agent: Thunderbird 2.0.0.0 (X11/20070326) MIME-Version: 1.0 To: =?x-windows-949?Q?=C1=A4=B5=BF=C7=F6?= CC: nwchem-users@emsl.pnl.gov Subject: Re: [NWCHEM] HESSIAN calculation error References: <001001c7a965$a63460b0$8364a8c0@DHJUNG@gabia@nwchem-users@emsl.pnl.gov> In-Reply-To: <001001c7a965$a63460b0$8364a8c0@DHJUNG@gabia@nwchem-users@emsl.pnl.gov> Content-Type: multipart/mixed; boundary="------------090605040103070307010704" X-OriginalArrivalTime: 08 Jun 2007 16:58:52.0929 (UTC) FILETIME=[4B052F10:01C7A9EE] Sender: owner-nwchem-users@emsl.pnl.gov Precedence: bulk This is a multi-part message in MIME format. --------------090605040103070307010704 Content-Type: text/plain; charset=ks_c_5601-1987 Content-Transfer-Encoding: 8bit Dear Dong, We've seen this error happened before, it's due to the asymmetric numbers of processors per node. This means that you can overcome this problem by requiring the same number of processors on each node. We've also got a patch to fix this problem, so please use the enclosed patch file to patch base.c under your $NWCHEM_TOP/src/tools/global/src directory. This will solve your current problem too. Cheers Dunyou Á¤µ¿Çö wrote: > Dear NWCHEM users, > > I am trying to install NWChem into our Quadcore Xeon 2.66GHz linux server. > I tried a binary version of NWChem-5.0 (Intel-EM64T-MPICH2) and also > compiled the source using MPICH1 library and intel compiler(v9.0). > When I use only one core per node, all things seem to be ok. > But, when I use multi cores per node, after a successful geometry > optimization step, the HESSIAN calculation does not go on. > Our system is > 6-node(2-way) Quadcore Xeon 2.66GHz > OS: RHEL 4 > > Please help me... > Thank you in advance... > > Sincerely, > Dong Hyun Jung > > The error messages are following: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > NWChem Input Module > ------------------- > > > > > NWChem Nuclear Hessian and Frequency Analysis > --------------------------------------------- > > > > NWChem Analytic Hessian > ----------------------- > > > HESSIAN: the one electron contributions are done in 0.2s > > 4:4:ga_copy:ga_merge_mirrored:nga_access_ptr:locate top failed:: 0 > 5:5:ga_copy:ga_merge_mirrored:nga_access_ptr:locate top failed:: 0 > 3:3:ga_copy:ga_merge_mirrored:nga_access_ptr:locate top failed:: 0 > 3:3:ga_copy:ga_merge_mirrored:nga_access_ptr:locate top failed:: 0 > 7:7:ga_copy:ga_merge_mirrored:nga_access_ptr:locate top failed:: 0 > 7:7:ga_copy:ga_merge_mirrored:nga_access_ptr:locate top failed:: 0 > Last System Error Message from Task 7:: Resource temporarily unavailable > 6:6:ga_copy:ga_merge_mirrored:nga_access_ptr:locate top failed:: 0 > Last System Error Message from Task 3:: No such file or directory > p3_9121: p4_error: : 0 > [7] MPI Abort by user Aborting program ! > [7] Aborting program! > p7_26654: p4_error: : 0 > p5_7693: p4_error: : 0 > p4_4960: p4_error: : 0 > 1:armci_rcv_data: read failed: -1 > rm_l_3_9122: (49.640625) net_send: could not write to fd=5, errno = 32 > [3] MPI Abort by user Aborting program ! > [3] Aborting program! > ocess terminated prematurely, status=: 0 > ram ! > [4] Aborting program! > p6_25248: p4_error: : 0 > rm_l_6_25249: (49.285156) net_send: could not write to fd=5, errno = 32 > ld not write to fd=5, errno = 32 > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > The input file is the example file for h2o optimization and frequency: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > start h2o > title Water > geometry units au autosym > O 0.00000000 0.00000000 0.00000000 > H 0.00000000 1.93042809 -1.10715266 > H 0.00000000 -1.93042809 -1.10715266 > end > basis noprint > H library sto-3g > O library sto-3g > end > scf; thresh 1e-6; end > driver; tight; end > task scf optimize > scf; thresh 1e-8; print none; end > task scf freq > freq > reuse; temp 4 298.15 300.0 350.0 400.0 > end > task scf freq > freq > reuse; mass H 2.014101779 > temp 1 298.15 > end > task scf freq > freq > reuse; mass 2 2.014101779 > end > task scf freq > freq > reuse; mass 2 2.014101779 ; mass 3 3.01604927 > end > task scf freq > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Dong Hyun Jung, Ph. D > Chief Scientist > > Insilicotech Co. Ltd. > A-1101, Kolontripolis, 210, Geumgok-Dong, Seongnam, > Gyeonggi-Do, 463-805, Korea > Tel. +82-31-728-0443 Fax. +82-31-728-0444 > dhjung@insilicotech.co.kr > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > --------------090605040103070307010704 Content-Type: text/plain; name="base.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="base.patch" --- base.c 2007-06-08 09:29:51.769388000 -0700 +++ base.c.bak 2007-06-08 09:10:20.103183000 -0700 @@ -3503,7 +3503,6 @@ float f_one = 1.0; long l_one = 1; double c_one[2]; - int chk = 1; c_one[0] = 1.0; c_one[1] = 0.0; @@ -3530,25 +3529,12 @@ ga_error("Unable to create work array for merge",GAme); ga_zero_(&_ga_tmp); /* Find data on this processor and accumulate in temporary global array */ - inode = GAme - zproc; - nga_distribution_(g_a,&inode,lo,hi); - - /* Check to make sure processor has data */ - chk = 1; - for (i=0; i