From owner-nwchem-users@emsl.pnl.gov Sat Jul 2 23:56:08 2005 Received: from odyssey.emsl.pnl.gov (localhost [127.0.0.1]) by odyssey.emsl.pnl.gov (8.12.10/8.12.10) with ESMTP id j636u8ox023834 for ; Sat, 2 Jul 2005 23:56:08 -0700 (PDT) Received: (from majordom@localhost) by odyssey.emsl.pnl.gov (8.12.10/8.12.10/Submit) id j636u8eY023833 for nwchem-users-outgoing; Sat, 2 Jul 2005 23:56:08 -0700 (PDT) Date: Sun, 03 Jul 2005 15:30:53 +0800 (CST) From: Jason Shih Subject: nwchem fail on ibm sp(pwr4) (fwd) X-X-Sender: jason@bit135.sinica.edu.tw To: "NWChem User's Mailing List" Reply-to: Jason Shih Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-nwchem-users@emsl.pnl.gov Precedence: bulk Dear NWChem users, I am not sure if you have similiar problem before, when compiling NWchem on IBM sp machine. error return when executing over two processors: ------------------------ gasc01:~/nwchem/example> ./nwchem h2o_scf.nw 0:lapi_init failed 410(19a) 0:lapi_init failed 410(19a) system message: Error 0 system message: Error 0 ERROR: 0031-250 task 1: Terminated ERROR: 0031-250 task 0: Terminated ------------------------ after turn on the debug level, further information is dump as shown below: error msg when turn on MP debug level: -------------------------- gasc01:~/nwchem/example> ./nwchem h2o_scf.nw INFO: DEBUG_LEVEL changed from 0 to 2 D1: Open of file /euler6/user3/sci/hlshih/hpc/MPI/Myhosts.j successful D1: mp_euilib = ip D1: task 0 gasc01 10.109.12.11 10 D1: node allocation strategy = 0 D1: Entering pm_contact, jobid is 0 D1: Jobid = 1127413211 D1: DCE is not available...processing continues. D1: Requesting service pmv3 D1: 1 master nodes D1: Socket file descriptor for master 0 (gasc01) is 4 D1: Leaving pm_contact, jobid is 1127413211 D1: attempting to bind socket to /tmp/s.pedb.7045236.34411 INFO: 0031-724 Executing program: <./nwchem> INFO: DEBUG_LEVEL changed from 0 to 2 D1: In mp_main, mp_main will not be checkpointable D1: mp_euilib is LAPI: @(#) 03/12/12 16:08:50 LAPI version # 4.77 Date:11/17/2003 0:lapi_init failed 410(19a) 0:lapi_init failed 410(19a) system message: Error 0 D1: In pm_child_sig_handler, signal=15, task=0 INFO: 0031-656 I/O file STDOUT closed by task 0 INFO: 0031-656 I/O file STDERR closed by task 0 ERROR: 0031-250 task 0: Terminated D1: All remote tasks have exited: maxx_errcode = 143 INFO: 0031-639 Exit status from pm_respond = 0 D1: Maximum return code from user = 143 D2: In pm_exit... About to call pm_remote_shutdown D2: Sending PMD_EXIT to task 0 D2: Elapsed time for pm_remote_shutdown: 0 seconds D2: In pm_exit... Calling exit with status = 143 at Sun Jul 3 13:19:35 2005 -------------------------- I have search around but cant find any clue for this problem. what would you suggest to profile further? Thanks. Br, J -- --------------------------------- Jason Shih HPC Team, Academia Sinica Computing Center No.128, Sec. 2, Academia Rd., Nangang District, Taipei City 11529,Taiwan (R.O.C.) Tel: +886-2-27899960 Fax: +886-2-27899949 ---------------------------------