From owner-nwchem-users@emsl.pnl.gov Mon Jul 18 09:54:57 2005 Received: from odyssey.emsl.pnl.gov (localhost [127.0.0.1]) by odyssey.emsl.pnl.gov (8.12.10/8.12.10) with ESMTP id j6IGsvND015558 for ; Mon, 18 Jul 2005 09:54:57 -0700 (PDT) Received: (from majordom@localhost) by odyssey.emsl.pnl.gov (8.12.10/8.12.10/Submit) id j6IGsud4015554 for nwchem-users-outgoing; Mon, 18 Jul 2005 09:54:56 -0700 (PDT) Date: Mon, 18 Jul 2005 12:53:53 -0400 From: "Y. Huang" Subject: Re: Parallel execution errors In-reply-to: <42DBCF67.7010105@pnl.gov> To: Edoardo Apra` Cc: nwchem-users@emsl.pnl.gov Message-id: MIME-version: 1.0 Content-type: MULTIPART/MIXED; BOUNDARY="-2124328755-801594660-1121705633=:728654" X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (scienide.uwaterloo.ca [0.0.0.0]); Mon, 18 Jul 2005 12:53:56 -0400 (EDT) X-Miltered: at demeter with ID 42DBDEA1.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Virus-Scanned: ClamAV version 0.86.1, clamav-milter version 0.86 on localhost X-Virus-Status: Clean Sender: owner-nwchem-users@emsl.pnl.gov Precedence: bulk This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. ---2124328755-801594660-1121705633=:728654 Content-Type: TEXT/PLAIN; charset=US-ASCII Hi Edoardo, Attached the input file "h3tr1.nw", it is just the sample input file under the $NWCHEM_TOP/examples/dirdyvtst/h3. Thanks for your help. Yiye On Mon, 18 Jul 2005, Edoardo Apra` wrote: > Yiye > could you please send us your input file? > Thanks, Edo > > Y. Huang wrote: > > >Hi, > > > >I compiled NWCHEM with MPI option under lam-mpi in a dual opteron system > >with two nodes. Its OS is Rocks Cluster. The LAM run-time environment was > >lauched by the lamboot command, the hostfile for lam-mpi contains, > > > >vdw01 cpu=2 > >compute-0-0 cpu=2 > > > >However, when I ran a NWCHEM example: > > > >[huang@vdw01 h3]$ mpirun -np 2 nwchem h3tr1.nw > > > >I got errors: > > > >... > > stpr_wrt_fd_from_sq: overwrite of existing file:./h3.hess > > stpr_wrt_fd_dipole: overwrite of existing file./h3.fd_ddipole > >1:Segmentation Violation error, status=: 11 > >1:Segmentation Violation error, status=: 11 > >--------------------------------------------------------------------- > >One of the processes started by mpirun has exited with a nonzero exit > >code. This typically indicates that the process finished in error. > >If your process did not finish in error, be sure to include a "return > >0" or "exit(0)" in your C code before exiting the application. > > > >PID 20009 failed on node n0 (10.1.1.1) with exit status 1. > >--------------------------------------------------------------------- > > > > > > > >Please note, I don't have any problem to run my own MPI programs in this > >system. > > > >I also compiled NWCHEM without the MPI option, and it ran fine under a > >single processor. However, when I ran it in parallel using TCGMSG (a file > >named "h3tr1.p" was in the working directory): > > > >[huang@vdw01 h3]$ parallel h3tr1 h3tr1.nw > > > >I got, > > > >... > >1:Segmentation Violation error, status=: 11 > >1:Segmentation Violation error, status=: 11 > >Last System Error Message from Task 1:: No such file or directory > > 1: ARMCI aborting 11 (0xb). > > 1: ARMCI aborting 11 (0xb). > >system error message: No such file or directory > > stpr_wrt_fd_from_sq: overwrite of existing file:./h3.hess > > stpr_wrt_fd_dipole: overwrite of existing file./h3.fd_ddipole > >0:Child process terminated prematurely, status=: 256 > >0:Child process terminated prematurely, status=: 256 > >Last System Error Message from Task 0:: No such file or directory > > 0: ARMCI aborting 256 (0x100). > > 0: ARMCI aborting 256 (0x100). > >system error message: No such file or directory > > 2: interrupt(1) > >WaitAll: No children or error in wait? > > > > > >Please help! > > > > > >Yiye > > > >********************************** > > Department of Chemistry > > University of Waterloo > > Waterloo, Ontario N2L 3G1 > > Tel: (519) 888-4567 ext.6110 > > E-mail: huang@uwaterloo.ca > >********************************** > >| (\ > >| http://hpc.uwaterloo.ca ( \ > >|__________________________) ) /> > > / ) / //))/ > > \ \_/ ///// > > \ / > > \_ / > > | | > > |___| > > > > > > > ---2124328755-801594660-1121705633=:728654 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="h3tr1.nw" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: attachment; filename="h3tr1.nw" c3RhcnQgaDMNCg0KYmFzaXMNCiBoIGxpYnJhcnkgMy0yMUcNCmVuZA0KDQpz Y2YgICAgIA0KICAgdWhmIA0KICAgZG91YmxldCANCiAgIHRocmVzaCAxLjBl LTYgDQplbmQNCg0KZGlyZHl2dHN0IGF1dG9zeW0gMC4wMDENCiAgdGhlb3J5 IHNjZg0KKkdFTkVSQUwNCiAgVElUTEUNCiAgICBUZXN0IHJ1bjogSCtIMiBy ZWFjdGlvbiwgRXVsZXIgaW50ZWdyYXRpb24sIG5vIHJlc3RhcnQNCg0KICBB VE9NUw0KICAgIDEgIEgNCiAgICAyICBIDQogICAgMyAgSA0KICBFTkQNCg0K KlJFQUNUMQ0KICAgR0VPTQ0KICAgICAxICAgMC4wICAgMC4wICAgMC4wDQog ICAgIDIgICAwLjAgICAwLjAgICAxLjM4ODYxNDQNCiAgIEVORCANCg0KICAg U1BFQ0lFUyBMSU5SUA0KDQoqUkVBQ1QyDQoNCiAgIEdFT00NCiAgICAzICAg IDAuMCAgIDAuMCAgICAxOTAuMzYxMjEzMg0KICAgRU5EDQoNCiAgIFNQRUNJ RVMgIEFUT01JQw0KDQoqUFJPRDINCg0KICBHRU9NDQogICAxICAwLjAgICAw LjAgMTkwLjM2MTIxMzINCiAgRU5EDQoNCiAgU1BFQ0lFUyAgQVRPTUlDDQoN CipQUk9EMQ0KICBHRU9NDQogICAyICAgIDAuMCAgIDAuMCAgIDEuMzg4NjE0 NA0KICAgMyAgICAwLjAgICAwLjAgICAwLjANCiAgRU5EDQoNCiAgU1BFQ0lF UyAgTElOUlANCg0KKlNUQVJUDQogIEdFT00NCiAgIDEgICAgMC4wICAgMC4w ICAtMS43NjUzMTk3Mw0KICAgMiAgICAwLjAgICAwLjAgICAwLjANCiAgIDMg ICAgMC4wICAgMC4wICAgMS43NjUzMTk3Mw0KICBFTkQNCg0KICBTUEVDSUVT ICBMSU5UUw0KDQoqUEFUSA0KICAgU1NURVAgIDAuMDEgDQogICBTU0FWRSAg MC4wNQ0KICAgU0xQICAgIDAuNSANCiAgIFNMTSAgIC0wLjUNCiAgIFNDQUxF TUFTUyAgICAwLjY3MTg5OTMNCg0KICAgSU5URUdSQSBFVUxFUg0KICAgDQog ICBQUklOVEZSRVENCmVuZA0KDQp0YXNrIGRpcmR5dnRzdA0K ---2124328755-801594660-1121705633=:728654--