From owner-nwchem-users@emsl.pnl.gov Tue Apr 25 08:37:53 2006 Received: from odyssey.emsl.pnl.gov (localhost [127.0.0.1]) by odyssey.emsl.pnl.gov (8.13.6/8.13.6) with ESMTP id k3PFbrcu007720 for ; Tue, 25 Apr 2006 08:37:53 -0700 (PDT) Received: (from majordom@localhost) by odyssey.emsl.pnl.gov (8.13.6/8.13.6/Submit) id k3PFbr7Z007719 for nwchem-users-outgoing-0915; Tue, 25 Apr 2006 08:37:53 -0700 (PDT) Date: Tue, 25 Apr 2006 11:37:19 -0400 From: So Hirata Subject: RE: [NWCHEM] armci: problem with tickets In-reply-to: <36D2FC86-09F5-4819-A187-CD918B858C34@wsu.edu> To: "'Kirk Peterson'" , "'Nieplocha, Jarek'" Cc: nwchem-users@emsl.pnl.gov Message-id: <200604251537.k3PFbN8F013697@zwart.qtp.ufl.edu> MIME-version: 1.0 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2900.2869 X-Mailer: Microsoft Office Outlook, Build 11.0.5510 Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7bit Thread-Index: AcZoe2h+g7cTKW3qRWGwqzmGTTbw8AAApHwQ X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (zwart.qtp.ufl.edu [128.227.89.3]); Tue, 25 Apr 2006 11:37:29 -0400 (EDT) X-Scanned-By: MIMEDefang 2.54 on 128.227.89.3 Sender: owner-nwchem-users@emsl.pnl.gov Precedence: bulk Thank you Kirk. The CCSDTQ algorithm has hardly changed from 4.6 to 4.7 so I suspect that it has more to do with parallel tools? So > -----Original Message----- > From: Kirk Peterson [mailto:kipeters@wsu.edu] > Sent: Tuesday, April 25, 2006 11:18 AM > To: Nieplocha, Jarek > Cc: hirata@qtp.ufl.edu; nwchem-users@emsl.pnl.gov > Subject: Re: [NWCHEM] armci: problem with tickets > > Jarek and So, > > to reiterate a bit, this happens with just a 2-way parallel run on > one dual processor node. Depending > on the node it runs on, sometimes this error appears before the 1st > iteration, sometimes after 1 and > sometimes after 2. My student ran this same job (same number of > procs, etc.) with NWChem 4.6 > and it ran ok. The nodes have been upgraded from SuSE 8 to Rocks 4.1 > in the meantime (linux kernel 2.4 > to CentOS 4.2 kernel 2.6). > > -Kirk > > PS - here is the tce output: > > General Information > ------------------- > Number of processors : 2 > Wavefunction type : Restricted open-shell Hartree-Fock > No. of electrons : 33 > Alpha electrons : 17 > Beta electrons : 16 > No. of orbitals : 110 > Alpha orbitals : 55 > Beta orbitals : 55 > Alpha frozen cores : 10 > Beta frozen cores : 10 > Alpha frozen virtuals : 0 > Beta frozen virtuals : 0 > Spin multiplicity : doublet > Number of AO functions : 55 > Number of AO shells : 18 > Use of symmetry is : on > Symmetry adaption is : on > Schwarz screening : 0.10D-09 > > Correlation Information > ----------------------- > Calculation type : Coupled-cluster singles, doubles, > triples, & quadruples > Perturbative correction : none > Max iterations : 200 > Residual threshold : 0.10D-04 > Amplitude update : 5-th order DIIS > I/O scheme : Shared File Library > > Memory Information > ------------------ > Available GA space size is -444602274 doubles > Available MA space size is 314564461 doubles > > Maximum block size 9 doubles > Block Spin Irrep Size Offset Alpha > ------------------------------------------------- > 1 alpha a1 3 doubles 0 1 > 2 alpha b1 2 doubles 3 2 > 3 alpha b2 2 doubles 5 3 > 4 beta a1 3 doubles 7 4 > 5 beta b1 2 doubles 10 5 > 6 beta b2 1 doubles 12 6 > 7 alpha a1 6 doubles 13 7 > 8 alpha a1 6 doubles 19 8 > 9 alpha a1 6 doubles 25 9 > 10 alpha a2 4 doubles 31 10 > 11 alpha b1 8 doubles 35 11 > 12 alpha b2 8 doubles 43 12 > 13 beta a1 6 doubles 51 13 > 14 beta a1 6 doubles 57 14 > 15 beta a1 6 doubles 63 15 > 16 beta a2 4 doubles 69 16 > 17 beta b1 8 doubles 73 17 > 18 beta b2 4 doubles 81 18 > 19 beta b2 5 doubles 85 19 > Global files accessible by all nodes assumed > Parallel file system coherency ......... OK > Integral file = ./bro.aoints.0 > Record size in doubles = 65536 No. of integs per rec = 43688 > Max. records in memory = 0 Max. records in file = 5186 > No. of bits per label = 8 No. of bits per value = 64 > > > #quartets = 1.471D+04 #integrals = 2.844D+05 #direct = 0.0% #cached > =100.0% > > > File balance: exchanges= 0 moved= 0 time= 0.0 > > > Fock matrix recomputed > 1-e file size = 1314 > 1-e file name = ./bro.f1 > Cpu & wall time / sec 1.5 4.2 > 2-e (intermediate) file size = 13025650 > 2-e (intermediate) file name = ./bro.v2i > Cpu & wall time / sec 10.2 11.0 > 2-e file size = 1801654 > 2-e file name = ./bro.v2 > Cpu & wall time / sec 125.1 217.3 > t1 file size = 165 > t1 file name = ./bro.t1 > t2 file size = 32146 > t2 file name = ./bro.t2 > t3 file size = 4098822 > t3 file name = ./bro.t3 > t4 file size = 404917383 > t4 file name = ./bro.t4 > > CCSDTQ iterations > -------------------------------------------------------- > Iter Residuum Correlation Cpu Wall > -------------------------------------------------------- > 1 0.6021205170407 -0.2956627051880 15534.5 23247.9 > 2 0.6495573587581 -0.2990325657267 15590.9 23379.0 > 1:armci: problem with tickets: (72373504,72373759) > p1_12623: p4_error: : 72373504 > 1:armci: problem with tickets: (72373504,72373759) > Last System Error Message from Task 1:: No such file or directory > [1] MPI Abort by user Aborting program ! > [1] Aborting program! > > > On Apr 24, 2006, at 10:55 PM, Nieplocha, Jarek wrote: > > > So, > > > > Overflowing ticket table could indicate a massive contention for a > > single lock or perhaps acquiring multiple locks within a critical > > section. Could this be really happening for Kirk's run? > > > > Jarek > > > > > > (Sent from my wireless RIM Blackberry PDA) > > > > > > -----Original Message----- > > From: owner-nwchem-developers@emsl.pnl.gov > > To: 'Kirk Peterson'; nwchem-users@emsl.pnl.gov > > Sent: Mon Apr 24 13:48:04 2006 > > Subject: RE: [NWCHEM] armci: problem with tickets > > > > Dear Kirk: > > > > One possibility is that the offset table for CCSDTQ is getting too > > large. The table stores the offsets of spin spatial symmetry tiles > > of T4 > > amplitudes in a one-dimensional array of T4. I might suggest to > > lower the > > symmetry of molecule, if possible, which would make the number of > > tiles > > fewer (or increase the tilesize by tilesize keyword). > > > > We have since developed a more compact offset table based on hash > > tables and this is no longer a memory issue. Also our current > > development > > version has active-space CC (CCSDt, CCSDtq, CCSDTq, EOM-CCSDt, EOM- > > CCSDtq, > > EOM-CCSDTq), IP/EA-EOM (up to CCSDTQ), spin-orbit CC and EOM, CC + > > perturbation (CCSD(2), CCSDT(2), etc), CIS + perturbation (CIS(D), > > CIS(3), > > CIS(4)). The active-space CCSDTQ and CCSD(2) and CCSDT(2) might be an > > alternative to CCSDTQ when the latter do not complete. They will be > > available in NWCHEM soon. > > > > So Hirata > > Quantum Theory Project, University of Florida > > P.O.Box 118435 Gainesville, FL 32611-8435 > > Tel (352) 392-6976; Fax (352) 392-8722 > > http://www.qtp.ufl.edu/~hirata > > > >> -----Original Message----- > >> From: owner-nwchem-developers@emsl.pnl.gov [mailto:owner-nwchem- > >> developers@emsl.pnl.gov] On Behalf Of Kirk Peterson > >> Sent: Monday, April 24, 2006 3:47 PM > >> To: nwchem-users@emsl.pnl.gov > >> Subject: [NWCHEM] armci: problem with tickets > >> > >> Hi, > >> > >> we're attempting to run a somewhat large CCSDTQ calculation with > >> NWChem 4.7 and run into an error like this after 0-2 iterations: > >> > >> 1:armci: problem with tickets: (72373504,72373759) > >> p1_12623: p4_error: : 72373504 > >> 1:armci: problem with tickets: (72373504,72373759) > >> Last System Error Message from Task 1:: No such file or directory > >> [1] MPI Abort by user Aborting program ! > >> [1] Aborting program! > >> > >> > >> This job was run 2-way parallel on a single dual processor node of an > >> Opteron cluster running a CentOS Linux kernel 2.6.9-22. The network > >> is just gige. This build of NWChem used a MPI flavor of GA (FC=pgf90 > >> (v5.1) and CC=gcc). Any suggestions? the input in the tce section > >> is very simple: > >> > >> tce > >> scf > >> freeze 10 > >> ccsdtq > >> io sf > >> maxiter 200 > >> thresh 1.0e-5 > >> end > >> > >> > >> thanks in advance, > >> > >> Kirk > >> > >> > >> -------------------------------------------- > >> Kirk A. Peterson > >> Professor of Chemistry and Materials Science > >> Washington State University > >> Pullman, WA 99164-4630 > >> > >> Office: (509) 335-7867 > >> Fax: (509) 335-8867 > >> kipeters@wsu.edu > >> http://tyr0.chem.wsu.edu/~kipeters/ > >> --------------------------------------------------------------------- > >> --- > > > > > > > >