From owner-nwchem-users@emsl.pnl.gov Wed Apr 30 11:12:21 2008 Received: from odyssey.emsl.pnl.gov (localhost.pnl.gov [127.0.0.1]) by odyssey.emsl.pnl.gov (8.14.1/8.14.1) with ESMTP id m3UICKIn027782 for ; Wed, 30 Apr 2008 11:12:21 -0700 (PDT) Received: (from majordom@localhost) by odyssey.emsl.pnl.gov (8.14.1/8.14.1/Submit) id m3UICKYE027781 for nwchem-users-outgoing-0915; Wed, 30 Apr 2008 11:12:20 -0700 (PDT) X-Authentication-Warning: odyssey.emsl.pnl.gov: majordom set sender to owner-nwchem-users@emsl.pnl.gov using -f X-Ironport-SG: Throttle X-Ironport-SBRS: 1.5 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Au8CAJpSGEhIDtyYc2dsb2JhbACRWAEMAwQECQ8FlVGFfw X-IronPort-AV: E=Sophos;i="4.25,729,1199692800"; d="scan'208";a="51985872" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:mime-version:content-type:content-transfer-encoding:content-disposition; bh=JspVw5Rvu1DH/O2CHx3En71NBPXcH09RELAHEpn5pZw=; b=pWdexdAs0SgwIFdiYtzQU+bB6X8/ajvHLF4D+COVUNzyepuWjv+ud971vMWk5IZ73FHDoC3hhB/RDUOduF/pYVtiBcLAK/GJ3GYbMvWySmXrAO96ry5oW8q2hnacOFdza4MZw10av48nDZAO1fqt+sPczfGZVw7Yg75gdfocdBo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:mime-version:content-type:content-transfer-encoding:content-disposition; b=tzBHARUYxua9OaFog9oACK+BbbN9SOVLa7H3USnlVzKOFZGsuWB60QdJ5/eCLecn3daz/07PrNebNTyDE25Kl2+/dz0QYcUL2768QKDCs6HbeMnurX6iJTCSluflwPF96Hwsa3FdlTfBbBXmJ684HEjTY8ZmAQ1QQ8TfMJoBoZc= Message-ID: <96f4bb620804301112p61fd5399t41afd5e34732b0b4@mail.gmail.com> Date: Wed, 30 Apr 2008 13:12:16 -0500 From: "Jeff Hammond" To: "Jeremy Merritt" Subject: Re: [NWCHEM] CCSDt memory/IO issues Cc: "NWChem Users" MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-nwchem-users@emsl.pnl.gov Precedence: bulk Jeremy, You are using the IO scheme "Replicated Exclusive Access Files" with CCSDt for 181 orbitals, so it is understantable that IO would be significant. The TCE currently does not run in any sort of "direct" mode in the sense that I think of the term. If you have enough memory, running in-core ("io ga" in the tce input deck) will be significantly faster and use no disk (SCF may use disk for integrals unless you put "direct" in the scf block) In general, you should always try "io ga" first, then if run out memory, try "io dra" when running in serial and "io sf" or "io eaf" if running in parallel. The "io sf" option requires a coherency in your filesystem, which should always be true on an SMP machine, while on clusters it depends on the configuration. It also appears that your memory settings are not optimal. If each node has 8 GB, you should put "memory stack 3600 mb heap 200 mb global 3600 mb" in your input file (outside of any module deck), and if using more than one processor per node, divide by the number of processors you are going to use per node (not total). If you send me your input file, I can test your job and see what options are best. It may be that your job is too big to run on your current machine. Jeff On Wed, Apr 30, 2008 at 12:31 PM, Jeremy Merritt wrote: > Dear NWChem'ers: > I am running EOM-CCSDTA (T3_level_1) calculations but have found that my > jobs are using way more IO time than the CPU time. The jobs is creating ~10 > GB worth of files while it is running. I am not sure if this is a > reasonable amount, but the jobs are running too slow to finish, or usually > erroring due to memory or disk space issues. For smaller basis sets they > finish, so maybe this calculation is just not feasible with my computational > resources. I have 8 GB of memory on my computing nodes, and I have > specified this on a memory card. (I am trying to run with 4 processors > also, the max I am allowed on our cluster) Is the a way to have the TCE > calculation run in a "direct" mode? to save IO time? or are there some other > options which should be chosing for parallel computing? Part of my output > file is shown below. > > NWChem Extensible Many-Electron Theory Module > --------------------------------------------- > > ====================================================== > This portion of the program was automatically > generated by a Tensor Contraction Engine (TCE). > The development of this portion of the program > and TCE was supported by US Department of Energy, > Office of Science, Office of Basic Energy Science. > TCE is a product of Battelle and PNNL. > Please cite: S.Hirata, J.Phys.Chem.A 107, 9887 (2003). > ====================================================== > > General Information > ------------------- > Number of processors : 4 > Wavefunction type : Restricted Hartree-Fock > No. of electrons : 16 > Alpha electrons : 8 > Beta electrons : 8 > No. of orbitals : 362 > Alpha orbitals : 181 > Beta orbitals : 181 > Alpha frozen cores : 4 > Beta frozen cores : 4 > Alpha frozen virtuals : 0 > Beta frozen virtuals : 0 > Alpha active occupieds : 4 > Beta active occupieds : 4 > Alpha active virtuals : 12 > Beta active virtuals : 12 > T3 active excitation level : 1 > Spin multiplicity : singlet > Number of AO functions : 184 > Number of AO shells : 56 > Use of symmetry is : on > Symmetry adaption is : on > Schwarz screening : 0.10D-09 > > !! WARNING !! The number of MO is less than the number of AO > > Correlation Information > ----------------------- > Calculation type : Coupled-cluster singles, doubles, & active > triples Perturbative correction : none > Max iterations : 100 > Residual threshold : 0.10D-04 > DIIS level shift : 0.00D+00 > Amplitude update : 5-th order DIIS > No. of excited states : 7 > Target root : 1 > Target symmetry : a1 Symmetry restriction : on > Dipole & oscillator str : off > I/O scheme : Replicated Exclusive Access Files > > Memory Information > ------------------ > Available GA space size is 279586408 doubles > Available MA space size is 952443723 doubles > > Maximum block size 23 doubles > > Block Spin Irrep Size Offset Alpha > ------------------------------------------------- > 1 alpha a1 2 doubles 0 1 > 2 alpha b1 1 doubles 2 2 > 3 alpha b2 1 doubles 3 3 > 4 beta a1 2 doubles 4 1 > 5 beta b1 1 doubles 6 2 > 6 beta b2 1 doubles 7 3 > 7 alpha a1 5 doubles 8 7 > 8 alpha a1 16 doubles 13 8 > 9 alpha a1 17 doubles 29 9 > 10 alpha a1 17 doubles 46 10 > 11 alpha a2 1 doubles 63 11 > 12 alpha a2 15 doubles 64 12 > 13 alpha a2 16 doubles 79 13 > 14 alpha b1 3 doubles 95 14 > 15 alpha b1 20 doubles 98 15 > 16 alpha b1 20 doubles 118 16 > 17 alpha b2 3 doubles 138 17 > 18 alpha b2 20 doubles 141 18 > 19 alpha b2 20 doubles 161 19 > 20 beta a1 5 doubles 181 7 > 21 beta a1 16 doubles 186 8 > 22 beta a1 17 doubles 202 9 > 23 beta a1 17 doubles 219 10 > 24 beta a2 1 doubles 236 11 > 25 beta a2 15 doubles 237 12 > 26 beta a2 16 doubles 252 13 > 27 beta b1 3 doubles 268 14 > 28 beta b1 20 doubles 271 15 > 29 beta b1 20 doubles 291 16 > 30 beta b2 3 doubles 311 17 > 31 beta b2 20 doubles 314 18 > 32 beta b2 20 doubles 334 19 > > Replicated distributed files algorithm will be used > > Parallel file system coherency ......... OK > > Fock matrix recomputed > 1-e file size = 8145 > 1-e file name = ./a1-pVTZ.f1.0 Cpu & wall time / sec 10.6 > 10.6 > > 2-e (intermediate) file size = 2221732288 > 2-e (intermediate) file name = ./a1-pVTZ.v2i.0 Cpu & wall time / sec > 497.1 5085.4 > > 2-e file size = 321237610 > 2-e file name = ./a1-pVTZ.v2.0 * Cpu & wall time / sec 804.5 > 40491.9* > > t1 file size = 196 > t1 file name = ./a1-pVTZ.t1.0 before tce_guess_t2 > after tce_guess_t2 > > t2 file size = 171465 > t2 file name = ./a1-pVTZ.t2.0 > t3a file size = 12870494 > t3a file name = ./a1-pVTZ.t3.0 > > Thanks in advance, > Jeremy > > -- > Jeremy Merritt, PhD > Department of Chemistry > CB 212 Atwood Hall > Emory University > Atlanta, Ga 30322 > Voice: 404-727-0029 > Fax: 404-727-6586 > Email: jeremy.merritt@emory.edu > > -- Jeff Hammond The University of Chicago