High-performance parallel implementation of high-order coupled-cluster theories
이 뉴스, 어떠셨어요?
한 번의 탭으로 반응을 남겨요 · 로그인 불필요
Abstract
High-order coupled-cluster theories with iterative triples (CCSDT), perturbative quadruples [CCSDT(Q)], and iterative quadruples (CCSDTQ) provide benchmark-quality correlation energies, but their steep computational scalings, $O(N^8), O(N^9)$, and $O(N^{10})$, together with the large memory requirements of high-order amplitude tensors, have historically limited their application to small molecules.
In this work, we develop efficient open-source implementations of spin-restricted CCSDT (RCCSDT), RCCSDT(Q), RCCSDTQ, and spin-unrestricted CCSDT (UCCSDT) within the PySCF package.
The shared-memory implementation combines compact triangular storage of the highest-order amplitude tensors with the multithreaded tensor contraction backend pytblis, enabling efficient use of modern many-core CPU architectures.
This design delivers near-ideal thread scaling up to 90 cores and achieves wall times shorter than or comparable to existing single-node implementations for representative benchmark molecules.
We further extend RCCSDT, RCCSDT(Q), and RCCSDTQ to distributed-memory architectures using MPI-based algorithms.
By distributing compact high-order amplitudes across MPI ranks and overlapping communication with computation through nonblocking data transfers, the distributed implementation achieves near-ideal strong scaling on up to 32 nodes, corresponding to approximately 3,000 CPU cores.
These developments substantially extend the practical reach of canonical high-order CC theory, enabling CCSDT(Q) calculations with approximately 100 correlated electrons in 450 orbitals and CCSDTQ calculations with approximately 50 correlated electrons in 115 orbitals.
Applications to $\pi$-stacked noncovalent dimers, the CO dissociation energy of Cr(CO)$_6$, and the Cope rearrangement of semibullvalene demonstrate that canonical high-order CC benchmarks are now feasible for chemically realistic molecular systems.