Surrogate-Gated Generation and Foundation-Model Embeddings for Bayesian Materials Design
이 뉴스, 어떠셨어요?
한 번의 탭으로 반응을 남겨요 · 로그인 불필요
Abstract
Closed-loop materials discovery iterates between proposing candidate structures and evaluating their properties, and property evaluation dominates the cost.
In the generative variant, a learned prior proposes candidate crystals and a property oracle scores them; we ask whether a cheap probabilistic surrogate can triage the generator's output, and what such a surrogate must do well.
Across three architecturally distinct pretrained diffusion priors (MatterGen, CrystalFlow, ADiT) and two targets (room-temperature heat capacity and bulk modulus), we insert a Gaussian process acquisition gate between structure generation and the oracle in an RL-steered generative workflow.
The gate matches or exceeds ungated fine-tuning of the generative model while capping oracle calls at a fixed per-cycle budget.
Budget-matched ablations isolate the mechanism.
At an identical four-call budget, ranking-based selection outperforms arbitrary selection, confirming that the gain comes from the surrogate's choice; the gate comes within $\sim$9\% of exhaustive oracle spending at roughly one-fifth of the calls.
A density-functional-theory check of the bulk-modulus discoveries confirms the learned oracle to within 2.5\% on average and the surrogate's ranking of the generated structures at Spearman $\rho = 0.94$.
A cross-factorial benchmark of surrogate performance spanning mechanical, electronic, and vibrational properties identifies pretrained ORB embeddings with a Gaussian process as the most reliable combination, which we adopt as the building blocks of the proposed workflow.
The complete pipeline is released as open-source software.