Budget-Constrained Compound Library Prioritization with Risk Awareness and Uncertainty Quantification
이 뉴스, 어떠셨어요?
한 번의 탭으로 반응을 남겨요 · 로그인 불필요
Abstract
Early discovery projects often face a budgeted prioritization problem: many structures can be enumerated or purchased, but only a small fraction can be tested, reviewed, or synthesized first.
I formulate this setting as risk-aware compound-library compression.
Given a molecular library and a fixed Top-k budget, the goal is to return an enriched candidate subset while preserving uncertainty, applicability-domain evidence, ADMET/structural alerts, and audit fields needed for human review.
The framework intentionally uses a transparent 2D activity proxy rather than a complex representation model, combining Morgan fingerprints, RDKit descriptors, a multilayer perceptron, split-conformal uncertainty intervals, leakage auditing, and auditable export.
On ChEMBL 36, the model achieved Spearman 0.7674 and EF@1% 2.7331 on internal validation, and Spearman 0.5171 with EF@1% 2.4359 on a temporal holdout.
After fold-0 training-overlap control, a scaffold-disjoint BACE subset retained ROC AUC 0.7626 and EF@1% 2.0253.
In a strict 100-molecule BACE decision-layer replay, risk-aware ordering kept Hit@10 at 0.9000 while exposing review evidence that pure activity sorting omits.
An EGFR/CHEMBL203 label-hidden operational replay supports workflow feasibility but is reported as same-source sensitivity analysis rather than independent external validation.
The claim is bounded: the evidence supports risk-aware library compression as an upstream prioritization layer, while prospective blinded validation remains necessary before claiming project-specific hit-rate or cost improvements.