Grounded autonomous scrutiny at scale: emergent critique from reproduction of published computational physics papers

arXiv CS.AI

이 뉴스, 어떠셨어요?

한 번의 탭으로 반응을 남겨요 · 로그인 불필요

CC BY

이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.

Abstract

Autonomous LLM agents now produce complete research artifacts in machine-learning sandboxes, but real computational physics is harder: experiments are first-principles calculations against re-runnable physical ground truth, and meaningful new work almost always builds on a key existing paper.

We ask whether such an agent can perform grounded scrutiny of published computational physics - reading a paper, reproducing it from scratch, and surfacing methodological concerns from execution.

We deploy a single Claude Opus 4.6 configuration at two complementary scopes.

At scale, across 111 open-access Quantum ESPRESSO papers, an autonomous agent runs the read-plan-compute-compare loop and, although never asked to critique, raises substantive methodological concerns on ~42% of papers; 85 of 88 of these critiques (96.6%) surface only after the agent has actually run a calculation, with a reading-only ceiling of 1.8%.

Critique emerges from reproduction, not from reading.

In depth, on one Nature Communications paper on multiscale device simulation of a 2D-material MOSFET, a fresh agent inheriting a verified reproduction pipeline autonomously produces a 14-concern physics inventory and a complete, submission-form six-page Comment that revises the paper's L_G = 5 nm headline.

Two of its L_G = 5 nm headline-challenging attacks - a source-degeneration contact-resistance bound and a Sb-doping degradation ratio - are absent from the published 21-reviewer peer review.

전문 보기

Grounded autonomous scrutiny at scale: emergent critique from reproduction of published computational physics papers

이 뉴스, 어떠셨어요?

Abstract

관련 뉴스

'research' 카테고리 뉴스

PACE: A Neuro-Symbolic Framework for Plausible and Actionable Counterfactual Explanations

Auto-FL-Research: Agentic Search for Federated Learning Algorithms

The Wiola Architecture for Efficient Small Language Models

arXiv의 다른 기사

CreativityNeuro: Steering Language Model Weights to Improve Divergent Thinking and Reduce Mode Collapse

Discrete Diffusion Language Models for Interactive Radiology Report Drafting

Beyond Next-Token Prediction: An RLVR Proof of Concept for Tool-Use Agents on Atlassian Workflows