Structural Enforcement of Statistical Rigor in AI-Driven Discovery: A Functional Architecture

arXiv CS.AI

이 뉴스, 어떠셨어요?

한 번의 탭으로 반응을 남겨요 · 로그인 불필요

CC BY

이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.

Abstract

AI-Scientist systems risk manufacturing spurious discoveries through uncontrolled multiple testing.

We present a functional architecture that enforces statistical rigor at two levels: a Haskell embedded domain-specific language (the Research monad) that makes it impossible to test a hypothesis without updating the error budget, and a declarative scaffold, backed by an OS-level sandbox, that makes validation data physically absent from the environment in which LLM-generated code runs.

We ground the design in a machine-checked Lean~4 formalization of LORD++ online false-discovery-rate (FDR) control: we derive its error budget and prove both marginal and full FDR control, then close the gap to the implementation by verifying the budget's wealth invariant over IEEE~754 arithmetic in SPARK/Ada.

To our knowledge this is the first verified chain from theorem to floating-point implementation for an online FDR procedure.

In simulation, the architecture holds the false discovery rate near 1\% against a 5\% target, where a naive approach reaches 41\%.

In end-to-end case studies, a valid test avoids the false discoveries a flawed one produces, yet still finds real effects when the data allow.

An adversarial evaluation confirms that generated code cannot read the held-out data even when given its exact path.

전문 보기

Structural Enforcement of Statistical Rigor in AI-Driven Discovery: A Functional Architecture

이 뉴스, 어떠셨어요?

Abstract

관련 뉴스

'research' 카테고리 뉴스

Constructive Alignment: Governing Preference Dynamics in Human-AI Interaction

Bounded Morality: Defining the Space of Moral Computation

The MMM Data Model -- A Normative Specification for Knowledge Interoperability in a Decentralisable Knowledge Commons

arXiv의 다른 기사

RareDxR1: Autonomous Medical Reasoning for Rare Disease Diagnosis Beyond Human Annotation

A Contextual-Bandit Oversight Game with Two-Sided Informational Asymmetry

Constructing Epistemic AI Literacy: Detecting Epistemic Aims and Processes in Student-AI Co-Programming