Regression Test Selection for Updated Capability Modules in Compositional ML Systems via Atomic-Quality Probes

arXiv CS.AI

이 뉴스, 어떠셨어요?

한 번의 탭으로 반응을 남겨요 · 로그인 불필요

CC BY

이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.

Abstract

Compositional machine-learning (ML) systems assemble runtime behavior from libraries of independently re-trained capability modules.

Replacing one module raises a regression-testing question that static dependence analysis cannot answer: which existing compositions stay valid, and at what test cost?

We frame capability updates as regression test selection (RTS) and contribute four results.

First, a paired cross-version swap protocol isolates the marginal effect of a single module update.

Second, on two contact-rich manipulation tasks we characterize a dominant-skill effect: one capability module reaches 88.0% atomic success while siblings stay at or below 32.0%, and its inclusion shifts composition success by up to 52 percentage points; a controlled weight-space interpolation tracks composition success against atomic quality point-by-point (pooled Pearson r=0.94), and the effect replicates on a second task, where the governing module must lie on the critical path of the phase sequence.

Third, off-policy behavioral-distance metrics fail to identify the dominant module.

Fourth, a margin-gated Hybrid Selector matches full revalidation at zero per-decision test cost (75.0% gold-label agreement, with no detectable difference) and reaches 81.25% match at half of full-revalidation cost, beating a cost-matched random budget (Monte-Carlo p=0.039).

A resolution analysis shows that coarse evaluation overstates the apparent advantage of full revalidation.

The atomic-quality probe gives a principled test-selection criterion for capability-update regression testing in compositional ML systems.

전문 보기

Regression Test Selection for Updated Capability Modules in Compositional ML Systems via Atomic-Quality Probes

이 뉴스, 어떠셨어요?

Abstract

관련 뉴스

'research' 카테고리 뉴스

PACE: A Neuro-Symbolic Framework for Plausible and Actionable Counterfactual Explanations

Auto-FL-Research: Agentic Search for Federated Learning Algorithms

The Wiola Architecture for Efficient Small Language Models

arXiv의 다른 기사

CreativityNeuro: Steering Language Model Weights to Improve Divergent Thinking and Reduce Mode Collapse

Discrete Diffusion Language Models for Interactive Radiology Report Drafting

Beyond Next-Token Prediction: An RLVR Proof of Concept for Tool-Use Agents on Atlassian Workflows