Regression Test Selection for Updated Capability Modules in Compositional ML Systems via Atomic-Quality Probes
이 뉴스, 어떠셨어요?
한 번의 탭으로 반응을 남겨요 · 로그인 불필요
Abstract
Compositional machine-learning (ML) systems assemble runtime behavior from libraries of independently re-trained capability modules.
Replacing one module raises a regression-testing question that static dependence analysis cannot answer: which existing compositions stay valid, and at what test cost?
We frame capability updates as regression test selection (RTS) and contribute four results.
First, a paired cross-version swap protocol isolates the marginal effect of a single module update.
Second, on two contact-rich manipulation tasks we characterize a dominant-skill effect: one capability module reaches 88.0% atomic success while siblings stay at or below 32.0%, and its inclusion shifts composition success by up to 52 percentage points; a controlled weight-space interpolation tracks composition success against atomic quality point-by-point (pooled Pearson r=0.94), and the effect replicates on a second task, where the governing module must lie on the critical path of the phase sequence.
Third, off-policy behavioral-distance metrics fail to identify the dominant module.
Fourth, a margin-gated Hybrid Selector matches full revalidation at zero per-decision test cost (75.0% gold-label agreement, with no detectable difference) and reaches 81.25% match at half of full-revalidation cost, beating a cost-matched random budget (Monte-Carlo p=0.039).
A resolution analysis shows that coarse evaluation overstates the apparent advantage of full revalidation.
The atomic-quality probe gives a principled test-selection criterion for capability-update regression testing in compositional ML systems.