학술
기타

AIChilles: Automatically Uncovering Hidden Weaknesses in AI-Evolved Systems

arXiv CS.AI
조회 0
CC BY
이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.
Computer Science > Artificial Intelligence [Submitted on 14 Jun 2026] Title:AIChilles: Automatically Uncovering Hidden Weaknesses in AI-Evolved Systems View PDF HTML (experimental)Abstract:The computer systems community has recently seen growing interest in AI-driven system evolution, where AI agents iteratively rewrite systems. Frameworks such as AdaEvolve and Engram report 12-60% score improvements over human-designed algorithms. While these results are promising, there are practical concerns if these AI-evolved programs can perform worse on unseen workloads and exhibit scalability regressions. Given the speed and scale of AI-generated code, we need automated mechanisms to uncover such identify hidden weaknesses in AI-evolved systems programs. To this end, we develop AIChilles that takes as input a baseline program $P$ and an AI-evolved program $P'$, AIChilles searches for valid workloads where $P'$ regresses relative to $P$ in correctness, runtime, memory usage, or output quality. To tackle the diversity in system applications, weakness types and potential bugs, AIChilles combines deterministic workload-parameter extraction, agent-based constraint inference, differential oracles, and code-frequency coverage to discover diverse failures. Across five system applications and 30 AI-evolved programs, AIChilles finds 49 distinct hidden weaknesses. We also show that explicitly including AIChilles in the AI-driven development lifecycle can mitigate several of these weaknesses. Current browse context: cs.AI References & Citations Loading... Bibliographic and Citation Tools Bibliographic Explorer (What is the Explorer?) Connected Papers (What is Connected Papers?) Litmaps (What is Litmaps?) scite Smart Citations (What are Smart Citations?) Code, Data and Media Associated with this Article alphaXiv (What is alphaXiv?) CatalyzeX Code Finder for Papers (What is CatalyzeX?) DagsHub (What is DagsHub?) Gotit.pub (What is GotitPub?) Hugging Face (What is Huggingface?) ScienceCast (What is ScienceCast?) Demos Recommenders and Search Tools Influence Flower (What are Influence Flowers?) CORE Recommender (What is CORE?) arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
전문 보기

이 뉴스, 독자들은 어떻게 느꼈나요?

첫 반응을 남겨보세요

로그인하면 감정 반응에 참여할 수 있어요.

공식 발표 ↔ 진영별 보도

진보 성향 25%중도 성향 75%
13
보수 성향 미디어 보도 없음
기타·국영 45
🌐arXiv CS.AI
보는 중

AIChilles: Automatically Uncovering Hidden Weaknesses in AI-Evolved Systems

🇨🇳ECNS (China News Service)

Chinese AI models roll out World Cup prediction features

🇨🇳ECNS (China News Service)

World Cup predictions become new battleground for AI

🌐arXiv CS.AI

Trust Between AI Agents: Measuring Formation, Breakage, and Recovery, with Implications for Governing Multi-Agent Systems

🌐arXiv CS.AI

AI Engram: In Search of Memory Traces in Artificial Intelligence

🌐arXiv CS.AI

A Formal Framework for Declarative Agentic AI in Business Process Analysis

🌐arXiv CS.AI

APEX: Adaptive Principle EXtraction A Three-Layer Self-Evolution Framework for Production AI Agents

🌐arXiv CS.AI

Towards End-to-End Automation of AI Research

🌐arXiv CS.AI

Artificial Intelligence Index Report 2026

🌐arXiv CS.AI

TrustedARI: Towards Trust-Native Agentic Routing Infrastructure for Agentic AI

🌐arXiv CS.AI

AI Pluralism and the Worlds It Misses

🌐arXiv CS.AI

Architectural Wisdom: A Framework for Governing Optimization in AI Systems

🌐arXiv CS.AI

When Agent Automation Becomes Profitable: Quantifying and Insuring Autonomous AI Risk through Trace-Economic Underwriting

🌐arXiv CS.AI

The Integrator Advantage: Controlled Agentic AI for Small and Medium-Sized Companies

🌐arXiv CS.AI

Adaptive and Explicit safe: Triggering Latent Safety Awareness in Large Reasoning Models

🌐arXiv CS.AI

A Causal Model of Theory of Mind in Conflict for Artificial Intelligence

🌐arXiv CS.AI

The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers

🌐arXiv CS.AI

Agentomics: Economic Foundations for the Valuation, Attribution, and Pricing of AI Agents in Human-AI Workflows

🌐arXiv CS.AI

A Security Analysis of Long-Horizon Agentic AI Systems: Threats, Evaluation, and Framework Development

🌐arXiv CS.AI

Resilient Consensus in Agentic AI

🌐arXiv CS.AI

AI Contagion in Social Networks

🌐arXiv CS.AI

Cognitive Trajectory Modeling: Quantifying Human-AI Co-Creation through Cognitively Grounded Interaction Trajectories

🌐arXiv CS.AI

The Perils of Agency: How Developers Perceive, Prioritize, and Address Risks in Agentic AI Products

🌐arXiv CS.AI

CmdNeedle: Measuring the Incompleteness of Command Denylists for AI Agents

🌐arXiv CS.AI

Green SARC: Predictive Cost and Carbon Governance for Agentic AI Systems

🌐arXiv CS.AI

How to Detect and Measure the AI Dangers to Democracy

🌐arXiv CS.AI

Gaming-Resistant Insurance Contracts for Autonomous AI Agents: Strategy-Proof Toll Mechanism Design

🌐arXiv CS.AI

Lect\=uraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching

🌐arXiv CS.AI

AI systems out-persuade expert humans

🌐arXiv CS.AI

Using AI in engineering education: a balancing act, driven by clear purpose

🌐arXiv CS.AI

Beyond Models: Reflections on Engineering AI-enabled Systems in a Project-Based Course

🌐arXiv CS.AI

Computational Safety for Generative AI: A Hypothesis Testing Perspective

🌐arXiv CS.AI

A Model-Free Universal AI

🌐arXiv CS.AI

The Missing Knowledge Layer in Cognitive Architectures for AI Agents

🌐arXiv CS.AI

Emergent Strategic Reasoning Risks in AI: A Taxonomy-Driven Evaluation Framework

🌐arXiv CS.AI

Can We Stop Malicious AI? KILLBENCH: A Benchmark for External AI Kill Switch Feasibility

🌐arXiv CS.AI

Can Artificial Intelligence Accelerate Technological Progress? Researchers' Perspectives on AI in Manufacturing and Materials Science

🌐arXiv CS.AI

Critically Engaged Pragmatism: Scientific Norm and Social, Pragmatist Epistemology for AI Science Evaluation Tools

🌐arXiv CS.AI

Sustainable Materials Discovery in the Era of Artificial Intelligence

🌐arXiv CS.AI

Adaptive Memory Crystallization for Autonomous AI Agent Learning in Dynamic Environments

🌐arXiv CS.AI

Trust Without Trusting: A Recomputable Trust Protocol for Autonomous Agents

🌐arXiv Physics

AI as a Partner in Learning about, Doing, and Engaging with Science: Vigilance as the Key to Productive Augmentation

🌐arXiv Econ

Chaining Tasks, Redefining Work: A Theory of AI Automation

🌐arXiv Econ

U.S. Policies Unintentionally Accelerated China's Open AI Ecosystems

🇬🇧Phys.org

Unintended consequences: When AI backfires in the workplace for employees

관련 뉴스 제보는 로그인 후 가능합니다.