COMPASS: Grounding Composition-Intent Guidance in Unified Multimodal Models

arXiv CS.AI

이 뉴스, 어떠셨어요?

한 번의 탭으로 반응을 남겨요 · 로그인 불필요

CC BY

이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.

Abstract

Composition is a high-level visual intent that governs where subjects are placed and how a scene is organized, yet current unified multimodal models remain unreliable at fine-grained composition recognition and struggle to turn such intent into controllable generation.

We present COMPASS, the first unified multimodal framework that grounds composition-intent control in a single system spanning both composition perception and composition-guided generation, with a shared expert token $\tau_c$ as the central intent anchor.

On the perception side, COMPASS injects composition expertise into an MoE backbone in a minimally invasive manner and distills the inferred intent into $\tau_c$.

On the generation side, COMPASS reuses $\tau_c$ as a global conditioning signal that steers the denoising trajectory, effectively converting passive composition analysis into explicit layout control.

To support systematic instruction-following composition learning and evaluation at scale, we construct Comp-11, a large-scale dataset with an 11-class taxonomy and reasoning-augmented annotations.

Extensive experiments show that COMPASS substantially improves category-level composition understanding and delivers more composition-consistent, prompt-faithful generation than strong baselines.

전문 보기

COMPASS: Grounding Composition-Intent Guidance in Unified Multimodal Models

이 뉴스, 어떠셨어요?

Abstract

관련 뉴스

'research' 카테고리 뉴스

Recursive Self-Evolving Agents via Held-Out Selection

Data and Evaluation Closed-Loop for Model Capability Enhancement

GPTNT: Benchmarking Real-Time Collaboration Between Multimodal Agents on Keep Talking And Nobody Explodes

arXiv의 다른 기사

Aristotelian Virtue Profiling of LLMs through Ethical Dilemmas

An AI agent for treatment reasoning over a biomedical tool universe

BV-Blend: Uncertainty-Weighted Historical Baselines for Stable Critic-Free RL with Verifiable Rewards