Ghost in the Kernel: In-Context Learning with Efficient Transformers via Domain Generalization

arXiv Stat

이 뉴스, 어떠셨어요?

한 번의 탭으로 반응을 남겨요 · 로그인 불필요

CC BY

이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.

Abstract

Transformer-based large models have demonstrated remarkable generalization abilities across different tasks by leveraging a context-aware attention module for in-context learning.

With richer context, transformers adapt more effectively to the current use case without any parameter updates.

However, the quadratic computational and memory complexity with respect to context length significantly slows data processing in softmax transformers.

Linear transformers were proposed to address this issue by reducing the complexity to linear dependence on context length, but the design and understanding of the feature mapping in linear attention, from a theoretical viewpoint, remain unclear.

In this paper, we investigate the approximation and generalization abilities of linear transformers under a two-staged sampling process from domain generalization.

We show that linear transformers perform in-context learning as learning a mapping from context distributions to response functions.

A dimension-independent convergence rate is obtained for our generalization analysis, which also exhibits the tradeoff between the regularities of data distributions and latent features.

Guided by our theoretical framework, we propose a new perspective on activation and loss design for linearizing pretrained softmax large language models.

전문 보기

Ghost in the Kernel: In-Context Learning with Efficient Transformers via Domain Generalization

이 뉴스, 어떠셨어요?

Abstract

관련 뉴스

'research' 카테고리 뉴스

Constructive Alignment: Governing Preference Dynamics in Human-AI Interaction

Bounded Morality: Defining the Space of Moral Computation

The MMM Data Model -- A Normative Specification for Knowledge Interoperability in a Decentralisable Knowledge Commons

arXiv의 다른 기사

RareDxR1: Autonomous Medical Reasoning for Rare Disease Diagnosis Beyond Human Annotation

A Contextual-Bandit Oversight Game with Two-Sided Informational Asymmetry

Constructing Epistemic AI Literacy: Detecting Epistemic Aims and Processes in Student-AI Co-Programming