Small edits, large models: How Wikipedia advocacy shapes LLM values

arXiv CS.AI

이 뉴스, 어떠셨어요?

한 번의 탭으로 반응을 남겨요 · 로그인 불필요

CC BY

이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.

Abstract

Can a small group of volunteers shape how AI systems discuss animal welfare, just by editing Wikipedia?

We show that they can.

Wikipedia appears in nearly every major language model training dataset and is weighted more heavily than web-crawled text.

The Pro-Animal Wikipedians (PAW), a group of advocates who add sourced animal welfare content to relevant articles, have made 125 edits across 115 pages.

Using gradient-based data attribution (Bergson; MAGIC), we traced how these edits influence language model behavior.

TrackStar retrieval attribution on Llama 3.1 8B found that PAW-edited sections made up 68 percent of the highest-attributed documents for animal welfare queries (p < 0.0001) but only 52 percent for unrelated queries about the same companies (p = 0.53): the model links PAW content specifically to animal welfare topics, not to the entities in general.

MAGIC counterfactual influence estimation on Llama-3.2-1B, run across five random training-order seeds, gave the same picture even more sharply: in every seed, the top-10 most influential documents on animal welfare queries were all PAW edits (10 of 10, 5 of 5 seeds), while on general queries the same top-10 sat at chance (4 to 6 of 10).

Mean PAW influence exceeded mean control influence on animal welfare queries with p < 0.0001 in every seed, an effect 6 to 30 times larger than on general queries.

Leave-subset-out validation gave Spearman rho = 1.00 for all 10 runs.

When we fine-tuned separate models on PAW content versus control content, each model performed better specifically on the type of text it was trained on: the PAW-trained model cut perplexity on animal welfare text from 12.4 to 8.4, while the control-trained model cut perplexity on control text from 16.1 to 11.4.

A small, coordinated Wikipedia editing campaign therefore measurably shapes how language models handle the topics those edits address.

전문 보기

Small edits, large models: How Wikipedia advocacy shapes LLM values

이 뉴스, 어떠셨어요?

Abstract

관련 뉴스

'research' 카테고리 뉴스

Detecting and Controlling Sycophancy with Cascading Linear Features

Life After Benchmark Saturation: A Case Study of CORE-Bench

Refusal Lives Downstream of Persona in Chat Models

arXiv의 다른 기사

Knowledge-augmented Agentic AI for Mental Health Medication Information Seeking

Accelerating Skill Assessment in Chess: A Drift-Diffusion-Enhanced Elo Rating System

Governing Actions, Not Agents: Institutional Attestation as a Governance Model for Autonomous AI Systems