Output-Space Allocation Costs for Calibration-Guided LLM Compression: An Empirical Study
이 뉴스, 어떠셨어요?
한 번의 탭으로 반응을 남겨요 · 로그인 불필요
Abstract
Training-free compression methods for large language models (LLMs) often use calibration data to guide compression decisions.
ROCKET, a recent method combining sparse-dictionary factorization with multi-choice knapsack problem (MCKP) allocation, derives its per-layer factorization from an output reconstruction objective but uses weight-space Frobenius error as the MCKP allocation cost.
We investigate whether aligning the allocation cost with the output-space objective improves compressed model fidelity.
On Qwen3-8B at 50\% compression, our ROCKET-ActCost achieves +0.8 percentage points higher average accuracy across 8 zero-shot benchmarks (53.1\% vs 52.3\%), but increases WikiText perplexity by 16\% (61.46 vs 52.98).
This accuracy-perplexity tradeoff reveals that different allocation objectives favor different downstream metrics.
The high correlation ($>$0.99) between weight-space and output-space errors limits allocation divergence, explaining the modest effect size.
On Llama-3.2-1B at 20\% compression, the two methods produce near-identical results (53.3\% vs 53.5\% accuracy, 14.45 vs 14.66 PPL), suggesting that the effect of the cost function is minor at lower compression ratios.