Bilevel Data Curation for LLM Fine-tuning: Offline Selection and Online Self-Refining Generation

arXiv Math

이 뉴스, 어떠셨어요?

한 번의 탭으로 반응을 남겨요 · 로그인 불필요

CC BY

이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.

Abstract

Supervised fine-tuning (SFT) datasets are critical to the downstream performance of large language models, yet they often contain low-quality or harmful question-response pairs.

To improve SFT data quality, we develop a unified bilevel framework that combines offline data selection with the online self-refining generation.

In the offline setting, bilevel data selection (BDS) selects question-response pairs from the offline SFT dataset to maximize the validation performance.

We theoretically show that the optimal model given by BDS outperforms direct data mixing approach in useful data coverage.

Moreover, we provide a global convergence analysis for gradient-based BDS approach for one-layer Transformer, showing that the epsilon-global optimum of offline BDS is achievable in finite time.

Although efficient, offline BDS discards potentially harmful questions together with responses, thereby reducing question diversity.

We address this limitation by refining the responses to selected questions using online self-refining generation framework.

However, BDS is inefficient to update the response weights when responses are regenerated online.

To address this issue, we introduce bilevel multi-objective optimization (BMO) for response-level weighting.

We show that BMO recovers the same validation-aligned solution as BDS, but admits a closed-form importance-ratio weight that adapts to regenerated responses.

Experiments on LLM quality enhancement and safety-aware fine-tuning demonstrate that the proposed framework consistently improves both data quality and downstream fine-tuning performance.

전문 보기

Bilevel Data Curation for LLM Fine-tuning: Offline Selection and Online Self-Refining Generation

이 뉴스, 어떠셨어요?

Abstract

관련 뉴스

'research' 카테고리 뉴스

Rise Time Effects of a Portable Inductive Energy Storage Pulse Generator on NO Production in Spark Discharges

ConSolv: Solvent-Conditional Machine Learning Implicit Solvent Potential

Machine Learning Approaches for Improved Scalability of Metallic Magnetic Calorimeters

arXiv의 다른 기사

Machine learning is revolutionizing weather forecasting -- the next step is a change in how we work

Liquid Jet in Crossflow: Review of Breakup modes and Injector Geometry Effects

Slow Extraction Beam Commissioning for the Mu2e Experiment at Fermilab