SGD at the Edge of Stability: Stochastic Stabilization with Large Learning Rates

arXiv Math

이 뉴스, 어떠셨어요?

한 번의 탭으로 반응을 남겨요 · 로그인 불필요

CC BY

이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.

Abstract

Modern deep learning has been shown to operate at the edge of stability, routinely using learning rates far larger than those justified by classical optimization theory.

Most prior analyses of the edge of stability phenomenon focus on deterministic gradient descent, leaving the stochastic setting largely unexplored.

In this work, we provide sharp convergence guarantees for Stochastic Gradient Descent (SGD) applied to the multiclass cross-entropy loss, for both linear classifiers and two-layer neural networks.

We show that the stochasticity of SGD may cause the dynamics to alternate between an edge-of-stability regime that is dominated by curvature-driven oscillations, and a stable regime in which the expected loss decreases at a controlled rate.

Despite that, we prove that SGD self-stabilizes the dynamics, ensuring that the iterates return to stability in a fixed number of iterations and allowing convergence in the best-iterate sense even with large learning rates.

Experiments validate our theoretical findings and illustrate the benefits of SGD in the large-stepsize regime.

전문 보기

SGD at the Edge of Stability: Stochastic Stabilization with Large Learning Rates

이 뉴스, 어떠셨어요?

Abstract

관련 뉴스

'research' 카테고리 뉴스

What Drives Interactive Improvement from Feedback?

Contrastive Reflection for Iterative Prompt Optimization

How Can AI Find My Model? A Model-Finding Experimental Study Considering Data Formats, Embeddings, and Retrieval Strategies

arXiv의 다른 기사

Beyond expert users: agents should help users construct preferences, not just elicit them

Investigating Multi-Agent Deliberation in Law

Why Solve It Twice? Hierarchical Accumulation of Skills for Transfer-Efficient ML Engineering