An adaptive subsampling method for large-sample feature screening

arXiv Stat

이 뉴스, 어떠셨어요?

한 번의 탭으로 반응을 남겨요 · 로그인 불필요

CC BY

이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.

Abstract

We consider the sure independence screening (SIS) method, a standard feature screening approach that aims to eliminate non-informative features in ultrahigh-dimensional datasets.

Although effective, SIS incurs a computational cost of order $O(np)$ for a predictor matrix of size $n\times p$, which can be prohibitively expensive when both n and p are considerable.

Motivated by the multi-armed bandit (MAB) problem, we propose a more computationally efficient feature screening algorithm that reduces the cost to $O(\sqrt{n}p)$.

The core idea is to progressively increase the subsample size and eliminate variables with small empirical marginal Pearson correlations, thereby avoiding unnecessary computation on unpromising features.

We develop a new interpretable statistical theoretical analysis that characterizes how the subsample size affects screening accuracy, thereby revealing the balance between computational efficiency and statistical reliability.

Moreover, we show that the proposed method retains the sure screening property under mild regularity conditions.

Extensive numerical experiments on synthetic and real-world datasets show that BanditSIS achieves screening and prediction performance comparable to SIS while substantially reducing computational time.

Our method offers a scalable and adaptive alternative to SIS, particularly well-suited for large-sample, high-dimensional applications where computational efficiency is critical.

전문 보기

An adaptive subsampling method for large-sample feature screening

이 뉴스, 어떠셨어요?

Abstract

관련 뉴스

'research' 카테고리 뉴스

Correction: Defect induced improved capacitive performance of MnS incorporated MoO<sub>3</sub> nanocomposite for supercapacitor electrodes in aqueous electrolytes

Correction: Burden and predisposing factors of physical inactivity among adults in Africa: Systematic review and Meta-analysis

Potential of extracellular vesicle-derived microRNAs as a platform for biomarker discovery in acute lymphoblastic leukemia

arXiv의 다른 기사

RIFT-Bench: Dynamic Red-teaming For Agentic AI Systems

Neuro-Symbolic Drive: Rule-Grounded Faithful Reasoning for Driving VLAs

Critique of Agent Model