Composition as Direction: An Active-Set Ray-Based Model for Sparse High-Dimensional Compositional Data
이 뉴스, 어떠셨어요?
한 번의 탭으로 반응을 남겨요 · 로그인 불필요
Abstract
[Working Draft] Compositional data are central to microbial, ecological, and environmental research, yet often have four features that are difficult to accommodate jointly: exact zeros, latent dependence among components, high-dimensionality, and a unit-sum constraint that induces a non-Euclidean geometry.
Conventional Dirichlet-type and logistic-normal models address these features only partially.
Projected Gaussian models offer a directional representation that captures exact zeros and latent dependence; however, support correctness on the simplex requires either truncation or folding, both of which become computationally prohibitive as the dimension grows.
We develop an Active-set Ray-based Compositional (ARC) framework, which retains the benefits of projected Gaussian models while remaining computationally feasible in high-dimensional settings.
In this framework, we map compositions to the nonnegative orthant of the unit hypersphere and specify an active-set process that governs which components are present.
Conditional on the active set, the positive subcomposition is modeled by evaluating a latent Gaussian density along positive rays of the active subspace with the radius treated as an auxiliary variable.
Such a construction (i) separates the active-set process that governs which components are present from the positive subcomposition on the active components, (ii) preserves a latent Gaussian interpretation, and (iii) accommodates arbitrary latent dependence.
Thus, the framework is conducive to high-dimensional applications in which exact zeros and shared positive responses are scientifically central.
Conceptually, the proposed framework reframes a composition as an observed direction of a latent abundance vector with an unobserved magnitude and an explicitly modeled active set.