Tree-Embedded Bayesian Factor Models for Multidimensional Categorical Distributions
이 뉴스, 어떠셨어요?
한 번의 탭으로 반응을 남겨요 · 로그인 불필요
Abstract
Analyzing data collected from multiple observational units to estimate common and heterogeneous structures through a hierarchical model is a central task in Bayesian inference, and to this end, Bayesian factor models are one of the most widely used tools for this purpose.
In this paper, we propose a novel Bayesian latent factor model for categorical distributions from grouped data, providing a parsimonious model for describing many observed distributions through lower-dimensional structures.
Grouped data arise in a wide range of applications in social science, for example, distributions of age composition and income observed across locations.
In these contexts, standard mixture models can be inefficient because the distributions do not necessarily exhibit clear clustering structures, and the distributions can be more accurately approximated as a combination of lower-dimensional characteristics.
To analyze distribution-valued data with the Bayesian factor analysis, we adopt a tree-based transformation that embeds distributions into a Euclidean space and construct a Bayesian latent factor model in the transformed space.
We develop the hierarchical model by incorporating the infinite factor model, which can adaptively estimate the number of effective factors.
In addition, we propose its generalization by incorporating a spatial dependence by introducing a prior based on a SAR model.
The proposed model provides smooth estimates of multivariate distributional structures, because once a tree-based transformation is applied, both univariate and multivariate distributions are essentially treated as the same Euclidean vectors.
Through numerical experiments using real population data, we demonstrate that the proposed model outperforms existing parametric and Bayesian nonparametric models in various scenarios involving smooth spatial variations, especially under small sample sizes.