Improving Efficiency of Regression Analyses by Integrating Data from Population-Representative Surveys: A Model-Assisted Calibration Approach
이 뉴스, 어떠셨어요?
한 번의 탭으로 반응을 남겨요 · 로그인 불필요
Abstract
The increasing availability of diverse data sources has motivated great interest in data integration for improving regression efficiency.
Existing data integration methods primarily focus on integrating nonprobability samples and typically assume that the integrated data sources represent the same target population.
While this assumption is often difficult to justify for nonprobability samples, it is naturally satisfied when integrating probability-based surveys designed to represent a common target population.
Such surveys are important research data sources because they provide representative samples and collect rich information on diverse variables, making them well suited to data integration.
However, existing integration methods do not accommodate complex sampling designs.
We propose model-assisted calibration methods to improve regression efficiency by integrating multiple probability-based survey samples.
The proposed framework accommodates settings in which either individual-level data or only summary statistics are available from external surveys while preserving valid finite-population inference without requiring correct specification of the outcome model.
We establish the design consistency of the proposed estimators and develop Taylor linearization variance estimators accounting for the complex sampling designs of both surveys.
Simulation studies and an application integrating National Health and Nutrition Examination Survey and National Health Interview Survey demonstrate substantial efficiency gains while maintaining valid finite-population inference.