A time-series classification framework for individual-level absenteeism prediction under severe class imbalance
이 뉴스, 어떠셨어요?
한 번의 탭으로 반응을 남겨요 · 로그인 불필요
Abstract
Staff absenteeism imposes substantial operational costs in high-demand work environments such as healthcare, emergency services, meat processing, construction, and courier and delivery services, where proactive workforce planning depends on reliable individual-level absence prediction.
Existing regression and classification approaches share a structural limitation; they map features observed at time t to labels at the same time t, reproducing already-realised outcomes rather than predicting future events, and discard the sequential behavioural structure inherent in individual attendance histories.
We propose a Time Series Classification (TSC) framework that separates historical attendance sequences from future absence labels, enabling genuinely proactive prediction.
Due to the lack of public longitudinal attendance data, we construct a reproducible simulated dataset calibrated to the UCI dataset.
We analyse Binary Focal Loss (BFL) and Geometric Mean (G-Mean) loss under severe class imbalance using only the imbalance ratio $\rho$.
For BFL, the initial gradient ratio is $\rho\alpha/(1-\alpha)$, implying the balanced weight $\alpha = 1/(1+\rho) \approx 0.023$.
Experiments show that performance is governed mainly by $\alpha$, with BFL achieving specificity 0.813 and balanced accuracy 0.888, comparable to G-Mean.
Unlike BFL, G-Mean adapts automatically without parameter calibration.
Among three deep learning architectures evaluated, Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and the hybrid LSTM-Fully Convolutional Network (LSTM-FCN), the LSTM-FCN delivers strong precision and specificity.
Stable performance is obtained with batch sizes >= 64 and window sizes between 40-80 days, yielding balanced accuracy of approximately 80% on held-out test data.