A unified approach to outlier identification for mixed-type data

arXiv Stat

이 뉴스, 어떠셨어요?

한 번의 탭으로 반응을 남겨요 · 로그인 불필요

CC BY

이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.

Abstract

We present an outlier identification method for mixed type data sets comprising continuous and ordinal variables.

We define outliers based on using a multivariate Gaussian distribution as reference distribution for non-outliers, with a latent Gaussian assumed for ordinal variables.

The proposed algorithm is based on the robust Minimum Covariance Determinant estimator for estimating the parameters of the multivariate Gaussian for the non-outliers.

This is extended to account for the fact that the full Gaussian information underlying the ordinal variables is not observed.

A breakdown theorem shows that replacing observations will noty stop extreme enough outliers from being identified.

The effectiveness of our approach is demonstrated via simulations on synthetic data with various types of contamination, achieving high detection and low false positive rates.

Practical relevance is illustrated through an application to Airbnb listing data containing both continuous and ordinal attributes.

전문 보기

A unified approach to outlier identification for mixed-type data

이 뉴스, 어떠셨어요?

Abstract

관련 뉴스

'research' 카테고리 뉴스

Detecting and Controlling Sycophancy with Cascading Linear Features

Life After Benchmark Saturation: A Case Study of CORE-Bench

Refusal Lives Downstream of Persona in Chat Models

arXiv의 다른 기사

Knowledge-augmented Agentic AI for Mental Health Medication Information Seeking

Accelerating Skill Assessment in Chess: A Drift-Diffusion-Enhanced Elo Rating System

Governing Actions, Not Agents: Institutional Attestation as a Governance Model for Autonomous AI Systems