Dialogue to Detection: A Multimodal Hybrid NLP Pipeline for Insurance Fraud Detection
이 뉴스, 어떠셨어요?
한 번의 탭으로 반응을 남겨요 · 로그인 불필요
Abstract
Insurance fraud imposes substantial financial losses and operational inefficiencies, raising premiums and impacting trust among legitimate policyholders.
Early detection at FNOL remains a persistent challenge.
Existing approaches rely largely on private, text-only datasets, limiting progress on multimodal methods that integrate linguistic, behavioural, and speaker-based indicators.
We introduce a synthetic multimodal framework that replicates FNOL conditions.
It generates agent-customer dialogue transcripts and two-speaker audios, performs ASR and diarisation.
Downstream modules combine NER, regex-based feature extraction, LLM-RAG retrieval, and speaker embeddings in a rule-based risk score to flag narrative reuse, structural inconsistencies, and cross-case voice repetition while balancing sensitivity and false positives.
Dataset validation and component-level evaluations show stability and transfer potential, offering a reproducible baseline beyond text-only fraud detection.