RISED: A Pre-Deployment Evaluation Framework for High-Stakes AI Decision-Support Systems, with Application to Healthcare

Computer Science > Machine Learning [Submitted on 13 May 2026 (v1), last revised 30 May 2026 (this version, v2)] Title:RISED: A Pre-Deployment Evaluation Framework for High-Stakes AI Decision-Support Systems, with Application to Healthcare View PDF HTML (experimental)Abstract:Clinical decision-support systems are expert systems whose recommendations clinicians act on directly, yet they are usually cleared on one aggregate accuracy number from a held-out test set. That number says nothing about input reliability under encoding shifts, subgroup gaps, threshold sensitivity, or operational feasibility. We present RISED, a pre-deployment evaluation framework operationalising five dimensions (Reliability, Inclusivity, Sensitivity, Equity, Deployability) through BCa bootstrap 95% confidence intervals, literature-grounded thresholds, and Holm-Bonferroni-corrected PASS / FAIL / INCONCLUSIVE verdicts; Equity is a proxy-dependence diagnostic rather than a gating test. Applied to seven cohorts spanning 35 years (n from 303 to 99,492), RISED surfaces failures invisible to AUROC: on Diabetes 130, Reliability passes by three orders of magnitude (PSS = 0.0004) while Inclusivity (AUC parity gap = 0.262) and Sensitivity (max threshold-flip rate 49.1%) fail decisively; both NHIS cohorts reproduce this. NHANES 2021-2023, with a complete feature profile, achieves INCONCLUSIVE verdicts; BRFSS 2024 produces the suite's most severe Sensitivity failure (max threshold-flip rate 64.2%) after instrument rotation removed hypertension and cholesterol. The pattern recurs on credit- and income-prediction cohorts, confirming domain-agnosticity; a multi-model check shows the failures are data-driven, not model-specific. RISED ships as an open-source Python package complementing TRIPOD+AI, FUTURE-AI, and Fairlearn with the structured numerical evidence those standards require but do not prescribe. Submission history From: Rohith Reddy Bellibatlu [view email][v1] Wed, 13 May 2026 02:17:13 UTC (93 KB) [v2] Sat, 30 May 2026 04:52:59 UTC (69 KB) Current browse context: cs.LG References & Citations Loading... Bibliographic and Citation Tools Bibliographic Explorer (What is the Explorer?) Connected Papers (What is Connected Papers?) Litmaps (What is Litmaps?) scite Smart Citations (What are Smart Citations?) Code, Data and Media Associated with this Article alphaXiv (What is alphaXiv?) CatalyzeX Code Finder for Papers (What is CatalyzeX?) DagsHub (What is DagsHub?) Gotit.pub (What is GotitPub?) Hugging Face (What is Huggingface?) ScienceCast (What is ScienceCast?) Demos Recommenders and Search Tools Influence Flower (What are Influence Flowers?) CORE Recommender (What is CORE?) IArxiv Recommender (What is IArxiv?) arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

관련 뉴스

'research' 카테고리 뉴스

Correction: Oropouche infection in Peruvian patients: A systematic review and meta-analysis

Correction: Impact of different blood pressure targets on cerebral hemodynamics in septic shock: A prospective pilot study protocol—SEPSIS-BRAIN

Tumor hypoxia is associated with global copy-number alteration burden and subtype-dependent overall survival in breast cancer: Evidence from TCGA and METABRIC

arXiv의 다른 기사

Position Paper: Post-Solve Robustness in Decision Engines: Feasible Regions and Smoothness Under Perturbations

Emergent Collaborative Deliberation in Multi-Model AI Systems: A BFT-Derived Protocol for Epistemic Synthesis

Deliberative Curation: A Protocol for Multi-Agent Knowledge Bases