Evaluation Metrics as Averaged Outcomes of Fair Gambles

arXiv Stat

이 뉴스, 어떠셨어요?

한 번의 탭으로 반응을 남겨요 · 로그인 불필요

CC BY

이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.

Abstract

In the current practices of machine learning, the evaluation of forecasts has become a cornerstone of scientific progress.

A multitude of evaluation metrics have been suggested and used to qualify "good" forecasts.

What do those metrics share?

How are they related?

In this work, we use a protocol borrowed from game-theoretic probability to show that a large part of evaluation metrics can be viewed as averaged outcomes of fair gambles.

Intuitively, a fair gambler is one which a forecaster would expect to fail.

Hence, the gambler's ability to gain disproves the quality of the forecast.

Standard evaluation metrics are then variants of choices of such fair gambles.

In particular, this choice is structured along two dimensions, one of which separates calibration-type and regret-type metrics.

In particular, this framework sheds light on the relationship of calibration and regret showing a theoretical equivalence in their ability to evaluate when being scaled appropriately, but the incomparability of obtained scores.

전문 보기

Evaluation Metrics as Averaged Outcomes of Fair Gambles

이 뉴스, 어떠셨어요?

Abstract

관련 뉴스

'research' 카테고리 뉴스

Correction: Defect induced improved capacitive performance of MnS incorporated MoO<sub>3</sub> nanocomposite for supercapacitor electrodes in aqueous electrolytes

Correction: Burden and predisposing factors of physical inactivity among adults in Africa: Systematic review and Meta-analysis

Potential of extracellular vesicle-derived microRNAs as a platform for biomarker discovery in acute lymphoblastic leukemia

arXiv의 다른 기사

RIFT-Bench: Dynamic Red-teaming For Agentic AI Systems

Neuro-Symbolic Drive: Rule-Grounded Faithful Reasoning for Driving VLAs

Critique of Agent Model