Efficient and Stable Multi-Dimensional Kolmogorov-Smirnov Distance
이 뉴스, 어떠셨어요?
한 번의 탭으로 반응을 남겨요 · 로그인 불필요
Abstract
We revisit extending the Kolmogorov-Smirnov distance between probability distributions to the multi-dimensional setting, and make new arguments about the proper way to approach this generalization.
Our proposed formulation maximizes the difference over orthogonal dominating rectangular ranges (d-sided rectangles in R^d), and is an integral probability metric.
We also prove that the distance between a distribution and a sample from the distribution converges to 0 as the sample size grows, and bound this rate.
Moreover, we show that one can, up to this same approximation error, compute the distance efficiently in 4 or fewer dimensions; specifically, the runtime is near-linear in the size of the sample needed for that error.
With this, we derive a delta-precision two-sample hypothesis test using this distance.
Finally, we show these metrics and approximation properties do not hold for other popular variants.