Kolmogorov–Smirnov test: Difference between revisions

From WikiMD's Wellness Encyclopedia

CSV import
Tags: mobile edit mobile web edit
 
CSV import
 
Line 31: Line 31:


{{Statistics-stub}}
{{Statistics-stub}}
<gallery>
File:KS_Example.png|Example of a Kolmogorov–Smirnov test
File:KolmogorovDistrPDF.png|Probability density function of the Kolmogorov distribution
File:KS2_Example.png|Another example of a Kolmogorov–Smirnov test
</gallery>

Latest revision as of 01:56, 18 February 2025

Kolmogorov–Smirnov test (K–S test) is a nonparametric test used in statistics to compare a sample with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test). It is named after Andrey Kolmogorov and Nikolai Smirnov. The K–S test quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution, or between the empirical distribution functions of two samples. The null hypothesis of the test is that the sample comes from the same distribution as the reference distribution (in the one-sample case), or that the two samples come from the same distribution (in the two-sample case).

Definition[edit]

The Kolmogorov–Smirnov statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution, or between the empirical distribution functions of two samples. For a one-sample K–S test, the test statistic is:

\[D_n = \sup_x |F_n(x) - F(x)|\]

where \(F_n(x)\) is the empirical distribution function of the sample and \(F(x)\) is the cumulative distribution function of the reference distribution. For a two-sample K–S test, the test statistic is:

\[D_{n,m} = \sup_x |F_{n}(x) - G_{m}(x)|\]

where \(F_{n}(x)\) and \(G_{m}(x)\) are the empirical distribution functions of the two samples, with \(n\) and \(m\) being the sizes of the samples, respectively.

Applications[edit]

The K–S test is widely used in situations where the form of the distribution is not known and for comparing the goodness-of-fit of empirical data to a theoretical model. It is particularly useful in the fields of statistics, economics, psychology, and environmental science, among others.

Advantages and Limitations[edit]

One of the main advantages of the K–S test is its nonparametric nature, meaning it does not assume a specific distribution for the data. However, the test has less power than some alternatives, such as the Anderson-Darling test, especially for small sample sizes or when the differences between distributions are in the tails.

Implementation[edit]

The K–S test has been implemented in various statistical software packages, including R, Python's SciPy library, and MATLAB. These implementations typically provide functions to perform both one-sample and two-sample tests, along with options to adjust for the effect of discrete data or ties.

See Also[edit]


Stub icon
   This article is a statistics-related stub. You can help WikiMD by expanding it!