Pearson's chi-squared test: Difference between revisions

From WikiMD's Wellness Encyclopedia

CSV import
Tags: mobile edit mobile web edit
 
CSV import
Tags: mobile edit mobile web edit
 
Line 41: Line 41:


{{Statistics-stub}}
{{Statistics-stub}}
<gallery>
File:Chi-square_distributionCDF-English.png|Cumulative distribution function of the chi-square distribution
</gallery>

Latest revision as of 21:58, 16 February 2025

Pearson's chi-squared test (Chi-squared test), also known as the chi-square goodness-of-fit test or chi-square test for independence, is a statistical hypothesis test used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories of a contingency table. In the field of statistics, it is one of the most common tests for analyzing categorical data.

Overview[edit]

The test is applicable in situations where the data can be categorized into a contingency table, and the sample size is sufficiently large. The chi-squared test provides a method to gauge the discrepancy between observed and expected frequencies under the null hypothesis that no difference exists. It was developed by Karl Pearson in the early 20th century, hence the name.

Assumptions[edit]

Before applying Pearson's chi-squared test, certain assumptions must be met:

  • Observations are independently drawn from the population.
  • The sample size is large enough. As a rule of thumb, all expected counts should be at least 5.
  • The data are categorical rather than numerical.

Calculation[edit]

The test statistic is calculated as: \[ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} \] where \(O_i\) is the observed frequency for category \(i\), \(E_i\) is the expected frequency for category \(i\), and the summation is over all categories.

Applications[edit]

Pearson's chi-squared test is widely used in two major scenarios:

  • Goodness-of-fit test: To determine how well an observed distribution fits with an expected distribution.
  • Test of independence: To determine if there is a significant association between two categorical variables.

Limitations[edit]

While widely used, the test has limitations:

  • It is not suitable for small sample sizes.
  • It can only be used on categorical data.
  • The test is sensitive to the sample size, meaning that with very large samples, even trivial differences can appear statistically significant.

Examples[edit]

An example of a goodness-of-fit test would be comparing the observed color distribution of M&Ms to the expected distribution claimed by the manufacturer. For a test of independence, one might analyze data from a survey to see if there is an association between gender and preference for a particular type of music.

See Also[edit]

References[edit]

<references/>


Stub icon
   This article is a statistics-related stub. You can help WikiMD by expanding it!