Freedman–Diaconis rule: Difference between revisions
CSV import Tags: mobile edit mobile web edit |
CSV import |
||
| Line 1: | Line 1: | ||
{{Short description|A statistical rule for determining histogram bin width}} | |||
== | == Freedman–Diaconis rule == | ||
The Freedman–Diaconis rule is | The '''Freedman–Diaconis rule''' is a statistical method used to determine the optimal bin width for a [[histogram]]. This rule is particularly useful in [[descriptive statistics]] for creating histograms that accurately represent the underlying distribution of a dataset. | ||
[[File:Histogram-rules.png|thumb|right|300px|Comparison of different rules for determining histogram bin width, including the Freedman–Diaconis rule.]] | |||
== Formula == | |||
The Freedman–Diaconis rule calculates the bin width using the following formula: | |||
: \[ \text{Bin width} = 2 \times \frac{\text{IQR}}{\sqrt[3]{n}} \] | |||
\ | where: | ||
* \( \text{IQR} \) is the [[interquartile range]] of the data. | |||
* \( n \) is the number of observations in the dataset. | |||
The interquartile range is the difference between the third quartile (Q3) and the first quartile (Q1) of the data, and it measures the spread of the middle 50% of the data. | |||
The | |||
== | == Application == | ||
The Freedman–Diaconis rule is applied in the construction of histograms to ensure that the bins are neither too wide nor too narrow. This helps in avoiding over-smoothing or under-smoothing of the data distribution. By using the interquartile range, the rule is robust to [[outliers]] and provides a more reliable bin width compared to other methods such as [[Sturges' rule]] or the [[Scott's rule]]. | |||
==Related | == Advantages == | ||
* '''Robustness to Outliers''': The use of the interquartile range makes the Freedman–Diaconis rule less sensitive to outliers, which can skew the results of other binning methods. | |||
* '''Adaptability''': It adapts to the size of the dataset, providing a more accurate representation of the data distribution. | |||
== Limitations == | |||
* '''Computational Complexity''': Calculating the interquartile range can be computationally intensive for very large datasets. | |||
* '''Data Dependency''': The effectiveness of the rule depends on the distribution of the data, and it may not perform well for data with very irregular distributions. | |||
== Related pages == | |||
* [[Histogram]] | * [[Histogram]] | ||
* [[Interquartile range]] | * [[Interquartile range]] | ||
* [[Sturges' rule]] | * [[Sturges' rule]] | ||
* [[Scott's rule]] | * [[Scott's rule]] | ||
* [[ | * [[Descriptive statistics]] | ||
[[Category:Statistical rules]] | |||
[[Category:Descriptive statistics]] | |||
Latest revision as of 06:52, 16 February 2025
A statistical rule for determining histogram bin width
Freedman–Diaconis rule[edit]
The Freedman–Diaconis rule is a statistical method used to determine the optimal bin width for a histogram. This rule is particularly useful in descriptive statistics for creating histograms that accurately represent the underlying distribution of a dataset.

Formula[edit]
The Freedman–Diaconis rule calculates the bin width using the following formula:
- \[ \text{Bin width} = 2 \times \frac{\text{IQR}}{\sqrt[3]{n}} \]
where:
- \( \text{IQR} \) is the interquartile range of the data.
- \( n \) is the number of observations in the dataset.
The interquartile range is the difference between the third quartile (Q3) and the first quartile (Q1) of the data, and it measures the spread of the middle 50% of the data.
Application[edit]
The Freedman–Diaconis rule is applied in the construction of histograms to ensure that the bins are neither too wide nor too narrow. This helps in avoiding over-smoothing or under-smoothing of the data distribution. By using the interquartile range, the rule is robust to outliers and provides a more reliable bin width compared to other methods such as Sturges' rule or the Scott's rule.
Advantages[edit]
- Robustness to Outliers: The use of the interquartile range makes the Freedman–Diaconis rule less sensitive to outliers, which can skew the results of other binning methods.
- Adaptability: It adapts to the size of the dataset, providing a more accurate representation of the data distribution.
Limitations[edit]
- Computational Complexity: Calculating the interquartile range can be computationally intensive for very large datasets.
- Data Dependency: The effectiveness of the rule depends on the distribution of the data, and it may not perform well for data with very irregular distributions.