Statistics - Variation
Variation is a measure of how spread out the data is around the centre of the data.
The Variation of the Data
Measures of variation are statistics of how far away the values in the observations (data points) are from each other.
There are different measures of variation. The most commonly used are:
Measures of variation combined with an average (measure of centre) gives a good picture of the distribution of the data.
Note: These measures of variation can only be caclucated for numerical data.
The range is the difference between the smallest and the largest value of the data.
Range is the simplest measure of variation.
Here is a histogram of the age of all 934 Nobel Prize winners up to the year 2020, showing the range:
The youngest winner was 17 years and the oldest was 97 years. The range of ages for Nobel Prize winners is then 80 years.
Quartiles and Percentiles
Quartiles and percentiles are ways of separating equal numbers of values in the data into parts.
Quartiles are values that separate the data into four equal parts.
Percentiles are values that separate the data into 100 equal parts.
Here is a histogram of the age of all 934 Nobel Prize winners up to the year 2020, showing the quartiles:
The quartiles (Q0,Q1,Q2,Q3,Q4) are the values that separate each quarter.
Between Q0 and Q1 are the 25% lowest values in the data. Between Q1 and Q2 are the next 25%. And so on.
- Q0 is the smallest value in the data.
- Q2 is the middle value (median).
- Q4 is the largest value in the data.
Interquartile range is the difference between the first and third quartiles (Q1 and Q3).
The 'middle half' of the data is between the first and third quartile.
Here is a histogram of the age of all 934 Nobel Prize winners up to the year 2020, showing the interquartile range (IQR):
Here, the middle half of is between 51 and 69 years. The interquartile range for Nobel Prize winners is then 18 years.
Standard deviation is the most used measure of variation.
Standard deviation (σ) measures how far a 'typical' observation is from the average of the data (μ).
Standard deviation is important for many statistical methods.
Here is a histogram of the age of all 934 Nobel Prize winners up to the year 2020, showing standard deviations:
Note: Values within one standard deviation (σ) are considered to be typical.
Values outside three standard deviations are considered to be outliers.