Menu
×
   ❮   
HTML CSS JAVASCRIPT SQL PYTHON JAVA PHP HOW TO W3.CSS C C++ C# BOOTSTRAP REACT MYSQL JQUERY EXCEL XML DJANGO NUMPY PANDAS NODEJS R TYPESCRIPT ANGULAR GIT POSTGRESQL MONGODB ASP AI GO KOTLIN SASS VUE DSA GEN AI SCIPY AWS CYBERSECURITY DATA SCIENCE
     ❯   

Statistics - Variation


Variation is a measure of how spread out the data is around the center of the data.


The Variation of the Data

Measures of variation are statistics of how far away the values in the observations (data points) are from each other.

There are different measures of variation. The most commonly used are:

Measures of variation combined with an average (measure of center) gives a good picture of the distribution of the data.

Note: These measures of variation can only be calculated for numerical data.


Range

The range is the difference between the smallest and the largest value of the data.

Range is the simplest measure of variation.

Here is a histogram of the age of all 934 Nobel Prize winners up to the year 2020, showing the range:

Histogram of the age of Nobel Prize winners with range shown between the minimum and maximum values.

The youngest winner was 17 years and the oldest was 97 years. The range of ages for Nobel Prize winners is then 80 years.


Quartiles and Percentiles

Quartiles and percentiles are ways of separating equal numbers of values in the data into parts.

Quartiles are values that separate the data into four equal parts.

Percentiles are values that separate the data into 100 equal parts.

Here is a histogram of the age of all 934 Nobel Prize winners up to the year 2020, showing the quartiles:

Histogram of the age of Nobel Prize winners with quartiles shown.

The quartiles (Q0,Q1,Q2,Q3,Q4) are the values that separate each quarter.

Between Q0 and Q1 are the 25% lowest values in the data. Between Q1 and Q2 are the next 25%. And so on.

  • Q0 is the smallest value in the data.
  • Q2 is the middle value (median).
  • Q4 is the largest value in the data.


Interquartile Range

Interquartile range is the difference between the first and third quartiles (Q1 and Q3).

The 'middle half' of the data is between the first and third quartile.

Here is a histogram of the age of all 934 Nobel Prize winners up to the year 2020, showing the interquartile range (IQR):

Histogram of the age of Nobel Prize winners with interquartile range shown.

Here, the middle half of is between 51 and 69 years. The interquartile range for Nobel Prize winners is then 18 years.


Standard Deviation

Standard deviation is the most used measure of variation.

Standard deviation (σ) measures how far a 'typical' observation is from the average of the data (μ).

Standard deviation is important for many statistical methods.

Here is a histogram of the age of all 934 Nobel Prize winners up to the year 2020, showing standard deviations:

Histogram of the age of Nobel Prize winners with interquartile range shown.

Note: Values within one standard deviation (σ) are considered to be typical.

Values outside three standard deviations are considered to be outliers.