# Statistics - Hypothesis Testing

Hypothesis testing is a formal way of checking if a hypothesis about a population is true or not.

## Hypothesis Testing

A **hypothesis** is a claim about a population parameter.

A **hypothesis test** is a formal procedure to check if a hypothesis is true or not.

Examples of claims that can be checked:

```
```The average height of people in Denmark is **more** than 170 cm.

The share of left handed people in Australia is **not** 10%.

The average income of dentists is **less** the average income of lawyers.

## The Null and Alternative Hypothesis

Hypothesis testing is based on making two different claims about a population parameter.

The **null** hypothesis (\(H_{0} \)) and the **alternative** hypothesis (\(H_{1}\)) are the claims.

The two claims needs to be **mutually exclusive**, meaning only one of them can be true.

The alternative hypothesis is typically what we are trying to prove.

For example, we want to check the following claim:

```
```"The average height of people in Denmark is more than 170 cm."

In this case, the **parameter** is the average height of people in Denmark (\(\mu\)).

The null and alternative hypothesis would be:

```
```**Null hypothesis**: The average height of people in Denmark **is** 170 cm.

**Alternative hypothesis**: The average height of people in Denmark is **more** than 170 cm.

The claims are often expressed with symbols like this:

```
```\(H_{0}\): \(\mu = 170 \: cm \)

\(H_{1}\): \(\mu > 170 \: cm \)

If the data supports the alternative hypothesis, we **reject** the null hypothesis and **accept** the alternative hypothesis.

If the data does **not** support the alternative hypothesis, we **keep** the null hypothesis.

**Note:** The alternative hypothesis is also referred to as (\(H_{A} \)).

## The Significance Level

The significance level (\(\alpha\)) is the **uncertainty** we accept when rejecting the null hypothesis in the hypothesis test.

The significance level is a percentage probability of accidentally making the wrong conclusion.

Typical significance levels are:

- \(\alpha = 0.1\) (10%)
- \(\alpha = 0.05\) (5%)
- \(\alpha = 0.01\) (1%)

A lower significance level means that the evidence in the data needs to be stronger to reject the null hypothesis.

There is no "correct" significance level - it only states the uncertainty of the conclusion.

**Note:** A 5% significance level means that when we reject a null hypothesis:

We expect to reject a **true** null hypothesis 5 out of 100 times.

## The Test Statistic

The test statistic is used to decide the outcome of the hypothesis test.

The test statistic is a **standardized** value calculated from the sample.

Standardization means converting a statistic to a well known **probability distribution**.

The type of probability distribution depends on the type of test.

Common examples are:

- Standard Normal Distribution (Z): used for Testing Population Proportions
- Student's T-Distribution (T): used for Testing Population Means

**Note:** You will learn how to calculate the test statistic for each type of test in the following chapters.

## The Critical Value and P-Value Approach

There are two main approaches used for hypothesis tests:

- The
**critical value**approach compares the test statistic with the critical value of the significance level. - The
**p-value**approach compares the p-value of the test statistic and with the significance level.

### The Critical Value Approach

The critical value approach checks if the test statistic is in the **rejection region**.

The rejection region is an area of probability in the tails of the distribution.

The size of the rejection region is decided by the significance level (\(\alpha\)).

The value that separates the rejection region from the rest is called the **critical value**.

Here is a graphical illustration:

If the test statistic is **inside** this rejection region, the null hypothesis is **rejected**.

For example, if the test statistic is 2.3 and the critical value is 2 for a significance level (\(\alpha = 0.05\)):

```
```We reject the null hypothesis (\(H_{0} \)) at 0.05 significance level (\(\alpha\))

### The P-Value Approach

The p-value approach checks if the p-value of the test statistic is **smaller** than the significance level (\(\alpha\)).

The p-value of the test statistic is the area of probability in the tails of the distribution from the value of the test statistic.

Here is a graphical illustration:

If the p-value is **smaller** than the significance level, the null hypothesis is **rejected**.

The p-value directly tells us the **lowest significance level** where we can reject the null hypothesis.

For example, if the p-value is 0.03:

```
```We reject the null hypothesis (\(H_{0} \)) at a 0.05 significance level (\(\alpha\))

We keep the null hypothesis (\(H_{0}\)) at a 0.01 significance level (\(\alpha\))

**Note:** The two approaches are only different in how they present the conclusion.

## Steps for a Hypothesis Test

The following steps are used for a hypothesis test:

- Check the conditions
- Define the claims
- Decide the significance level
- Calculate the test statistic
- Conclusion

One **condition** is that the sample is randomly selected from the population.

The other conditions depends on what type of parameter you are testing the hypothesis for.

Common parameters to test hypotheses are:

- Proportions (for qualitative data)
- Mean values (for numerical data)

You will learn the steps for both types in the following pages.