Correlation is used to measure how much one set of numbers relates to another set of numbers. It is measured by the Correlation Coefficient.
When To Use Correlation
Correlation analysis is a really useful tool to see how strongly two sets of numbers are related.
If you are running experiments and want to see whether one option outperformed another option, then you might want to perform a Hypothesis Test.
What Is the Correlation Coefficient?
The Correlation Coefficient is a number between -1 and 1. A value close to 1 means that when the values in your first set of numbers increase or decrease, then the numbers in your second set also increase or decrease. A value close to -1 means that when the numbers in your first set increase or decrease, then the numbers in your second set change in the opposite direction. E.g. if the numbers in your first set increase, then the numbers in your second set decrease.
Datasets with a Correlation Coefficient that is close to -1 or 1 are said to be highly correlated. This means that there is a strong relationship between the numbers in the first and second sets.
Datasets with a Correlation Coefficient close to 0 are said to be weakly correlated. This means that there is a weak, or no, relationship between the numbers in the first or second sets.
Correlation Does Not Imply Causation
A strong correlation between two sets of numbers does not mean that one causes the other. Correlation does not mean causation.
For example, ice cream stands at the beach tend to sell more ice cream on days when there are more shark attacks. But selling ice-creams does not cause shark attacks, and removing the stands will not stop shark attacks. There is a confounding variable at play, the temperature of the day.
More people go swimming at the beach on sunny days. More people swimming means that there is a higher chance of shark attacks, and it also means that more ice creams will be sold because there are more people at the beach.