Mean and median are two closely related measures of central tendency that help describe the central tendency of a set of data. If you know the difference between mean and median, you can choose which one is appropriate for a particular data set. Let's dive into a quick explanation of the difference between mean and median and learn how to pick the right one for our next analysis.

### What Are Mean And Median?

Mean is the average of the data values. The mathematical formula for computing mean is as follows:

- Find the sum of all the numbers
- Divide the sum by the number of elements (in this case, there are two sets of data)
- The result is the mean

Mean is usually presented in square brackets and followed by two decimals (e.g., 3.0). For example, the mean of a baseball batting average over the last ten years is.260 with a standard deviation of.045 (this means that 26% of the players have batted over.300 and 74% have batted under.300).

Median is the middle value of a set of data. The mathematical formula for computing the median is as follows:

- Find the sum of all the numbers
- Divide the sum by the number of elements (in this case, there are two sets of data)
- The result is the median

Median is usually presented in parentheses and followed by two decimals (e.g., 4). For example, the median of a baseball batting average over the last ten years is.300 (this means that the middle value is approximately equal to the average).

### Why Are Mean And Median Important?

Mean and median are both important measures of central tendency because they help describe the central tendency of a set of data. If we only used upper and lower bounds (e.g., min and max), we would not have an accurate picture of the data. For example, the bounds of a baseball batting average over the last ten years could be.200 and.400, but the actual (or average) value is.260. The bounds do not accurately represent the overall data distribution.

Additionally, we need a number in the middle to know where we stand. For example, if we know the mean of a data set, we can compare it to the median to get an idea of the overall distribution. If the mean is 4 and the median is 3, this means that approximately 16% of the data are below average and 84% are above average. This is a widely accepted ratio and can be used to assess the relative performance of an organization or business.

### When Do You Need To Use One Or The Other?

Mean and median are both important measures of central tendency that can be used to describe the overall distribution of a data set. If we had only used upper and lower bounds, we would not have had an accurate picture of the data distribution. Also, we need a number in the middle to know where we stand. As a general rule, we want to use the mean if the data are normally distributed and we want to use the median if the data are not normally distributed.

If the data are normally distributed, we can assume that the values are roughly symmetrically distributed around the mean (approximately 68% of the data will be within one SD of the mean and approximately 95% will be within two SDs of the mean). In this case, we can use the mean to summarize the data set and get an idea of the central tendency and spread of the data. If we want to know if a sample is representative of the population, we can use the mean to calculate a sample size (we will use this ratio in our next example).

If the data are not normally distributed (e.g., they have long tails or a few extremely high or low values), we can still use the mean in a pinch, but we should use the median for more rigorous statistical analysis since the data do not follow a symmetrical distribution around the mean (in this case, approximately 2% of the data will be below the mean and approximately 98% will be above the mean).

In the special case where the data are all negative or all positive, we can also use the absolute values (the minimum and maximum) to calculate the mean or median. In this case, we would want to use the absolute values to ensure that we get a representative number since there is no positive value that can represent all of the data perfectly. An example of this would be a data set consisting of only negative values or only positive values. The resulting mean or median would be very close to zero since all of the values are either negative or positive.

### Example: Calculating The Mean And Median Of A Baseball Batting Average

Suppose we have two sets of data, one set containing the batting averages of all major league baseball players over the last ten years and the other set containing the salaries of all major league baseball players for the same time period. The mean and median of the batting average are easily calculated using the formulas above.

The mean is the sum of all the batting averages divided by the number of players (11), and the median is the sum of all the batting averages divided by two (since there are two sets of data). Let's take a look at some of the statistics for the average major league baseball player in the last ten years:

- 366 players have batted over.300 (this is approximately 18% of the total number of players)
- 246 players have batted under.300 (this is approximately 22% of the total number of players)
- The average batting average is.265 (this means that on average, 26% of the players have batted over.300 and 74% have batted under.300)
- The standard deviation of the batting average is.045 (this means that there is a 5% chance that a player's batting average will be above or below the average)

As you can see, the data are not normally distributed and we should use the median for our calculations instead of the mean.

To determine if the data are normally distributed or have a normal distribution, we can use the Shapiro-Wilk test, also known as the Shaphar test. Simply put, the Shaphar test checks to see if any of the values in our data set have a normal distribution. If all of the values do not have a normal distribution, we can conclude that the data are not normally distributed and should be skewed by means of the median (in this case, 24% of the players have batted over.300 and 76% have batted under.300).

### Summary

Mean and median are both important measures of central tendency that can be used to describe the overall distribution of a data set. If we had only used upper and lower bounds (e.g., min and max), we would not have had an accurate picture of the data distribution. Also, we need a number in the middle to know where we stand. As a general rule, we want to use the mean if the data are normally distributed and we want to use the median if the data are not normally distributed. If the data are all negative or all positive, we can also use the absolute values (the minimum and maximum) to calculate the mean or median. In this case, we would want to use the absolute values to ensure that we get a representative number since there is no positive value that can represent all of the data perfectly. A special case where the data are skewed by many extremely high or low values is also presented below for the sake of completeness.

## Comments

## Join the discussion