The Ultimate Guide to Understanding Mean, Median, and Mode

Mean, median and mode are three concepts in mathematics that you must have come across. If you are here today, it’s either you have an issue with the subject and you need some help. That’s fine, I am here to help.

Statistics is not limited to the field of mathematics alone. It is important to various fields from science to business and many more. The main importance of statistics is it helps people make sense of data turning them into useful information.

There are various ways this happens, either through charts or different graph forms. Central to this is the concept of central tendency, which gives us an idea of the 'average' or 'typical' value in a dataset.

Today, we will be looking at three core measures of central tendency in statistics: Mean, Median and mode. By the end of this article, I fully expect you to have a clear understanding of these concepts and how to calculate them, even if you're new to statistics.

Y’all ready?

What is the Measure of Centrality?

Central tendency measures provide a single value that represents the centre of a dataset. There are three measures of centrality

1. Mean

Mean is the most popular of all measures of central tendency. Maybe it’s because you’ve gotten the idea of the concept straight from your basic school days.

It is otherwise referred to as Average. Remember? When you want to calculate the average or mean of a given set of data, you will total the number of values in that set divided by the number of values.

If you didn’t understand, the mean is calculated by adding up all the values in a dataset and dividing by the number of values. It provides a general idea of the overall level of the data.

We’ll delve deep into the mean later.

2. Median

The median is the sneaky part of the three measures of centrality. It is the middle value of a dataset when it is ordered from smallest to largest.

To get the median, you must first arrange in set of data in an orderly manner. If there's an even number of values, the median is the average of the two middle numbers. The median is less affected by outliers and skewed data compared to the mean.

3. Mode

The mode is the value that frequently appears most in a dataset. There can be more than one mode if multiple values have the same highest frequency.

Mode is particularly useful for categorical data where we wish to know which is the most common category. If you are ever confused about the idea of each of these central tendencies, don’t stray away from these definitions. You can come back to here to understand what they mean.

Mathematical Notation for Calculating the Mean

To calculate the mean, we use the following formula:

The mean of a variable is found by adding all its values together and then dividing by the total number of values.

This concept of mean can be represented by the following equation.

This is the formula for calculating the mean when given a set of data. While it applies to almost all mean questions, we can streamline this formula to….

This might seem overwhelming, but both equations are just the same. Let’s break it down.

The sigma symbol means ''add up'' or "summation". Add up all the heights “x”. The “i = 1 ” part tells us which term to start with, and we always start with the first observation, so i = 1.

The number on top of the summation is the last observation we include. N is the total number of occurrences.

Function to Calculate the Mean

Let's write a simple function in Python to calculate the mean of any vector (list of numbers).

def calculate_mean(vector):

    return sum(vector) / len(vector)


#Example usage:

data = [10, 20, 30, 40, 50]

mean = calculate_mean(data)

print ("Mean:", mean)


This function works by summing all the elements of the vector and dividing by the count of elements.

Function to Calculate the Median

Next, we'll create a function to calculate the median. This requires sorting the dataset and then finding the middle value.

def calculate_median(vector):

    sorted_vector = sorted(vector)

    n = len(sorted_vector)

    mid = n // 2


    if n % 2 == 0:

        return (sorted_vector[mid - 1] + sorted_vector[mid]) / 2

    else:

        return sorted_vector[mid]


#Example Usage:

data = [10, 20 30, 40, 50]

median = calculate_median(data)

print("Median:", median)

This function sorts the vector and checks if the number of elements is even or odd to determine how to calculate the median.

https://www.youtube.com/watch?v=zjHfAhcU6kE

Mean, Median, and Mode of Grouped Data & Frequency Distribution Tables Statistics

Reliability of Sample Mean with Increasing Sample Size

The reliability of the sample mean improves as the sample size increases. This concept is rooted in the Law of Large Numbers. The Law states that "as the size of a sample increases, the sample mean gets closer to the population mean".

That's heavy on your ears, ehn? Okay, let's make it more practical.

  1. Small sample sizes can be heavily influenced by outliers and may not represent the population accurately.

  2. Larger sample sizes tend to average out anomalies and provide a more accurate estimation of the population mean.

To visualize this,

Imagine you’re trying to estimate the average height of people in a city by measuring a few individuals. If you measure only ten people, your estimate could be far off due to random variations. However, if you measure a thousand people, your estimate will be much closer to the true average height of the entire city.

Increasing the sample size reduces the variability of the sample mean and enhances its accuracy as an estimator of the population mean.

Look At This

What Is The Difference Between Has and Have? (Everything You Should Know) + Practice Questions

How are mean median and mode used in data analysis?

The 3 most common measures of central tendency are the mode, median, and mean. Mode: the most frequent value. Median: the middle number in an ordered dataset. Mean: the sum of all values divided by the total number of values.

What are mean median and mode appropriately used?

Mode will be the best measure of central tendency (as it is the only one appropriate to use) when dealing with nominal data. The mean and/or median are usually preferred when dealing with all other types of data, but this does not mean it is never used with these data types.

In Conclusion

When you understand these measures of centrality and their calculations, you can better interpret data and make informed decisions based on statistical analysis.

Whether it's simple datasets or complex statistical models, it is important to grasp the foundational concepts.


Previous
Previous

50 Basic Spanish Words and Phrases to Learn for Beginners

Next
Next

What Is The Difference Between Has and Have? (Everything You Should Know) + Practice Questions