📈 Maths #002 - Statistics - Descriptive Statistics - Part 1📊
Understanding the Basics...
Hello TWFers & Hello November, 🌟
As we close down on the year, our journey here at The Weekend Freelancer is just starting. Everyday I see more and more interaction with my content, It’s really encourages me to build quality content for you guys. Thanks a lot! 🙏
On to the “l'ordre du jour”, Today in this second edition of MATHS, we start our exploration of Statistics! It’s one of my favorite subjects to study. In fact if I remember correctly this was the only subject during my engineering where I got a perfect grade. Now, seeing the impact this subject has had on my career I don’t shy about bragging about my grade. 📈📚😂
So, What is Statistics?
At its core, statistics is the science and methodology behind collecting, analyzing, interpreting and presenting data. It is not merely a set of mathematical tools; it's a comprehensive approach to understanding the patterns and structures hidden within data.
Statistics provides a framework to process information, make informed decisions and extract meaningful insights from vast and often complex datasets. From simple calculations to sophisticated models, it empowers us to derive knowledge from the numbers and make predictions about the world around us. 📊📈🔍
A Glimpse into History: The Birth of Statistics 📜
Don’t know if you are a history nerd or no but for me as I grow older my desire to reach to the origins of things keeps increasing. It’s for that I try to add some sort of history into our newsletters. Is that something you value too? 🤓
The foundations of statistics can be traced back through the annals of human civilization. Ancient societies collected and analyzed data to inform agricultural practices, trading decisions, and governance. However, the formal discipline of statistics began to take shape in the 17th century. Scholars like Sir William Petty, John Graunt, and later, visionaries such as Sir Francis Galton and Karl Pearson, contributed to the systematic development of statistical methods. They established concepts and tools that laid the groundwork for the statistical theories and techniques used today.
Fast-forward to the 20th century, Sir Ronald Fisher emerged as a key figure. His work in statistical theory and methodology laid the foundation for many modern statistical techniques. Fisher's contributions, particularly in experimental design, hypothesis testing, and the development of the analysis of variance, have had a profound and lasting impact on the field of statistics. 🌐📊📚
Statistics is around you, everyday…. (no really)
Statistics is the silent force shaping our everyday choices, influencing health decisions, educational policies, economic forecasts and beyond. It guides the development of governmental policies, shapes marketing strategies and aids advancements in technology.
From understanding consumer behavior to predicting election outcomes, statistics underpins these crucial decisions, often operating inconspicuously behind the scenes. It’s the tool behind tailored streaming recommendations, optimized traffic flow and personalized healthcare approaches.
Moreover, in the realm of analytics, statistics is the bedrock upon which insights are built. It unravels hidden patterns, aids in predictive modeling, and forms the foundation for risk assessment in fields such as finance, healthcare, marketing, and numerous other domains. Statistics isn't just about numbers; it's the language that empowers analysts to decode data, construct models and make informed decisions based on empirical evidence.✨
Now that you have some background about Statistics, Let’s talk about
Descriptive Statistics -
Descriptive statistics, in essence, are the tools and methods used to summarize and describe the essential features of a dataset. They provide a clear and concise understanding of the information within the data, helping to distill large volumes of numbers into comprehensible and meaningful insights.
At its core, descriptive statistics encompasses various measures that aid in simplifying complex data. These measures include central tendency metrics such as the mean, median, and mode, which offer a snapshot of the 'typical' value in a dataset. 🎯
Measures of variability, like the range and standard deviation, depict the spread or dispersion of the data points. These statistics are essential in capturing the distribution, shape and the overall pattern of the data, allowing for a better understanding of the underlying characteristics. 🌐📏📉
By employing histograms, bar charts, or box plots, descriptive statistics also provide visual representations of the data, making it easier to interpret and communicate insights. They serve as the initial steps in analyzing data, providing a comprehensive framework to explore, interpret, and communicate the nature of the information, enabling individuals to make informed decisions based on a clear understanding of the data's characteristics and patterns.
Today we will start by understanding the very basic but important topics in Descriptive Analytics -
Central tendency measures
Central tendency is like finding the middle or 'center' of a group of numbers. It helps us understand what the 'typical' or 'average' value is in a set of data. Imagine you have a group of numbers, say, the ages of a few people. Central tendency measures like the mean, median, and mode help us figure out what number is most typical in that group.
1. Mean (Average):
The mean is the average of a set of numbers. To find the mean, you add up all the numbers and divide by how many numbers there are.
For instance, consider the ages of five friends: 20, 22, 25, 28, and 30. To find the mean age, add up all the ages (20 + 22 + 25 + 28 + 30 = 125) and then divide by the number of friends (5). The mean age is 25.
2. Median:
The median is the middle value when the numbers are arranged in ascending or descending order.
For instance, if you have the ages of a group of people: 18, 20, 22, 25, 30, 40. The median age is 25, as it's the value in the middle when the ages are ordered.
3. Mode:
The mode is the number that appears most frequently in a dataset.
For example, in a list of test scores: 85, 90, 75, 90, 80. Here, the mode is 90 because it appears more often than any other score.
Each measure serves a distinct purpose based on the nature of the dataset and the analytical requirements. The table below illustrates the advantages, disadvantages, and ideal usage scenarios for these measures. This comparative overview can guide the selection of the most suitable measure depending on the characteristics of the dataset and the specific analytical objectives.🔍
Don’t know what Normal distribution, Outliers, Skewness means?
No problem, because next time those are the topics we will cover alongside Understanding Variability and Data Spread.
This is not the end yet, because it’s your turn to practice what you just learnt today! 🧠✍️
Homework or 😂 Substack work -
Imagine a dataset representing the heights of students in a college basketball team:
Heights (in inches): 68, 70, 72, 72, 74, 74, 76, 78, 82, 86
Now, based on this dataset:
Question for You:
In this scenario, which measure of central tendency (Mean, Median or Mode) would best represent the typical height of the basketball team?
Why would you choose a specific measure given the nature of this dataset?
Comment below with your answers! Would love to see your views.
Until the weekend,
Raghunandan 🎯
P.S. - “The Weekend Freelancer” is a reader backed publication. Share this newsletter with your friends and relatives & consider becoming a free or paid member of this newsletter. Every subscription gives me an extra ounce of motivation to keep going! 💪