Statistics Basics in R
Statistics helps us understand data by summarizing it, identifying patterns, and making informed decisions.
In this lesson, you will learn the most important statistical concepts and how to calculate them using R.
What Is Statistics?
Statistics is the science of collecting, analyzing, interpreting, and presenting data.
In data analysis, statistics helps answer questions like:
- What is the average value?
- How spread out is the data?
- Are there extreme values?
Types of Statistics
There are two main types of statistics:
- Descriptive Statistics – Summarizes data
- Inferential Statistics – Makes predictions using data
In this lesson, we focus on descriptive statistics.
Mean (Average)
The mean is the sum of all values divided by the number of values.
It gives a general idea of the central value.
numbers <- c(10, 20, 30, 40, 50)
mean(numbers)
Median
The median is the middle value when data is sorted.
It is useful when data contains extreme values.
median(numbers)
Mode
The mode is the value that appears most frequently.
R does not have a built-in mode function, but we can calculate it manually.
values <- c(2, 4, 4, 6, 8, 4)
mode_value <- names(sort(table(values), decreasing = TRUE))[1]
mode_value
Minimum and Maximum
These values show the smallest and largest observations in a dataset.
They help identify the range of data.
min(numbers)
max(numbers)
Range
The range is the difference between the maximum and minimum values.
It shows how spread out the data is.
range(numbers)
diff(range(numbers))
Variance
Variance measures how far values are spread from the mean.
A higher variance means greater variability.
var(numbers)
Standard Deviation
Standard deviation is the square root of variance.
It is easier to interpret because it uses the same unit as the data.
sd(numbers)
Summary Statistics
R provides a single function to calculate multiple statistics at once.
This is commonly used in exploratory data analysis.
summary(numbers)
Why Statistics Matters
- Helps understand data behavior
- Identifies patterns and trends
- Supports data-driven decisions
- Foundation for machine learning
📝 Practice Exercises
Exercise 1
Find the mean and median of a numeric vector.
Exercise 2
Calculate the minimum and maximum values.
Exercise 3
Compute variance and standard deviation.
Exercise 4
Use the summary function on a dataset.
✅ Practice Answers
Answer 1
data <- c(5, 10, 15, 20)
mean(data)
median(data)
Answer 2
min(data)
max(data)
Answer 3
var(data)
sd(data)
Answer 4
summary(data)
What’s Next?
In the next lesson, you will learn Hypothesis Testing, where statistics is used to make decisions based on data.