Boxplots and Scatterplots
While bar charts, pie charts, and histograms help summarize data, some analytical questions require deeper insight into data spread, outliers, and relationships.
Boxplots and scatterplots are powerful visual tools used to explore distributions and relationships between variables. They are widely used in research, quality control, and business analytics.
Understanding Boxplots
A boxplot (box-and-whisker plot) summarizes the distribution of a numerical variable using five key values:
- Minimum
- First Quartile (Q1)
- Median
- Third Quartile (Q3)
- Maximum
Boxplots are especially useful for:
- Detecting outliers
- Comparing distributions across groups
- Understanding data spread
Example: Salary Distribution
Consider the following salary data:
| Employee_ID | Department | Monthly_Salary |
|---|---|---|
| 1001 | IT | 52000 |
| 1002 | IT | 58000 |
| 1003 | HR | 42000 |
| 1004 | Sales | 75000 |
A boxplot can immediately show whether salary values are evenly distributed or if extreme values exist.
EXAMINE VARIABLES=Monthly_Salary
/PLOT=BOXPLOT
/STATISTICS=NONE.
Interpreting a Boxplot
Key interpretation points:
- The box shows the middle 50% of data
- The line inside the box represents the median
- Points outside whiskers indicate outliers
Outliers should be investigated, not automatically removed. They may represent valid but rare observations.
Understanding Scatterplots
Scatterplots visualize the relationship between two numerical variables. Each point represents one observation.
Scatterplots help answer questions like:
- Does salary increase with experience?
- Is there a relationship between study time and exam score?
Patterns in scatterplots indicate the type and strength of relationships.
Example: Experience vs Salary
| Experience_Years | Monthly_Salary |
|---|---|
| 1 | 35000 |
| 3 | 42000 |
| 5 | 55000 |
| 8 | 70000 |
GRAPH
/SCATTERPLOT(BIVAR)=Experience_Years WITH Monthly_Salary.
Interpreting Scatterplots
Key patterns to look for:
- Positive relationship – both variables increase
- Negative relationship – one increases, the other decreases
- No relationship – points scattered randomly
Scatterplots are often used before correlation or regression analysis.
Common Mistakes
Beginners often make these mistakes:
- Using scatterplots for categorical data
- Ignoring outliers in boxplots
- Assuming causation from correlation
Correct interpretation is essential to avoid misleading conclusions.
Quiz 1
What does a boxplot primarily show?
Data distribution and outliers.
Quiz 2
Which variables are suitable for scatterplots?
Two numerical variables.
Quiz 3
What does an outlier represent?
An unusually high or low value.
Quiz 4
What pattern indicates a positive relationship?
Points trending upward from left to right.
Quiz 5
Why are scatterplots used before regression?
To visually assess relationships between variables.
Mini Practice
Create a dataset with:
- Experience_Years
- Monthly_Salary
Perform:
- A boxplot for Monthly_Salary
- A scatterplot between Experience_Years and Monthly_Salary
Use Analyze → Descriptive Statistics → Explore for boxplots and Graphs → Chart Builder for scatterplots.
What’s Next
In the next lesson, you will learn how to save, export, and manage SPSS output, which is essential for reporting and documentation.