Interpreting Regression Output and R-Squared
In the previous lesson, we learned how to build a simple linear regression model.
In practice, regression results usually come as tables generated by software (Excel, Python, R, SPSS, etc.).
This lesson focuses on understanding what those numbers actually mean.
What Is Regression Output?
Regression output summarizes how well the model fits the data and how each variable contributes to the prediction.
Even though the format may vary by software, the core components are always similar.
Typical Regression Output Table
| Term | Coefficient | Standard Error | t-value | p-value |
|---|---|---|---|---|
| Intercept | 40 | 5 | 8.0 | 0.001 |
| Hours Studied | 6 | 0.8 | 7.5 | 0.002 |
We now break down each column.
Coefficients
The coefficient represents the estimated effect of a variable on the outcome.
- The intercept is the predicted value when X = 0
- The slope tells us how much Y changes for one unit of X
In this example:
- Intercept = 40 → predicted score with 0 study hours
- Hours Studied coefficient = 6 → each extra hour increases score by 6 points
Standard Error
The standard error measures the uncertainty in the estimated coefficient.
Smaller standard error means:
- More precise estimate
- Greater confidence in the coefficient
t-value
The t-value compares the coefficient to its standard error.
It answers the question:
“How far is this estimate from zero, relative to its variability?”
Larger absolute t-values indicate stronger evidence that the coefficient is not zero.
p-value
The p-value indicates whether a coefficient is statistically significant.
Decision rule:
- p-value ≤ α → coefficient is statistically significant
- p-value > α → not statistically significant
In our example, both coefficients have very small p-values, so both are statistically significant.
What Is R-Squared?
R-squared (R²) measures how much of the variability in the dependent variable is explained by the model.
Its value lies between 0 and 1.
Interpreting R-Squared
| R² Value | Meaning |
|---|---|
| 0.00 | No explanatory power |
| 0.50 | 50% of variability explained |
| 1.00 | Perfect explanation |
If R² = 0.72, it means 72% of the variation in Y is explained by the model.
Important Notes About R-Squared
- A high R² does not imply causation
- A low R² does not mean the model is useless
- R² depends on context and field
Real-World Interpretation
In human behavior studies, R² values are often lower.
In physical systems, R² values tend to be higher.
Always interpret R² within the problem domain.
Common Misinterpretations
- Assuming R² close to 1 means a perfect model
- Ignoring residual patterns
- Focusing only on R² and not coefficients
Quick Check
What does a statistically significant coefficient mean?
There is evidence that the variable has a real effect on the outcome.
Practice Quiz
Question 1:
What does R-squared measure?
The proportion of variability explained by the model.
Question 2:
Can a model have significant coefficients but low R²?
Yes, especially in complex or noisy data.
Question 3:
Does a low p-value prove causation?
No. It only shows statistical association.
Mini Practice
A regression model reports:
- Coefficient for X = 3.5
- p-value = 0.01
- R² = 0.40
Interpret these results.
X has a statistically significant positive effect. The model explains 40% of the variation in Y.
What’s Next
In the next lesson, we will study Regression Residuals and Assumptions, which help validate whether a regression model is reliable.