Correlation & Regression | Dataplexa

Correlation & Regression in R

Correlation and regression are statistical techniques used to understand relationships between variables.

They help answer questions like whether variables move together and how one variable can be used to predict another.


What Is Correlation?

Correlation measures the strength and direction of a relationship between two numeric variables.

It tells us whether variables increase together, decrease together, or move in opposite directions.


Correlation Coefficient

The correlation coefficient ranges between -1 and +1.

  • +1 → Perfect positive relationship
  • 0 → No relationship
  • -1 → Perfect negative relationship

Calculating Correlation in R

R provides the cor() function to calculate correlation.

Both vectors must be numeric and of the same length.

x <- c(10, 20, 30, 40, 50)
y <- c(15, 25, 35, 45, 55)

cor(x, y)

Types of Correlation

There are different methods to calculate correlation depending on data type.

  • Pearson – Linear relationship (default)
  • Spearman – Rank-based relationship
  • Kendall – Ordinal data relationship
cor(x, y, method = "spearman")

What Is Regression?

Regression explains how one variable changes based on another variable.

It is commonly used for prediction and trend analysis.


Simple Linear Regression

Simple linear regression models the relationship between one independent variable and one dependent variable.

The equation follows a straight-line pattern.

model <- lm(y ~ x)
model

Understanding Regression Output

The regression model provides important information such as:

  • Intercept
  • Slope
  • Residuals

The slope indicates how much the dependent variable changes when the independent variable increases by one unit.

summary(model)

Making Predictions

Regression models can be used to predict new values.

This is one of the most practical uses of regression analysis.

new_data <- data.frame(x = c(60, 70))
predict(model, new_data)

Difference Between Correlation and Regression

  • Correlation measures relationship strength
  • Regression models cause-and-effect trends
  • Correlation does not imply prediction
  • Regression supports prediction

Why This Matters

Correlation and regression are used widely in data analysis, business forecasting, research, and machine learning.

They form the foundation for more advanced statistical models.


📝 Practice Exercises


Exercise 1

Calculate correlation between two numeric vectors.

Exercise 2

Create a simple linear regression model.

Exercise 3

Display the summary of a regression model.

Exercise 4

Use a regression model to predict new values.


✅ Practice Answers


Answer 1

a <- c(2, 4, 6, 8)
b <- c(3, 6, 9, 12)

cor(a, b)

Answer 2

model <- lm(b ~ a)
model

Answer 3

summary(model)

Answer 4

new_points <- data.frame(a = c(10, 12))
predict(model, new_points)

What’s Next?

In the next lesson, you will explore Time Series Analysis, which focuses on data collected over time.