Data Science Data Science

Linear Regression — A Complete Deep Dive | ML Series Part 2

Linear Regression explained from scratch — math, types, implementation, evaluation metrics, regularization, and real-world projects in Python. Beginner to ad...

Apr 14, 2026 6 min read

Socho tum ek company me kaam kar rahe ho.

Boss bolta hai:

"Agar hum ₹10,000 ads me lagaye to kitni sales aayegi?"

Ya:

"Employee experience badhne se salary kitni increase hoti hai?"

Ab har baar manually guess karna possible nahi hai. Yaha aata hai Linear Regression — ek simple but powerful ML technique jo relationship samajhta hai aur future predict karta hai.

1. Linear Regression kya hota hai?

Simple language me:

👉 Linear Regression ek technique hai jo input aur output ke beech ek straight line ka relation banata hai.

Example:

Input (X): Ad spend
Output (Y): Sales

Ye ek line banata hai:

"Agar X badhega, to Y kitna badhega"

Core Formula (sab kuch yahi hai)

y = mx + b

Iska matlab:

y = Output (prediction)
x = Input
m = slope (kitna change hoga)
b = intercept (jab x = 0 ho)

2. Real-Life Example

Ad Spend (₹)	Sales (₹)
1000	5000
2000	7000
3000	9000

Model seekhega:
👉 "Har ₹1000 increase pe sales ₹2000 badh rahi hai"

To equation ban sakta hai:

Sales = 2 * AdSpend + 3000

3. Ye kaam kaise karta hai internally?

Step 1: Line guess karta hai

Random line banata hai.

Step 2: Error calculate karta hai

Error = Actual value - Predicted value

Step 3: Cost Function use karta hai

👉 Most common: Mean Squared Error (MSE)

Formula (simple me):

Error^2 ka average

Step 4: Gradient Descent

👉 Model dheere-dheere line ko adjust karta hai taki error kam ho jaye.

4. Gradient Descent kya hota hai?

Socho tum pahad se neeche utar rahe ho and goal hai lowest point.

👉 Har step me tum neeche ki taraf move karte ho

Waise hi:

Model slope aur intercept adjust karta hai
Jab tak error minimum na ho jaye

5. Types of Linear Regression

1. Simple Linear Regression

👉 Ek input, ek output

Example:

Experience → Salary

2. Multiple Linear Regression

👉 Multiple inputs

Example:

Experience + Skills + Location → Salary

Formula:

y = m1x1 + m2x2 + m3x3 + b

6. Real Industry Use Cases

🏢 Sales Forecasting

Ads vs Sales prediction

🏦 Banking

Loan amount vs risk

🏠 Real Estate

Area + Location → Price

🧑‍💻 HR Analytics

Experience + Performance → Salary

7. Important Concepts

1. Residuals

Actual - Predicted difference

2. R² Score (Accuracy measure)

0 → bekar model
1 → perfect model

Example:

R² = 0.85 → 85% data explain ho raha hai

3. Assumptions (bahut important)

Linear Regression tab best kaam karta hai jab:

Relationship linear ho
Errors random ho
Data independent ho
Variance constant ho

8. Where Linear Regression FAILS ❌

❌ Non-linear data

Example:

Age vs Happiness (straight line nahi hoti)

❌ Outliers

Ek extreme value pura model bigaad deta hai

❌ Multicollinearity

Inputs ek dusre se heavily related ho

9. Practical Workflow (Office level)

Step-by-step:

Data collect karo
Data clean karo (null, outliers)
Visualization karo (scatter plot)
Train-test split
Model train karo
Predictions lo
Evaluate karo (R², MSE)
Deploy karo (API / dashboard)

10. Python Example (Simple)

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

11. Advanced Concepts (Industry level)

Regularization

Ridge (L2)
Lasso (L1)

👉 Overfitting control karta hai

Feature Engineering

New columns create karna (experience², etc.)

Polynomial Regression

👉 Jab data linear na ho

Final Summary

👉 Linear Regression ek simple but powerful tool hai
👉 Ye relationship samajhta hai aur prediction deta hai
👉 Internally error minimize karta hai (Gradient Descent)
👉 Har jagah use hota hai — sales, finance, HR

But:
👉 Ye sirf tab kaam karta hai jab relation linear ho
👉 Outliers aur wrong data isko easily bigaad dete hain

Next Steps (ab kya seekhna chahiye?)

Agar tumne ye samajh liya, next learn karo:

Polynomial Regression
Ridge & Lasso Regression
Logistic Regression (classification)
Feature Engineering
Real datasets pe practice (Kaggle)

Linear Regression Project (Ad Spend → Sales)

# ==============================
# 1. Import Required Libraries
# ==============================

import pandas as pd                      # data handle karne ke liye
import numpy as np                       # numerical operations ke liye
import matplotlib.pyplot as plt          # visualization ke liye

from sklearn.model_selection import train_test_split   # train-test split
from sklearn.linear_model import LinearRegression      # model
from sklearn.metrics import mean_squared_error, r2_score  # evaluation


# ==============================
# 2. Load Dataset
# ==============================

# CSV file read kar rahe hain (apna path change karna)
data = pd.read_csv("ads_data.csv")

# First 5 rows dekhne ke liye
print(data.head())


# ==============================
# 3. Basic Data Understanding
# ==============================

print(data.info())        # columns, datatype check
print(data.describe())    # statistical summary


# ==============================
# 4. Data Cleaning
# ==============================

# null values check
print(data.isnull().sum())

# simple approach: null rows hata do
data = data.dropna()

# NOTE:
# real project me median/mean se fill karte hain


# ==============================
# 5. Data Visualization
# ==============================

# TV Ads vs Sales ka relation check
plt.scatter(data['TV Ads'], data['Sales'])
plt.xlabel("TV Ads Spend")
plt.ylabel("Sales")
plt.title("TV Ads vs Sales")
plt.show()

# NOTE:
# yaha dekhte hain relation linear hai ya nahi


# ==============================
# 6. Feature Selection
# ==============================

# Input features (independent variables)
X = data[['TV Ads', 'Facebook Ads', 'Google Ads']]

# Output (dependent variable)
y = data['Sales']


# ==============================
# 7. Train-Test Split
# ==============================

# 80% training, 20% testing
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# random_state fix karne se same result aata hai har run me


# ==============================
# 8. Model Training
# ==============================

model = LinearRegression()     # model object create
model.fit(X_train, y_train)    # training


# ==============================
# 9. Prediction
# ==============================

y_pred = model.predict(X_test)

# predicted vs actual compare kar sakte ho
print("Predictions:", y_pred[:5])
print("Actual:", y_test.values[:5])


# ==============================
# 10. Model Evaluation
# ==============================

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R2 Score:", r2)

# MSE kam hona chahiye
# R2 1 ke paas hona chahiye


# ==============================
# 11. Model Coefficients (Insight)
# ==============================

print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)

# iska matlab:
# har feature ka impact kya hai


# ==============================
# 12. Custom Prediction (Real Use)
# ==============================

# Example: new ad budget
# TV = 2000, Facebook = 500, Google = 300

new_data = [[2000, 500, 300]]

prediction = model.predict(new_data)

print("Predicted Sales:", prediction)

# NOTE:
# real project me yahi API ya dashboard me use hota hai


# ==============================
# 13. (Optional) Save Model
# ==============================

import joblib

joblib.dump(model, "linear_regression_model.pkl")

# baad me load karne ke liye:
# model = joblib.load("linear_regression_model.pkl")

1. Linear Regression kya hota hai?

Example:

Core Formula (sab kuch yahi hai)

Iska matlab:

2. Real-Life Example

3. Ye kaam kaise karta hai internally?

Step 1: Line guess karta hai

Step 2: Error calculate karta hai

Step 3: Cost Function use karta hai

Step 4: Gradient Descent

4. Gradient Descent kya hota hai?

5. Types of Linear Regression

1. Simple Linear Regression

2. Multiple Linear Regression

6. Real Industry Use Cases

🏢 Sales Forecasting

🏦 Banking

🏠 Real Estate

🧑‍💻 HR Analytics

7. Important Concepts

1. Residuals

2. R² Score (Accuracy measure)

3. Assumptions (bahut important)

8. Where Linear Regression FAILS ❌

❌ Non-linear data

❌ Outliers

❌ Multicollinearity

9. Practical Workflow (Office level)

Step-by-step:

10. Python Example (Simple)

11. Advanced Concepts (Industry level)

Regularization

Feature Engineering

Polynomial Regression

Final Summary

Next Steps (ab kya seekhna chahiye?)

Linear Regression Project (Ad Spend → Sales)

Latest comments