Data Science Python

Complete Guide to Anaconda, Conda, and Jupyter for Beginners

Master Anaconda, Conda, and Jupyter Notebook from scratch. Learn installation, environments, packages, and data science workflows in one complete guide.

Apr 9, 2026 16 min read

1. Introduction

When you first start with Python for data science, you quickly hit a wall: "Which Python version? Which library version? Why is import pandas failing? Why is my colleague's code not running on my machine?"

This is exactly why Anaconda, Conda, and Jupyter exist.

What Is Anaconda?

Anaconda is a free, open-source distribution of Python (and R) specifically built for data science, machine learning, and scientific computing. Think of it as a pre-packaged Python bundle that comes with:

Python interpreter
250+ pre-installed data science libraries (NumPy, Pandas, Matplotlib, Scikit-learn, etc.)
Conda (package + environment manager)
Jupyter Notebook and JupyterLab
Anaconda Navigator (GUI)
Spyder IDE

Analogy: Regular Python is like buying a car engine. Anaconda is like buying a fully assembled car — everything is already there and configured.

What Is Conda?

Conda is the package manager and environment manager that ships inside Anaconda. It does two things:

Installs packages — like pip, but smarter (handles non-Python dependencies too)
Manages virtual environments — isolated Python setups for each project

What Is Jupyter?

Jupyter Notebook is an interactive, browser-based coding environment where you write code in "cells" and see output instantly below each cell. It's the de facto standard for data analysis, ML experimentation, and teaching.

2. Basic Concepts

The Problem Conda Solves — Virtual Environments

Imagine you have two projects:

Project A → needs pandas 1.3 and Python 3.8
Project B → needs pandas 2.1 and Python 3.11

Without virtual environments, installing one breaks the other. Conda environments give each project its own isolated Python + library sandbox.

Your Machine
├── base environment (Python 3.11)
├── project_a_env (Python 3.8, pandas 1.3)
└── project_b_env (Python 3.11, pandas 2.1)

Key Vocabulary

Term	Definition
Environment	An isolated directory containing a specific Python version + packages
Package	A library (e.g., numpy, pandas) you install and import
Channel	A repository from where conda downloads packages (defaults, conda-forge)
Base Environment	Default environment created when Anaconda is installed
Kernel	The engine that runs your Python code inside Jupyter
Cell	A single block of code or text in a Jupyter Notebook
.ipynb	Jupyter Notebook file format (JSON internally)
conda-forge	Community-maintained channel with more packages than default
Miniconda	Lightweight version of Anaconda — just Python + Conda, no extra packages

3. Installation

Anaconda vs Miniconda — Which Should You Install?

Feature	Anaconda	Miniconda
Size	~3 GB	~400 MB
Pre-installed packages	250+	None (just conda + python)
Good for	Beginners, all-in-one	Pros, custom setups
Installation time	10-20 min	2-3 min
Recommendation	Beginners	Intermediate/Advanced

Installing Anaconda on Windows

Go to https://www.anaconda.com/download
Download the Windows 64-bit installer (.exe)
Run the installer
Important options during install:
- ✅ Install for "Just Me" (recommended)
- ✅ Add Anaconda to PATH (optional but useful)
- ✅ Register as default Python
Verify installation — open Anaconda Prompt and run:

conda --version
# Output: conda 24.x.x

python --version
# Output: Python 3.11.x

Installing on macOS/Linux

# Download installer (macOS example)
wget https://repo.anaconda.com/archive/Anaconda3-2024.10-MacOSX-x86_64.sh

# Run installer
bash Anaconda3-2024.10-MacOSX-x86_64.sh

# Follow prompts, then initialize conda
conda init

# Restart terminal, then verify
conda --version

Installing Miniconda (Recommended for Advanced Users)

# Linux/macOS
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

# Windows — download .exe from:
# https://docs.conda.io/en/latest/miniconda.html

4. All Conda Commands and Features

4.1 Conda Info and Help

# Check conda version
conda --version

# Detailed conda info (Python version, platform, channels)
conda info

# Help for any command
conda --help
conda create --help

# Update conda itself
conda update conda

4.2 Managing Environments

# Create a new environment
conda create --name myenv

# Create with specific Python version
conda create --name myenv python=3.10

# Create with specific Python + packages
conda create --name myenv python=3.10 numpy pandas matplotlib

# List all environments
conda env list
# OR
conda info --envs

# Activate an environment
conda activate myenv

# Deactivate current environment
conda deactivate

# Remove an environment completely
conda env remove --name myenv

# Clone an environment (copy)
conda create --name myenv_backup --clone myenv

# Rename environment (conda doesn't rename directly)
# Step 1: Clone
conda create --name new_name --clone old_name
# Step 2: Delete old
conda env remove --name old_name

4.3 Managing Packages

# Install a package (in active environment)
conda install numpy

# Install specific version
conda install numpy=1.24.3

# Install multiple packages at once
conda install numpy pandas matplotlib scikit-learn

# Install from conda-forge channel
conda install -c conda-forge plotly

# Update a package
conda update numpy

# Update all packages in environment
conda update --all

# Remove a package
conda remove numpy

# List all installed packages
conda list

# Search for a package
conda search pandas

# Check if package exists in conda-forge
conda search -c conda-forge tensorflow

# Install using pip inside conda environment
pip install some_package  # only if not available via conda

Pro Tip: Always prefer conda install over pip install inside conda environments. Use pip only when a package is not available via conda.

4.4 Exporting and Sharing Environments

This is one of the most important features of Conda — reproducibility.

# Export current environment to a YAML file
conda env export > environment.yml

# Export only packages you explicitly installed (cleaner)
conda env export --from-history > environment.yml

# Example environment.yml output:

name: myenv
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.10
  - numpy=1.24.3
  - pandas=2.0.1
  - matplotlib=3.7.1
  - scikit-learn=1.3.0
  - pip:
    - some_pip_only_package==1.0.0

# Create environment FROM a YAML file
conda env create -f environment.yml

# Update existing environment from YAML
conda env update -f environment.yml --prune

4.5 Conda Channels

Channels are package repositories. Think of them like app stores.

# Default channels: defaults (Anaconda's official repo)
# conda-forge: community channel with many more packages

# Add conda-forge as default channel
conda config --add channels conda-forge
conda config --set channel_priority strict

# Install from specific channel
conda install -c conda-forge lightgbm

# Show configured channels
conda config --show channels

# Remove a channel
conda config --remove channels conda-forge

Channel Priority:

conda-forge > defaults  (recommended for data science)

4.6 Conda Config

# Show all configurations
conda config --show

# Set auto-activation of base environment OFF (good practice)
conda config --set auto_activate_base false

# Always use conda-forge
conda config --add channels conda-forge
conda config --set channel_priority strict

# Show config file location
conda config --show-sources

4.7 Conda Clean (Free Up Space)

Conda caches packages aggressively. Over time it can consume GBs.

# Remove unused packages and caches
conda clean --all

# Remove only cached packages (tarballs)
conda clean --tarballs

# Remove unused packages
conda clean --packages

# Dry run (see what will be deleted)
conda clean --all --dry-run

4.8 Conda vs Pip — Quick Reference

# Conda install
conda install numpy

# Pip install (use only when conda doesn't have the package)
pip install numpy

# Check what's installed
conda list          # shows conda + pip packages
pip list            # shows only pip packages

# Uninstall
conda remove numpy
pip uninstall numpy

5. Jupyter Notebook — Deep Dive

5.1 Launching Jupyter Notebook

# Activate your environment first
conda activate myenv

# Install jupyter if not present
conda install jupyter

# Launch Jupyter Notebook
jupyter notebook

# Launch on a specific port
jupyter notebook --port 8889

# Launch without opening browser
jupyter notebook --no-browser

# Launch on a specific directory
jupyter notebook --notebook-dir="C:/Projects/my_project"

This opens Jupyter in your browser at http://localhost:8888.

5.2 Jupyter Notebook Interface

Browser: http://localhost:8888
│
├── Dashboard (File Browser)
│   ├── New → Python 3 (creates new notebook)
│   ├── Upload (upload .ipynb or data files)
│   └── Running (see active notebooks)
│
└── Notebook Editor
    ├── Menu Bar (File, Edit, View, Cell, Kernel, Help)
    ├── Toolbar (Run, Stop, Restart, Cell type selector)
    └── Cells (Code cells, Markdown cells, Raw cells)

5.3 Cell Types

Cell Type	Purpose	Example
Code	Run Python code	`print("Hello")`
Markdown	Write formatted text, equations	`## Header`, `bold`
Raw	Plain text, not executed	Notes, metadata

Change cell type:
- Via dropdown in toolbar
- Keyboard: Press Esc → M (Markdown), Y (Code), R (Raw)

5.4 Keyboard Shortcuts (The Most Important Ones)

Jupyter has two modes:

Command Mode (blue border) — navigate between cells
Edit Mode (green border) — type inside a cell

COMMAND MODE (press Esc to enter):
─────────────────────────────────
A          → Insert cell Above
B          → Insert cell Below
D, D       → Delete current cell
M          → Convert to Markdown
Y          → Convert to Code
Z          → Undo cell deletion
Shift + M  → Merge selected cells
Up/Down    → Navigate cells
Shift + Enter → Run cell and move to next
Ctrl + Enter  → Run cell and stay

EDIT MODE (press Enter to enter):
──────────────────────────────────
Tab        → Autocomplete
Shift+Tab  → Show docstring/help
Ctrl + /   → Comment/uncomment code
Ctrl + Z   → Undo
Ctrl + A   → Select all
Ctrl + D   → Delete line

5.5 Running Code in Cells

# Cell 1: Basic Python
x = 10
y = 20
print(x + y)
# Output: 30

# Cell 2: Variables persist across cells in same notebook
z = x * y   # x and y from Cell 1 are available
print(z)
# Output: 200

# Cell 3: Last expression is auto-displayed
x + y       # No need to print — Jupyter shows it automatically
# Output: 30

# Cell 4: Multiple outputs
x = 5
y = 10
x, y        # Shows (5, 10) — last expression displayed

5.6 Magic Commands

Magic commands are special Jupyter commands starting with % or %%. They extend what you can do beyond normal Python.

Line Magics (`%`) — Apply to one line

# Time a single expression
%time sum(range(1_000_000))
# Output: CPU times: user 12.1 ms, sys: 0 ns, total: 12.1 ms

# Time multiple runs and give average
%timeit sum(range(1_000_000))
# Output: 10.1 ms ± 234 µs per loop (mean ± std. dev. of 7 runs, 100 loops)

# List all variables in namespace
%who
%whos      # more detailed version

# Load a Python file into cell
%load my_script.py

# Run a Python file
%run my_script.py

# Show current working directory
%pwd

# Change directory
%cd /path/to/folder

# List files in directory
%ls

# Show command history
%history

# Reset all variables (clear namespace)
%reset

# Install packages without leaving notebook
%pip install seaborn
%conda install seaborn

# Show matplotlib plots inline
%matplotlib inline

# Interactive plots (hover, zoom)
%matplotlib widget

Cell Magics (`%%`) — Apply to entire cell

%%time
# Time entire cell execution
import pandas as pd
df = pd.DataFrame({'a': range(100000)})
df['b'] = df['a'] * 2

%%writefile my_script.py
# Write cell content to a file
def greet(name):
    return f"Hello, {name}!"

print(greet("Himanshu"))

%%html
<!-- Render HTML inside notebook -->
<h1 style="color:blue">Hello from HTML!</h1>
<button>Click Me</button>

%%bash
# Run bash/shell commands in cell
echo "Current directory:"
pwd
ls -la

%%sql
-- If you have ipython-sql installed
SELECT * FROM employees LIMIT 10;

%%capture output
# Capture cell output (don't display, store in variable)
import pandas as pd
df = pd.read_csv('data.csv')
print(df.shape)
# Access via: output.stdout

5.7 Jupyter Notebook with Pandas and Visualization

# Cell 1: Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline   # Show plots inside notebook

print("Libraries loaded!")

# Cell 2: Create sample dataset
np.random.seed(42)
df = pd.DataFrame({
    'Month': pd.date_range('2023-01', periods=12, freq='ME'),
    'Sales': np.random.randint(50000, 200000, 12),
    'Expenses': np.random.randint(30000, 150000, 12),
    'Region': np.random.choice(['North', 'South', 'East', 'West'], 12)
})

df['Profit'] = df['Sales'] - df['Expenses']
df.head()   # Displays as formatted HTML table in Jupyter

# Cell 3: Quick stats
df.describe()

# Cell 4: Plot
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Line chart
axes[0].plot(df['Month'], df['Sales'], marker='o', color='steelblue', label='Sales')
axes[0].plot(df['Month'], df['Expenses'], marker='s', color='salmon', label='Expenses')
axes[0].set_title('Sales vs Expenses (Monthly)')
axes[0].legend()
axes[0].tick_params(axis='x', rotation=45)

# Bar chart
axes[1].bar(df['Month'].dt.strftime('%b'), df['Profit'], color='green')
axes[1].set_title('Monthly Profit')
axes[1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

5.8 Jupyter Markdown Cells

Markdown cells let you write formatted documentation inside notebooks.

# Main Heading
## Sub Heading
### Sub-Sub Heading

**Bold text**
*Italic text*
`inline code`

- Bullet item 1
- Bullet item 2

1. Numbered item 1
2. Numbered item 2

| Column A | Column B |
|----------|----------|
| Value 1  | Value 2  |

$$E = mc^2$$          ← LaTeX math equations

[Link text](https://example.com)

![Image](image.png)

> Blockquote

5.9 Jupyter Notebook File Management

# Convert notebook to Python script
jupyter nbconvert --to script notebook.ipynb
# Creates: notebook.py

# Convert to HTML (share with non-Python users)
jupyter nbconvert --to html notebook.ipynb
# Creates: notebook.html

# Convert to PDF (requires LaTeX)
jupyter nbconvert --to pdf notebook.ipynb

# Convert to Markdown
jupyter nbconvert --to markdown notebook.ipynb

# Execute notebook and save output
jupyter nbconvert --to notebook --execute notebook.ipynb --output executed_notebook.ipynb

# Run notebook as a script (headless)
jupyter nbconvert --execute notebook.ipynb

5.10 Jupyter Kernels

A kernel is the computational engine that runs your code. Each environment needs its own kernel.

# Install ipykernel in your environment
conda activate myenv
conda install ipykernel

# Register environment as a Jupyter kernel
python -m ipykernel install --user --name myenv --display-name "Python (myenv)"

# List all available kernels
jupyter kernelspec list

# Remove a kernel
jupyter kernelspec remove myenv

# Check which kernel a notebook is using
# → Kernel menu → Change Kernel

Why this matters: If you have a myenv environment with TensorFlow, you need to register it as a kernel so Jupyter can use it. Otherwise Jupyter uses the base environment.

6. JupyterLab — The Next Level

JupyterLab is the modern, IDE-like successor to Jupyter Notebook. Same concept, better interface.

# Install JupyterLab
conda install jupyterlab

# Launch
jupyter lab
# Opens at: http://localhost:8888/lab

JupyterLab vs Jupyter Notebook

Feature	Jupyter Notebook	JupyterLab
Interface	Single notebook	Multi-tab IDE
File browser	Basic	Advanced sidebar
Multiple notebooks	Open in separate tabs	Side-by-side panels
Terminal access	No	Built-in terminal
Text editor	No	Yes
Extensions	Limited	Rich ecosystem
CSV viewer	No	Yes
Image viewer	No	Yes
Drag & drop cells	No	Yes
Table of contents	No	Built-in

JupyterLab Extensions

# Install useful extensions
pip install jupyterlab-git          # Git integration
pip install jupyterlab-code-formatter  # Auto-format code
pip install jupyterlab_execute_time  # Show cell execution time
pip install aquirdturtle_collapsible_headings  # Collapsible sections

# List installed extensions
jupyter labextension list

7. Interview Questions

Basic Level

Q1. What is Anaconda and how is it different from regular Python?

Anaconda is a Python distribution pre-bundled with 250+ data science libraries, conda package manager, and Jupyter. Regular Python is just the interpreter — you install everything manually with pip.

Q2. What is a conda environment and why do we use it?

A conda environment is an isolated directory with its own Python version and packages. We use it to avoid dependency conflicts between projects.

Q3. What is the difference between conda install and pip install?

conda install can install both Python and non-Python packages (C libraries, CUDA, MKL), handles dependencies with a proper solver, and manages environments. pip only installs Python packages and has a simpler dependency resolver.

Q4. What is a Jupyter kernel?

A kernel is the computational engine that executes code in a Jupyter notebook. Each environment needs its own kernel registered with Jupyter.

Q5. What are the three types of cells in Jupyter Notebook?

Code cells (run Python/R code), Markdown cells (formatted text, equations), and Raw cells (plain text, not executed).

Intermediate Level

Q6. How do you share a conda environment with a teammate?

# Export
conda env export --from-history > environment.yml

# Teammate creates environment
conda env create -f environment.yml

Q7. What is the difference between %time and %timeit?

%time runs the code once and measures time. %timeit runs it multiple times (default 7 runs × multiple loops) and gives average + standard deviation for statistical accuracy.

Q8. How do you use a different conda environment in Jupyter?

conda activate myenv
conda install ipykernel
python -m ipykernel install --user --name myenv
# Then in Jupyter → Kernel → Change Kernel → Select myenv

Q9. What is conda-forge and when should you use it?

conda-forge is a community-maintained package channel with 30,000+ packages (vs Anaconda's 8,000). Use it when a package isn't available in default channels or when you need more up-to-date versions.

Q10. How do you run a Jupyter Notebook without opening a browser (headless)?

jupyter nbconvert --to notebook --execute input.ipynb --output output.ipynb

Advanced Level

Q11. What is libmamba and why is it faster than the default conda solver?

libmamba is an alternative SAT solver for conda written in C++. The default conda solver is written in Python. libmamba is 10-50x faster at resolving dependencies because of lower-level implementation and better algorithmic approach.

Q12. How does Papermill work and what is it used for?

Papermill executes Jupyter notebooks programmatically with injected parameters. It's used to create parameterized notebook pipelines — e.g., running the same analysis notebook for different dates, regions, or datasets without manual editing.

Q13. Explain the internal format of a .ipynb file.

A .ipynb file is JSON containing: notebook metadata (kernel info, language), format version, and an array of cells. Each cell has a type (code/markdown), source (code text), and outputs (execution results, images, errors). This is why notebooks create large diffs in Git when outputs are included.

Q14. How would you automate a Jupyter Notebook to run daily and email the results?

# 1. Parameterize notebook with papermill
# 2. Run via cron/Task Scheduler:
papermill template.ipynb output_$(date +%Y%m%d).ipynb -p date $(date +%Y-%m-%d)

# 3. Convert to HTML
jupyter nbconvert output_*.ipynb --to html

# 4. Email via Python smtplib or SendGrid

Scenario-Based

Q15. Your colleague can't run your notebook — it throws ModuleNotFoundError even though the module is installed. What do you check?

First check which Python/environment the notebook is using (import sys; print(sys.executable)). The issue is almost always a kernel-environment mismatch — Jupyter is using a different environment than where the package was installed. Fix: register the correct environment as a kernel.

Q16. A data scientist in your team says "conda is slow — it takes 20 minutes to install packages." How do you fix this?

conda install -n base conda-libmamba-solver
conda config --set solver libmamba
# Now conda uses the libmamba C++ solver → 10-50x faster

Q17. Production uses Python 3.11 but your ML model was trained in a notebook with Python 3.8. How do you ensure compatibility?

Export the exact environment using conda env export > environment.yml (not --from-history), include this in the deployment pipeline, and use Docker to containerize the exact environment for production. Also use joblib or pickle with documented Python + library versions for model serialization.

16. Conclusion

Key Learnings Summary

Topic	What You Learned
Anaconda	Pre-bundled Python for data science with 250+ packages
Conda	Package + environment manager; better than pip for data science
Environments	Isolated per-project Python setups to avoid conflicts
environment.yml	Reproducibility across machines and teams
Jupyter Notebook	Interactive browser-based coding with cells
Magic Commands	`%timeit`, `%matplotlib inline`, `%%time`, etc.
Kernels	Bridge between Jupyter and conda environments
JupyterLab	Modern IDE-like interface, successor to Notebook
Papermill	Parameterized notebook automation
Best Practices	Never use base env, export `--from-history`, strip outputs