Hands-On Introduction to PyTorch for Machine Learning

A comprehensive notebook on PyTorch — tensors, autograd, training loops, and the full ecosystem from core libraries to community tools.

Posted Feb 26, 2026 Updated Mar 20, 2026

PyTorch deep learning framework

By YuXuan Yan

3 min read

Hands-On Introduction to PyTorch for Machine Learning

PyTorch is the framework of choice for most deep learning research and production. This is a reference notebook covering everything from the basics to the full ecosystem.

⚡ PyTorch at a Glance

PyTorch is a widely used deep learning framework developed and maintained by Meta.

Why PyTorch?

Dynamic computation graph — flexibility and easy debugging, especially valuable for research and experimentation
Pythonic interface — integrates seamlessly with NumPy, SciPy, and the broader Python ML ecosystem
Strong GPU acceleration — high-performance training and inference via CUDA
Thriving community — extensive documentation, TorchServe for deployment, TorchScript for optimization

🌐 PyTorch’s Ecosystem

PyTorch has a rich ecosystem and continues to expand. Key projects:

Project	Description
docTR	OCR library that integrates with PyTorch pipelines
Transformers (HuggingFace)	Thousands of pretrained models for text, vision, audio
vLLM	High-throughput, memory-efficient LLM inference engine
depyf	Decompiles PyTorch compiler bytecode back to source
DeepSpeed	Distributed training optimization (Microsoft/PyTorch Foundation)

🔢 Tensors — The Core Data Structure

A PyTorch tensor is a multidimensional array, similar to a NumPy array, but with GPU acceleration and autograd support.

Key characteristics:

Multidimensional: 1D (vectors), 2D (matrices), or higher-dimensional for images, audio, video
Type and device flexibility: float32, int64, etc. — on CPU or GPU
Autograd support: when requires_grad=True, PyTorch tracks operations for backpropagation
Interoperability: convert to/from NumPy with .numpy() and torch.from_numpy()
Efficient operations: slicing, matrix multiplication, broadcasting — all optimized in C++/CUDA

Broadcasting allows arithmetic on tensors of different shapes without manually copying data — handled in optimized C++/CUDA, much faster than Python loops.

🔁 Autograd — Automatic Differentiation

Autograd is PyTorch’s automatic differentiation engine, which powers training by computing gradients automatically.

flowchart LR
    Input["📥 Input Tensor\nrequires_grad=True"]:::tensor --> FP["➡️ Forward Pass\n(compute predictions)"]:::step
    FP --> Loss["📉 Loss\n.backward()"]:::loss
    Loss --> BP["⬅️ Backward Pass\n(compute gradients)"]:::step
    BP --> Opt["⚙️ Optimizer\n.step() → update weights"]:::opt

    classDef tensor fill:#4A90D9,stroke:#2c5f8a,color:#fff
    classDef step fill:#5BA85A,stroke:#3a6e39,color:#fff
    classDef loss fill:#D97B4A,stroke:#9e5430,color:#fff
    classDef opt fill:#9B6EBD,stroke:#6b4785,color:#fff

Key features:

Dynamic computation graph — built on the fly as operations execute; easier to debug than static graphs
Automatic gradient computation — call .backward() on the loss, and gradients propagate to all requires_grad=True tensors
Gradient storage — gradients stored in .grad attribute, consumed by optimizers like SGD or Adam
Custom autograd functions — subclass torch.autograd.Function for advanced use cases

Two phases of neural network training:

Forward propagation: makes its best guess about the correct output
Backward propagation: traverses backward from output → collects derivatives → optimizes parameters via gradient descent

🔌 APIs — Python on Top, C++ Under the Hood

Layer	Language	What it handles
User-facing API	Python	`torch`, `torch.nn`, `torch.optim`, `torch.utils.data`
Core tensor ops (aten)	C++	Tensor operations, model serialization
GPU acceleration	CUDA	NVIDIA GPU kernels for tensor computation

This hybrid design gives PyTorch the best of both worlds: Python’s ease of use and C++/CUDA’s raw performance.

📚 Libraries Overview

Five Core Libraries

Library	Purpose
`torch`	Base library: tensors, autograd, deep learning workflows
`torch.nn`	Neural network building blocks: layers, loss functions, activations
`torch.optim`	Optimization algorithms: SGD, Adam, RMSprop
`torch.autograd`	Automatic differentiation for backpropagation
`torch.utils.data`	Dataset and DataLoader utilities

Five Supporting Libraries

Library	Domain
TorchVision	Computer vision: pretrained models, datasets, transforms
TorchAudio	Audio processing: speech and audio models
TorchText	NLP: datasets, preprocessing, embeddings
TorchServe	Model serving: deploy at scale via RESTful APIs
`torch.distributed`	Distributed training across multiple GPUs and nodes

Advanced Tools

torch.compile (PyTorch 2.x+) — compiles models for improved performance via TorchDynamo, AOTAutograd, and backend accelerators
torch.jit / TorchScript — static graph compilation and model export for deployment without Python runtime
torch.fx — toolkit for transforming and analyzing PyTorch programs via intermediate representations
torch.multiprocessing — parallelism using multiple processes, useful for data loading and distributed training

Use torch.compile first when optimizing for inference speed. It’s the easiest win in PyTorch 2.x — often 10–30% faster with zero code changes.

Community Ecosystem Add-ons

Tool	What it does
PyTorch Lightning	Lightweight wrapper reducing boilerplate for research
HuggingFace Transformers	State-of-the-art NLP and multimodal models
fastai	User-friendly, high-level training API on top of PyTorch

Part of my deep learning frameworks series. Next: writing a full training loop from scratch in PyTorch.

Deep Learning, Frameworks

This post is licensed under CC BY 4.0 by the author.