Featured image of post PyTorch in a Nutshell

PyTorch in a Nutshell

Python Machine Learning

Once upon a time, deep learning was like a mysterious black box that only wizards (also known as researchers) could understand.

Frameworks like TensorFlow ruled the land, but they were a bit, well, complicated.

Then, one day in 2016, the wizards at Facebook AI Research (FAIR) unleashed PyTorch—a deep learning framework that was dynamic, Pythonic, and, dare I say, fun to use.

PyTorch made neural networks more like playing with Lego bricks rather than assembling a spaceship with a 500-page manual.

If you love Python and hate wrestling with weird syntax, PyTorch was (and still is) a breath of fresh air.

History and Motivation

Before PyTorch, we had Torch, a powerful machine learning framework written in Lua (yes, Lua—because that’s what the cool kids were using).

Torch was great, but Lua wasn’t exactly the most popular language.

Meanwhile, TensorFlow had already taken off, but many researchers found it frustrating to use because of its static computation graph (ugh, so rigid!).

Then, PyTorch entered the scene, keeping the best parts of Torch while ditching Lua for Python. It introduced dynamic computation graphs, making it easy to debug and experiment with neural networks on the fly. It was love at first import.

Getting Started with PyTorch

Let’s get our hands dirty! First, you’ll need to install PyTorch:

1
pip install torch torchvision torchaudio

And now, let’s check if everything is working:

1
2
import torch
print(torch.__version__)

If you see a version number, congratulations! You’re officially in the PyTorch club.

Tensors: The Foundation of PyTorch

At its core, PyTorch is all about tensors, which are like NumPy arrays but way cooler because they can run on GPUs.

Creating a simple tensor:

1
2
3
import torch
x = torch.tensor([[1, 2], [3, 4]])
print(x)

Want a random tensor? No problem:

1
2
rand_tensor = torch.rand(3, 3)
print(rand_tensor)

Need one on a GPU? Easy:

1
2
3
device = "cuda" if torch.cuda.is_available() else "cpu"
x = torch.tensor([1.0, 2.0], device=device)
print(x)

Autograd: The Magic Behind Gradients

PyTorch’s autograd module automates gradient computation. Let’s see it in action:

1
2
3
4
5
x = torch.tensor(2.0, requires_grad=True)
y = x ** 2

y.backward()
print(x.grad)  # Should print 4.0 (dy/dx = 2x at x=2)

Boom! No need to manually compute derivatives—PyTorch does it for you.

Building a Neural Network in PyTorch

Let’s create a simple feedforward neural network using torch.nn:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import torch.nn as nn
import torch.nn.functional as F

class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(2, 3)
        self.fc2 = nn.Linear(3, 1)
    
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

model = SimpleNN()
print(model)

Training a Model in PyTorch

Here’s a full training loop:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import torch.optim as optim

dataset = [(torch.tensor([0.5, 1.0]), torch.tensor([1.0]))]
dataloader = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=True)

model = SimpleNN()
optimizer = optim.SGD(model.parameters(), lr=0.01)

for epoch in range(10):
    for data, target in dataloader:
        optimizer.zero_grad()
        output = model(data)
        loss = F.mse_loss(output, target)
        loss.backward()
        optimizer.step()
    print(f"Epoch {epoch+1}, Loss: {loss.item()}")

Using GPUs for Speed

To run your model on a GPU:

1
2
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

Now, every tensor you create needs to be on the same device:

1
data, target = data.to(device), target.to(device)

Saving and Loading Models

After all that training, let’s save the model:

1
torch.save(model.state_dict(), "model.pth")

To load it:

1
model.load_state_dict(torch.load("model.pth"))

Deploying PyTorch Models

PyTorch models can be exported to TorchScript for production:

1
2
scripted_model = torch.jit.script(model)
scripted_model.save("model_scripted.pt")

This makes it easier to deploy in a production environment.


Key Ideas

ConceptSummary
PyTorch OriginsCreated by Facebook AI Research (FAIR) in 2016 to be Pythonic and dynamic
TensorsThe backbone of PyTorch, similar to NumPy arrays but GPU-compatible
AutogradAutomatically computes gradients for backpropagation
Neural NetworksBuilt using torch.nn module with a simple, modular design
Training ModelsUses optimization and loss functions for training
GPU AccelerationEasily moves computations to CUDA-compatible GPUs
Saving & LoadingSave models with torch.save() and load with torch.load()
DeploymentConvert models to TorchScript for production

References