Making a Multi-Layer Perceptron (MLP)¶
this is a simplified intro to MLPs, enough to understand how they process structured/tabular data. their most common use case is classification or regression on feature vectors (lists of numbers), like predicting if a student passes an exam based on hours studied and hours slept (this time we can do it over 100s of students rather than 1 guy).
mlps can technically take any 1d array as input — unlike cnn’s, they don’t exploit spatial locality, so neighbors in the input don’t matter as much.
Step 0: prereqs and theory¶
an MLP is just a standard feedforward neural network. the input goes through fully connected layers (linear transformations) interleaved with non-linearities.
if you want a deeper understanding, check out Neural Networks from Scratch.
we will use pytorch from the start. first, import what we need:
Step 1: preparing the data¶
suppose we want to predict if a student passes (1
) or fails (0
) based on hours studied and hours slept. our dataset could look like this:
X = torch.tensor([
[5.0, 7.0],
[1.0, 2.0],
[10.0, 6.0],
[2.0, 8.0]
])
y = torch.tensor([
[1.0],
[0.0],
[1.0],
[0.0]
])
X
is our input (2 features per student), y
is the target, now with 4 students.
batch size is optional here, but for more data, wrap it in a DataLoader
like with cnn’s.
Step 2: building the MLP¶
an MLP consists of:
- fully connected (
Linear
) layers — these map inputs to outputs - activation functions — like ReLU or Sigmoid
The architecture looks like this:
here’s a minimal 1-hidden-layer MLP:
model = nn.Sequential(
nn.Linear(2, 4), # input layer → hidden layer
nn.ReLU(), # activation
nn.Linear(4, 1), # hidden layer → output layer
nn.Sigmoid() # squash into 0–1 probability
)
- input layer: 2 features → hidden layer: 4 neurons
- output layer: 1 neuron → probability of passing
you can test it with a dummy input:
Step 3: defining loss and optimizer¶
for binary classification, we can use BCELoss
(Binary Cross Entropy), basically cross entropy for two choices
learning rate is higher here because the network is tiny.
Step 4: the training loop¶
like cnn’s: forward → loss → backprop → update
for epoch in range(1000):
pred = model(X)
loss = loss_fn(pred, y)
opt.zero_grad()
loss.backward()
opt.step()
if epoch % 100 == 0:
print(f"epoch {epoch}, loss: {loss.item():.4f}")
after training, the model will output probabilities close to 0 or 1.
Step 5: saving and loading the model¶
saving weights:¶
inferencing later:¶
# define the same architecture
model = nn.Sequential(
nn.Linear(2, 4),
nn.ReLU(),
nn.Linear(4, 1),
nn.Sigmoid()
)
# load weights
model.load_state_dict(torch.load("mlp_student.pth"))
model.eval()
# predict new student
new_student = torch.tensor([[6.0, 5.0]])
with torch.no_grad():
prob = model(new_student).item()
print("pass probability:", prob)
Step 6: intuition/a way to think about it¶
- each layer linearly combines inputs via weights/biases → applies non-linearity → passes forward
- hidden layer “learns features” from raw input
- output layer maps to final prediction probability
Full code (put together)¶
import torch
import torch.nn as nn
import torch.optim as optim
# data
X = torch.tensor([[5.0, 7.0], [1.0, 2.0], [10.0, 6.0], [2.0, 8.0]])
y = torch.tensor([[1.0], [0.0], [1.0], [0.0]])
# model
model = nn.Sequential(
nn.Linear(2, 4),
nn.ReLU(),
nn.Linear(4, 1),
nn.Sigmoid()
)
# loss + optimizer
loss_fn = nn.BCELoss()
opt = optim.Adam(model.parameters(), lr=0.01)
# training
for epoch in range(1000):
pred = model(X)
loss = loss_fn(pred, y)
opt.zero_grad()
loss.backward()
opt.step()
if epoch % 100 == 0:
print(f"epoch {epoch}, loss: {loss.item():.4f}")
# save
torch.save(model.state_dict(), "mlp_student.pth")
print("model saved.")
# inference
model.load_state_dict(torch.load("mlp_student.pth"))
model.eval()
new_student = torch.tensor([[6.0, 5.0]])
with torch.no_grad():
print("pass probability:", model(new_student).item())