
What is a Fully Connected Neural Network?
A Fully Connected Neural Network (FCNN), also known as a dense neural network, is one of the simplest yet most powerful architectures in deep learning. It consists of layers where every neuron in one layer is connected to every neuron in the next layer. These networks are versatile and can approximate complex functions, making them ideal for tasks such as classification, regression, and more.
Here’s how it works:
- The input layer receives data (like images, numbers, or text embeddings).
- Hidden layers perform computations, transforming the input into higher-level features.
- The output layer produces the final predictions (e.g., classifying digits, predicting a house price, etc.).
In this tutorial, you’ll learn how to code a fully connected neural network in PyTorch and train it to classify handwritten digits from the MNIST dataset. We’ll keep things fun and interactive, exploring every step of the way!
What Will You Achieve in This Tutorial?
By the end of this tutorial, you will:
- Understand Fully Connected Networks: Learn the logic behind how these layers work and what makes them different from CNNs.
- Build Your First Fully Connected Neural Network: Create a custom architecture step by step using PyTorch.
- Train the Model to Classify MNIST Digits: Watch your network come to life as it learns to predict handwritten numbers.
- Analyze Results: See how well your model performs and visualize how it improves with training.
This is going to be an exciting deep dive! You’ll see how raw numerical data transforms into predictions, and by the end, you’ll feel empowered to build your own custom networks for any task.
in!
1. Set Up Your Environment
First, we’ll create a Python virtual environment to keep everything clean and organized.
Step 1.1: Create a Virtual Environment
- Open your terminal or command prompt.
- Navigate to your project folder or create a new one:
mkdir fully_connected_nn && cd fully_connected_nn
3. Create a virtual environment :
python -m venv venv
4. Activate the virtual environment:
- Windows:
venv\Scripts\activate
- Mac/Linux:
source venv/bin/activate
🧹 Why? A virtual environment keeps your project dependencies isolated so it doesn’t mess up other Python projects.
Step 1.2: Install Required Libraries
Now, let’s install the libraries we need. Inside your activated environment, run: refer to pytorch website below
https://pytorch.org/get-started/locally/
pip install torch torchvision tqdm
- torch: The main PyTorch library.
- torchvision: For datasets and data transformations.
- tqdm: For pretty progress bars.
2. Open Jupyter Notebook
If you’re using Jupyter Notebook, make sure it’s installed. You can install it with:
pip install notebook
Then, start Jupyter Notebook in your project directory:
jupyter notebook
Create a new notebook, and let’s get coding! 🎉
3. Explaining the code step by step
0. Import the Libraries
The first step in your notebook is to import all the libraries we’ll need.
# Imports
import torch
import torch.nn.functional as F
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torch import optim
from torch import nn
from torch.utils.data import DataLoader
from tqdm import tqdm
🧠 Think of this as bringing your tools to the workbench before starting your project!
torch: The main PyTorch library for tensor operations.
torch.nn: Contains modules and classes for building neural networks.
torch.optim: Provides optimization algorithms.
torchvision: A library for image processing and datasets.
tqdm: A library for progress bars in loops.
1. Defining the Neural Network Architecture
We start by defining our neural network class, NN, which inherits from nn.Module.
class NN(nn.Module):
def __init__(self, input_size, num_classes):
super(NN, self).__init__()
self.fc1 = nn.Linear(input_size, 50) # First fully connected layer
self.fc2 = nn.Linear(50, num_classes) # Second fully connected layer
def forward(self, x):
x = F.relu(self.fc1(x)) # Apply ReLU activation after the first layer
x = self.fc2(x) # Output layer
return x
Explanation:
- __init__** Method**: This initializes the network’s layers.
- input_size: The number of input features (for MNIST, it’s 784, corresponding to 28x28 pixel images).
- num_classes: The number of output classes (for MNIST, it’s 10, representing digits 0-9).
- nn.Linear: Fully connected layers that transform inputs to outputs.
- — The first layer reduces the input from 784 to 50 dimensions.
- — The second layer maps the 50 dimensions to 10 output classes.
- forward** Method**: Defines how data flows through the network.
- ReLU Activation: Adds non-linearity to help the network learn complex patterns.
2. Setting Up the Device
We need to specify whether to use a GPU or CPU for computations:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
Explanation:
- torch.device: Checks if a GPU is available and sets it as the computation device. Using a GPU can significantly speed up training. If no GPU is available, it defaults to the CPU.
3. Defining Hyperparameters
These are the settings that control how the network is trained:
input_size = 784
num_classes = 10
learning_rate = 0.001
batch_size = 64
num_epochs = 3
Explanation:
- input_size: Number of input features (28x28 flattened into 784).
- num_classes: Number of output classes (digits 0-9).
- learning_rate: Controls how much the model updates weights with each step.
- batch_size: Number of training samples processed in one iteration.
- num_epochs: Number of complete passes through the dataset.
4. Loading the MNIST Dataset
We’ll use PyTorch’s torchvision library to load the MNIST dataset:
train_dataset = datasets.MNIST(
root="dataset/", train=True, transform=transforms.ToTensor(), download=True
)
test_dataset = datasets.MNIST(
root="dataset/", train=False, transform=transforms.ToTensor(), download=True
)
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=True)
Explanation:
- datasets.MNIST: Downloads and loads the MNIST dataset.
- transform=transforms.ToTensor(): Converts images into PyTorch tensors for computation.
- DataLoader: Creates iterable datasets, enabling us to loop through data in batches.
5. Initializing the Model
Let’s create an instance of our neural network:
model = NN(input_size=input_size, num_classes=num_classes).to(device)
Explanation:
- Model Initialization: Creates the network based on the architecture defined earlier.
- .to(device): Moves the model to the specified computation device (GPU or CPU).
6. Defining the Loss Function and Optimizer
We need to measure the model’s error and update its weights:
criterion = nn.CrossEntropyLoss() # Loss function for multi-class classification
optimizer = optim.Adam(model.parameters(), lr=learning_rate) # Adam optimizer
Explanation:
- nn.CrossEntropyLoss: Suitable for multi-class classification problems.
- optim.Adam: An optimization algorithm that adapts the learning rate for each parameter, making training efficient and fast.
7. Training the Network
Here’s the training loop where the magic happens:
for epoch in range(num_epochs):
for batch_idx, (data, targets) in enumerate(tqdm(train_loader)):
data = data.to(device=device)
targets = targets.to(device=device)
data = data.reshape(data.shape[0], -1) # Flatten the images
# Forward pass
scores = model(data)
loss = criterion(scores, targets)
# Backward pass
optimizer.zero_grad() # Clear previous gradients
loss.backward() # Backpropagation
optimizer.step() # Update weights
The result of this cell looks like”

Explanation:
- Epoch Loop: Runs the training process for a specified number of epochs.
- Batch Loop: Processes data in batches:
- Moves data and targets to the computation device.
- Flattens images into 1D tensors (28x28 to 784).
- Performs a forward pass through the network to get predictions.
- Calculates loss by comparing predictions with actual labels.
- Performs backpropagation to calculate gradients.
- Updates weights using the optimizer.
8. Checking Model Accuracy
After training, we’ll evaluate the model’s performance:
def check_accuracy(loader, model):
num_correct = 0
num_samples = 0
model.eval() # Set the model to evaluation mode
with torch.no_grad(): # Disable gradient tracking
for x, y in loader:
x = x.to(device=device)
y = y.to(device=device)
x = x.reshape(x.shape[0], -1) # Flatten the images
scores = model(x) # Forward pass
_, predictions = scores.max(1) # Get the predicted class
num_correct += (predictions == y).sum() # Count correct predictions
num_samples += predictions.size(0) # Count total samples
model.train() # Set the model back to training mode
return num_correct / num_samples # Return accuracy
Explanation:
- check_accuracy** Function**: Evaluates the model’s accuracy on a given dataset.
- Evaluation Mode: Disables training-specific behaviors like dropout.
- No Gradients: Saves memory and speeds up computations during evaluation.
9. Displaying Results
Finally, let’s see how well our model performs:
print(f"Accuracy on training set: {check_accuracy(train_loader, model)*100:.2f}%")
print(f"Accuracy on test set: {check_accuracy(test_loader, model)*100:.2f}%")
Accuracy on training set: 96.22%
Accuracy on test set: 95.70%
Explanation:
- Training Accuracy: Measures performance on data the model has seen.
- Test Accuracy: Evaluates performance on unseen data.
10. Testing the Model with a Single Image
After evaluating the overall accuracy, it’s time to test the model on individual images. This section demonstrates how to visualize an image from the test dataset, pass it through the trained model, and view the predicted label compared to the actual label.
import matplotlib.pyplot as plt
# Extract an image from the test dataset
image, label = test_dataset[0] # Get the first image and its label
image = image.to(device) # Move image to the same device as the model
image_flattened = image.view(-1, 28*28) # Flatten the image to match input size
# Display the image
plt.imshow(image.cpu().squeeze(), cmap="gray")
plt.title(f"Actual Label: {label}")
plt.show()
# Pass the image through the model
model.eval() # Set the model to evaluation mode
with torch.no_grad():
output = model(image_flattened) # Forward pass
predicted_label = output.argmax(dim=1).item() # Get the predicted label
print(f"Predicted Label: {predicted_label}")
Explanation:
- Importing matplotlib: As before, matplotlib.pyplot is used to visualize the images. In this case, we display an image from the test dataset to see how well the model performs on individual examples.
- Extracting the Image: We retrieve the first image from the test_dataset. This gives us both the image and the true label. The image is a tensor representing the pixel values, and the label is the correct class for that image.
- Moving the Image to the Device: We move the image tensor to the same device (GPU or CPU) where the model resides. This ensures the image and model are aligned for processing.
- Flattening the Image: The MNIST images are 28x28 pixels, but our model expects a 1D vector of size 784. Thus, we flatten the image using image.view(-1, 28*28) to reshape it into a single vector.
- Displaying the Image: The plt.imshow() function is used to visualize the image. The cmap="gray" argument ensures that the image is displayed in grayscale, as MNIST images are black-and-white. We use squeeze() to remove any singleton dimensions, ensuring the image is properly visualized.
- Evaluating the Image with the Model: We set the model to evaluation mode using model.eval(). This is important because, during evaluation, behaviors like dropout are turned off, ensuring the model performs consistently.
- Making a Prediction: Inside a torch.no_grad() context (which disables gradient computation to save memory and computation), we pass the flattened image through the model using model(image_flattened). The output is a tensor of raw scores (logits) for each class. We use output.argmax(dim=1) to get the index of the maximum score, which represents the predicted class.
- Displaying the Prediction: Finally, we print the predicted label, which represents the class with the highest probability according to the model.
The output of the code above is as follows:

'Pytorch Beginner' 카테고리의 다른 글
PyTorch_002_Implementing a Graph Convolutional Network (GCN) Layer (0) | 2025.02.17 |
---|---|
threading_001: Introduction to Python Threads (0) | 2025.02.09 |