What is Batch Normalization In Deep Learning?
Batch Normalization is used to reduce the problem of internal covariate shift in neural networks. It works by normalizing the data within each mini-batch. This means it calculates the mean and variance of data in a batch and then adjusts the values so that they have similar range. After that it scales and shifts the values so that model learn effectively.
In traditional neural networks as the input data propagates through the network, the distribution of each layer's inputs changes. This phenomenon is known as internal covariate shift and it can slow down training process. Batch Normalization aims to reduce this issue by normalizing the inputs of each layer.
This process keeps the inputs to each layer of the network in a stable range even if the outputs of earlier layers change during training. As a result training becomes faster and more stable.
Need of Batch Normalization
Batch Normalization makes sure outputs of each layer stay steady as model learns. This helps model train faster and learn more effectively.
- Solves the problem of internal covariate shift.
- Makes training faster and more stable.
- Allows use of higher learning rates.
- Helps avoid vanishing or exploding gradients.
- Can act like a regularizer sometimes reduce the need for dropout.
Fundamentals of Batch Normalization
In this section we are going to discuss the steps taken to perform batch normalization.
Step 1: Compute the Mean and Variance of Mini-Batches
For mini-batch of activations
Step 2: Normalization
Each activation
Additionally a small constant
Step 3: Scale and Shift the Normalized Activations
The normalized activations
Batch Normalization in TensorFlow
In the code below we built a simple neural network using TensorFlow. We added Batch Normalization layer using tf.keras.layers.BatchNormalization(). This layer helps normalize the output or activations from the previous layer
import tensorflow as tf
# Define a simple model
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, input_shape=(784,)),
# Add Batch Normalization layer
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Activation('relu'),
tf.keras.layers.Dense(10),
tf.keras.layers.Activation('softmax')
])
# Compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
model.fit(x_train, y_train, epochs=5, batch_size=32)
Batch Normalization in PyTorch
In the following code we have build a simple neural network with batch normalization using PyTorch. We have define a subclass of 'nn.Module' and added the 'nn.BatchNorm1D' after the first fully connected layer to normalize the activations.
We have used 'nn.BatchNorm1D' as the input data is one-dimensional but for two-dimensional data like Convolutional Neural Networks 'BatchNorm2D' is used.
import torch
import torch.nn as nn
# Define a simple model
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.fc1 = nn.Linear(784, 64)
# Add Batch Normalization layer
self.bn = nn.BatchNorm1d(64)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(64, 10)
self.softmax = nn.Softmax(dim=1)
def forward(self, x):
x = self.fc1(x)
# Apply Batch Normalization
x = self.bn(x)
x = self.relu(x)
x = self.fc2(x)
x = self.softmax(x)
return x
# Instantiate the model
model = Model()
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
# Train the model
for epoch in range(5):
for inputs, labels in train_loader:
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
Benefits of Batch Normalization
- Faster Convergence: Batch Normalization reduces internal covariate shift, allowing for faster convergence during training.
- Higher Learning Rates: With Batch Normalization, higher learning rates can be used without the risk of divergence.
- Regularization Effect: Batch Normalization introduces a slight regularization effect that reduces the need for adding regularization techniques like dropout.
By normalizing the inputs to each layer Batch Normalization helps stabilize the learning process and allows for faster convergence, making training more effective and reducing the need for careful initialization and learning rate tuning.