import matplotlib.pyplot as plt
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz 11490434/11490434 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
some_images = x_train[:12]
some_labels = y_train[:12]
plt.figure(figsize=(10,4))
for i in range(12):
plt.subplot(2,6,i+1)
plt.imshow(some_images[i],cmap='grey')
plt.axis('off')
plt.title(f"Label: {some_labels[i]}")
| Image Type | Channels | Shape Example |
|---|---|---|
| Grayscale | 1 | (28, 28, 1) |
| RGB | 3 | (28, 28, 3) |
x_train = x_train.astype('float32')/255
x_test = x_test.astype('float32')/255
x_train = x_train.reshape(-1,28,28,1)
x_test = x_test.reshape(-1,28,28,1)
print("Train shape:", x_train.shape)
print("Test shape:", x_test.shape)
Train shape: (60000, 28, 28, 1) Test shape: (10000, 28, 28, 1)
from keras.utils import to_categorical
print("y_train shape before:", y_train.shape)
y_train = to_categorical(y_train,10)
y_test = to_categorical(y_test,num_classes=10)
print("y_train shape after:", y_train.shape)
print("y_test shape after:", y_test.shape)
print(y_train,y_test)
y_train shape before: (60000,) y_train shape after: (60000, 10) y_test shape after: (10000, 10) [[0. 0. 0. ... 0. 0. 0.] [1. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] ... [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 1. 0.]] [[0. 0. 0. ... 1. 0. 0.] [0. 0. 1. ... 0. 0. 0.] [0. 1. 0. ... 0. 0. 0.] ... [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.] [0. 0. 0. ... 0. 0. 0.]]
CNN Layers and Their Parameters¶
This note explains Conv2D, MaxPooling2D, Flatten, and Dense layers used in CNNs, with clear parameter meanings and reasoning.
Conv2D¶
from tensorflow.keras.layers import Conv2D
Conv2D(filters, kernel_size, activation=None, input_shape=None, padding='valid', strides=(1,1))
- filters: Number of output channels (feature maps), e.g., 32 or 64. More filters capture more patterns.
- kernel_size: Size of the filter, e.g., (3,3). Controls the receptive field.
- activation: Activation function, e.g., 'relu'.
- input_shape: Shape of the input, only needed in the first layer. For MNIST: (28,28,1).
- padding: 'valid' (no padding, reduces dimensions) or 'same' (pads to keep dimensions).
- strides: Step size of the filter. Default is (1,1).
Purpose: Extract spatial features from images.
MaxPooling2D¶
from tensorflow.keras.layers import MaxPooling2D
MaxPooling2D(pool_size=(2,2), strides=None, padding='valid')
- pool_size: Size of the pooling window, e.g., (2,2).
- strides: Step size, defaults to pool_size.
- padding: 'valid' or 'same'.
Purpose: Downsamples feature maps, reducing computation and adding translation invariance.
Flatten¶
from tensorflow.keras.layers import Flatten
Flatten()
Purpose: Flattens multi-dimensional feature maps into a 1D vector for Dense layers.
Dense¶
from tensorflow.keras.layers import Dense
Dense(units, activation=None)
- units: Number of neurons, e.g., 128 or 10.
- activation: Activation function, e.g., 'relu' or 'softmax'.
Purpose: Fully connected layer for learning high-level patterns and producing output probabilities.
Why 32, 64, 128?¶
- 32, 64 filters in Conv2D: Allow capturing increasingly complex patterns as depth increases.
- 128 units in Dense: Provides enough capacity to learn meaningful combinations of features.
- 10 units in final Dense with softmax: For 10-class digit classification in MNIST.
Example MNIST CNN¶
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
model = Sequential([
Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
MaxPooling2D((2,2)),
Conv2D(64, (3,3), activation='relu'),
MaxPooling2D((2,2)),
Flatten(),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])
This architecture:
- Learns feature hierarchies (edges, shapes, patterns).
- Reduces spatial dimensions while increasing feature depth.
- Outputs class probabilities for digit classification.
Summary Table¶
| Layer | Key Parameters | Purpose |
|---|---|---|
| Conv2D | filters, kernel_size | Extract spatial features |
| MaxPooling2D | pool_size | Downsample feature maps |
| Flatten | - | Reshape for Dense layers |
| Dense | units, activation | Learn high-level patterns |
This single file provides the practical, structured reference you need while working with CNNs in your MNIST and general image classification projects.
from keras.layers import Dense,Conv2D,MaxPool2D,Flatten, BatchNormalization,Dropout
from keras.models import Sequential
model = Sequential([
Conv2D(32,(3,3),activation='relu',input_shape=(28,28,1)),
MaxPool2D(pool_size=(2,2)),
Conv2D(64,(3,3),activation='relu'),
MaxPool2D(pool_size=(2,2)),
Flatten(),
Dense(512,activation='relu'),
BatchNormalization(),
Dropout(0.5),
Dense(10,activation='softmax')
]
)
/usr/local/lib/python3.11/dist-packages/keras/src/layers/convolutional/base_conv.py:107: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead. super().__init__(activity_regularizer=activity_regularizer, **kwargs)
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ conv2d (Conv2D) │ (None, 26, 26, 32) │ 320 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d (MaxPooling2D) │ (None, 13, 13, 32) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ conv2d_1 (Conv2D) │ (None, 11, 11, 64) │ 18,496 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ max_pooling2d_1 (MaxPooling2D) │ (None, 5, 5, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ flatten (Flatten) │ (None, 1600) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense (Dense) │ (None, 512) │ 819,712 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ batch_normalization │ (None, 512) │ 2,048 │ │ (BatchNormalization) │ │ │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout (Dropout) │ (None, 512) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_1 (Dense) │ (None, 10) │ 5,130 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 845,706 (3.23 MB)
Trainable params: 844,682 (3.22 MB)
Non-trainable params: 1,024 (4.00 KB)
| Term | Meaning |
|---|---|
| Total params | All weights, biases, statistics in the model |
| Trainable params | Weights and biases updated during training |
| Non-trainable params | Typically moving averages in BatchNorm, not updated by gradients |
history = model.compile(loss='categorical_crossentropy',optimizer="adam",metrics=['accuracy'])
model.fit(x_train,y_train,batch_size=128,epochs=2,validation_data=(x_test,y_test),verbose=1)
Epoch 1/2 469/469 ━━━━━━━━━━━━━━━━━━━━ 39s 80ms/step - accuracy: 0.9259 - loss: 0.2405 - val_accuracy: 0.9824 - val_loss: 0.0639 Epoch 2/2 469/469 ━━━━━━━━━━━━━━━━━━━━ 39s 84ms/step - accuracy: 0.9853 - loss: 0.0465 - val_accuracy: 0.9859 - val_loss: 0.0436
<keras.src.callbacks.history.History at 0x7dd965746c10>
| Term | Meaning |
|---|---|
| Validation Data | Portion of data used to monitor performance during training |
| validation_split=0.1 | Use 10% of training data for validation automatically |
| Purpose | Check generalization, detect overfitting, tune hyperparameters |
score = model.evaluate(x_test,y_test)
print(score)
313/313 ━━━━━━━━━━━━━━━━━━━━ 2s 7ms/step - accuracy: 0.9831 - loss: 0.0536 [0.0436282679438591, 0.9858999848365784]
| Item | Meaning |
|---|---|
loss |
Cross-entropy loss on the test set |
accuracy |
Classification accuracy on the test set |
313/313 |
Batches processed during evaluation |
98.31% (bar) vs 98.59% (result) |
Small difference due to computation reporting timing |
print('Test Accuracy: %.2f'%score[1])
Test Accuracy: 0.99
Data Augmentation in Deep Learning¶
What is Data Augmentation?¶
Data augmentation is the process of generating new training samples by modifying existing data while preserving labels.
Example: Rotating, flipping, shifting, or zooming images to create new variations.
Why Use Data Augmentation?¶
- Prevent Overfitting: Adds variability, reducing memorization of training data.
- Improve Generalization: Prepares the model for real-world variations.
- Expand Small Datasets: Increases effective dataset size without extra collection.
How is it Applied?¶
Using utilities like ImageDataGenerator in Keras:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=10,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.1,
horizontal_flip=True
)
Explanation of Parameters¶
rotation_range=10: Randomly rotates images within ±10 degrees.width_shift_range=0.1: Shifts images horizontally by 10%.height_shift_range=0.1: Shifts images vertically by 10%.zoom_range=0.1: Zooms images in/out by 10%.horizontal_flip=True: Randomly flips images horizontally.rescale=1./255: Normalizes pixel values to [0, 1].
Using During Training¶
train_generator = datagen.flow(x_train, y_train, batch_size=64)
model.fit(train_generator, epochs=10, ...)
This allows the model to receive dynamically augmented batches, improving learning robustness.
Common Transformations¶
- Rotation
- Width/Height Shift
- Zoom
- Shear
- Horizontal/Vertical Flip
- Brightness/Contrast Adjustment
- Noise Injection
Benefits of Data Augmentation¶
- Improves model robustness and validation performance.
- Reduces overfitting risks.
- Helps with real-world deployment by preparing the model for varied input conditions.
Summary Table¶
| Aspect | Description |
|---|---|
| Purpose | Create variations for better learning |
| When Used | During training only |
| Tools | ImageDataGenerator, TensorFlow/PyTorch transforms |
| Impact | Improved generalization, reduced overfitting |
Data augmentation is a standard, practical tool for enhancing deep learning pipelines, especially in image classification workflows.
Key Parameters of fit_generator¶
| Parameter | Description |
|---|---|
| generator | The Python generator/iterator yielding (x_batch, y_batch) |
| steps_per_epoch | Number of batches to draw from the generator before declaring one epoch finished |
| epochs | Number of epochs to train |
| validation_data | Generator or tuple (x_val, y_val) for validation |
| validation_steps | Number of batches to draw from validation_data per epoch |
| callbacks | List of Keras callbacks (e.g., EarlyStopping, ModelCheckpoint) |
| class_weight | Optional dictionary mapping class indices to weights |
| max_queue_size | Maximum size for the generator queue |
| workers | Number of worker threads for data loading |
| use_multiprocessing | Whether to use multiprocessing for data loading |
| shuffle | Whether to shuffle the order of batches at each epoch |
from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_generator = ImageDataGenerator(rotation_range=7,width_shift_range=0.05, shear_range=0.05,height_shift_range=0.07,zoom_range=0.05)
test_generator = ImageDataGenerator()
train_generator = train_generator.flow(x_train,y_train,batch_size=64)
test_generator = test_generator.flow(x_test,y_test,batch_size=64)
import numpy as np
images, labels = next(train_generator)
plt.figure(figsize=(12, 6))
for i in range(12):
plt.subplot(3, 4, i + 1)
img = images[i].reshape(28, 28)
label = np.argmax(labels[i])
plt.imshow(img, cmap='gray')
plt.title(f"Label: {label}")
plt.axis('off')
plt.tight_layout()
model.fit(train_generator,steps_per_epoch=60000//64, epochs=5,validation_data=test_generator,validation_steps = 10000//64)
Epoch 1/5
/usr/local/lib/python3.11/dist-packages/keras/src/trainers/data_adapters/py_dataset_adapter.py:121: UserWarning: Your `PyDataset` class should call `super().__init__(**kwargs)` in its constructor. `**kwargs` can include `workers`, `use_multiprocessing`, `max_queue_size`. Do not pass these arguments to `fit()`, as they will be ignored. self._warn_if_super_not_called()
937/937 ━━━━━━━━━━━━━━━━━━━━ 50s 53ms/step - accuracy: 0.9664 - loss: 0.1043 - val_accuracy: 0.9818 - val_loss: 0.0618 Epoch 2/5 1/937 ━━━━━━━━━━━━━━━━━━━━ 34s 37ms/step - accuracy: 0.9375 - loss: 0.1369
/usr/local/lib/python3.11/dist-packages/keras/src/trainers/epoch_iterator.py:107: UserWarning: Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches. You may need to use the `.repeat()` function when building your dataset. self._interrupted_warning()
937/937 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.9375 - loss: 0.1369 - val_accuracy: 0.9816 - val_loss: 0.0630 Epoch 3/5 937/937 ━━━━━━━━━━━━━━━━━━━━ 82s 87ms/step - accuracy: 0.9793 - loss: 0.0636 - val_accuracy: 0.9905 - val_loss: 0.0306 Epoch 4/5 937/937 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - accuracy: 0.9844 - loss: 0.1673 - val_accuracy: 0.9900 - val_loss: 0.0311 Epoch 5/5 937/937 ━━━━━━━━━━━━━━━━━━━━ 104s 50ms/step - accuracy: 0.9826 - loss: 0.0542 - val_accuracy: 0.9888 - val_loss: 0.0331
<keras.src.callbacks.history.History at 0x7dd94c861ad0>
CNN Architecture: Rules of Thumb¶
This note provides clear, practical guidelines for structuring CNNs effectively in image classification tasks.
1. Convolutional Layers (Conv2D)¶
- Use small filters, typically
(3,3). - Start with fewer filters (e.g., 32), increasing in deeper layers (e.g., 64, 128).
- Doubling filters as you go deeper helps capture increasingly complex patterns.
Example:
Conv2D(32, (3,3))
Conv2D(64, (3,3))
Conv2D(128, (3,3))
2. MaxPooling Layers (MaxPooling2D)¶
- Use after 1-2 Conv layers to reduce spatial dimensions.
- Common pooling window:
(2,2). - Reduces computation while preserving important features.
- Do not overuse to avoid excessive spatial information loss.
Example:
MaxPooling2D(pool_size=(2,2))
3. Batch Normalization (BatchNormalization)¶
- Use after Conv and Dense layers to stabilize and speed up training.
- Helps use higher learning rates and reduces sensitivity to initialization.
- Optional but beneficial for deeper models.
Example:
BatchNormalization()
4. Dropout (Dropout)¶
- Helps prevent overfitting by randomly dropping neurons during training.
- Typical rates:
0.25after Conv layers (if needed).0.5after Dense layers before the output layer.
Example:
Dropout(0.5)
5. Flatten Layer (Flatten)¶
- Converts 3D feature maps
(height, width, channels)into a 1D vector for Dense layers.
Example:
Flatten()
6. Dense Layers (Dense)¶
- Add 1–2 Dense layers after convolutional layers:
- One with
128or512neurons, usingrelu. - Final Dense layer with
num_classesneurons andsoftmaxfor classification.
- One with
Example:
Dense(128, activation='relu')
Dense(10, activation='softmax')
7. Activation Functions¶
- Use
'relu'for hidden Conv and Dense layers. - Use
'softmax'for the final output layer in classification tasks.
Example CNN Structure¶
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization
model = Sequential([
Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
MaxPooling2D((2,2)),
Conv2D(64, (3,3), activation='relu'),
MaxPooling2D((2,2)),
Flatten(),
Dense(128, activation='relu'),
Dropout(0.5),
Dense(10, activation='softmax')
])
Recap of Rules¶
- Start with Conv layers: 32, 64, 128 filters.
- Add MaxPooling every 1-2 Conv layers.
- Optionally use BatchNormalization for stability.
- Use Dropout to prevent overfitting.
- Flatten before Dense layers.
- Use Dense layers for final classification.
- Use
'relu'for hidden layers,'softmax'for output.
Following these structured guidelines will help you design stable, effective CNN models for your MNIST and general image classification projects.
model.save("mnist_cnn.h5")
WARNING:absl:You are saving your model as an HDF5 file via `model.save()` or `keras.saving.save_model(model)`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')` or `keras.saving.save_model(model, 'my_model.keras')`.
from google.colab import files
files.download("mnist_cnn.h5")