CNN on CIFAR-10 data set

7 min readMar 25, 2021

Cifar 10 data set is a collection of different images which is commonly used for computer vision and machine learning algorithms. It was collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. It is a subset of million the 80 million tiny images dataset and consists of 60,000 32x32 color images containing one of 10 object classes, with 6000 images per class. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.

we are going to develop a deep learning model in pytorch to explore this dataset and to build a classifier.

What is Convolutional Neural Networks?

The name “convolutional neural network” indicates that the network employs a mathematical operation called convolution. Convolutional networks are a specialized type of neural networks that use convolution in place of general matrix multiplication in at least one of their layers.

what is a Convolution?

Convolution is mathematical concept of moving window operation across given image to extract features . Convolution is used to reduce an complex image to simpler one which is easier to process. We apply any specific operation overall the image while moving window of particular size through the image, which results in simpler image.

In the above example, the filter applied is 0.5. After applying the operation, the size of the image is reduced from 5*5 to 4*4. In this manner, the window slides through out the image and extract the features of images by updating weights of the layer.

what is a Maxpooling?

Maxpooling is another sliding window technique which is used to downsample the image. No weights is involved in this layer.

In the above example, a 2*2 window of maxpooling is applied. The window slides through the image and extracts the maximum value of the window then replaces the window with the extracted value. After performing maxpooling operation the image size is reduced from 6*6 to 3*3.

Training an image classifier

We will do the following steps in order:

Importing Libraries
Define a Convolutional Neural Network
Define a loss function
Train the network on the training data
Test the network on the test data

Importing libraries

This library is part of the PyTorch project. PyTorch is an open source machine learning framework.The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision.

Transforms are used forcommon image transformations.

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

Loading Data

The output of torchvision datasets are PILImage images of range [0, 1]. So, We transform them to Tensors of normalized range [-1, 1].

Then we load the data into trainloader and testloader with size of 4.

Above we show sample data.

Defining Convolution Neural Network

Each image in cifar data set is of 3*32*32 size. So,

The input of conv1 layer is 3*32*32,output is 16 and kernel size of 3. After the image passed through this layer, it would be 16*30*30. After conv2 layer, image would be 32*28*28.

Then the image is passed through maxpool layer which results the image in 32*14*14.

After conv3 layer with size 3 and maxpool layer, image would be 64*6*6.

After conv4 layer with size 3 and maxpool layer, image would be 128*2*2.

Then we use view function to flatten the image to 2*2*128.

Later the image is passed through 3 normal layers with neurons of 128, 256 and 10 respectively.

Loss Function and Optimizer

Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1.

Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties.

Training Data

Above we simply have to loop over our data iterator, and feed the inputs to the network and optimize.

Output above would be

Here is a plot between Loss vs no of epochs.

We can see that the loss gradually decreased as number of epochs increased.

Testing Data

Above we tested the model with test data.

Above we calculated the accuracy of the model.

Here we calculated the accuracy of each class.

References

Training a Classifier - PyTorch Tutorials 1.8.0 documentation

Now you might be thinking, Generally, when you have to deal with image, text, audio or video data, you can use standard…

pytorch.org

A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way

Artificial Intelligence has been witnessing a monumental growth in bridging the gap between the capabilities of humans…

towardsdatascience.com

Contribution

I have done this project as part of my assignment. My original goal was to improve accuracy score of the network.

Accuracy of the given network of pytorch is 56%.

And Accuracy for each label for given network is as follows:

Accuracy of plane : 65 %
Accuracy of   car : 68 %
Accuracy of  bird : 33 %
Accuracy of   cat : 30 %
Accuracy of  deer : 46 %
Accuracy of   dog : 57 %
Accuracy of  frog : 70 %
Accuracy of horse : 62 %
Accuracy of  ship : 66 %
Accuracy of truck : 66 %

Accuracy of my network is 73%

And Accuracy for each label for my network is as follows:

Accuracy of plane : 81%
Accuracy of   car : 88%
Accuracy of  bird : 70%
Accuracy of   cat : 54%
Accuracy of  deer : 75%
Accuracy of   dog : 55%
Accuracy of  frog : 71%
Accuracy of horse : 78%
Accuracy of  ship : 81%
Accuracy of truck : 75%

To achieve this I have changed network topology. Initially network was of 2 convolution layers with sliding window of size 5 and 3 layers of normal layer with 16*5*5,120,84 neurons.

I have experimented with different combination of convolution layers with different window sizes and different neurons in normal layers.

Then I end up with network of 4convolution layers with 3 window size and 3 normal layers with 128*2*2,128,256 neurons respectively. And also I have changed number of epochs from 2 to 10.

Challenge

Getting better accuracy was a challenge for me. I tried different combinations of convolution and normal layers to get accuracy in less run time . Finally, I overcame this challenge by changing topology of network to 4 convolution layers and by changing number of epochs to 10. By doing this with limited convolution layers and neuron layers I achieved better accuracy with in less run time. In experiments section, I have attached the images of my different topology networks and plots of my experiments.