Article Review: Multi-Category Classification with CNN

Classification of Multi-Category Images Using Deep Learning: A Convolutional Neural Network Model

In this article, the article ‘Classifying multi-category images using Deep Learning: A Convolutional Neural Network Model’ presented in India in 2017 by Ardhendu Bandhu, Sanjiban Sekhar Roy is being reviewed. An image classification model using a convolutional neural network is presented with TensorFlow. TensorFlow is a popular open-source library for machine learning and deep neural networks. A multi-category image dataset was considered for classification. Traditional back propagation neural network; has an input layer, hidden layer, and an output. A convolutional neural network has a convolutional layer and a maximum pooling layer. We train this proposed classifier to calculate the decision boundary of the image dataset. Real-world data is mostly untagged and unstructured. This unstructured data can be an image, audio, and text data. Useful information cannot be easily derived from neural networks that are shallow, meaning they are those with fewer hidden layers. A deep neural network-based CNN classifier is proposed, which has many hidden layers and can obtain meaningful information from images.

Keywords: Image, Classification, Convolutional Neural Network, TensorFlow, Deep Neural Network.

First of all, let’s examine what classification is so that we can understand the steps laid out in the project. Image Classification refers to the function of classifying images from a multi-class set of images. To classify an image dataset into multiple classes or categories, there must be a good understanding between the dataset and classes.

In this article;

  1. Convolutional Neural Network (CNN) based on deep learning is proposed to classify images.
  2. The proposed model achieves high accuracy after repeating 10,000 times within the dataset containing 20,000 images of dogs and cats, which takes about 300 minutes to train and validate the dataset.

In this project, a convolutional neural network consisting of a convolutional layer, RELU function, a pooling layer, and a fully connected layer is used. A convolutional neural network is an automatic choice when it comes to image recognition using deep learning.

Convolutional Neural Network

For classification purposes, it has the architecture as the convolutional network [INPUT-CONV-RELU-POOL-FC].

INPUT- Raw pixel values as images.

CONV- Contents output in the first cluster of neurons.

RELU- It applies the activation function.

POOL- Performs downsampling.

FC- Calculates the class score.

In this publication, a multi-level deep learning system for picture characterization is planned and implemented. Especially the proposed structure;

1) The picture shows how to find nearby neurons that are discriminatory and non-instructive for grouping problem.

2) Given these areas, it is shown how to view the level classifier.


A data set containing 20,000 dog and cat images from the Kaggle dataset was used. The Kaggle database has a total of 25000 images available. Images are divided into training and test sets. 12,000 images are entered in the training set and 8,000 images in the test set. Split dataset of the training set and test set helps cross-validation of data and provides a check over errors; Cross-validation checks whether the proposed classifier classifies cat or dog images correctly.

The following experimental setup is done on Spyder, a scientific Python development environment.

  1. First of all, Scipy, Numpy, and Tensorflow should be used as necessary.
  2. A start time, training path, and test track must be constant. Image height and image width were provided as 64 pixels. The image dataset containing 20,000 images is then loaded. Due to a large number of dimensions, it is resized and iterated. This period takes approximately 5-10 minutes.
  3. This data is fed by TensorFlow. In TensorFlow, all data is passed between operations in a calculation chart. Properties and labels must be in the form of a matrix for the tensors to easily interpret this situation.
  4. Tensorflow Prediction: To call data within the model, we start the session with an additional argument where the name of all placeholders with the corresponding data is placed. Because the data in TensorFlow is passed as a variable, it must be initialized before a graph can be run in a session. To update the value of a variable, we define an update function that can then run.
  5. After the variables are initialized, we print the initial value of the state variable and run the update process. After that, the rotation of the activation function, the choice of the activation function has a great influence on the behavior of the network. The activation function for a specific node is an input or output of the specific node provided a set of inputs.
  6. Next, we define the hyperparameters we will need to train our login features. In more complex neural networks, we encounter more hyperparameters. Some of our hyperparameters can be like the learning rate.
    Another hyperparameter is the number of iterations we train our data. The next hyperparameter is the batch size, which chooses the size of the set of images to send for classification at one time.
  7. Finally, after all that, we start the TensorFlow session, which makes TensorFlow work, because without starting a session a TensorFlow won’t work. After that, our model will start the training process.


🖇 As deep architecture, we used a convolutional neural network and also implemented the TensorFlow deep learning library. The experimental results below were done on Spyder, a scientific Python development environment. 20,000 images were used and the batch is fixed at 100 for you.

🖇 It is essential to examine the accuracy of the models in terms of test data rather than training data. To run the convolutional neural network using TensorFlow, the Windows 10 machine was used, the hardware was specified to have an 8GB of RAM with the CPU version of TensorFlow.

📌 As the number of iterations increases, training accuracy increases, but so does our training time. Table 1 shows the number line with the accuracy we got.

Number of iterations vs AccuracyThe graph has become almost constant after several thousand iterations. Different batch size values can lead to different results. We set a batch size value of 100 for images.

✨ In this article, a high accuracy rate was obtained in classifying images with the proposed method. The CNN neural network was implemented using TensorFlow. It was observed that the classifier performed well in terms of accuracy. However, a CPU based system was used. So the experiment took extra training time, if a GPU-based system was used, training time would be shortened. The CNN model can be applied in solving the complex image classification problem related to medical imaging and other fields.


  2. Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. science, 313(5786), 504- 507.
  3. Yoshua Bengio, “Learning Deep Architectures for AI”, Dept. IRO, Universite de Montreal C.P. 6128, Montreal, Qc, H3C 3J7, Canada, Technical Report 1312.
  4. Yann LeCun, Yoshua Bengio & Geoffrey Hinton, “Deep learning “, NATURE |vol 521 | 28 May 2015
  5. Yicong Zhou and Yantao Wei, “Learning Hierarchical Spectral–Spatial Features for Hyperspectral Image Classification”, IEEE Transactions on cybernetics, Vol. 46, No.7, July 2016.