Saving The World Just Got Easier

At the Energy Impact Center (EIC) we are committed to finding accelerated pathways to decarbonize the global economy by 2040. It is not enough to clean the electricity sector; future energy…

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转

Building a Convolutional Neural Network for Image Classification with Tensorflow

Convolutional Neural Network (CNN) is a special type of deep neural network that performs well in computer vision problems such as image classification, object detection, etc.

This article illustrates how to create an image classifier with Tensorflow by implementing a CNN to classify cats & dogs..

With traditional programming, it’s not possible to build scalable solutions for problems like computer vision since it isn’t feasible to write an algorithm that is generalized enough to identify the nature of images.

With machine learning however, we can build an approximation that is sufficient enough to used by training a model for to predict unseen data.

CNN is constructed with multiple convolution layers, pooling layers, and dense layers.

The idea of the convolution layer is to transform the input image to extract features (ex. ears, nose, legs of cats & dogs) to distinguish them correctly. This is done by convolving the image with a kernel. A kernel is specialized to extract certain features. It’s possible to apply multiple kernels to a single image to capture multiple features.

How kernel is applied to an image to extract features

Usually, an activation function (ex. tanh, relu) will be applied to the convoluted values to increase the non-linearity.

The job of the pooling layer is to reduce the image size. It will only retain the most important features and remove the other area from the image. This helps to reduce the computational cost as well. The most popular pooling strategies are max-pooling and average-pooling.

How max-pooling and average-pooling works

This series of convolution layers and pooling layers will help to identify the features and are followed by dense layers for learning and prediction.

CNN is a deep neural network that needs a lot of computation power for training. To obtain sufficient accuracy, there should be a large dataset to construct a generalized model for unseen data. For that reason, we will be running the code in Google Colab, which is a platform used for research purposes. Colab supports GPU enabled hardware which gives a huge boost for training.

This dataset contains 2000 jpg images of cats and dogs. First, we need to download the dataset and extract it (Here data is downloaded to /tmp directory in the Colab instance).

The above code segments will download the datasets and extract them to /tmp directory. The extracted directory will have 2 sub-directories named train and validation. This will have the training and testing data.

Inside both those directories, there are 2 subdirectories for cats and dogs. We can load these training and testing data for the 2 classes with the TensorFlow data generator.

Setting the paths of testing and validation images

Here we have 2 data generators for train and test data. When loading the data a rescaling is applied to normalize the pixel values to faster convert the model.

When loading data, we do it in 20 image batches, and all of them are resized into 150x150. If there are images in different sizes, this will fix it.

Since the data is ready, we can start to build up the model. Here we are going to add 3 convolutional layers followed with 3 max-pooling layers. After which there is a Flatten layer, and finally 2 dense layers.

In the first convolution layer, we have added 16 kernels which is 3x3 in size. Once the image is convoluted with the kernel it will be passed through the relu activation to obtain non-linearity. The input shape of this layer should be 150x150, since we resized images to that size. Since all the images are colored images, they have 3 channels for RGB.

In the max-pooling layer, We have added a 2x2 kernel such that the max value will be taken when reducing the image size by 50%.

There are 3 such layers (convolution and max-pooling) to extract the features of images. If there are very complex features that need to be learned, more layers should be added to the model making it much deeper.

The Flatten layer will take the output from the previous max-pooling layer and convert it to a 1D array so that it can be fed to the Dense layers (a dense layer is a regular layer of neurons in a neural network). This is where the actual learning process happens by adjusting the weights. Here we have 2 such dense layers and since this is a binary classification there is only 1 neuron in the output layer. The number of neurons in the other layer can be adjusted as a hyperparameter to obtain the best accuracy.

Since we have constructed the model, now we can compile it.

Finally, the metrics parameter will be used to estimate how good our model is and here we use accuracy.

Now we can start training the model

After 15 epochs the model has a 98.9% accuracy score on the training set, and 71.5% accuracy on the validation set. This is a clear indication that our model has overfitted. Our model will perform really well in the training set and it will perform poorly for unseen data.

To solve the overfitting problem, either we can add regularization to avoid over-complexing the model, or we can add more data to the training set to make the model more generalized for unseen data. Since we have a very small data set (2000 images) for training, adding more data should fix the issue.

Collecting more data to train a model is overwhelming in machine learning since it’s a requirement to preprocess the data. But when working with images, especially in image classification, there is no need to collect more data. This can be fixed with the technique called Image Augmentation.

The idea of Image Augmentation is to create more images by resizing, zooming, rotating, etc to construct new images. With this approach, the model will able to capture more features than before and will be able to generalize for unseen data.

For example, let’s assume most of the cats in our training set have the full body of a cat. The model will try to learn the shape of the body of the cat from these images.

Due to this, the classifier might fail to identify images like the one below, since it hasn’t trained with examples similar to that.

But with image augmentation, we can construct new images from existing images to make the classifier learn new features. With the zoom feature in image augmentation, we can construct a new image like the one below, to help the learner to classify images like the one above.

Zoomed image from the original image with image augmentation

Adding image augmentation is really easy with the TensorFlow image generator. While image augmentation is being applied, the original dataset will be untouched and all the manipulations will be done in memory.

The following code segment shows how you could add this functionality.

Here image rotating, shifting, zooming and a few other image manipulation techniques are applied to generate new samples in the training set.

Once we apply the image augmentation, it is possible to obtain 86% training accuracy and 81% testing accuracy.

This model is not overfitted like before, and given how small the dataset is, the accuracy is impressive. Furthermore, you can improve the accuracy by playing with the hyperparameters like the optimizer, the number of dense layers, number of neurons in each layer, etc.

Saving The World Just Got Easier

Building a Convolutional Neural Network for Image Classification with Tensorflow

Add a comment

Related posts:

AVERAGE COLLEGE ESSAY LENGTH

How to increase customer loyalty and build real customers

You Are the Solution to a Problem That No Longer Exists