Using Deep Learning To Identify Your Dog Breed

Published in

Artificial Intelligence in Plain English

7 min readJan 11, 2021

This post is about my Udacity Data Science Nanodegree capstone project: implementing a dog breed classifier algorithm and a web application to provide online predictions.

Problem Statement

The main problems addressed by this project are to classify images of dogs according to their breed using CNN (Convolutional Neural Networks) and implementing a web application to provide online dog breeds detection. Detecting dog breed is extremely challenging, even for humans:

This capstone project was one of the suggestions by Udacity and it is divided by 2 main components:

Training a model for dog breed detection
The main goal is to implement a deep learning network to classify dog breeds from images. The goal is to evaluate different CNN custom architectures to solve this problem. I’ll present how I used transfer learning to train a pre-trained CNN to achieve this goal. I’ve used Keras and Tensorflow to train and validate the network.

Online detection of dog breeds
Creating an algorithm to consume the trained model and predict dog breeds from images. If a dog is detected in the image, it will provide an estimated dog breed. If a human is detected, it will provide an estimate of the dog breed that is most resembling. As explained as following, this algorithm is provided via a Flask application to receive uploaded photos, encode images and generate the predictions.

Evaluation Metrics

Given this project is a classification problem and the scope is simple, we are going to use accuracy score to evaluate the model in training and test set. Accuracy works well when the dataset classes are balanced.

As presented in the next section, we don’t have a uniform distribution of classes in our dataset but it is balanced enough to use this metric to evaluate the quality of our models.

Data Exploration and Visualization

The current dataset had a total of 8351 dog images which were labeled to one of the total 133 dog breeds (categories) available:

Figure 2. Examples of training set dog images

To explore and to prepare the data for the model, I split the dataset into: train, test and validation. Also, I’ve checked the distribution of each breed in the training.

Figure 3. Distribution of each class by dataset

As can be observed, for both train and test sets, we have a similar share of samples (%) for each class. That's important as we will be measure accuracy of result set of same distributions and therefore are comparable.

As described as following, images are pre-processed to have the same size. The dataset could be improved but images with different dimensions are not anomalies or blockers for training the model.

Data Preprocessing

To train the model and also generating the online predictions, we need to pre-process (encode) the dog images. This code was provided by Udacity instructors as well this methodology for encoding the dog images:

Resize image to a square image of 224 x 224 pixels
Convert image to an array. As we work with color images, we have 3 channels. Then, the image array has shape: (1, 224, 224, 3)
When using TensorFlow as backend, Keras CNNs require a 4D array as input, with shape: (nb_samples, rows, columns, channels), where nb_samples corresponds to the total number of images (or samples), and rows, `columns`, and channels correspond to the number of rows, columns, and channels for each image, respectively. Given that, the final array has shape: (nb_samples, 224, 224, 3).

I’ve apply this encoding methodology for all images in train, test and validation sets.

Model implementation

As part of the project, I’ve first try to create a CNN network for scratch. Following the best practices of CNN architectures, I’ve created a Keras Sequential model with the following architecture:

As can be observed the network is very deep. It combines convolutional and dropout layers and a final dense layer for prediction. To improve the dataset and the model performance I’ve used data augementation] on the images:

And then, the Keras model is compiled, using Categorical Crossentropy loss function and the accuracy metric:

Figure 6: Compiling Keras model

As expected I had a low accuracy: 16.38%. As mentioned before, this problem is pretty hard and we have out there many pre-trained networks to solve similar object detection problems.

Then, as a refinement, to have a better model and reduce training time, without decreasing accuracy, I’ve trained a CNN using transfer learning.

As our dataset is small and very similar to Image net dataset, I’ve applied a transfer learning methodology based on these steps:

Slice off the end of the neural network
Add a new fully connected layer that matches the number of classes in the new data set
Randomize the weights of the new fully connected layer and freeze all the weights from the pre-trained network
Train the network to update the weights of the new fully connected layer

Preloaded networks trained on Imagenet dataset were provided by Udacity instructors. For each model, I’ve applied the methodology above compiling the model using the same parameters I used in my CNN from scratch. After the evaluation I had the following results:

Figure 7: Accuracy of pre-trained Imagenet based networks

Given, the good results, the winner was the Xception network and I've defined the following architecture:

Figure 8: Customized Xception architecture

As can be observed, we have the pre-trained layer, only one dropout layer, to avoid overfitting, and dense layer with exactly 133 units which is the number of classes to be predicted.

Model Evaluation and Validation

I've trained the winner model (previous section) using the following parameters:

Figure 9. Parameters to train the CNN model

Here, we can find the training, validation and test results:

Table 2: Overall model evaluation and validation results

Given the scope of this project, the results were very satisfactory, specially the accuracy score in the test set. All code used to evaluate and validate the winner model as well the training, validation and test logging can be found in this notebook.

In the next section, we explore the algorithm to predict as well as the model output for real dog images and human faces.

Testing the prediction algorithm

To improve the overall recommendation, I’ve created an algorithm to first detect if an image contains a dog or a human face and only when that happens, detect the dog breed using the Xception model.

Udacity instructors provided models for detecting a dog or human face on an image. The complete implementation of this algorithm is represented by the class DogBreedPredictor. By running the algorithm with valid and invalid images, we had interested results:

Figure 10: Prediction output with real photos

Doggo: A web application to run online detections

Doggo is a Flask web app application to run Dog breed detection models. Based on a photo upload by a user, Doggo runs the following steps:

Encode uploaded photo for running dog detector, human face detector and dog breed detector models
If the photo is classified as a dog or a human face, it predicts dog breeds
For the top n predictions, Doggo calls Wikipedia and dog.ceo to find more information about the predicted breeds
Return results to user

Code repository: https://github.com/besson/ds-capstone-project/tree/master/doggo

How to run

Build docker image: docker build -t doggo .
The app has already trained models for detecting dog, human faces and dog breeds. In case of update, you need to save the new models at models folder
Running the app: docker run -p 5000:80 doggo
Go to http://0.0.0.0:5000

How to use

For an uploaded photo, the app runs the model and displays the main dog breed and probability of top n dog breeds:

Figure 11: Screenshot of web application

Justification

The model looks nice and I had a lot of fun doing. CNNs are indeed the state of art for image classification as all pre-trained networks performed very well.

Transfer learning is a very common practice in industry and I was also surprised how easy was to apply transfer learning and have a decent solution for such a complex problem as classifying dog breeds from images.

There are many things to improve in my network but given the scope and simplicity, it works better than I expected. However, there some potential issues:

It looks the model has some bias in classifying humans as Dachshund.
Model classified a fake dog (Photo 4 of Figure 9) as a real dog. It is very interesting but it should not give a prediction for this case as the photo is not a dog or a human face.
Web application is simple but could have a better response time.

Possible improvements

Improve dataset: remove black and white images and measure the performance
Re-training the model with a larger dataset and measure the accuracy. Currently, it is ~ 86% accuracy on test set.
Having further analysis to confirm if we have a bias to classify photos as Dachshund.
I can consider caching images and calls to Wikipedia in the web application. Also, it will be deployed soon.

To see all code used in this project, please check my Github here.

Artificial Intelligence in Plain English

Using Deep Learning To Identify Your Dog Breed

Problem Statement

Evaluation Metrics

Data Exploration and Visualization

Data Preprocessing

Model implementation

Model Evaluation and Validation

Testing the prediction algorithm

Doggo: A web application to run online detections

How to run

How to use

Justification

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in Artificial Intelligence in Plain English

Written by Felipe Besson

No responses yet