Using Deep Learning To Identify Your Dog Breed
This post is about my Udacity Data Science Nanodegree capstone project: implementing a dog breed classifier algorithm and a web application to provide online predictions.

Problem Statement
The main problems addressed by this project are to classify images of dogs according to their breed using CNN (Convolutional Neural Networks) and implementing a web application to provide online dog breeds detection. Detecting dog breed is extremely challenging, even for humans:

This capstone project was one of the suggestions by Udacity and it is divided by 2 main components:
Training a model for dog breed detection
The main goal is to implement a deep learning network to classify dog breeds from images. The goal is to evaluate different CNN custom architectures to solve this problem. I’ll present how I used transfer learning to train a pre-trained CNN to achieve this goal. I’ve used Keras and Tensorflow to train and validate the network.
Online detection of dog breeds
Creating an algorithm to consume the trained model and predict dog breeds from images. If a dog is detected in the image, it will provide an estimated dog breed. If a human is detected, it will provide an estimate of the dog breed that is most resembling. As explained as following, this algorithm is provided via a Flask application to receive uploaded photos, encode images and generate the predictions.
Evaluation Metrics
Given this project is a classification problem and the scope is simple, we are going to use accuracy score to evaluate the model in training and test set. Accuracy works well when the dataset classes are balanced.
As presented in the next section, we don’t have a uniform distribution of classes in our dataset but it is balanced enough to use this metric to evaluate the quality of our models.
Data Exploration and Visualization
The current dataset had a total of 8351 dog images which were labeled to one of the total 133 dog breeds (categories) available:

To explore and to prepare the data for the model, I split the dataset into: train, test and validation. Also, I’ve checked the distribution of each breed in the training.


As can be observed, for both train and test sets, we have a similar share of samples (%) for each class. That's important as we will be measure accuracy of result set of same distributions and therefore are comparable.
As described as following, images are pre-processed to have the same size. The dataset could be improved but images with different dimensions are not anomalies or blockers for training the model.
Data Preprocessing
To train the model and also generating the online predictions, we need to pre-process (encode) the dog images. This code was provided by Udacity instructors as well this methodology for encoding the dog images:
- Resize image to a square image of 224 x 224 pixels
- Convert image to an array. As we work with color images, we have 3 channels. Then, the image array has shape: (1, 224, 224, 3)
- When using TensorFlow as backend, Keras CNNs require a 4D array as input, with shape: (nb_samples, rows, columns, channels), where nb_samples corresponds to the total number of images (or samples), and rows, `columns`, and channels correspond to the number of rows, columns, and channels for each image, respectively. Given that, the final array has shape: (nb_samples, 224, 224, 3).
I’ve apply this encoding methodology for all images in train, test and validation sets.
Model implementation
As part of the project, I’ve first try to create a CNN network for scratch. Following the best practices of CNN architectures, I’ve created a Keras Sequential model with the following architecture:

As can be observed the network is very deep. It combines convolutional and dropout layers and a final dense layer for prediction. To improve the dataset and the model performance I’ve used data augementation] on the images:

And then, the Keras model is compiled, using Categorical Crossentropy loss function and the accuracy metric:

As expected I had a low accuracy: 16.38%. As mentioned before, this problem is pretty hard and we have out there many pre-trained networks to solve similar object detection problems.
Then, as a refinement, to have a better model and reduce training time, without decreasing accuracy, I’ve trained a CNN using transfer learning.
As our dataset is small and very similar to Image net dataset, I’ve applied a transfer learning methodology based on these steps:
- Slice off the end of the neural network
- Add a new fully connected layer that matches the number of classes in the new data set
- Randomize the weights of the new fully connected layer and freeze all the weights from the pre-trained network
- Train the network to update the weights of the new fully connected layer
Preloaded networks trained on Imagenet dataset were provided by Udacity instructors. For each model, I’ve applied the methodology above compiling the model using the same parameters I used in my CNN from scratch. After the evaluation I had the following results:

Given, the good results, the winner was the Xception network and I've defined the following architecture:

As can be observed, we have the pre-trained layer, only one dropout layer, to avoid overfitting, and dense layer with exactly 133 units which is the number of classes to be predicted.
Model Evaluation and Validation
I've trained the winner model (previous section) using the following parameters:

Here, we can find the training, validation and test results:

Given the scope of this project, the results were very satisfactory, specially the accuracy score in the test set. All code used to evaluate and validate the winner model as well the training, validation and test logging can be found in this notebook.
In the next section, we explore the algorithm to predict as well as the model output for real dog images and human faces.
Testing the prediction algorithm
To improve the overall recommendation, I’ve created an algorithm to first detect if an image contains a dog or a human face and only when that happens, detect the dog breed using the Xception model.
Udacity instructors provided models for detecting a dog or human face on an image. The complete implementation of this algorithm is represented by the class DogBreedPredictor. By running the algorithm with valid and invalid images, we had interested results:

Doggo: A web application to run online detections
Doggo is a Flask web app application to run Dog breed detection models. Based on a photo upload by a user, Doggo runs the following steps:
- Encode uploaded photo for running dog detector, human face detector and dog breed detector models
- If the photo is classified as a dog or a human face, it predicts dog breeds
- For the top n predictions, Doggo calls Wikipedia and dog.ceo to find more information about the predicted breeds
- Return results to user
Code repository: https://github.com/besson/ds-capstone-project/tree/master/doggo
How to run
- Build docker image: docker build -t doggo .
- The app has already trained models for detecting dog, human faces and dog breeds. In case of update, you need to save the new models at models folder
- Running the app: docker run -p 5000:80 doggo
- Go to http://0.0.0.0:5000
How to use
For an uploaded photo, the app runs the model and displays the main dog breed and probability of top n dog breeds:

Justification
The model looks nice and I had a lot of fun doing. CNNs are indeed the state of art for image classification as all pre-trained networks performed very well.
Transfer learning is a very common practice in industry and I was also surprised how easy was to apply transfer learning and have a decent solution for such a complex problem as classifying dog breeds from images.
There are many things to improve in my network but given the scope and simplicity, it works better than I expected. However, there some potential issues:
- It looks the model has some bias in classifying humans as Dachshund.
- Model classified a fake dog (Photo 4 of Figure 9) as a real dog. It is very interesting but it should not give a prediction for this case as the photo is not a dog or a human face.
- Web application is simple but could have a better response time.
Possible improvements
- Improve dataset: remove black and white images and measure the performance
- Re-training the model with a larger dataset and measure the accuracy. Currently, it is ~ 86% accuracy on test set.
- Having further analysis to confirm if we have a bias to classify photos as Dachshund.
- I can consider caching images and calls to Wikipedia in the web application. Also, it will be deployed soon.
To see all code used in this project, please check my Github here.