Deep learning for image classification

Nihar Sarangi picture Nihar Sarangi · Feb 17, 2013 · Viewed 7.8k times · Source

After reading a few papers on deep learning and deep belief networks, I got a basic idea of how it works. But still stuck with the last step, i.e, the classification step. Most of the implementation I found on the Internet deal with generation. (MNIST digits)

Is there some explanation (or code) available somewhere that talk about classifying images(preferably natural images or objects) using DBNs?

Also some pointers in the direction would be really helpful.

Answer

solvingPuzzles picture solvingPuzzles · Jan 16, 2014

The basic idea

These days, the state-of-the-art deep learning for image classification problems (e.g. ImageNet) are usually "deep convolutional neural networks" (Deep ConvNets). They look roughly like this ConvNet configuration by Krizhevsky et al: enter image description here

For the inference (classification), you feed an image into the left side (notice that the depth on the left side is 3, for RGB), crunch through a series of convolution filters, and it spits out a 1000-dimensional vector on the right-hand side. This picture is especially for ImageNet, which focuses on classifying 1000 categories of images, so the 1000d vector is "score of how likely it is that this image fits in the category."

Training the neural net is only slightly more complex. For training, you basically run classification repeatedly, and every so often you do backpropagation (see Andrew Ng's lectures) to improve the convolution filters in the network. Basically, backpropagation asks "what did the network classify correctly/incorrectly? For misclassified stuff, let's fix the network a little bit."


Implementation

Caffe is a very fast open-source implementation (faster than cuda-convnet from Krizhevsky et al) of deep convolutional neural networks. The Caffe code is pretty easy to read; there's basically one C++ file per type of network layer (e.g. convolutional layers, max-pooling layers, etc).