General approach to developing an image classification algorithm for Dilbert cartoons

Andrew J picture Andrew J · Nov 13, 2011 · Viewed 8k times · Source

As a self-development exercise, I want to develop a simple classification algorithm that, given a particular cell of a Dilbert cartoon, is able to identify which characters are present in the cartoon (Dilbert, PHB, Ratbert etc.).

I assume the best way to do this is to (1) apply some algorithm to the image, which converts it into a set of features, and (2) use a training set and one of many possible machine learning algorithms to correlate the presence/absence of certain features with a particular character being present in the cell.

So my questions are - (a) is this the correct approach, (b) since there's a number of classification algorithms and ML algorithms to test, what is a good methodology for finding the right one, and (c) which algorithms would you start with, given that we're essentially conducting a classification exercise on a cartoon.

Answer

doug picture doug · Nov 13, 2011

So i think you are on the right track w/r/t your step 1 (apply some algorithm to the image, which converts it into a set of features).

This project is more challenging that most ML problems because here you will actually have to create your training data set from the raw data (the individual frames comprising the cartoons). For instance, grab a frame, identify two characters in that frame, Dilbert and the character with horns (Dilbert's boss i believe, don't know his name), extract those two characters from that frame and append to each the appropriate class label (e.g., "1" for Dlibert).

Step 1

To extract the individual characters from each of the frames comprising the a Dilbert cartoon, i would suggest a spectral decomposition of each frame. If you are not familiar with this technique, at its core, it's just an eigenvector decomp.

If you like python (or R, given that you can use python-to-R bindings like RPy) then i would strongly encourage you to look at sklearn. In particular, this excellent library (which was originally developed under the SciPy scikits project umbrella, and therefore uses NumPy + SciPy for matrix computation) has several algorithms for image segmentation, one of which is based on spectral clustering. For this step in your Project, you would most likely want to look at these two scikits.learn modules

  • sklearn.feature_extraction (esp. the image sub-module)

  • sklearn.cluster.spectral_clustering

Included with these two modules are two good example scripts, one segmenting a digital photograph and the other segmenting an image comprised of three partially super-imposed circles with minimal contrast w/r/t each other and w/r/t the background--both, i suspect are more difficult problems that the decompositions you will need to perform. In other words, sklearn has two complete, well-documented example scripts included in the source distribution, both of which process data similar to yours. Either or both would be an excellent template for this step.

Step 2

So that's the first step; here's the second: sort all of the components of the decomposed images into groups, one group for each Dilbert character. Next, assign a class label to each Group, e.g., if there are four characters from your decomposition step, then a decent choice for class labels is "0", "1", "2", and "3." Append those class labels to the component matrices (the decomposition products from step 1) so each character matrix is mapped to its corresponding class (Dilbert character).

Step 3

Select a suitable ML technique. You have many choices for this step; the only criteria are that the technique is in the supervised category (because you have assigned class labels to your data) and that it function as a classifier (i.e., it returns a class label, versus a regressor which outputs a numerical value). Given this is a personal project, i would chose the one that seems most interesting to you. A few that satisfy the criteria i just mentioned are: multi-layer perceptron (neural network), support vector machine (SVM), and k-nearest neighbors (kNN).

Step 4

train, validate, and test your classifier

Alternative Technique: Template Matching

Once Step 1 is completed (each image is decomposed into a set of objects, some of which will no doubt represent the characters) you can manually sift through these decomposition products and collect exemplars for each character in the cartoon. The are the templates.

Next you compare objects segmented from an image with this set of unique templates. In scikit-image, another scipy scikit, you can use the method match_template, to which you pass in a template image and a candidate image, and this method returns a 2D array showing the pixel-by-pixel correlation (between -1 and 1).