How to create my own recommendation engine?

Burak Dede picture Burak Dede · Sep 10, 2009 · Viewed 17.8k times · Source

I am interested in recommendation engines these days and I want to improve myself in this area. I am currently reading "Programming Collective Intelligence" I think this is the best book about this subject, from O'Reilly. But I don't have any ideas how to implement engine; What I mean by "no idea" is "don't know how to start". I have a project like Last.fm in my mind.

  1. Where do (should be implemented on database side or backend side) I start creating recommendation engine?
  2. What level of database knowledge will be needed?
  3. Is there any open source ones that can be used for help or any resource?
  4. What should be the first steps that I have to do?

Answer

Josh picture Josh · Sep 17, 2010

Presenting recommendations can be split up in to two main sections:

  1. Feature extraction
  2. Recommendation

Feature extraction is very specific to the object being recommended. For music, for example, some features of the object might be the frequency response of the song, the power, the genre, etc. The features for the users might be age, location, etc. You then create a vector for each user and song with the various elements of the vector corresponding to different features of interest.

Performing the actual recommendation only requires well thought out feature vectors. Note that if you don't choose the right features your recommendation engine will fail. This would be like asking you to tell me my sex based on my age. Of course my age may provide a bit of information, but I think you could imagine better questions to ask. Anyways, once you have your feature vectors for each user and song, you will need to train the recommendation engine. I think the best way to do this would be to get a whole bunch of users to take your demographic test and then tell you specific songs that they like. At this point you have all the information you need. Your job is to draw a decision boundary with the information you have. Consider a simple example. You want to predict whether or not a user likes AC/DC's "Back in Black" based on age and sex. Imagine a graph showing 100 data points. The x axis is age, the y axis is sex (1 is male, 2 is female). A black mark indicates that the user likes the song while a red mark means they don't like the song. My guess is that this graph might have a lot of black marks corresponding to users that are male and between the ages of 12 and 37 while the rest of the marks will be red. So, if we were to manually select a decision boundary, it'd be a rectangle around this area holding the majority of the black marks. This is called the decision boundary because, if a completely new person comes to you and tells you their age and sex, you only have to plot them on the graph and ask whether or not they fall within that box.

So, the hard part here is finding the decision boundary. The good news is that you don't need to know how to do that. You just need to know how to use some of the common tools. You can look into using neural networks, support vector machines, linear classifiers, etc. Again, don't let the big names fool you. Most people can't tell you what these things are really doing. They just know how to plug things in and get results.

I know it's a bit late, but I hope this helps anyone that stumbles on this thread.