How to do an item based recommendation in spark mllib?

user321532 picture user321532 · Dec 17, 2014 · Viewed 7.2k times · Source

In Mahout, there is support for item based recommendation using API method:

ItemBasedRecommender.mostSimilarItems(int productid, int maxResults, Rescorer rescorer)

But in Spark Mllib, it appears that the APIs within ALS can fetch recommended products but userid must be provided via:

MatrixFactorizationModel.recommendProducts(int user, int num)

Is there a way to get recommended products based on a similar product without having to provide user id information, similar to how mahout performs item based recommendation.

Answer

Vedant picture Vedant · Apr 6, 2015

Spark 1.2x versions do not provide with a "item-similarity based recommender" like the ones present in Mahout.

However, MLlib currently supports model-based collaborative filtering, where users and products are described by a small set of latent factors {Understand the use case for implicit (views, clicks) and explicit feedback (ratings) while constructing a user-item matrix.}

MLlib uses the alternating least squares (ALS) algorithm [can be considered similar to the SVD algorithm] to learn these latent factors.

If you need to construct purely an item-similarity based recommender, I would recommend this:

  1. Represent all items by a feature vector
  2. Construct an item-item similarity matrix by computing a similarity metric (such as cosine) with each items pair
  3. Use this item similarity matrix to find similar items for users

Since similarity matrices do not scale well, (imagine how your similarity matrix would grow if you had 100 items vs 10000 items) this read on DIMSUM might be helpful if you're planning to implement it on a large number of items:

https://databricks.com/blog/2014/10/20/efficient-similarity-algorithm-now-in-spark-twitter.html