I am a graduate CS student (Data mining and machine learning) and have a good exposure to core Java (>4 years). I have read up a bunch of stuff on Hadoop and Map/Reduce
I would now like to do a project on this stuff (over my free time of corse) to get a better understanding.
Any good project ideas would be really appreciated. I just wanna do this to learn, so I dont really mind re-inventing the wheel. Also, anything related to data mining/machine learning would be an added bonus (fits with my research) but absolutely not necessary.
You haven't written anything about your interest. I know algorithms in graph mining has been implemented over hadoop framework. This software http://www.cs.cmu.edu/~pegasus/ and paper : "PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations" may give you starting point.
Further, this link discusses something similar to your question: http://atbrox.com/2010/02/08/parallel-machine-learning-for-hadoopmapreduce-a-python-example/ but it is in python. And, there is a very good paper by Andrew Ng "Map-Reduce for Machine Learning on Multicore".
There was a NIPS 2009 workshop on similar topic "Large-Scale Machine Learning: Parallelism and Massive Datasets". You can browse some of the paper and get an idea.
Edit : Also there is Apache Mahout http://mahout.apache.org/ -->" Our core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm"