I am looking for a library that, ideally, has the following features:
I would like this to be in C++, as I am most comfortable with that language, but I will also use any other language if the library is worth it. I have googled and found some, but I do not really have the time to try them all out, so I want hear what other people had for experiences. Please only answer if you have some experience with the library you recommend.
P.S.: I could also use different libraries for the clustering and the SVM.
There are only a few ML libraries that i have used enough so that i am comfortable recommending them; dlib ml is certainly one of them.
Sourceforge download here; and bleeding-edge check-out:
hg clone http://hg.code.sf.net/p/dclib/code dclib-code
The original library creator and current maintainer is Davis King.
Your wishlist versus the relevant dlib features:
good documentation: for free, open-source libraries directed at a relatively small group of users/developers, this is probably as good as it gets; aside from the usual docs, refined during the five-year dev history, there's a frequently updated Intro to dlib, a (low-traffic) forum; and a large set of excellent examples (including at least one for SVM).
C++: 100% in C++ as far as i know.
Support-Vector Machine algorithm: yep; in fact, the SVM modules have been the focus of the most recent updates to this Library.
Hierarchical Clustering algorithm: not out of the box; there is however, packaged code for k-means clustering. Obviously the results from each technique are very different, but calculation of the similarity metric and the subsequent recursive/iterative partitioning step are at the heart of both--in other words, the computation engine for hierarchical clustering is all there. To adapt the extant clustering module for HC, will take more than a couple lines of code, but it's also not a major endeavor given that you're working almost at the data-presentation level.
dlib ml has a few additional points to recommend it. It's a mature library (it's at version 17.x now, version 1.x was released sometime in late 2005, i believe) yet it also remains under active development, as evidenced by the repo logs (the last update, 17.27, was 17 May 2010) and the last commit (23 May 2010). In addition, it also includes quite few other ML techniques (eg., Bayesian Networks, Kernel Methods, etc.). And third, dllib ml has excellent "support" libraries for matrix computation and optimization--both of which are fundamental building blocks of many ML techniques.
In the source, i've noticed that dlib ml is licensed under BSL (Boost?), which is an open source license, though I don't know anything else about this type of license.