I'm looking for a lightweight Java library that supports Nearest Neighbor Searches by Locality Sensitive Hashing for nearly equally distributed data in a high dimensional (in my case 32) dataset with some hundreds of thousands data points.
It's totally good enough to get all entries in a bucket for a query. Which ones i really need could then be processed in a different way under consideration of some filter parameters my problem include.
I already found likelike but hope that there is something a bit smaller and without need of any other tools (like Apache Hadoop in the case of likelike).
Maybe this one:
"TarsosLSH is a Java library implementing Locality-sensitive Hashing (LSH), a practical nearest neighbour search algorithm for multidimensional vectors that operates in sublinear time. It supports several Locality Sensitive Hashing (LSH) families: the Euclidean hash family (L2), city block hash family (L1) and cosine hash family. The library tries to hit the sweet spot between being capable enough to get real tasks done, and compact enough to serve as a demonstration on how LSH works."
Code can be found here