Importing Java class into a python project

benj rei picture benj rei · Feb 28, 2016 · Viewed 11.7k times · Source

I've been trying to find a method to importing Java-ml into my python project. I have the jar file in the same path as my project.

I want to use it for kmeans clustering, since it allows me to change the distance metric. I am wondering though whether with the implementation that one of you suggest, whether I'll be able to pass a different java class as a parameter for the function?

I tried using:

import sys

sys.path.append(r"C:\Users\X\Desktop\X\javaml-0.1.7\javaml-0.1.7.jar")

import net.sf.javaml as jml

test = jml.clustering.Kmeans()

I considered using jython, however I am unsure of how it works, and it is unclear whether I could continue using idle and whether I would have to reprogram my project.

Lastly I considered using PyJNIus, however it is simply not working.

Answer

BeRecursive picture BeRecursive · Feb 28, 2016

In short, you can't run Java code natively in a CPython interpreter.

Firstly, Python is just the name of the specification for the language. If you are using the Python supplied by your operating system (or downloaded from the official Python website), then you are using CPython. CPython does not have the ability to interpret Java code.

However, as you mentioned, there is an implementation of Python for the JVM called Jython. Jython is an implementation of Python that operates on the JVM and therefore can interact with Java modules. However, very few people work with Jython and therefore you will be a bit on your own about making everything work properly. You would not need to re-write your vanilla Python code (since Jython can interpret Python 2.x) but not all libraries (such as numpy) will be supported.

Finally, I think you need to better understand the K-Means algorithm, as the algorithm is implicitly defined in terms of the Euclidean distance. Using any other distance metric would no longer be considered K-Means and may affect the convergence of the algorithm. See here for more information.


Again, you can't run Java code natively in a CPython interpreter. Of course there are various third party libraries that will handle marshalling of data between Java and Python. However, I stand by my statement that for this particular use case you are likely better to use a native Python library (something like K-Medoid in Scikit-Learn). Attempting to call through to Java, with all the associated overhead, is overkill for this problem, in my opinion.