Which datamining tool to use?

user2670818 picture user2670818 · Jul 25, 2016 · Viewed 10.6k times · Source

Can somebody explain me the main pros and cons of the most known datamining open-source tools?

Everywhere I read that RapidMiner, Weka, Orange, KNIME are the best ones. look at this blog post

Can somebody do a fast technical comparison in a small bullet list.

My needs are the following:

  • It should support classification algorithms (Naive Bayes, SVM, C4.5, kNN).
  • It should be easy to implement in Java.
  • It should have understandable documentation.
  • It should have reference production projects or use cases working on in.
  • some additional benchmark comparison if possible.

Thanks!

Answer

D3181 picture D3181 · Jul 25, 2016

I would like to say firstly there are pro's and cons for each of them on your list however I would suggest out of your list weka from my personal experience it is incredibly simple to implement in your own java application using the weka jar file and has its own self contained tools for data mining.

Rapid miner seems to be a commercial solution offering an end to end solution however the most notable number of examples of external implementations of solutions for rapid miner are usually in python and r script not java.

Orange offers tools that seem to be targeted primarily at people with possibly less need for custom implementations into their own software but a far easier time with user itneraction, its written in python and source is available, user addons are supported.

Knime is another commercial platform offering end to end solutions for data mining and analysis providing all the tools required, this one has various good reviews around the internet but i havent used it enough to advise you or anyone on the pro's or cons of it.

See here for knime vs weka

Best data mining tools

As i said weka is my personal favorite as a software developer but im sure other people have varying reasons and opinions on why to choose one over the other. Hope you find the right solution for you.

Also per your requirements weka supports the following:

Naivebayes

SVM

C4.5

KNN