Cassandra Client Java API's

java cassandra hector astyanax pelops

arsenal · Apr 13, 2013 · Viewed 26.2k times · Source

I have recently started working with Cassandra Database. Now I am in the process of evaluating which Cassandra client we should go forward with.

I have seen various post on stackoverflow about which client to use for Cassandra but none has very definitive answer.

My team has asked me to do some research on this and come up with certain pros and cons for each Cassandra Client API’s in Java.

As I mentioned, I recently got involved with Cassandra so not have that much idea why certain people choose Pelops client and why certain people go with Astyanax and some other clients.

I know brief things about each of the Cassandra clients, by which I mean I am able to make that work and start reading and writing to Cassandra database.

Below is the information I have so far.

CASSANDRA APIS

Hector (Production-Ready)
The most stable of the Java APIs, ready for prime-time.
Astyanax (The Up and Comer)
A clean Java API from Netflix. It isn't as widely used as Hector, but it is solid.
Kundera (The NoSQL ORM)
JPA compliant, this is handy when you want to interact with Cassandra via objects.
This constrains you somewhat in that you won't be able to have a dynamic number of columns/names, etc. But it does allow you to port over ORMs, or centralize storage onto Cassandra for more traditional uses.
Pelops
I've only used Pelops briefly. It was a straight forward API, but didn't seem to have the momentum behind it.
PlayORM (ORM without the constraints?)
I just heard about this. It looks like it is trying to solve the impedance mismatch between traditional JPA-based ORMs and NoSQL by introducing JQL. It looks promising.
Thrift (Avoid Me!)
This is the "low-level" API.

Below are our priorities in deciding Cassandra Client-

First priorities are: low latency overhead, Asynch API, and reliability/stability for production environment.
(e.g. a more user-friendly APIs that can be had in the DAL that wraps the client).
Connection pooling and partition awareness are some other good feature to have.
Able to detect any new nodes that got added.
Good Support as well (as pointed by dean below)

Can anyone provide some thoughts on this? And also any pros and cons for each Cassandra Client and also which client can fulfill my requirements will be of great help as well.

I believe, mainly I will be revolving around Astyanax client or New Datastax client that uses Binary protocol I guess basis on my research so far. But don't have certain information to back my research and present it to my team.

Any comparison between Astyanax client and New Datastax client(which uses new Binary protocol) will be of great help.

It will be of great help to me in my research and will get lot of knowledge on this from different people who have used different clients in the past.

Answer

Thrift is becoming more of a legacy API:

First, you should be aware that the Thrift API is not going to be getting new features ; it's there for backwards compatibility, and not recommended for new projects.
- the paul

So I'd avoid Thrift based APIs (thrift is only kept for backwards compatibility).

In saying that if you do need to use a thrift based API I'd go for Astyanax. Astyanax is very easy to use (compared to other thrift APIs but my personal experience is that Datastax's driver is even easier).

So you should have a look at Datastax's API (and GitHub repo)? I'm not sure if there any compiled versions of the API for download but you can easily build it with Maven. Also if you take a look at the GitHub repo's commit logs it undergoes very frequent updates.

The driver works exclusively with CQL3 and is asynchronous but be warned that Cassandra 1.2 is the earliest supported version.

Performance
Astyanax is thrift based and Datastax's drive is the binary protocol. Here are the latest benchmarks I could find between thrift and CQL (note these are definitely out of date). But in fairness the small difference in performance shown in these benchmarks will rarely matter.

Asynch support
Datastax's asynch support is a definite advantage over Astyanax (Netflix tried implementing it but decided not to).

Documentation
I cant really argue against Netflix's wiki. The documentation is excellent and its updated fairly frequently. Their wiki includes code examples, and you can find tests in the source code if you need to see the code at work. I struggled to find any documentation of the Datastax driver however test are provided in the GitHub repository so that is a starting point.

Also have a look at this answer (well.. not my one anyway) It looks into some advantages/disadvantages of Thrift and CQL.

Cassandra Client Java API's

Answer

Related questions