HBase Kerberos connection renewal strategy

user1310957 picture user1310957 · Oct 19, 2015 · Viewed 8.7k times · Source

Recently I enabled kerberos in my cluster, everything works great until my kerberos login expires, at say, 12 hours. At that point any connections I have created, any tables created with those connections etc will throw when I use them. This could potentially crash my app depending on how I handle this.

I don't mind crashing hugely because my app is managed by slider which will resurrect the app if and when it goes down, however this will only happen when HBase is "used" (i.e. I call a method on a table with a now stale connection) which will probably be caused by a user interaction and this would lead to poor UX.

I don't want authentication implementation details to pervade my application and also don't want to create connection objects more often than is necessary because it is a costly operation which makes a large number of RPC calls (zookeeper metadata location to start with).

Is there a common strategy (preferably inbuilt in HBase client) for managing kerberos authentication expiry and renewing HBase connections/tables when that happens?

Answer

Samson Scharfrichter picture Samson Scharfrichter · Oct 20, 2015

A Kerberos TGT has a lifetime (e.g. 12h) and a renewable lifetime (e.g. 7 days). As long as the ticket is still valid and is still renewable, you can request a "free" renewal -- no password required --, and the lifetime counter is reset (e.g. 12h to go, again).

The Hadoop authentication library spawns a specific Java thread for automatic renewal of the current TGT. It's kind of ugly, using a kinit -R command line instead of a JAAS library call, but it works - see HADOOP-6656

So, if you get Slider to create a renewable ticket on startup, and if you can bribe your SysAdmin to raise the default (cf. client conf) and the max (cf. KDC conf) renewable lifetime to, say, 30 days, then your app could run for 30 days straight with the initial TGT. A nice improvement.

~~~~~~~~~~

If you really crave for eternity... sorry, but you will actually have some programming to do. That means a dedicated thread/process in charge or re-creating automagically the TGT.

  • The Java Way: on startup, before you connect to HBase/HDFS/whatever, create explicitly an UGI with loginUserFromKeytab() then run checkTGTAndReloginFromKeytab() from time to time
  • The Shell Way: start a shell that (a) creates a TGT with kinit (b) spawns a sub-process that periodically fires kinit again (c) launches your Java app then kills the subprocess when/if your app ever terminates

Caveat: if some other thread happens to open, or re-open, a connection while the TGT is being re-created, that connection may fail because the cache was empty at the exact time it was accessed ("race condition"). The next attempt will be successful, but expect a few rogue warnings in your logs.

~~~~~~~~~~

Final advice: you can use a private ticket cache for your app (i.e. you can run multiple apps on the same node with the same Linux account but different Kerberos principals) by setting KRB5CCNAME environment variable, as long as it's a "FILE:" cache.