In my server application I'm connecting to Kerberos secured Hadoop cluster from my java application. I'm using various components like the HDFS file system, Oozie, Hive etc. On the application startup I do call
UserGroupInformation.loginUserFromKeytabAndReturnUGI( ... );
This returns me UserGroupInformation
instance and I keep it for application lifetime. When doing privileged action I launch them with ugi.doAs(action)
.
This works fine but I wonder if and when should I renew the kerberos ticket in UserGroupInformation
? I found a method UserGroupInformation.checkTGTAndReloginFromKeytab()
which seems to do the ticket renewal whenever it's close to expiry. I also found that this method is being called by various Hadoop tools like WebHdfsFileSystem
for example.
Now if I want my server application (possibly running for months or even years) to never experience ticket expiry what is the best approach? To provide concrete questions:
checkTGTAndReloginFromKeytab
whenever it's needed? checkTGTAndReloginFromKeytab
myself in my code?ugi.doAs(...)
or rather setup a timer and call it periodically (how often)?Hadoop committer here! This is an excellent question.
Unfortunately, it's difficult to give a definitive answer to this without a deep dive into the particular usage patterns of the application. Instead, I can offer general guidelines and describe when Hadoop would handle ticket renewal or re-login from a keytab automatically for you, and when it wouldn't.
The primary use case for Kerberos authentication in the Hadoop ecosystem is Hadoop's RPC framework, which uses SASL for authentication. Most of the daemon processes in the Hadoop ecosystem handle this by doing a single one-time call to UserGroupInformation#loginUserFromKeytab
at process startup. Examples of this include the HDFS DataNode, which must authenticate its RPC calls to the NameNode, and the YARN NodeManager, which must authenticate its calls to the ResourceManager. How is it that daemons like the DataNode can do a one-time login at process startup and then keep on running for months, long past typical ticket expiration times?
Since this is such a common use case, Hadoop implements an automatic re-login mechanism directly inside the RPC client layer. The code for this is visible in the RPC Client#handleSaslConnectionFailure
method:
// try re-login
if (UserGroupInformation.isLoginKeytabBased()) {
UserGroupInformation.getLoginUser().reloginFromKeytab();
} else if (UserGroupInformation.isLoginTicketBased()) {
UserGroupInformation.getLoginUser().reloginFromTicketCache();
}
You can think of this as "lazy evaluation" of re-login. It only re-executes login in response to an authentication failure on an attempted RPC connection.
Knowing this, we can give a partial answer. If your application's usage pattern is to login from a keytab and then perform typical Hadoop RPC calls, then you likely do not need to roll your own re-login code. The RPC client layer will do it for you. "Typical Hadoop RPC" means the vast majority of Java APIs for interacting with Hadoop, including the HDFS FileSystem
API, the YarnClient
and MapReduce Job
submissions.
However, some application usage patterns do not involve Hadoop RPC at all. An example of this would be applications that interact solely with Hadoop's REST APIs, such as WebHDFS or the YARN REST APIs. In that case, the authentication model uses Kerberos via SPNEGO as described in the Hadoop HTTP Authentication documentation.
Knowing this, we can add more to our answer. If your application's usage pattern does not utilize Hadoop RPC at all, and instead sticks solely to the REST APIs, then you must roll your own re-login logic. This is exactly why WebHdfsFileSystem
calls UserGroupInformation#checkTGTAndReloginFromkeytab
, just like you noticed. WebHdfsFileSystem
chooses to make the call right before every operation. This is a fine strategy, because UserGroupInformation#checkTGTAndReloginFromkeytab
only renews the ticket if it's "close" to expiration. Otherwise, the call is a no-op.
As a final use case, let's consider an interactive process, not logging in from a keytab, but rather requiring the user to run kinit
externally before launching the application. In the vast majority of cases, these are going to be short-running applications, such as Hadoop CLI commands. However, in some cases these can be longer-running processes. To support longer-running processes, Hadoop starts a background thread to renew the Kerberos ticket "close" to expiration. This logic is visible in UserGroupInformation#spawnAutoRenewalThreadForUserCreds
. There is an important distinction here though compared to the automatic re-login logic provided in the RPC layer. In this case, Hadoop only has the capability to renew the ticket and extend its lifetime. Tickets have a maximum renewable lifetime, as dictated by the Kerberos infrastructure. After that, the ticket won't be usable anymore. Re-login in this case is practically impossible, because it would imply re-prompting the user for a password, and they likely walked away from the terminal. This means that if the process keeps running beyond expiration of the ticket, it won't be able to authenticate anymore.
Again, we can use this information to inform our overall answer. If you rely on a user to login interactively via kinit
before launching the application, and if you're confident the application won't run longer than the Kerberos ticket's maximum renewable lifetime, then you can rely on Hadoop internals to cover periodic renewal for you.
If you're using keytab-based login, and you're just not sure if your application's usage pattern can rely on the Hadoop RPC layer's automatic re-login, then the conservative approach is to roll your own. @SamsonScharfrichter gave an excellent answer here about rolling your own.
HBase Kerberos connection renewal strategy
Finally, I should add a note about API stability. The Apache Hadoop Compatibility
guidelines discuss the Hadoop development community's commitment to backwards-compatibility in full detail. The interface of UserGroupInformation
is annotated LimitedPrivate
and Evolving
. Technically, this means the API of UserGroupInformation
is not considered public, and it could evolve in backwards-incompatible ways. As a practical matter, there is a lot of code already depending on the interface of UserGroupInformation
, so it's simply not feasible for us to make a breaking change. Certainly within the current 2.x release line, I would not have any fear about method signatures changing out from under you and breaking your code.
Now that we have all of this background information, let's revisit your concrete questions.
Can I rely on the various Hadoop clients they call checkTGTAndReloginFromKeytab whenever it's needed?
You can rely on this if your application's usage pattern is to call the Hadoop clients, which in turn utilize Hadoop's RPC framework. You cannot rely on this if your application's usage pattern only calls the Hadoop REST APIs.
Should I call ever checkTGTAndReloginFromKeytab myself in my code?
You'll likely need to do this if your application's usage pattern is solely to call the Hadoop REST APIs instead of Hadoop RPC calls. You would not get the benefit of the automatic re-login implemented inside Hadoop's RPC client.
If so should I do that before every single call to ugi.doAs(...) or rather setup a timer and call it periodically (how often)?
It's fine to call UserGroupInformation#checkTGTAndReloginFromKeytab
right before every action that needs to be authenticated. If the ticket is not close to expiration, then the method will be a no-op. If you're suspicious that your Kerberos infrastructure is sluggish, and you don't want client operations to pay the latency cost of re-login, then that would be a reason to do it in a separate background thread. Just be sure to stay a little bit ahead of the ticket's actual expiration time. You might borrow the logic inside UserGroupInformation
for determining if a ticket is "close" to expiration. In practice, I've never personally seen the latency of re-login be problematic.