GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) while connecting Polybase with Kerberos

Gigi picture Gigi · Jun 22, 2018 · Viewed 10.4k times · Source

We want to connect our SQL Server 2016 Enterprise via Polybase with our Kerberized OnPrem Hadoop-Cluster with Cloudera 5.14.

I followed the Microsoft PolyBase Guide to configure Polybase. After working few days on this topic I'm not able to continue because of an exception: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

Microsoft has an built in diagnostic tool for troubleshooting the connectivity with PolyBase and Kerberos. On this troubleshooting guide from Microsoft there are 4 checkpoints and I'm stuck on checkpoint 4. Short information about the checkpoints (where I'm successfull):

  • Checkpoint 1: Successfull! Authenticated against the KDC and received a TGT
  • Checkpoint 2: Successfull! Regarding troubleshooting guide PolyBase will make an attempt to access the HDFS and fail because the request did not contain the necessary Service Ticket.
  • Checkpoint 3: Sucessfull! A second hex dump indicates that SQL Server successfully used the TGT and acquired the applicable Service Ticket for the name node's SPN from the KDC.
  • Checkpoint 4: Not successfull SQL Server was authenticated by Hadoop using the ST (Service Ticket) and a session was granted to access the secured resource.

krb5.conf file

[libdefaults]
default_realm = COMPANY.REALM.COM
dns_lookup_kdc = false
dns_lookup_realm = false
ticket_lifetime = 86400
renew_lifetime = 604800
forwardable = true
default_tgs_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
default_tkt_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
permitted_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
udp_preference_limit = 1
kdc_timeout = 3000
[realms]
COMPANY.REALM.COM = {
kdc = ipadress.kdc.host
admin_server = ipadress.kdc.host
}
[logging]
default = FILE:/var/log/krb5/kdc.log
kdc = FILE:/var/log/krb5/kdc.log
admin_server = FILE:/var/log/krb5/kadmind.log

core-site.xml for Polybase on SQL-Server

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
  <property>
    <name>io.file.buffer.size</name>
    <value>131072</value>
  </property>
  <property>
    <name>ipc.client.connect.max.retries</name>
    <value>2</value>
  </property>
  <property>
    <name>ipc.client.connect.max.retries.on.timeouts</name>
    <value>2</value>
  </property>

<!-- kerberos security information, PLEASE FILL THESE IN ACCORDING TO HADOOP CLUSTER CONFIG -->
<property>
    <name>polybase.kerberos.realm</name>
    <value>COMPANY.REALM.COM</value>
  </property>
  <property>
    <name>polybase.kerberos.kdchost</name>
    <value>ipadress.kdc.host</value>
  </property>
  <property>
    <name>hadoop.security.authentication</name>
    <value>KERBEROS</value>
  </property>
</configuration>

hdfs-site.xml for Polybase on SQL-Server

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
  <property>
    <name>dfs.block.size</name>
    <value>268435456</value> 
  </property>
  <!-- Client side file system caching is disabled below for credential refresh and 
       settting the below cache disabled options to true might result in 
       stale credentials when an alter credential or alter datasource is performed
  -->
  <property>
    <name>fs.wasb.impl.disable.cache</name>
    <value>true</value>
  </property>
  <property>
    <name>fs.wasbs.impl.disable.cache</name>
    <value>true</value>
  </property>
  <property>
    <name>fs.asv.impl.disable.cache</name>
    <value>true</value>
  </property>
  <property>
    <name>fs.asvs.impl.disable.cache</name>
    <value>true</value>
  </property>
  <property>
    <name>fs.hdfs.impl.disable.cache</name>
    <value>true</value>
  </property>
<!-- kerberos security information, PLEASE FILL THESE IN ACCORDING TO HADOOP CLUSTER CONFIG -->
  <property>
    <name>dfs.namenode.kerberos.principal</name>
    <value>hdfs/[email protected]</value> 
  </property>
</configuration>

Polybase Exception

[2018-06-22 12:51:50,349] WARN  2872[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:53,568] WARN  6091[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:56,127] WARN  8650[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:58,998] WARN 11521[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:59,139] WARN 11662[main] - org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:676) - Couldn't setup connection for [email protected] to IPADRESS_OF_NAMENODE:8020
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

Log Entry on NameNode

Socket Reader #1 for port 8020: readAndProcess from client IP-ADRESS_SQL-SERVER threw exception [javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: AES128 CTS mode with HMAC SHA1-96 encryption type not in permitted_enctypes list)]]

Auth failed for IP-ADRESS_SQL-SERVER:60484:null (GSS initiate failed) with true cause: (GSS initiate failed)

The confusing part for me is the log entry from our NameNode because AES128 CTS mode with HMAC SHA1-96 is already in the list of permitted enctypes as shown in krb5.conf and in Cloudera Manager UI

Cloudera Manager UI krb_enc_types

We appreciate your help!

Answer

Gigi picture Gigi · Jul 2, 2018

The problem has itself taken care after we restarted the cluster. I think the problem was that the krb5.conf file in our Hadoop-Cluster could not be distributed on all nodes because of some running services. There was also a warning in the Cloudera Manager about a stale configuration regarding Kerberos. Many thanks to everyone!