We want to connect our SQL Server 2016 Enterprise via Polybase with our Kerberized OnPrem Hadoop-Cluster with Cloudera 5.14.
I followed the Microsoft PolyBase Guide to configure Polybase. After working few days on this topic I'm not able to continue because of an exception: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
Microsoft has an built in diagnostic tool for troubleshooting the connectivity with PolyBase and Kerberos. On this troubleshooting guide from Microsoft there are 4 checkpoints and I'm stuck on checkpoint 4. Short information about the checkpoints (where I'm successfull):
[libdefaults]
default_realm = COMPANY.REALM.COM
dns_lookup_kdc = false
dns_lookup_realm = false
ticket_lifetime = 86400
renew_lifetime = 604800
forwardable = true
default_tgs_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
default_tkt_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
permitted_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
udp_preference_limit = 1
kdc_timeout = 3000
[realms]
COMPANY.REALM.COM = {
kdc = ipadress.kdc.host
admin_server = ipadress.kdc.host
}
[logging]
default = FILE:/var/log/krb5/kdc.log
kdc = FILE:/var/log/krb5/kdc.log
admin_server = FILE:/var/log/krb5/kadmind.log
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>ipc.client.connect.max.retries</name>
<value>2</value>
</property>
<property>
<name>ipc.client.connect.max.retries.on.timeouts</name>
<value>2</value>
</property>
<!-- kerberos security information, PLEASE FILL THESE IN ACCORDING TO HADOOP CLUSTER CONFIG -->
<property>
<name>polybase.kerberos.realm</name>
<value>COMPANY.REALM.COM</value>
</property>
<property>
<name>polybase.kerberos.kdchost</name>
<value>ipadress.kdc.host</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>KERBEROS</value>
</property>
</configuration>
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.block.size</name>
<value>268435456</value>
</property>
<!-- Client side file system caching is disabled below for credential refresh and
settting the below cache disabled options to true might result in
stale credentials when an alter credential or alter datasource is performed
-->
<property>
<name>fs.wasb.impl.disable.cache</name>
<value>true</value>
</property>
<property>
<name>fs.wasbs.impl.disable.cache</name>
<value>true</value>
</property>
<property>
<name>fs.asv.impl.disable.cache</name>
<value>true</value>
</property>
<property>
<name>fs.asvs.impl.disable.cache</name>
<value>true</value>
</property>
<property>
<name>fs.hdfs.impl.disable.cache</name>
<value>true</value>
</property>
<!-- kerberos security information, PLEASE FILL THESE IN ACCORDING TO HADOOP CLUSTER CONFIG -->
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/[email protected]</value>
</property>
</configuration>
[2018-06-22 12:51:50,349] WARN 2872[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:53,568] WARN 6091[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:56,127] WARN 8650[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:58,998] WARN 11521[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:59,139] WARN 11662[main] - org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:676) - Couldn't setup connection for [email protected] to IPADRESS_OF_NAMENODE:8020
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
Socket Reader #1 for port 8020: readAndProcess from client IP-ADRESS_SQL-SERVER threw exception [javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: AES128 CTS mode with HMAC SHA1-96 encryption type not in permitted_enctypes list)]]
Auth failed for IP-ADRESS_SQL-SERVER:60484:null (GSS initiate failed) with true cause: (GSS initiate failed)
The confusing part for me is the log entry from our NameNode because AES128 CTS mode with HMAC SHA1-96 is already in the list of permitted enctypes as shown in krb5.conf and in Cloudera Manager UI
We appreciate your help!
The problem has itself taken care after we restarted the cluster. I think the problem was that the krb5.conf file in our Hadoop-Cluster could not be distributed on all nodes because of some running services. There was also a warning in the Cloudera Manager about a stale configuration regarding Kerberos. Many thanks to everyone!