Pyhive, SASL and Python 3.5

Thomas Bury picture Thomas Bury · Jun 13, 2017 · Viewed 7.9k times · Source

I tried to set a hive connection as described here: How to Access Hive via Python? using the hive. Connection with python 3.5.2 (installed on a cloudera Linux BDA) but the SASL package seems to cause a problem. I saw on a forum that SASL is compatible only with 2.7 python. Is that right? What did I miss/do wrong?

from pyhive import hive
conn = hive.Connection(host="myserver", port=10000)
import pandas as pd

Error message

TTransportException Traceback (most recent call last)
in ()
1 from pyhive import hive
2 #conn = hive.Connection(host="myserver", port=10000)
----> 3 conn = hive.Connection(host="myserver")
4 import pandas as pd

/opt/anaconda3/lib/python3.5/site-packages/pyhive/hive.py in init(self, host, port, username, database, auth, configuration)
102
103 try:
--> 104 self._transport.open()
105 open_session_req = ttypes.TOpenSessionReq(
106 client_protocol=protocol_version,

/opt/anaconda3/lib/python3.5/site-packages/thrift_sasl/init.py in open(self)
70 if not ret:
71 raise TTransportException(type=TTransportException.NOT_OPEN,
---> **72 message=("Could not start SASL: %s" % self.sasl.getError()))**
73
74 # Send initial response

TTransportException: TTransportException(message="Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found'", type=1)

Answer

Thomas Bury picture Thomas Bury · Jun 20, 2017

We (I should say, IT-team) find a solution

Upgrade of python packages thrift (to version 0.10.0) and PyHive (to version 0.3.0) don’t know why the version we used wasn’t the latest.

Added the following:

<property>
<name>hive.server2.authentication</name>
<value>NOSASL</value>
</property>

To the following Hive config parameters in Cloudera Manager:

HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site.xml Hive Client Advanced Configuration Snippet (Safety Valve) for hive-site.xml necessary so that HUE would work

from pyhive import hive
conn = hive.Connection(host="myserver", auth='NOSASL')
import pandas as pd
import sys

df = pd.read_sql("SELECT * FROM my_table", conn) 
print(sys.getsizeof(df))
df.head()

worked without problem/error.

Best, Tom