pyarrow error: toPandas attempted Arrow optimization

user5768866 picture user5768866 · Aug 28, 2018 · Viewed 7.2k times · Source

when I set pyarrow to true we using spark session, but when I run toPandas(), it throws the error:

"toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true. Please set it to false to disable this"

May I know why it happens?

Answer

Maneesh Bishnoi picture Maneesh Bishnoi · Aug 22, 2019

By default PyArrow is disabled but it seems in your case it is enabled, you have to manually disable this configuration either from the current spark application session or permanently from the Spark configuration file.

If you want to disable this for all of you spark sessions, add below line to your Spark configuration at SPARK_HOME/conf/spark-defaults .conf. spark.sql.execution.arrow.enabled=false

But I would suggest using PyArrow if you are using pandas in your spark application, it will speed the data conversion between spark and pandas.

For more on PyArrow please visit my blog.