CDH集群部署PySpark

kamisamak 发布于 2020-06-05 1686 次阅读


https://docs.cloudera.com/documentation/enterprise/latest/topics/spark_python.html
python环境为3.7.2,通过Anaconda-5.3.1-el7.parcel部署安装

 

在CM配置Spark的Python环境,并重启相关服务


if [ -z "${PYSPARK_PYTHON}" ]; then
export PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda-5.3.1/bin/python
export PYSPARK_DRIVER_PYTHON=/opt/cloudera/parcels/Anaconda-5.3.1/bin/python
fi

使用Pyspark命令测试