iPython integration with Spark in Cloudera

1. Download Anaconda parcel and Manifest files from https://repo.continuum.io/pkgs/misc/parcels/
2. Start Python server in the same directory by running “python -m SimpleHTTPServer 8800”
3. Configure server details from above step in Cloudera Manager parcels and “check for updates”

4. Click on Download then distribute the downloaded parcel to all the nodes

5. Activate the Anaconda parcel

6. Go to the Spark service, then Click the Configuration tab and Search for Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh. Add below variables to the property.

==========On the driver host==========
export PYSPARK_DRIVER_PYTHON=/opt/cloudera/parcels/Anaconda/bin/jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False --NotebookApp.ip='*' --NotebookApp.port=8880"

========== On the driver and executor hosts ==========

export PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python

7. Click Save Changes to commit the changes.
8. Restart the service.
9. Deploy the client configuration


Open http://<hostname>:8880 in a browser to access the iPython web console

Next Post »