1. Create kernel directory (pyspark is the kernel name)
mkdir -p ~/.ipython/kernels/pyspark
2. Create kernel file:
touch ~/.ipython/kernels/pyspark/kernel.json
3. Enter below code inside above file
{
"display_name": "pySpark (Spark 1.6.0)",
"language": "python",
"argv": [
"/opt/anaconda2/bin/python2.7",
"-m",
"IPython.kernel",
"-f",
"{connection_file}"
],
"env": {
"SPARK_HOME": "/opt/cloudera/parcels/CDH/lib/spark/",
"PYTHONPATH": "/opt/cloudera/parcels/CDH/lib/spark/python/:/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip",
"PYTHONSTARTUP": "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/shell.py",
"PYSPARK_SUBMIT_ARGS":"--master yarn --deploy-mode client pyspark-shell"
}
}
4. Create new IPython profile Create profile startup script
touch ~/.ipython/profile_pyspark/startup/00-pyspark-setup.py
5. Add below properties to above file
import os
import sys
spark_home = os.environ.get('SPARK_HOME', None)
if not spark_home:
raise ValueError('SPARK_HOME environment variable is not set')
sys.path.insert(0, os.path.join(spark_home, 'python'))
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.9-src.zip'))
execfile(os.path.join(spark_home, 'python/pyspark/shell.py'))
5. You can run this profile using following command
jupyter notebook
Sign up here with your email
Next
« Prev Post
« Prev Post
Previous
Next Post »
Next Post »
ConversionConversion EmoticonEmoticon