![]() Create a new notebook by clicking on ‘New’ > ‘Notebooks Python ’. Now, this command should start a Jupyter Notebook in your web browser. Restart (our just source) your terminal and launch PySpark: $ pyspark ![]() Just add these lines to your ~/.bashrc (or ~/.zshrc) file: export PYSPARK_DRIVER_PYTHON=jupyter export PYSPARK_DRIVER_PYTHON_OPTS='notebook' Now to run PySpark in Jupyter you’ll need to update the PySpark driver environment variables. To do so, configure your $PATH variables by adding the following lines in your ~/.bashrc (or ~/.zshrc) file: export SPARK_HOME=/opt/spark export PATH=$SPARK_HOME/bin:$PATH Unzip it and move it to your /opt folder: $ tar -xzf spark-2.3.0-bin-hadoop2.7.tgz $ mv spark-2.3.0-bin-hadoop2.7 /opt/spark-2.3.0Ĭreate a symbolic link (this will let you have multiple spark versions): $ ln -s /opt/spark-2.3.0 /opt/spark̀įinally, tell your bash (or zsh, etc.) where to find spark. If you want Hive support or more fancy stuff you will have to build your spark distribution by your own -> Build Spark. ![]() Select the latest Spark release, a prebuilt package for Hadoop, and download it directly. Of course, you will also need Python (I recommend > Python 3.5 from Anaconda). Make sure you have Java 8 or higher installed on your computer. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |