How To Run Scala In the Jupyter Notebook

I recently came across the need to run Scala programs in a notebook, for this Azure Notebook is readily available however it is a costly solution for individuals who want to play around. The Jupyter Notebook is one of the most used tools in data science projects. It’s a great tool for developing software in Python and has great support for that. It can also be used for scale development with the spylon-kernel.

Writing this blog for all the individuals who need to run the Scala programs on Jupyter Notebook.

There is a utility called spylon kernel which helps Scala to run on Jupyter.

Prerequisite:

Software –

  1. Spark (http://spark.apache.org/downloads.html)
  2. Hadoop (http://media.sundog-soft.com/Udemy/winutils.exe)
  3. JDK (https://www.oracle.com/in/java/technologies/downloads/)

Once you have downloaded all the Software listed above you would need to make certain modifications, and they are listed below:

Spark:

  1. Create a Folder in Spark on C drive and copy all the content from .tar to a newly created folder.
  2. Rename log4j.properties. template to log4j.properties
  3. Edit the same file and replace log4j.rootCategory=INFO, console to log4j.rootCategory=ERROR, console and then save and close the file.

Hadoop:

We are doing the below steps to execute spark programs on our local machine.

  1. After you’ve successfully downloaded winutils create a folder on c drive winutils\bin and tmp\hive.
  2. Paste wintils.exe in the bin folder.
  3. open a command prompt and run the following commands.

    – cd c:\winutils\bin

    – winutils.exe chmod 777 \tmp\hive

  1. It should run successfully.

Environment variables:

  1. SPARK_HOME: eg: “C:\Spark”
  2. HADOOP_HOME: “C:\winutils”
  3. JAVA_HOME

(all paths should contain a path to the folder not to the bin)

After you have set the env variables, run and check if the spark is running.

– cd c:\spark

– pyspark

If everything is OK, you should see an output like the image below.

 

For Jupyter Scala, open the Anaconda prompt and run the following commands.

pip install spylon-kernel python -m spylon_kernel install Jupyter notebook

Once the installation is complete you can see the spylon-kernel in a New file dropdown.

 

If everything goes well the scala snippets should run like Usain Bolt (Pun Intended). If in case, it does not run, you would need to perform some additional steps and they are as follows.

You need to copy the content from the following zip file

“C:\Spark\python\lib\py4j-0.10.8.1-src.zip”

“C:\Spark\python\lib\pyspark.zip”

TO

\anaconda\Lib\site-packages

That’s It Enjoy SCALA with Jupyter.



7 Comments

Leave a Reply