Getting the Spark Demo to Work

Spark is an open-source cluster computing system that provides primitives for in-memory computing and thus for certain tasks may be superior to a system like Hadoop, which has to keep going back and forth to disk. Spark is written in Scala and is, at the time of this writing, in Apache incubation. The documentation is comprehensive and the getting started instructions make it sound like the basics should be up and running in a few minutes without any effort. Of course it’s never quite that easy. Here is a detailed account of the problems I encountered getting the Spark demo to run and how I worked around them.

According to the instructions, you download the source and build with the Simple Build Tool. This worked as advertised for me. The next step is to run an example program that estimates the value of π. The command line is ./run spark.examples.SparkPi. When I copied this into my terminal I got the following.

./run spark.examples.SparkPi
 SCALA_HOME is not set

Fair enough: the documentation says that SCALA_HOME is required. It is not sufficient to have scala on your path. I do not have this environment variable set on my machine. Digging in to the run script, there is a SPARK_LAUNCH_WITH_SCALA option which will infer the Scala home directory from the executable. Here is what happens if I set it.

SPARK_LAUNCH_WITH_SCALA=1 ./run spark.examples.SparkPi local
 java.lang.ClassNotFoundException: scala.reflect.ClassManifest
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
...

At the time of this writing, Spark requires Scala version 2.9.3. It is not compatible with later versions of Scala. If you have Scala 2.9.3 on your machine, the above command should work. I, however, am using Homebrew to manage the installation of Scala on my Mac, and it has installed the latest version, which is 2.10.2, hence the error. The same thing happens if you point SCALA_HOME to a non-2.9.3 directory.

(By the way, if you point SCALA_HOME at a directory that doesn’t contain a Scala installation you see the following error.

SCALA_HOME=. ./run spark.examples.SparkPi local
Exception in thread "main" java.lang.NoClassDefFoundError: scala/ScalaObject
	at java.lang.ClassLoader.defineClass1(Native Method)
...

To recap, “java.lang.ClassNotFoundException: scala.reflect.ClassManifest” means incorrect Scala version and “java.lang.NoClassDefFoundError: scala/ScalaObject” means not a Scala directory.)

The solution is to install Scala 2.9.3. Multiple version support in Homebrew is tricky, so I decided the easiest thing to do was download 2.9.3 directly from the Scala website. I unzipped the tarball into a scala-2.9.3 directory alongside my Spark install.

SCALA_HOME=../scala-2.9.3/ ./run spark.examples.SparkPi
 Usage: SparkPi <master> [<slices>]

That looks much better. The final step is to give it the command line option local, which runs things on the local machine.

SCALA_HOME=../scala-2.9.3/ ./run spark.examples.SparkPi local
 ...
 Pi is roughly 3.13908

Once Spark has forward compatibility with Scala, this trickiness should go away. Until then, these are snags to be aware of.

Advertisements
This entry was posted in Those that have just broken the flower vase. Bookmark the permalink.

4 Responses to Getting the Spark Demo to Work

  1. Pingback: Example Python Machine Learning Algorithm on Spark | Corner Cases

  2. Kanwaldeep Dang says:

    I’ve setup Scala 2.9.3 and have the environment variable set but still having the error “Exception in thread “main” java.lang.NoClassDefFoundError: scala/ScalaObject”

    mseakdang:scala-2.9.3 kanwaldeep.dang$ echo $SCALA_HOME
    /Users/kanwaldeep.dang/spark-0.8.0-incubating-bin-cdh4/scala-2.9.3/
    mseakdang:scala-2.9.3 kanwaldeep.dang$ cd $SCALA_HOME
    mseakdang:scala-2.9.3 kanwaldeep.dang$ ls -l
    total 0
    drwxr-xr-x@ 12 kanwaldeep.dang staff 408 Feb 25 2013 bin
    drwxr-xr-x@ 5 kanwaldeep.dang staff 170 Feb 25 2013 doc
    drwxr-xr-x@ 10 kanwaldeep.dang staff 340 Feb 25 2013 lib
    drwxr-xr-x@ 4 kanwaldeep.dang staff 136 Dec 6 01:13 man
    drwxr-xr-x@ 3 kanwaldeep.dang staff 102 Feb 25 2013 misc
    drwxr-xr-x@ 8 kanwaldeep.dang staff 272 Feb 25 2013 src

  3. W.P. McNeill says:

    That’s not one of the two exceptions I saw, so I don’t know what’s going on here. The fact that it’s unable to find the “main” method is usually an indication of some fundamental classpath error, but beyond that I’m out of my depth. It’s StackOverflow time.

  4. there are problems with Mesos too. I don’t know why they are not interested in sharing real& working instructions to get started with!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s