We have released Preview 2 of Kotlin for Apache Spark. First of all, we’d like to thank the community for providing us with all their feedback and even some pull requests! Now let’s take a look at what we have implemented since Preview 1?

Scala 2.11 and Spark 2.4.1+ support

The main change in this preview is the introduction of Spark 2.4 and Scala 2.11 support. This means that you can now run jobs written in Kotlin in your production environment.

The syntax remains the same as the Apache Spark 3.0 compatible version, but the installation procedure differs a bit. You can now use sbt to add the correct dependency, for example:

libraryDependencies += "org.jetbrains.kotlinx.spark" %% "kotlin-spark-api-2.4" % "1.0.0-preview2"

Of course, for other build systems, such as Maven, we would use the standard:

<dependency>
  <groupId>org.jetbrains.kotlinx.spark</groupId>
  <artifactId>kotlin-spark-api-2.4_2.11</artifactId>
  <version>1.0.0-preview2</version>
</dependency>

And for Gradle we would use:

implementation 'org.jetbrains.kotlinx.spark:kotlin-spark-api-2.4_2.11:1.0.0-preview2'

Spark 3 only supports Scala 2.12 for now, so there is no need to define it.

You can read more about dependencies here.

**Support for the custom **SparkSessionBuilder

Imagine you have a company-wide SparkSession.Builder that is set up to work with your YARN/Mesos/etc. cluster. You don’t want to create it, and the “withSpark” function doesn’t give you enough flexibility. Before Preview 2, you had only one option – you had to rewrite the whole builder from Scala to Kotlin. You now have another one, as the “withSpark” function accepts the SparkSession.Builder as an argument:

val builder = // obtain your builder here
withSpark(builder, logLevel = DEBUG) {
  // your spark is initialized with correct config here
}

#kotlin #spark

Kotlin for Apache Spark: One Step Closer to Your Production Cluster
1.35 GEEK