Spark 2.4 Connector Error using Spark Shell

Couchbase 6.0.1 is setup successfully in a one-node cluster and is up and running.

I am trying to connect to this cluster from spark shell of edge node having following config -
Spark 2.4.0-cdh6.2.1 Scala version 2.11.12

Below are the commands including errors -

$spark-shell --conf “spark.driver.extraJavaOptions=-Dhttp.proxyHost=<proxy_name> -Dhttp.proxyPort=80” --packages com.couchbase.client:spark-connector_2.12:2.4.0 --conf spark.couchbase.nodes=<node_name> --conf spark.couchbase.username=root --conf spark.couchbase.password= --conf “spark.couchbase.bucket.travel-sample=”

scala> import com.couchbase.spark._
import com.couchbase.spark._

scala> import com.couchbase.spark.sql._
import com.couchbase.spark.sql._

scala> import com.couchbase.client.java.document.JsonDocument
import com.couchbase.client.java.document.JsonDocument

scala> import com.couchbase.client.java.document.json.{JsonArray, JsonObject}
import com.couchbase.client.java.document.json.{JsonArray, JsonObject}

scala> import com.couchbase.client.java.query.N1qlQuery
import com.couchbase.client.java.query.N1qlQuery

scala> import com.couchbase.client.java.view.ViewQuery
import com.couchbase.client.java.view.ViewQuery

scala> val airlines = spark.read.couchbase(schemaFilter = org.apache.spark.sql.sources.EqualTo(“type”, “airline”))
java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;
at com.couchbase.spark.sql.N1QLRelation$.attrToFilter(N1QLRelation.scala:223)
at com.couchbase.spark.sql.N1QLRelation$.filterToExpression(N1QLRelation.scala:182)
at com.couchbase.spark.sql.DataFrameReaderFunctions.$anonfun$buildFrame$1(DataFrameReaderFunctions.scala:81)
at scala.Option.map(Option.scala:146)
at com.couchbase.spark.sql.DataFrameReaderFunctions.buildFrame(DataFrameReaderFunctions.scala:81)
at com.couchbase.spark.sql.DataFrameReaderFunctions.couchbase(DataFrameReaderFunctions.scala:61)
… 53 elided

scala> sc.couchbaseGet[JsonDocument](Seq(“airline_10123”, “airline_10748”)).collect().foreach(println)
java.lang.NoSuchMethodError: scala.Product.init(Lscala/Product;)V
at com.couchbase.spark.connection.Credential.(CouchbaseConfig.scala:21)
at com.couchbase.spark.connection.CouchbaseConfig$.apply(CouchbaseConfig.scala:96)
at com.couchbase.spark.rdd.KeyValueRDD.(KeyValueRDD.scala:48)
at com.couchbase.spark.SparkContextFunctions.couchbaseGet(SparkContextFunctions.scala:35)
… 53 elided

scala> sc.couchbaseView(ViewQuery.from(“airlines”, “by_name”).limit(10)).collect().foreach(println)
java.lang.NoSuchMethodError: scala.Product.init(Lscala/Product;)V
at com.couchbase.spark.connection.Credential.(CouchbaseConfig.scala:21)
at com.couchbase.spark.connection.CouchbaseConfig$.apply(CouchbaseConfig.scala:96)
at com.couchbase.spark.rdd.ViewRDD.(ViewRDD.scala:31)
at com.couchbase.spark.rdd.ViewRDD$.apply(ViewRDD.scala:44)
at com.couchbase.spark.SparkContextFunctions.couchbaseView(SparkContextFunctions.scala:73)
… 53 elided

scala>

If this is due to version incompatibility of Scala 2.11 for Spark 2.4 Connector, please provide me the coordinates of a connector version that works with the Spark2.4 and Scala 2.11 ( configurations provided at the beginning and below as well ).
Else kindly confirm that there is no Spark connector that is compatible with the configuration -
Spark 2.4.0-cdh6.2.1 Scala version 2.11.12

Thanks

Hi @JD
Our Spark Connector 2.4 is build against Scala 2.12 only currently, because Spark themselves appeared to be going down the 2.12 route for Spark 2.4. However, recent versions of Spark are now on Scala 2.11 by default, and we do not currently have a Spark Connector 2.4 for 2.11 - though there is a ticket (https://issues.couchbase.com/browse/SPARKC-102) open for it.

Your solutions/workarounds are currently:

  1. Use Scala 2.12, which will let you use the latest and greatest of everything.
  2. Drop back to Spark Connector 2.3, for which we have a Scala 2.11 build.
  3. Wait for SPARKC-102 to be completed which will provide Spark Connector 2.4 on Scala 2.11, though I can’t offer an ETA on that at this time.
1 Like

Hello Graham,

Thanks for your reply which is indeed very helpful.

Few questions/inputs on the workarounds -
Workaround 1 - I agree
Workaround 2 - With our current edge node configuration of Spark 2.4.0-cdh6.2.1 Scala version 2.11.12, Spark Connector 2.3 will not work with Spark 2.4.0 as the target version is Spark 2.3.x.
Kindly confirm.
Workaround 3 - I went through the ticket details that you provided above in the link.
Will the solution cater to Spark 2.4.3 only or Spark 2.4.x ? We are using Spark 2.4.0.
So, once the build is available CB will release the Maven co-ordinates of SPARKC-102 for us to use it against Spark 2.4/Scala 2.11 right ?

Regards
JD

Workaround 2 - With our current edge node configuration of Spark 2.4.0-cdh6.2.1 Scala version 2.11.12, Spark Connector 2.3 will not work with Spark 2.4.0 as the target version is Spark 2.3.x.

Yes that’s correct.

Will the solution cater to Spark 2.4.3 only or Spark 2.4.x ? We are using Spark 2.4.0.
So, once the build is available CB will release the Maven co-ordinates of SPARKC-102 for us to use it against Spark 2.4/Scala 2.11 right ?

The intent is to provide a CB Spark Connector 2.4 built against the Spark 2.4.X series, for both 2.11 and 2.12. I’ll use the latest available Spark for building and testing, and that’s what will be supported. That said, I expect it will work just fine against Spark 2.4.0 too.

For release, there will be two CB Spark Connector 2.4.1 releases to Maven, in 2.11 and 2.12 flavours. SBT has a convenient %%% shorthand that picks the correct Scala 2.X variant, otherwise for Maven and gradle you just explicitly say which one you want - all this will be in the docs, of course.

1 Like