Spark Connector for dotnet

I’m trying to write data to Couchbase 6.0.2 using Couchbase-Spark-Connector_2.12-2.4.1. I have Spark 2.4.1 installed.
While save following exception is thrown:

[JvmBridge] java.lang.NoClassDefFoundError: com/couchbase/client/core/CouchbaseException
        at java.lang.Class.getDeclaredConstructors0(Native Method)
        at java.lang.Class.privateGetDeclaredConstructors(Unknown Source)
        at java.lang.Class.getConstructor0(Unknown Source)
        at java.lang.Class.newInstance(Unknown Source)
        at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:519)
        at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:281)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)

The code is as follows:

var dictionary = new Dictionary<string, string>
                    {"spark.couchbase.nodes", ""},
                    {"spark.couchbase.username", "<uname>"},
                    {"spark.couchbase.password", "<pwd>"},
                    {"spark.couchbase.bucket.<bucketName>", ""}
                    .Option("idField", "1")

Let me know what i’m missing or doing wrong?

I’m not familiar with spark for .net, but the exception above indicates that somehow core-io (a dependency of the java-client) is not available on your classpath. How are you deploying it and making sure all the dependencies are on all the executors?

The way i run it is as follows on my local machine:

% SPARK_HOME%\bin\spark-submit --packages com.couchbase.client:spark-connector:2.12-2.4.0 --class org.apache.spark.deploy.dotnet.DotnetRunner --master local microsoft-spark-2.4.x-0.11.0.jar dotnet <DLL_Name>.dll

Where SPARK_HOME is set to C:\bin\spark-2.4.1-bin-hadoop2.7\\bin
Maven picks up the dependencies specified in packages and sets it at current user .Ivy cache directory.
All dependencies are extracted but don’t understand what might be missing. Can you point out actual dependency name?
If this output helps:

com.couchbase.client#spark-connector_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-67edc361-b657-4b4e-8d6f-dc8d946a848a;1.0
confs: [default]
found com.couchbase.client#spark-connector_2.12;2.4.0 in central
found com.couchbase.client#java-client;2.7.6 in central
found com.couchbase.client#core-io;1.7.6 in central
found io.reactivex#rxjava;1.3.8 in central
found io.opentracing#opentracing-api;0.31.0 in central
found com.couchbase.client#dcp-client;0.23.0 in central
found io.reactivex#rxscala_2.12;0.26.5 in central
found org.apache.logging.log4j#log4j-api;2.2 in central
:: resolution report :: resolve 801ms :: artifacts dl 17ms
:: modules in use:
com.couchbase.client#core-io;1.7.6 from central in [default]
com.couchbase.client#dcp-client;0.23.0 from central in [default]
com.couchbase.client#java-client;2.7.6 from central in [default]
com.couchbase.client#spark-connector_2.12;2.4.0 from central in [default]
io.opentracing#opentracing-api;0.31.0 from central in [default]
io.reactivex#rxjava;1.3.8 from central in [default]
io.reactivex#rxscala_2.12;0.26.5 from central in [default]
org.apache.logging.log4j#log4j-api;2.2 from central in [default]
:: evicted modules:
com.couchbase.client#core-io;1.7.2 by [com.couchbase.client#core-io;1.7.6] in [default]
io.reactivex#rxjava;1.2.4 by [io.reactivex#rxjava;1.3.8] in [default]
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
| default | 10 | 0 | 0 | 2 || 8 | 0 |

Modified few things now the error says:

java.lang.NoSuchMethodError: scala.Product.init(Lscala/Product;)V
at com.couchbase.spark.connection.Credential.(CouchbaseConfig.scala:21)

Does it means any version conflict? I’m using Couchbase 6.5.1 and com.couchbase.client:spark-connector_2.12:2.4.0

@SumitMeh I think you need to use the scala_2.11 version, IIRC your spark version by default is built with scala 2.11

1 Like

@daschl, Thanks. That was indeed the last thing and you directed at it correctly.
Another thing was that following jars have to be available under spark-2.4.1-bin-hadoop2.7\jars :


I thought resolving dependencies via Maven would include them in current execution.