Upgrade Spark Couchbase Connector to Java Client version 3.0.0+

Hi!

Is there any effort channeled to upgrade the Java Client within the Spark Couchbase Connector to version 3.0.0?

Update: I have just checked the complexity of upgrading to Core IO 2.0.0. It’s messy. A lot of things have changed. Upgrading to Java Client 3.0.0 (Scala client for that matter) is blocked since we rely on the Couchbase Spark Connector in our project as well. Upgrading to DCP client 0.28.0 and Spark 3.0.0 release is easy, however, and all tests succeed locally. See [1].

[1] https://github.com/enlivensystems/couchbase-spark-connector

Thanks,
Zoltán

Hi Zoltán,

It is something we want to get to. In fact we’d like to rewrite it to use the Scala SDK that’s now available. But I have to be honest and say that it is not really on a confirmed roadmap at present.
I’d be very interested in seeing your 0.28.0 and Spark 3.0.0 changes as contributions, so we can attribute that work to you. (There’s a CONTRIBUTING.md guide in the repo)

regards,
Graham

1 Like

Hi Graham!

Thanks for an update on this issue!

I have contributed in the past to the DCP library. There, I personally found it cumbersome to contribute using Gerrit, but I would be happy if you could pull my work from [1].

In addition to these changes, I made an attempt to shade (shadow/relocate) the Core IO and Java Client libraries in the Couchbase Spark JAR that is authored through Gradle publish. I observed that not all references to that shadowed classes were correctly relocated. To me, the effect seemed weird, since relocating classes is not rocket science, moreover, some references were indeed relocated and some others are not, all of which belong to the same package. I suspect that something is not working quite right with Gradle’s shadow plugin.

Nevertheless, I upgraded the whole project to an SBT one, where I used the SBT assembly plugin in order to attempt a class relocation, again. There, relocation failed again. I noticed that SBT assembly and Gradle shadow uses Jar Jar Links, which have some limitations on relocations when rewriting import in Scala classes.

I’ll make further attempts to shade Core IS and Java Client libraries. That would mitigate the problem of conflicting Core IO and Java Client libraries when we want to depend on newer versions of Couchbase Java packages.

[1] https://github.com/enlivensystems/couchbase-spark-connector

Thanks,
Zoltán

2 Likes

Hi Zoltán,

On the shadowing - I have some memories that problems with the shadowing plugin was the reason we had to abandon our experiment of using Gradle for couchbase-jvm-clients in the end, and return to Maven. @daschl do you recall if that was the case?

Side note - that said, I’ve also had problems with Maven’s shadowing plugin when using it with Scala. Perhaps that also comes down to this same Jar Jar Links issue? So if we were to look at a shadowing approach, I’d be concerned about it not working when we did get to rewriting the connector atop the Scala SDK. (Though the need for shadowing would also be much reduced at that point too.)

I’m sorry that you find the gerrit contribution process cumbersome. I can take a look at your changes and see if I can add them to the Spark Connector, and credit you in the notes. Out of interest, and because we are always keen to find ways of encouraging community contributions, would you submit it if we had a Github PR process instead?

regards,
Graham

1 Like

Yeah we ran into a couple issues with shadowing (and renaming, assets etc), that’s why we stuck with maven for now. It seems to do the job good enough, even though its also not perfect.

1 Like

It all comes down to Jar Jar Links. Since SBT assembly is moved to Jar Jar Abrams [2], it now fixes the shading issues around Scala classes that are affected in the Couchbase Spark Connector as well. Gradle, as I have mentioned was a no go. Maven did not work either.

For this reason, we upgraded the build model to SBT. [1] It sucessfully shades the old libraries and we upgraded our clients to Scala version 1.0.5. This is how we depend on it currently. (Resolver is not public.)

val couchbaseSpark = ("com.couchbase.client" %% "spark-connector" % "3.0.2")
    .classifier("shaded")
    .excludeAll(
      ExclusionRule("org.apache.spark")
    )

Highlights:

  • DCP version 0.28.0.
  • Java client 2.7.15.

Shade rules in the SBT build model of Spark Couchbase Connector:


  assemblyShadeRules in assembly := Seq(
    ShadeRule
      .rename("com.couchbase.client.java.**" -> "shaded.com.couchbase.client.java.@1")
      .inLibrary("com.couchbase.client" % "java-client" % "2.7.15")
      .inProject,
    ShadeRule
      .rename("com.couchbase.client.core.**" -> "shaded.com.couchbase.client.core.@1")
      .inAll,
    ShadeRule
      .rename("com.couchbase.client.encryption.**" -> "shaded.com.couchbase.client.encryption.@1")
      .inAll
  ),

Tests go through, including our own tests in all of our child projects.

Out of interest, and because we are always keen to find ways of encouraging community contributions, would you submit it if we had a Github PR process instead?

Yes, definitely.

P.S.: The new Java & Scala library design is really good.

[1] https://github.com/enlivensystems/couchbase-spark-connector
[2] https://github.com/sbt/sbt-assembly/pull/398

@zoltan.zvara very interesting, and thanks for sharing this in-depth. This a approach (a move to SBT, plus the shadowing) could be a promising direction for the eventual major connector upgrade. As mentioned I don’t have a roadmap for that currently - are you happy to continue with your fork in the interim?

I wasn’t aware of the Jar Jar Links vs Abrams thing (loving the names), that certainly explains the Scala shadowing pain we’ve seen. Unfortunately it also sounds like we won’t be able to resolve those pains in couchbase-jvm-clients with our current Maven setup anytime soon.

P.S.: The new Java & Scala library design is really good.

Thank you!