Unable to replace and ignore DocumentNotFoundException since spark connector 3.x?

Hi,

I’m trying to replace a set of documents in couchbase, but want to skip documents which do not exist in the bucket.

When using spark connector 2.4 I used StoreMode.REPLACE_AND_IGNORE and that worked fine. But now I’m trying to upgrade to the newer 3.0 spark connector. In the migration guide it states that I should use couchbaseReplace, but there seems to be no option to ignore DocumentNotFoundException.

It seems that my whole job fails if one of the documents is missing.

Is there a way to be able to do this? Thanks!

@DemonTPx woops, you are right we missed that equivalent. We just released 3.2.1 so there is no immediate release planned, but I’ll make sure it gets rolled into the next one. As a workaround, you could perform N couchbaseReplace operations and handle the catch yourself?

Filed https://issues.couchbase.com/browse/SPARKC-157 to track this.

Thanks @daschl ! :slight_smile:

While I can wait for the new version, I did try and look at the workaround you’re suggesting, but I can’t figure out how to implement it. I have something like this:

spark.sql(...)
  .as[Data]
  .rdd
  .map(data => Replace("i::#" + data.itemId, formatContent(data)))
  .couchbaseReplace(Keyspace(bucket = Some(arguments.couchbaseBucket())))
  .collect

The SQL query might return up to 60 million rows. How do I convert this to N couchbaseReplace operations? Will it perform similar to one operation? And where and how do I catch the exception?

If you think it’s too much hassle for the workaround to work or will not be perfomant with 60M rows, I’ll just wait for a fix. :slight_smile: