Strange error during Java Transactions when a single object participates in many transactions in parallel

We are running at about 80 tests including unit and integration tests centered around a single object that updates a counter. This is in an invoicing module. Tests are very stable with the transactions API when not running in parallel.

Then we decided to test parallel calls and we see failing tests as follows:

RawJsonTranscoder can only decode into either byte[] or String!
com.couchbase.client.core.error.DecodingFailureException: RawJsonTranscoder can only decode into either byte[] or String!
	at com.couchbase.client.java.codec.RawJsonTranscoder.decode(RawJsonTranscoder.java:56)
	at com.couchbase.transactions.TransactionGetResult.contentAs(TransactionGetResult.java:141)
	at com.couchbase.transactions.components.DocumentGetter.lambda$atrFound$9(DocumentGetter.java:241)
	at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:44)
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:153)
	at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:56)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:150)
	at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:73)
	at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:114)
	at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:73)
	at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.onNext(FluxDoFinally.java:123)
	at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1782)
	at com.couchbase.client.core.Reactor$SilentMonoCompletionStage.lambda$subscribe$0(Reactor.java:178)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
	at com.couchbase.client.core.msg.BaseRequest.succeed(BaseRequest.java:143)
	at com.couchbase.client.core.io.netty.kv.KeyValueMessageHandler.decodeAndComplete(KeyValueMessageHandler.java:322)
	at com.couchbase.client.core.io.netty.kv.KeyValueMessageHandler.decode(KeyValueMessageHandler.java:301)
	at com.couchbase.client.core.io.netty.kv.KeyValueMessageHandler.channelRead(KeyValueMessageHandler.java:228)
	at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at com.couchbase.client.core.io.netty.kv.MemcacheProtocolVerificationHandler.channelRead(MemcacheProtocolVerificationHandler.java:84)
	at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at com.couchbase.client.core.deps.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
	at com.couchbase.client.core.deps.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
	at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at com.couchbase.client.core.deps.io.netty.handler.flush.FlushConsolidationHandler.channelRead(FlushConsolidationHandler.java:152)
	at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
	at com.couchbase.client.core.deps.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
	at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
	at com.couchbase.client.core.deps.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at com.couchbase.client.core.deps.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
	at com.couchbase.client.core.deps.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
	at com.couchbase.client.core.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
	at com.couchbase.client.core.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
	at com.couchbase.client.core.deps.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
	at com.couchbase.client.core.deps.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at com.couchbase.client.core.deps.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at com.couchbase.client.core.deps.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:834)
	Suppressed: com.couchbase.client.core.error.DecodingFailureException: RawJsonTranscoder can only decode into either byte[] or String!
		... 50 more

The above logs appear from time to time, very hectic. We think that it relates to transactions running in parallel for the same object.

Debugging into com.couchbase.client.java.codec.RawJsonTranscoder.decode we see that the RawJsonTranscoder is used as we configured, but somehow final Class<T> target is a JsonObject.class set by the following code:

default:
                                if (doc.links().isDeleted() || doc.contentAs(JsonObject.class).isEmpty()) {
                                    // This document is being inserted, so shouldn't be visible yet
                                    return Mono.just(Optional.empty());
                                } else {

We kindly appreciate any tip you could give to identify the root cause.

Thanks!

Hi @zoltan.zvara
That’s interesting, you certainly should be able to write the same doc from multiple concurrent transactions. It will increase latency of course, as all but one of the transactions will have to backoff and will sometimes need to retry. And occasionally you may see a transaction fail if it’s unable to win and lock the document inside its expiration time.

Ah, I think I maybe see the problem. So you’ve specified RawJsonTranscoder as the default transcoder at the environment level, is that right? Yes, I’m not sure that’s been tested. At that place in the code we’ve found a document that’s in another transaction, and we’re doing a check to decode the content into JSON and see if it’s an empty document, which was the way inserts were handled in the first version of the transactions protocol. It looks like that assumes a transcoder is being used that can output a JsonObject, and that’s not the case for RawJsonTranscoder.

So, this looks like a bug, I’ll create a ticket for it.

1 Like

Ah, I think I maybe see the problem. So you’ve specified RawJsonTranscoder as the default transcoder at the environment level, is that right?

Yes, that is it!

It looks like that assumes a transcoder is being used that can output a JsonObject, and that’s not the case for RawJsonTranscoder.

This is what I thought as well. Do you publish a hot-fix patch for this?

I’ll see if I can get it into the next train release, which is meant to happen on the 3rd Tuesday of each month.

1 Like

Sounds good!

I have removed the RawJsonTranscoder from the 1. Scala Cluster creation and the 2. Java Cluster creation as well.

  1. Scala Cluster is used mainly since the application is written in Scala. This change even simplified a few things, but I have to call serialization explicitly at other places, for example with Couchbase Spark Connector or Kafka Producer.

  2. Java Cluster is used to create the Transactions object (no Scala library for that). There currently I have to serialize back and forth on KV operations, since now, with the default transcoder supports JsonObject only, insert and other KV operations accept JsonObject. (Remember I switched to the default transcoder to mitigate the said bug in the first place.)

if (doc.links().isDeleted() || doc.contentAs(JsonObject.class).isEmpty()) {

Since we don’t serialize to JsonObject, we write back and forth our Entity. This way our serialization is pluggable with other systems and with this workaround, it works with Transactions (but being very expensive).

transactionContext.insert(
      javaCollection,
      entity.ID.stringify,
      JsonObject.fromJson(formatted.writeEntity(entity))
    )

Thanks!

@zoltan.zvara understood, thanks for the insight.

This change is more impactful than I first thought, as the simple fix (remove the contentAs call) drops any backwards compatibility with existing protocol 1.0 transactions. So I’m not going to get it into the next train release at this point, sorry, as I need to give it a bit more consideration. I’ll see if I can keep the compatibility, but do it more efficiently - and of course without the breakage you’ve seen. Hopefully I’ll have it out on the subsequent train release.

1 Like

Sorry @zoltan.zvara, two big new features in this train release (queries in transactions, and custom metadata collections) consumed all my time, and tbh it’s unlikely now I’ll be able to get this resolved this year. Apologies, but it is on the roadmap.

@graham.pople My post above solves the problem, but still, performance-wise it is slow. Thanks for the update!