OverflowException when using PersistTo.ACTIVE with sdk 3.1.4

I was trying to use the newest Java SDK (3.1.4) and noticed the strange reactor.core.Exceptions$OverflowException being occasionally thrown by the client. Full stack trace:

java.util.concurrent.CompletionException: reactor.core.Exceptions$OverflowException: Could not emit value due to lack of requests
	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
	at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
	at reactor.core.publisher.MonoToCompletableFuture.onError(MonoToCompletableFuture.java:76)
	at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.onError(FluxDoFinally.java:136)
	at reactor.core.publisher.SerializedSubscriber.onError(SerializedSubscriber.java:124)
	at reactor.core.publisher.SerializedSubscriber.onError(SerializedSubscriber.java:124)
	at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.onError(FluxTimeout.java:218)
	at reactor.core.publisher.MonoIgnoreElements$IgnoreElementsSubscriber.onError(MonoIgnoreElements.java:83)
	at reactor.core.publisher.FluxTake$TakeSubscriber.onError(FluxTake.java:143)
	at reactor.core.publisher.FluxSkipWhile$SkipWhileSubscriber.onError(FluxSkipWhile.java:152)
	at reactor.core.publisher.SerializedSubscriber.onError(SerializedSubscriber.java:124)
	at reactor.core.publisher.FluxRepeatWhen$RepeatWhenMainSubscriber.whenError(FluxRepeatWhen.java:194)
	at reactor.core.publisher.FluxRepeatWhen$RepeatWhenOtherSubscriber.onError(FluxRepeatWhen.java:244)
	at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.innerError(FluxConcatMap.java:308)
	at reactor.core.publisher.FluxConcatMap$ConcatMapInner.onError(FluxConcatMap.java:872)
	at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:127)
	at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
	at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: reactor.core.Exceptions$OverflowException: Could not emit value due to lack of requests
	at reactor.core.Exceptions.failWithOverflow(Exceptions.java:233)
	... 8 more

It occurs only when using PersistTo.ACTIVE or PersistTo.ONE (possibly higher numbers as well but I tested it on a single node only). With PersistTo.NONE it works fine.

Tested with Couchbase community 6.0 and SDK 3.1.4.

Here is a test that I used to reliably reproduce the issue:

    @Test
    public void reproduce() {
        final Cluster cluster = Cluster.connect(NODE_ADDRESSES, USERNAME, PASSWORD);
        final AsyncCollection collection = cluster.async().bucket(BUCKET_NAME).defaultCollection();

        final UpsertOptions options = UpsertOptions.upsertOptions().durability(PersistTo.ACTIVE, ReplicateTo.NONE);

        for (int i = 0; i < 10000; i++) {
            try {
                collection.upsert(UUID.randomUUID().toString(), JsonObject.create().put("foo", "bar"), options).join();
            } catch (final Exception ex) {
                System.out.println("Exception in iteration " + i);
                ex.printStackTrace();
            }
        }
    }

And here is how I set up a Couchbase instance:

docker run -d --name couchbase -p 8091-8094:8091-8094 -p 11210:11210 couchbase:community-6.0.0

docker exec -it couchbase couchbase-cli cluster-init -c localhost:8091 --cluster-username "$COUCHBASE_USER" --cluster-password "$COUCHBASE_PASSWORD" --cluster-ramsize=600

docker exec -it couchbase couchbase-cli bucket-create -c localhost:8091 --bucket default --bucket-type couchbase --bucket-ramsize 500 --bucket-replica 1 --bucket-priority high --username "$COUCHBASE_USER" --password "$COUCHBASE_PASSWORD"

On my machine, exceptions in the test above occur in average once every ~800 upserts. It seemed more frequent in my normal application tests where I used multiple operations, and even more frequent on slower machines in our CI pipeline.

The same thing happens when reactive API is used (cluster.reactive() instead of cluster.async()). I also did similar test using old sdk (2.7) and nothing like this was observed.

Did anyone experience similar issues? I will kindly appreciate any help since we are stuck with sdk 2.7 because of this.

Thanks.

@kstrek indeed, this looks like an issue with the client. Thanks for reporting we’ll investigate.

Turns out its a race in reactor, but with the help of the reactor team I think I know which workaround to apply so it won’t happen anymore.

edit: tracking ticket https://issues.couchbase.com/browse/JVMCBC-967

Hello @daschl , what is the workaround for SDK user, we upgraded to 3.1.4 and encountered the same issue

@chentaoz the fix will be in 3.1.5 which we are planning to release next week. As a workaround you can use the new sync durability (DurabilityRequirements) vs. PersistTo/ReplicateTo.

1 Like

Hey @daschl , our server is 6.5. would 3.1.5 support 6.5. Or should we downgrade SDK to lower version to avoid the issue?

3.1.5 went out today, so you can pick it up - the issue should be fixed and you can use PersistTo/ReplicateTo

1 Like