UnambiguousTimeoutException with Java SDK

We are hitting the same endpoint multiple times and we get intermittent UnambiguousTimeoutException exception. We are running this on 10 core machine.

SDK Version: 3.2.6

Java Code:

ClusterEnvironment.Builder couchbaseEnvironmentBuilder = ClusterEnvironment.builder().ioConfig(IoConfig.enableMutationTokens(true).enableDnsSrv(false)).timeoutConfig(TimeoutConfig.connectTimeout(Duration.ofMinutes(2)));
      clusterEnvironment = couchbaseEnvironmentBuilder.build();
      paceCluster = Cluster.connect("xxxxxx110035.xxxx.com", ClusterOptions.clusterOptions(
          "xxxxxxxxxx",
          "xxxxxxxxxx")
        .environment(clusterEnvironment));
      paceAsyncBucket = paceCluster.bucket("pace").async();
      fallbackAsyncBucket = Cluster.connect("xxxxxx110035.xxxx.com", ClusterOptions.clusterOptions(
          "xxxxxxxxxx",
          "xxxxxxxxxx")
        .environment(clusterEnvironment)).bucket("pace").async();
CompletableFuture.allOf(
        paceAsyncBucket.waitUntilReady(Duration.ofMinutes(2)),
          fallbackAsyncBucket.waitUntilReady(Duration.ofMinutes(2)))
        .get();

java.util.concurrent.CompletionException: com.couchbase.client.core.error.UnambiguousTimeoutException: GetRequest, Reason: TIMEOUT {“cancelled”:true,“completed”:true,“coreId”:“0xd38cbae10000000a”,“idempotent”:true,“lastChannelId”:“D38CBAE10000000A/0000000031F9E225”,“lastDispatchedFrom”:“10.99.251.26:56622”,“lastDispatchedTo”:“lpdo50431.xxx.xxxx.com:11210”,“reason”:“TIMEOUT”,“requestId”:16231,“requestType”:“GetRequest”,“retried”:0,“service”:{“bucket”:“pace”,“collection”:“_default”,“documentId”:”XXXX_XXX_XXX::000556C9-9B46-432F-A5B6-042F8400C0B1::V1”,“opaque”:“0x1da0”,“scope”:“_default”,“type”:“kv”,“vbucket”:386},“timeoutMs”:2500}
at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at com.couchbase.client.core.msg.BaseRequest.cancel(BaseRequest.java:189)
at com.couchbase.client.core.msg.Request.cancel(Request.java:70)
at com.couchbase.client.core.Timer.lambda$register$2(Timer.java:157)
at com.couchbase.client.core.deps.io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
at com.couchbase.client.core.deps.io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
at com.couchbase.client.core.deps.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
at com.couchbase.client.core.deps.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
at com.couchbase.client.core.deps.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
at com.couchbase.client.core.deps.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.couchbase.client.core.error.UnambiguousTimeoutException: GetRequest, Reason: TIMEOUT {“cancelled”:true,“completed”:true,“coreId”:“0xd38cbae10000000a”,“idempotent”:true,“lastChannelId”:“D38CBAE10000000A/0000000031F9E225”,“lastDispatchedFrom”:“10.99.251.26:56622”,“lastDispatchedTo”:“lpdo50431.xxx.xxxx.com:11210”,“reason”:“TIMEOUT”,“requestId”:16231,“requestType”:“GetRequest”,“retried”:0,“service”:{“bucket”:“pace”,“collection”:“_default”,“documentId”:“XXXX_XXX_XXX::000556C9-9B46-432F-A5B6-042F8400C0B1::V1”,“opaque”:“0x1da0”,“scope”:“_default”,“type”:“kv”,“vbucket”:386},“timeoutMs”:2500,“timings”:{“totalMicros”:2517252}}
at com.couchbase.client.core.msg.BaseRequest.cancel(BaseRequest.java:184)
… 9 more

@himanshu.mps - according to the message, there was no response within the timeout period (2.5 seconds). Perhaps the server (or client) is over-subscribed? It would be interesting to know if other requests that succeed are approaching the timeout. It would also help to see the Cluster Options and the code that calls get() (complete with options).

I suggest opening a case with customer support so they can get accurate details and offer troubleshooting advice. SDK Doctor is a troubleshooting tool, but may not be helpful for intermittent issues.

Hi @himanshu.mps

I wrote this a while back, it’s my personal checklist for looking into performance issues and specifically slower-than-desired KV ops: Couchbase performance issue - slowness - #4 by graham.pople. Maybe that’ll provide some pointers to help you diagnose this, particularly the second half of the checklist which is more centred on diagnostics.

(I know you’re reporting timeouts rather than slow op issues, but there’s a lot of overlap of course between the two - increase your KV timeouts to 10 minutes and now you have the latter rather than the former :slight_smile: )

1 Like

Itvwas a mistake from my end. We has a circuit breaker which was causing the issue.

1 Like

Hi @himanshu.mps , can you let me know the circuit breaker in your case. I’m also facing the same issue. Thanks in advance

@mehak28

I am using resilience4j.

I am also getting the same error not sure CB .NET SDK 3.2.9 upgrade issue or something else

Hi All

I am getting this in our new Cluster . any solution / fix if you can share will be appreciated

Please help

Follow the documentation. Handling Errors | Couchbase Docs and Troubleshooting Cloud Connections | Couchbase Docs

If you still have issues make a new post with your specific issue, with your specific exception/stacktrace.