AmbiguousTimeoutException during performance test

Hi all,

We have a spring boot application that’s used java sdk client 3.0.4 with a couchbase server 6.5 that’s run inside a kubernetes.

We try to execute 50 http request by second over 4 pods on 6 couchbase nodes (3 data and 3 query) that’s contain 500000 items and we have 50% of request with execution time superior than 5s.

The principal error is
com.couchbase.client.core.error.AmbiguousTimeoutException: QueryRequest
at com.couchbase.client.java.AsyncUtils.block(AsyncUtils.java:51)
at com.couchbase.client.java.Cluster.query(Cluster.java:393)
at com.carrefour.fr.cs.slot.infra.repository.Couchbase.query(Couchbase.java:38)

More detail :
com.couchbase.client.core.error.AmbiguousTimeoutException: QueryRequest {“cancelled”:true,“completed”:true,“coreId”:“0x57cfd6b600000001”,“idempotent”:false,“reason”:“TIMEOUT”,“requestId”:2210,“requestType”:“QueryRequest”,“retried”:12,“retryReasons”:[“ENDPOINT_NOT_AVAILABLE”,“ENDPOINT_TEMPORARILY_NOT_AVAILABLE”],“service”:{“operationId”:“24d826b0-0574-4de5-86ed-3b46a13a3c9e”,“statement”:“select agendaType from slots where type = ‘AGENDA’ and metiCode = $metiCode”,“type”:“query”},“timeoutMs”:10000,“timings”:{“totalMicros”:10504532}}] with root cause

We have check the indexes and all seems good.
The server does captured none slow queries.

The client is configured with query_timeout = 5s.

Do you have an idea of the problem ?

Best regard,
Guillaume

Hi!

Same here. Tried SDK 3.0.6 and 3.0.8, using reactive API. Couchbase server 6.5.1. Got this exception on upsert and mutateIn. The only way to continue upserting data is a restart of container.

1 Like

Fixed issue by deploying couchbase cluster on nodes with SSDs. I think that implementing backpressure in that case will be helpful.

Hi!

I also sometimes face the same issue with sdk 3.0.9 and couchbase server 6.5.1. On restarts of my application, all the replace/upsert operation starts throwing AmbigousTimeoutException repeatedly. Strange thing is that n1ql queries run successfully. Only after restarting the application does the replace/upsert operations become stable.

1 Like

We experienced a similar issue.
In our case, we ultimately concluded that the issue was caused by the Java service running out of Heap memory (which was completely unexpected. We thought it was a problem with the Couchbase server, but it turned out to be a GC issue on the client that caused the delay).

Take a good look at the GC TIme entry.

ENDPOINT_NOT_AVAILABLE on startup of an application against a healthy cluster usually indicates that the SDK has not had enough time to complete initialization before requests were made (initialization is asynchronous). The SDK method waitUntilReady can be called by the application to wait asynchronously for initialization to complete before proceeding to send requests.