AmbiguousTimeoutException during performance test

galbini · September 2, 2020, 6:17am

Hi all,

We have a spring boot application that’s used java sdk client 3.0.4 with a couchbase server 6.5 that’s run inside a kubernetes.

We try to execute 50 http request by second over 4 pods on 6 couchbase nodes (3 data and 3 query) that’s contain 500000 items and we have 50% of request with execution time superior than 5s.

The principal error is
com.couchbase.client.core.error.AmbiguousTimeoutException: QueryRequest
at com.couchbase.client.java.AsyncUtils.block(AsyncUtils.java:51)
at com.couchbase.client.java.Cluster.query(Cluster.java:393)
at com.carrefour.fr.cs.slot.infra.repository.Couchbase.query(Couchbase.java:38)

More detail :
com.couchbase.client.core.error.AmbiguousTimeoutException: QueryRequest {“cancelled”:true,“completed”:true,“coreId”:“0x57cfd6b600000001”,“idempotent”:false,“reason”:“TIMEOUT”,“requestId”:2210,“requestType”:“QueryRequest”,“retried”:12,“retryReasons”:[“ENDPOINT_NOT_AVAILABLE”,“ENDPOINT_TEMPORARILY_NOT_AVAILABLE”],“service”:{“operationId”:“24d826b0-0574-4de5-86ed-3b46a13a3c9e”,“statement”:“select agendaType from slots where type = ‘AGENDA’ and metiCode = $metiCode”,“type”:“query”},“timeoutMs”:10000,“timings”:{“totalMicros”:10504532}}] with root cause

We have check the indexes and all seems good.
The server does captured none slow queries.

The client is configured with query_timeout = 5s.

Do you have an idea of the problem ?

Best regard,
Guillaume

dotnetfx · September 14, 2020, 2:23am

Hi!

Same here. Tried SDK 3.0.6 and 3.0.8, using reactive API. Couchbase server 6.5.1. Got this exception on upsert and mutateIn. The only way to continue upserting data is a restart of container.

dotnetfx · September 14, 2020, 11:07pm

Fixed issue by deploying couchbase cluster on nodes with SSDs. I think that implementing backpressure in that case will be helpful.

Sahil333 · April 12, 2021, 5:02am

Hi!

I also sometimes face the same issue with sdk 3.0.9 and couchbase server 6.5.1. On restarts of my application, all the replace/upsert operation starts throwing AmbigousTimeoutException repeatedly. Strange thing is that n1ql queries run successfully. Only after restarting the application does the replace/upsert operations become stable.

snowvil84 · March 20, 2024, 2:36am

We experienced a similar issue.
In our case, we ultimately concluded that the issue was caused by the Java service running out of Heap memory (which was completely unexpected. We thought it was a problem with the Couchbase server, but it turned out to be a GC issue on the client that caused the delay).

Take a good look at the GC TIme entry.

mreiche · March 20, 2024, 2:56pm

ENDPOINT_NOT_AVAILABLE on startup of an application against a healthy cluster usually indicates that the SDK has not had enough time to complete initialization before requests were made (initialization is asynchronous). The SDK method waitUntilReady can be called by the application to wait asynchronously for initialization to complete before proceeding to send requests.