Intemitant couchbase insert failure

We have 5 node (m1.large) couchbase cluster on AWS.
We are using java SDK 2.7.9 .

We are continuously pushing data to couchbase. After few iterations application is throwing timeout exceptions as given below:

2019-09-17T05:04:18.399831461Z app_name=default-java container_name=k8s_default-java_default-java-deployment-12-j85gf_one-data_0f3df934-d708-11e9-b235-fa163e47598a_0 environment=e3_ipc1 ns=one-data pod_container=default-java pod_ip=192.168.31.152 pod_name=default-java-deployment-12-j85gf message={“name”:“com.axp.rest.services.CreateCardApplicationDaoService”,“timestamp”:“2019-09-17T05:04:18.399Z”,“level”:“error”,“schemaVersion”:“0.1”,“message”:“4185bfbb-fa6c-4567-839a-4de0eeb09699538c0a6a-e33c-4f9e-a137-41d8387faaa2 :: one-apply-for-card-sor :: CreateCardApplicationDaoService :: triggerCreateCardAppService :: Exception : {}”,“error”:{“name”:“java.lang.RuntimeException”,“message”:“java.util.concurrent.TimeoutException: {“b”:“ACTIVEAPPLICATIONS”,“s”:“kv”,“t”:2500000,“i”:“0xc7c32”}”,“throwLocation”:{“file”:“CreateCardApplicationDaoService.java”,“method”:“triggerCreateCardAppService”,“lineNumber”:80},“stackTrace”:“java.lang.RuntimeException: java.util.concurrent.TimeoutException: {“b”:“ACTIVEAPPLICATIONS”,“s”:“kv”,“t”:2500000,“i”:“0xc7c32”}\n\tat rx.exceptions.Exceptions.propagate(Exceptions.java:57)\n\tat rx.observables.BlockingObservable.blockForSingle(BlockingObservable.java:463)\n\tat rx.observables.BlockingObservable.single(BlockingObservable.java:340)\n\tat com.couchbase.client.java.CouchbaseBucket.insert(CouchbaseBucket.java:310)\n\tat com.couchbase.client.java.CouchbaseBucket.insert(CouchbaseBucket.java:305)\n\tat com.axp.dao.config.CouchbaseRepository.create(CouchbaseRepository.java:101)\n\tat com.axp.dao.config.CouchbaseDAO.createByObjectType(CouchbaseDAO.java:93)\n\tat com.axp.dao.service.ApplicationServiceImpl.createApplicationByBucket(ApplicationServiceImpl.java:96)\n\tat com.axp.dao.service.ApplicationServiceImpl.createApplicationForSnapshot(ApplicationServiceImpl.java:154)\n\tat com.axp.dao.service.ApplicationServiceImpl.createApplication(ApplicationServiceImpl.java:82)\n\tat com.axp.rest.services.CreateCardApplicationDaoService.triggerCreateCardAppService(CreateCardApplicationDaoService.java:54)\n\tat com.axp.services.CreateCardApplicationService.invokeService(CreateCardApplicationService.java:20)\n\tat com.axp.services.CreateCardApplicationService.execute(CreateCardApplicationService.java:37)\n\tat com.axp.servicehandlers.impl.CardApplicationServiceHandlerImpl.createCardApplication(CardApplicationServiceHandlerImpl.java:18)\n\tat com.axp.verticles.CreateCardApplicationCompositeServiceVerticle.lambda$null$3(CreateCardApplicationCompositeServiceVerticle.java:212)\n\tat io.vertx.core.impl.CompositeFutureImpl.lambda$all$1(CompositeFutureImpl.java:49)\n\tat io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:125)\n\tat io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:86)\n\tat com.axp.servicehandlers.impl.CommonServiceHandlerImpl.lambda$null$6(CommonServiceHandlerImpl.java:53)\n\tat io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:125)\n\tat io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:86)\n\tat com.axp.services.BaseServiceAsync.lambda$null$2(BaseServiceAsync.java:49)\n\tat io.vertx.core.impl.FutureImpl.setHandler(FutureImpl.java:79)\n\tat com.axp.services.BaseServiceAsync.lambda$null$3(BaseServiceAsync.java:48)\n\tat io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:125)\n\tat io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:86)\n\tat com.axp.services.ClicService.lambda$null$2(ClicService.java:68)\n\tat io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:125)\n\tat io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:86)\n\tat com.axp.services.BaseServiceAsync.lambda$null$0(BaseServiceAsync.java:33)\n\tat io.vertx.core.eventbus.impl.EventBusImpl.lambda$convertHandler$3(EventBusImpl.java:348)\n\tat io.vertx.core.eventbus.impl.HandlerRegistration.deliver(HandlerRegistration.java:276)\n\tat io.vertx.core.eventbus.impl.HandlerRegistration.handle(HandlerRegistration.java:254)\n\tat io.vertx.core.eventbus.impl.EventBusImpl$InboundDeliveryContext.next(EventBusImpl.java:578)\n\tat io.vertx.core.eventbus.impl.EventBusImpl.lambda$deliverToHandler$5(EventBusImpl.java:537)\n\tat io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:320)\n\tat io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:34)\n\tat io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.lang.Thread.run(Thread.java:748)\nCaused by: java.util.concurrent.TimeoutException: {“b”:“ACTIVEAPPLICATIONS”,“s”:“kv”,“t”:2500000,“i”:“0xc7c32”}\n\tat com.couchbase.client.java.bucket.api.Utils$1.call(Utils.java:131)\n\tat com.couchbase.client.java.bucket.api.Utils$1.call(Utils.java:127)\n\tat rx.internal.operators.OperatorOnErrorResumeNextViaFunction$4.onError(OperatorOnErrorResumeNextViaFunction.java:140)\n\tat rx.internal.operators.OnSubscribeTimeoutTimedWithFallback$TimeoutMainSubscriber.onTimeout(OnSubscribeTimeoutTimedWithFallback.java:166)\n\tat rx.internal.operators.OnSubscribeTimeoutTimedWithFallback$TimeoutMainSubscriber$TimeoutTask.call(OnSubscribeTimeoutTimedWithFallback.java:191)\n\tat rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)\n\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\t… 1 common frames omitted\n”}}

If it successfully returns, it returns within few hundred ms, otherwise it never it looks (i tried max 5 min timeout).
Again, even thow timeout exception is thrown, data is actually written to couchbase.

We are on a critical production launch, any help here is really appreciated. Thank you!!!

First off, if you’re going production soon and if you have an Enterprise subscription (which I suspect you do based on what I see there), you should probably open a case with Couchbase Support. They can analyze telemetry from your cluster, look through your full logs and help in a more general fashion. The forums have no formal service level.

From what I see here, you are doing an insert() operation, and it is timing out with the default 2.5s timeout.

Caused by: java.util.concurrent.TimeoutException: {“b”:“ACTIVEAPPLICATIONS”,“s”:“kv”,“t”:2500000,“i”:“0xc7c32”}

A TimeoutException is always an effect from another cause. The cause could be anything from GC pauses (not likely for this) to network interruptions to a failure that hasn’t been detected yet. Since this error comes from the thread where the operation timed out, you’ll need to investigate a bit further.

Some things to look for:

  1. Do other parts of the logs show connections being disrupted? You’d see that logged from the threads that are managing IO.
  2. That identifier (the hex value) in the timeout message can be correlated to the server. Since it’s an insert and over 2.5 seconds, if there is no network failure, there may be some correlating information in the server log.
  3. You should check your logs overall for the Response Time Observability output (blog here too). If you see a lot of timeouts in a short period of time across all servers, that may be pointing to something environmental. We’ve seen, for example, the effect of overcommitting CPU or network in virtualization to cause this kind of issue where there will be a large amount of timeouts in a short period of time.

p.s.: thanks for posting to another topic