.RuntimeException: java.util.concurrent.TimeoutException

Hi, i have a problem with my cluster with 8 server nodes.
The problem is when one of the server node is down, the couchbase entries in a loop without upsert nothing and return wtih this response :
java.lang.RuntimeException: java.util.concurrent.TimeoutException
at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:75)
at com.couchbase.client.java.CouchbaseBucket.upsert(CouchbaseBucket.java:353)
at com.couchbase.client.java.CouchbaseBucket.upsert(CouchbaseBucket.java:348)

Thanks for any advice

The design is that we will continue trying an operation until timeout. If the item you’re trying to work with is on the down node, it will timeout. If you failover the node rather than just leave it down, you should see operations recover. There’s more on this in the docs.

Thanks for the answer
The problem persists even when the node comes back up.
The system always responds with timeout.
At the time the solution is to restart the service that makes the entry in the cluster.
The cluster connection settings are:

DefaultCouchbaseEnvironment.builder()
	.connectTimeout(180 * 1000) // 180 Seconds 
	.keepAliveInterval(3600 * 1000) // 3600 Seconds 	
	.build();

Which SDK version are you using?

using:
java-client-2.2.3.jar

Very similar to the problem we’re having here:

But our cluster nodes are definitely up.

unhuman

did you find a solution?
Thanks

@Alessandro.79

Not yet. We have checked to ensure we have network path on all ports (8091, 8092, 8093, 11210, 11211) - we do. We are trying now flipping off Full Ejection. We are also trying to test from other locations, but that’s not set up yet.

@Alessandro.79

We just turned off Full Ejection, which restarted the bucket. Our test was able to run successfully. This isn’t a confirmation that it’s a fix to the problem (what is your bucket setting?) but we’re working for now.

@Alessandro.79 Looks like I spoke too soon. Came back.

@Alessandro.79 can you try 2.3.5 just as a sanity check please?

@Alessandro.79

The application server (couchbase client) errors that are mentioned above indeed seem to have stopped with the change to the eviction pattern (full -> value). We had some flakey tests that caused me to jump the gun on saying that we weren’t fixed.

Since it was late Friday, I’m hesitant to confirm for sure we’re good, but I’m at least hopeful at this point. I don’t think it was the client version (2.2.5 -> 2.3.5) that had any involvement, but we’re not rolling back to test. We’ll take our 2.3.5 as a benefit of this process.

Also, make sure your Java application is allocated enough memory; we have definitely seen oddball errors when the JVM runs out of memory.

I’ll update again if we see anything else.

Thanks for the info!!
For now my solution is if find a “java.lang.RuntimeException: java.util.concurrent.TimeoutException” close connection and re-open
Maybe so I solved

If you solve it that way, it would seem you’re just working around a client bug. But, certainly an interesting approach.

We are seeing new errors now:

    {"timestamp":"2016-12-05T15:45:46.695Z","level":"WARN","thread":"cb-io-1-2","logger":"com.couchbase.client.deps.io.netty.channel.AbstractChannel",
"message":"Force-closing a channel whose registration task was not accepted by an event loop: [id: 0x7d28468c]","context":"default",
"exception":"java.util.concurrent.RejectedExecutionException: event executor terminated
    at com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:800)
    at com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:345)
    at com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:338)
    at com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:743)
    at com.couchbase.client.deps.io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:422)
    at com.couchbase.client.deps.io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:72)
    at com.couchbase.client.deps.io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:60)
    at com.couchbase.client.deps.io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:64)
    at com.couchbase.client.deps.io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:320)
    at com.couchbase.client.deps.io.netty.bootstrap.Bootstrap.doConnect(Bootstrap.java:134)
    at com.couchbase.client.deps.io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:90)
    at com.couchbase.client.core.endpoint.BootstrapAdapter.connect(BootstrapAdapter.java:50)
    at com.couchbase.client.core.endpoint.AbstractEndpoint$4.call(AbstractEndpoint.java:300)
    at com.couchbase.client.core.endpoint.AbstractEndpoint$4.call(AbstractEndpoint.java:297)
    at rx.Single$1.call(Single.java:90)
    at rx.Single$1.call(Single.java:70)
    at rx.Single$2.call(Single.java:171)

and

{"timestamp":"2016-12-05T15:45:46.698Z","level":"ERROR","thread":"cb-io-1-2","logger":"com.couchbase.client.deps.io.netty.util.concurrent.DefaultPromise.rejectedExecution",
"message":"Failed to submit a listener notification task. Event loop shut down?","context":"default",
"exception":"java.util.concurrent.RejectedExecutionException: event executor terminated
    at com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:800)
    at com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:345)
    at com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:338)
    at com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:743)
    at com.couchbase.client.deps.io.netty.util.concurrent.DefaultPromise.safeExecute(DefaultPromise.java:767)
    at com.couchbase.client.deps.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:435)
    at com.couchbase.client.deps.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:129)
    at com.couchbase.client.deps.io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:852)

@daschl Please see ^ and provide guidance

Found this: Java client don't connect another cluster node when first is down which links https://issues.couchbase.com/browse/JCBC-999

I don’t know if that will fix our current issue. We’re going to also try going back to 2.2.5 to see if this issue happens with that version. This is an app that was recently updated from 1.4.x to 2.2.5 that started exhibiting this behavior.

The close and re-open connection solution not working…
I’m always looking for the reason why this happens.
You have found the solution?

We haven’t been encountering these problems of late. I wouldn’t say we’re fixed, but… for now we’ve got other things to look at. We also had too much going on on the server directly, so we eliminated some design docs, etc. That may have helped…

Our cluster has stabilized so that has helped our application stabilize, I assume. Doesn’t mean we’re good forever, though.

I have same problem with 'java-client'-2.4.3.
My code is like :
Cluster cluster = CouchbaseCluster.create("192.168.1.10",192.168.1.11); Bucket bucket = cluster.openBucket("test"); Document doc = SerializableDocument.create("TestKey", 5000000, "Test); bucket.upsert(doc); Thread.sleep(50000); //after thread sleep I am going to down 192.168.1.10 and after this i am going to get entry SerializableDocument doc = bucket.get(key, SerializableDocument.class); //now this method will throw exception

After down one node it throw exception java.lang.RuntimeException: java.util.concurrent.TimeoutException and log shows continues timout error.
This should not happen. If one node is down then it should return value from other node

Actually, it shouldn’t. It’s specifically designed not to automatically read from a replica unless there is a failover. Please see the documentation on reading from replicas.

Also, if you have a question about a scenario, it’s probably better to start a new topic rather than pick up an old loosely related one. Thanks!

This has come and bit us again. Googling around it appears to be a netty error (which Couchbase include). I found this open Netty issue: https://github.com/netty/netty/issues/5304

We can’t reproduce this reliably, but it has recurred - application restarted.

Thanks for any help!

-H