There is a problem when i using java sdk 2.1.x

hubo3085632 · May 6, 2015, 7:46am

my enviroment is
CouchbaseEnvironment env = DefaultCouchbaseEnvironment
.builder()
.managementTimeout(5000)
.kvTimeout(5000)
.build();

i using this method to write:
bucket.insert(Corp,ReplicateTo.ONE);

when my cluster is failover complite,i found that this write method is very slowly,even 1 ops/sec! as the same time it occur TimeOutException.

daschl · May 6, 2015, 8:00am

Hi @hubo3085632,

can you describe the following a little more please so we can help you better:

what workload are you running
what actions are you performing
what are your expectations/actual results

Maybe you can also show us some code and logs?

hubo3085632 · May 6, 2015, 8:23am

      thank you ,i am trying to  handle the exception if i use

couchbase in my production.I stop one node when i am writting,then 30s
later,the speed of insert almost 1ops/sec.
here is the code :
for(;;){
try
{
JsonDocument isdone=bucket.insert(Corp,ReplicateTo.ONE);
break;
}catch(Exception e){
if(e instanceof RequestCancelledException){
System.out.println(“writing failed！”);
continue;
}
else{
e.printStackTrace();
}
}
}

here is the error log
警告: [/192.168.103.136:8092][ViewEndpoint]: Could not connect to endpoint, retrying with delay 4096 MILLISECONDS:
java.net.ConnectException: 拒绝连接: /192.168.103.136:8092
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at com.couchbase.client.deps.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)

at

com.couchbase.client.deps.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281)
at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)

at

com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)

at

com.couchbase.client.deps.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)

五月 06, 2015 3:29:12 下午 com.couchbase.client.core.node.CouchbaseNode$1 call
信息: Disconnected from Node 192.168.103.136
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at a.insertdata(a.java:268)
at a.main(a.java:73)
java.lang.RuntimeException: java.util.concurrent.TimeoutException
at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:93)
at com.couchbase.client.java.CouchbaseBucket.insert(CouchbaseBucket.java:258)
at com.couchbase.client.java.CouchbaseBucket.insert(CouchbaseBucket.java:253)
at a.insertdata(a.java:272)
at a.main(a.java:73)
Caused by: java.util.concurrent.TimeoutException
… 5 more

daschl · May 6, 2015, 9:32am

I think there is a disconnect between your assumption on how couchbase works and how it actually works

When you shut down a node, you are not able to write data to this partition until you failover a node. Which means that those partial operations are going to time out if they can’t be retried in time.
Now, if you failover and bring you cluster back into a condition where all partitions are active, the client should recover timely and your operations to all partitions start working again. If this is not the case, then its maybe a client bug - but I can’t say for sure from the code you’ve shown here.

Can you please provide more information on the actions you are performing and the expected/actual behaviour of the SDK?

hubo3085632 · May 6, 2015, 9:40am

you means if i down a node,all cluster will refuse my write operate? not other node can continue writing?

daschl · May 6, 2015, 10:06am

No, only this specific node is down of course. So all document IDs which map to this node won’t be writable until you perform a failover, which means the replicas are promoted to active. All other nodes are fine. But the point is, since I guess you are using blocking operations of course if you write or read 5 docs and 2 of those 3 are on the downed node you will have 2 timeouts in the 5 second range and the other 3 will come back fine.

We also provide fail fast capabilities that in this case will fail early with a different exception, but then it’s more than normal up to you to perform retry handling as you see fit. Does that make sense?

hubo3085632 · May 7, 2015, 1:45am

oh,i see,but why it will write slowly after failover if i use insert with ReplicateTo.one?
I found a strange issue,if i run my client on windows server,it will always prompt timeout exceptions;if i run on the linux server,it performs better and does not need to modify the connectTimeout and kvTimeout.

hubo3085632 · May 7, 2015, 2:17am

when i use the command :service couchbase-server stop,it will go first to flush the in-memory`s data into disk,right?

daschl · May 7, 2015, 5:44am

@hubo3085632 it is going slowly with ReplicateTo.ONE after failover because once you hit failover, for those partitions there is no replica available (given that you only have one replica specified on the bucket).
You need to run a rebalance to create the replicas again or use more replicas in the first place so a failover will still have replicas available. The original insert will succeed anyways, but we are not able to fulfil the persistence constraint properly (which you are asking for through ReplicateTo.ONE).

Also, disk persistence and replication is asynchronous, the service will be stopped. Make sure to use PersistTo.MASTER if you want to be sure data gets written (you can be sure when you get a success back).

hubo3085632 · May 7, 2015, 6:20am

i set the bucket to 2 replica and insert(ReplicateTo.ONE) ,after failover it still slowly and timeout.
exception is that::
java.lang.RuntimeException: java.util.concurrent.TimeoutException
at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:93)
at com.couchbase.client.java.CouchbaseBucket.insert(CouchbaseBucket.java:258)
at com.couchbase.client.java.CouchbaseBucket.insert(CouchbaseBucket.java:253)
at a.insertdata(a.java:262)
at a.main(a.java:75)
Caused by: java.util.concurrent.TimeoutException

daschl · May 7, 2015, 6:31am

@hubo3085632

Do you think it’s possible that you share:

Code to reproduce
Your cluster setup
The steps to reproduce
Your expectations?

So we can try to reproduce your issue as closely as possible.

hubo3085632 · May 7, 2015, 7:38am

I want to ask a question:

when I deal with the node failure excpetion in coding, I use .toBlock() to retrive the data without timeout, but how can i catch the excpetion to retry the CRUD operation.