There is a problem when i using java sdk 2.1.x

#1

my enviroment is
CouchbaseEnvironment env = DefaultCouchbaseEnvironment
.builder()
.managementTimeout(5000)
.kvTimeout(5000)
.build();

i using this method to write:
bucket.insert(Corp,ReplicateTo.ONE);

when my cluster is failover complite,i found that this write method is very slowly,even 1 ops/sec! as the same time it occur TimeOutException.

SDK 2.1.2+2.1.3: Connection refused without accessing bucket
#2

Hi @hubo3085632,

can you describe the following a little more please so we can help you better:

  • what workload are you running
  • what actions are you performing
  • what are your expectations/actual results

Maybe you can also show us some code and logs?

#4
      thank you ,i am trying to  handle the exception if i use 

couchbase in my production.I stop one node when i am writting,then 30s
later,the speed of insert almost 1ops/sec.
here is the code :
for(;;){
try
{
JsonDocument isdone=bucket.insert(Corp,ReplicateTo.ONE);
break;
}catch(Exception e){
if(e instanceof RequestCancelledException){
System.out.println(“writing failed!”);
continue;
}
else{
e.printStackTrace();
}
}
}

here is the error log
警告: [/192.168.103.136:8092][ViewEndpoint]: Could not connect to endpoint, retrying with delay 4096 MILLISECONDS:
java.net.ConnectException: 拒绝连接: /192.168.103.136:8092
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at com.couchbase.client.deps.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)

    at 

com.couchbase.client.deps.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281)
at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)

    at 

com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)

    at 

com.couchbase.client.deps.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)

五月 06, 2015 3:29:12 下午 com.couchbase.client.core.node.CouchbaseNode$1 call
信息: Disconnected from Node 192.168.103.136
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at a.insertdata(a.java:268)
at a.main(a.java:73)
java.lang.RuntimeException: java.util.concurrent.TimeoutException
at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:93)
at com.couchbase.client.java.CouchbaseBucket.insert(CouchbaseBucket.java:258)
at com.couchbase.client.java.CouchbaseBucket.insert(CouchbaseBucket.java:253)
at a.insertdata(a.java:272)
at a.main(a.java:73)
Caused by: java.util.concurrent.TimeoutException
… 5 more

#5

I think there is a disconnect between your assumption on how couchbase works and how it actually works :smile:

When you shut down a node, you are not able to write data to this partition until you failover a node. Which means that those partial operations are going to time out if they can’t be retried in time.
Now, if you failover and bring you cluster back into a condition where all partitions are active, the client should recover timely and your operations to all partitions start working again. If this is not the case, then its maybe a client bug - but I can’t say for sure from the code you’ve shown here.

Can you please provide more information on the actions you are performing and the expected/actual behaviour of the SDK?

#6

you means if i down a node,all cluster will refuse my write operate? not other node can continue writing?

#7

No, only this specific node is down of course. So all document IDs which map to this node won’t be writable until you perform a failover, which means the replicas are promoted to active. All other nodes are fine. But the point is, since I guess you are using blocking operations of course if you write or read 5 docs and 2 of those 3 are on the downed node you will have 2 timeouts in the 5 second range and the other 3 will come back fine.

We also provide fail fast capabilities that in this case will fail early with a different exception, but then it’s more than normal up to you to perform retry handling as you see fit. Does that make sense?

#8

oh,i see,but why it will write slowly after failover if i use insert with ReplicateTo.one?
I found a strange issue,if i run my client on windows server,it will always prompt timeout exceptions;if i run on the linux server,it performs better and does not need to modify the connectTimeout and kvTimeout.

#9

when i use the command :service couchbase-server stop,it will go first to flush the in-memory`s data into disk,right?

#10

@hubo3085632 it is going slowly with ReplicateTo.ONE after failover because once you hit failover, for those partitions there is no replica available (given that you only have one replica specified on the bucket).
You need to run a rebalance to create the replicas again or use more replicas in the first place so a failover will still have replicas available. The original insert will succeed anyways, but we are not able to fulfil the persistence constraint properly (which you are asking for through ReplicateTo.ONE).

Also, disk persistence and replication is asynchronous, the service will be stopped. Make sure to use PersistTo.MASTER if you want to be sure data gets written (you can be sure when you get a success back).

#11

i set the bucket to 2 replica and insert(ReplicateTo.ONE) ,after failover it still slowly and timeout.
exception is that::
java.lang.RuntimeException: java.util.concurrent.TimeoutException
at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:93)
at com.couchbase.client.java.CouchbaseBucket.insert(CouchbaseBucket.java:258)
at com.couchbase.client.java.CouchbaseBucket.insert(CouchbaseBucket.java:253)
at a.insertdata(a.java:262)
at a.main(a.java:75)
Caused by: java.util.concurrent.TimeoutException

#12

@hubo3085632

Do you think it’s possible that you share:

  • Code to reproduce
  • Your cluster setup
  • The steps to reproduce
  • Your expectations?

So we can try to reproduce your issue as closely as possible.

#13

I want to ask a question:

when I deal with the node failure excpetion in coding, I use .toBlock() to retrive the data without timeout, but how can i catch the excpetion to retry the CRUD operation.