Java client don't connect another cluster node when first is down

Okay I need to get back to the original questions here for clarification, because there might be a misunderstanding.

When you have a 3 node cluster and one node is down but still part of the cluster it is expected that the client tries to reconnect to that one node until it is removed from the cluster! And if 1 node is down then 1/3 of your reads/writes won’t succeed if that node still has partitions on it, which is likely the case.

@jemasu6 in your case it makes sense as well, since the list is walked from start to end. If the first one properly bootstraps its going fine but if it doesn’t then it will pick the next one.

@slodyczka.stanislaw @jemasu6 can you both please exactly outline what your current behavior you see IS and also tell me what behavior you exactly expect? Then we can better identify if it works as designed (and there is just a misunderstanding of how couchbase works) or there is a bug in the SDK.

Thank you!

@daschl, i might be wrong (and it’s better to listen @slodyczka.stanislaw and @jemasu6), but i don’t think that they are taliking about “correct functioning at all” (like in case of failover), but about formal part that you’ve mentioned as “the list is walked from start to end”; simple question: ok, there is a problem with the first node in the list, so why just don’t skip it and connect to the second one ?

+UPDATE: we can simply imagine the case of “particular problem with particular ip within particular moment in time”

My general issue is what @egrep wrote about cluster connect order.

Okay so to be correct, the intended behavior is that if the first node in the list is not reachable it is talking the next one and so forth until it succeeds or there are not more elements in the list.

Are you saying that if you have 3 nodes in the bootstrap list, the first one is not reachable but the other 2 are it is not able to connect to the cluster at all? If so, can you please provide TRACE level logs of the bootstrap process?

Thanks,
Michael

@daschl @egrep

Thank you for responding!

@daschl, Yes thats right, I have 3 nodes in the bootstrap list and the first one is not reachable but the other 2 are. Its is not able to connect to the cluster at all. Here is my logs. I set logging level to FINEST, is it the same as trace? Also my code is below the logs, if I node 1 (the unreachable one) at the back of the list then I can connect to the cluster with no issue.

Nov 04, 2016 9:06:17 AM com.couchbase.client.core.logging.CouchbaseLoggerFactory newDefaultFactory
FINE: Using java.util.logging as the default logging framework
Nov 04, 2016 9:06:17 AM com.couchbase.client.deps.io.netty.util.internal.logging.InternalLoggerFactory newDefaultFactory
FINE: Using java.util.logging as the default logging framework
Nov 04, 2016 9:06:17 AM com.couchbase.client.deps.io.netty.channel.MultithreadEventLoopGroup <clinit>
FINE: -Dio.netty.eventLoopThreads: 8
Nov 04, 2016 9:06:17 AM com.couchbase.client.deps.io.netty.util.internal.PlatformDependent0 <clinit>
FINE: java.nio.Buffer.address: available
Nov 04, 2016 9:06:17 AM com.couchbase.client.deps.io.netty.util.internal.PlatformDependent0 <clinit>
FINE: sun.misc.Unsafe.theUnsafe: available
Nov 04, 2016 9:06:17 AM com.couchbase.client.deps.io.netty.util.internal.PlatformDependent0 <clinit>
FINE: sun.misc.Unsafe.copyMemory: available
Nov 04, 2016 9:06:17 AM com.couchbase.client.deps.io.netty.util.internal.PlatformDependent0 <clinit>
FINE: java.nio.Bits.unaligned: true
Nov 04, 2016 9:06:17 AM com.couchbase.client.deps.io.netty.util.internal.PlatformDependent0 <clinit>
FINE: java.nio.DirectByteBuffer.<init>(long, int): available
Nov 04, 2016 9:06:17 AM com.couchbase.client.deps.io.netty.util.internal.Cleaner0 <clinit>
FINE: java.nio.ByteBuffer.cleaner(): available
Nov 04, 2016 9:06:17 AM com.couchbase.client.deps.io.netty.util.internal.PlatformDependent javaVersion0
FINE: Java version: 8
Nov 04, 2016 9:06:17 AM com.couchbase.client.deps.io.netty.util.internal.PlatformDependent hasUnsafe0
FINE: -Dio.netty.noUnsafe: false
Nov 04, 2016 9:06:17 AM com.couchbase.client.deps.io.netty.util.internal.PlatformDependent hasUnsafe0
FINE: sun.misc.Unsafe: available
Nov 04, 2016 9:06:17 AM com.couchbase.client.deps.io.netty.util.internal.PlatformDependent hasJavassist0
FINE: -Dio.netty.noJavassist: false
Nov 04, 2016 9:06:17 AM com.couchbase.client.deps.io.netty.util.internal.PlatformDependent hasJavassist0
FINE: Javassist: unavailable
Nov 04, 2016 9:06:17 AM com.couchbase.client.deps.io.netty.util.internal.PlatformDependent hasJavassist0
FINE: You don't have Javassist in your class path or you don't have enough permission to load dynamically generated classes.  Please check the configuration for better performance.
Nov 04, 2016 9:06:17 AM com.couchbase.client.deps.io.netty.util.internal.PlatformDependent tmpdir0
FINE: -Dio.netty.tmpdir: /var/folders/t5/6rdwm4gx2v73b1nd11t5tlx98mv1qk/T (java.io.tmpdir)
Nov 04, 2016 9:06:17 AM com.couchbase.client.deps.io.netty.util.internal.PlatformDependent bitMode0
FINE: -Dio.netty.bitMode: 64 (sun.arch.data.model)
Nov 04, 2016 9:06:17 AM com.couchbase.client.deps.io.netty.util.internal.PlatformDependent <clinit>
FINE: -Dio.netty.noPreferDirect: false
Nov 04, 2016 9:06:17 AM com.couchbase.client.deps.io.netty.util.internal.PlatformDependent <clinit>
FINE: com.couchbase.client.deps.io.netty.maxDirectMemory: 3817865216 bytes
Nov 04, 2016 9:06:17 AM com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop <clinit>
FINE: -Dio.netty.noKeySetOptimization: false
Nov 04, 2016 9:06:17 AM com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop <clinit>
FINE: -Dio.netty.selectorAutoRebuildThreshold: 512
Nov 04, 2016 9:06:18 AM com.couchbase.client.core.CouchbaseCore <init>
INFO: CouchbaseEnvironment: {sslEnabled=false, sslKeystoreFile='null', sslKeystorePassword=false, sslKeystore=null, bootstrapHttpEnabled=true, bootstrapCarrierEnabled=true, bootstrapHttpDirectPort=8091, bootstrapHttpSslPort=18091, bootstrapCarrierDirectPort=11210, bootstrapCarrierSslPort=11207, ioPoolSize=4, computationPoolSize=4, responseBufferSize=16384, requestBufferSize=16384, kvServiceEndpoints=1, viewServiceEndpoints=1, queryServiceEndpoints=1, searchServiceEndpoints=1, ioPool=NioEventLoopGroup, coreScheduler=CoreScheduler, eventBus=DefaultEventBus, packageNameAndVersion=couchbase-java-client/2.3.4 (git: 2.3.4, core: 1.3.4), dcpEnabled=false, retryStrategy=BestEffort, maxRequestLifetime=75000, retryDelay=ExponentialDelay{growBy 1.0 MICROSECONDS, powers of 2; lower=100, upper=100000}, reconnectDelay=ExponentialDelay{growBy 1.0 MILLISECONDS, powers of 2; lower=32, upper=4096}, observeIntervalDelay=ExponentialDelay{growBy 1.0 MICROSECONDS, powers of 2; lower=10, upper=100000}, keepAliveInterval=30000, autoreleaseAfter=2000, bufferPoolingEnabled=true, tcpNodelayEnabled=true, mutationTokensEnabled=false, socketConnectTimeout=1000, dcpConnectionBufferSize=20971520, dcpConnectionBufferAckThreshold=0.2, dcpConnectionName=dcp/core-io, callbacksOnIoPool=false, disconnectTimeout=25000, requestBufferWaitStrategy=com.couchbase.client.core.env.DefaultCoreEnvironment$2@7e0ea639, queryTimeout=75000, viewTimeout=75000, kvTimeout=2500, connectTimeout=5000, dnsSrvEnabled=false}
Nov 04, 2016 9:06:18 AM com.couchbase.client.core.CouchbaseCore <init>
FINE: Diagnostics {
  gc.ps marksweep.collectionCount=0,
  gc.ps marksweep.collectionTime=0,
  gc.ps scavenge.collectionCount=0,
  gc.ps scavenge.collectionTime=0,
  heap.pendingFinalize=0,
  heap.used=init = 268435456(262144K) used = 16304256(15922K) committed = 257425408(251392K) max = 3817865216(3728384K),
  mem.physical.free=18710528,
  mem.physical.total=17179869184,
  mem.swap.free=364642304,
  mem.swap.total=2147483648,
  mem.virtual.comitted=8412745728,
  offHeap.used=init = 2555904(2496K) used = 9206760(8990K) committed = 10158080(9920K) max = -1(-1K),
  proc.cpu.time=786959000,
  runtime.name=84452@P41272,
  runtime.spec=Oracle Corporation/Java Virtual Machine Specification: 1.8,
  runtime.startTime=1478217977517,
  runtime.sysProperties={gopherProxySet=false, awt.toolkit=sun.lwawt.macosx.LWCToolkit, file.encoding.pkg=sun.io, java.specification.version=1.8, com.couchbase.client.deps.io.netty.packagePrefix=com.couchbase.client.deps., sun.cpu.isalist=, sun.jnu.encoding=UTF-8, java.class.path=/Users/james.camps/Documents/workspace/quickTestDeleteThis/target/classes:/Users/james.camps/.m2/repository/com/couchbase/client/java-client/2.3.4/java-client-2.3.4.jar:/Users/james.camps/.m2/repository/com/couchbase/client/core-io/1.3.4/core-io-1.3.4.jar:/Users/james.camps/.m2/repository/io/reactivex/rxjava/1.1.8/rxjava-1.1.8.jar, java.vm.vendor=Oracle Corporation, sun.arch.data.model=64, java.vendor.url=http://java.oracle.com/, user.timezone=Asia/Tokyo, user.country.format=JP, os.name=Mac OS X, java.vm.specification.version=1.8, user.country=US, sun.java.launcher=SUN_STANDARD, sun.boot.library.path=/Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre/lib, sun.java.command=quickTestDeleteThis.Main, sun.cpu.endian=little, user.home=/Users/james.camps, user.language=en, java.specification.vendor=Oracle Corporation, java.home=/Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre, file.separator=/, line.separator=
, java.vm.specification.vendor=Oracle Corporation, java.specification.name=Java Platform API Specification, java.awt.graphicsenv=sun.awt.CGraphicsEnvironment, sun.boot.class.path=/Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre/lib/sunrsasign.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre/classes, sun.management.compiler=HotSpot 64-Bit Tiered Compilers, java.runtime.version=1.8.0_102-b14, user.name=james.camps, path.separator=:, os.version=10.11.6, java.endorsed.dirs=/Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre/lib/endorsed, java.runtime.name=Java(TM) SE Runtime Environment, file.encoding=UTF-8, sun.nio.ch.bugLevel=, java.vm.name=Java HotSpot(TM) 64-Bit Server VM, java.vendor.url.bug=http://bugreport.sun.com/bugreport/, java.io.tmpdir=/var/folders/t5/6rdwm4gx2v73b1nd11t5tlx98mv1qk/T/, java.version=1.8.0_102, user.dir=/Users/james.camps/Documents/workspace/quickTestDeleteThis, os.arch=x86_64, java.vm.specification.name=Java Virtual Machine Specification, java.awt.printerjob=sun.lwawt.macosx.CPrinterJob, sun.os.patch.level=unknown, java.library.path=/Users/james.camps/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:., java.vm.info=mixed mode, java.vendor=Oracle Corporation, java.vm.version=25.102-b14, java.ext.dirs=/Users/james.camps/Library/Java/Extensions:/Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre/lib/ext:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java, sun.io.unicode.encoding=UnicodeBig, java.class.version=52.0},
  runtime.uptime=637,
  runtime.vm=Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM: 25.102-b14,
  sys.cpu.loadAvg=2.853515625,
  sys.cpu.num=4,
  sys.os.arch=x86_64,
  sys.os.name=Mac OS X,
  sys.os.version=10.11.6,
  thread.count=13,
  thread.peakCount=13,
  thread.startedCount=13
}
Nov 04, 2016 9:06:18 AM com.couchbase.client.core.config.DefaultConfigurationProvider seedHosts
FINE: Setting seed hosts to [/10.200.0.10, /10.200.0.11, /10.200.0.12]
Nov 04, 2016 9:06:18 AM com.couchbase.client.core.RequestHandler addNode
FINE: Got instructed to add Node 10.200.0.10/10.200.0.10
Nov 04, 2016 9:06:18 AM com.couchbase.client.core.RequestHandler addNode
FINE: Connecting Node 10.200.0.10/10.200.0.10
Nov 04, 2016 9:06:18 AM com.couchbase.client.core.node.CouchbaseNode connect
FINE: [10.200.0.10]: Got instructed to connect.
Nov 04, 2016 9:06:18 AM com.couchbase.client.core.RequestHandler$2 call
FINE: Connect finished, registering for use.
Nov 04, 2016 9:06:18 AM com.couchbase.client.core.RequestHandler addService
FINE: Got instructed to add Service CONFIG, to Node 10.200.0.10/10.200.0.10
Nov 04, 2016 9:06:18 AM com.couchbase.client.core.node.CouchbaseNode addService
FINE: [10.200.0.10]: Adding Service CONFIG
Nov 04, 2016 9:06:18 AM com.couchbase.client.core.node.CouchbaseNode addService
FINE: [10.200.0.10]: Adding Service CONFIG to registry and connecting it.
Nov 04, 2016 9:06:18 AM com.couchbase.client.core.service.AbstractDynamicService connect
FINE: [10.200.0.10][ConfigService]: Got instructed to connect.
Nov 04, 2016 9:06:18 AM com.couchbase.client.deps.io.netty.buffer.PooledByteBufAllocator <clinit>
FINE: -Dio.netty.allocator.numHeapArenas: 8
Nov 04, 2016 9:06:18 AM com.couchbase.client.deps.io.netty.buffer.PooledByteBufAllocator <clinit>
FINE: -Dio.netty.allocator.numDirectArenas: 8
Nov 04, 2016 9:06:18 AM com.couchbase.client.deps.io.netty.buffer.PooledByteBufAllocator <clinit>
FINE: -Dio.netty.allocator.pageSize: 8192
Nov 04, 2016 9:06:18 AM com.couchbase.client.deps.io.netty.buffer.PooledByteBufAllocator <clinit>
FINE: -Dio.netty.allocator.maxOrder: 11
Nov 04, 2016 9:06:18 AM com.couchbase.client.deps.io.netty.buffer.PooledByteBufAllocator <clinit>
FINE: -Dio.netty.allocator.chunkSize: 16777216
Nov 04, 2016 9:06:18 AM com.couchbase.client.deps.io.netty.buffer.PooledByteBufAllocator <clinit>
FINE: -Dio.netty.allocator.tinyCacheSize: 512
Nov 04, 2016 9:06:18 AM com.couchbase.client.deps.io.netty.buffer.PooledByteBufAllocator <clinit>
FINE: -Dio.netty.allocator.smallCacheSize: 256
Nov 04, 2016 9:06:18 AM com.couchbase.client.deps.io.netty.buffer.PooledByteBufAllocator <clinit>
FINE: -Dio.netty.allocator.normalCacheSize: 64
Nov 04, 2016 9:06:18 AM com.couchbase.client.deps.io.netty.buffer.PooledByteBufAllocator <clinit>
FINE: -Dio.netty.allocator.maxCachedBufferCapacity: 32768
Nov 04, 2016 9:06:18 AM com.couchbase.client.deps.io.netty.buffer.PooledByteBufAllocator <clinit>
FINE: -Dio.netty.allocator.cacheTrimInterval: 8192
Nov 04, 2016 9:06:18 AM com.couchbase.client.core.node.CouchbaseNode$1 call
FINE: Disconnected (IDLE) from Node 10.200.0.10/10.200.0.10
Nov 04, 2016 9:06:18 AM com.couchbase.client.deps.io.netty.util.internal.ThreadLocalRandom newSeed
FINE: -Dio.netty.initialSeedUniquifier: 0x276328a857d54b19 (took 0 ms)
Nov 04, 2016 9:06:18 AM com.couchbase.client.deps.io.netty.buffer.ByteBufUtil <clinit>
FINE: -Dio.netty.allocator.type: unpooled
Nov 04, 2016 9:06:18 AM com.couchbase.client.deps.io.netty.buffer.ByteBufUtil <clinit>
FINE: -Dio.netty.threadLocalDirectBufferSize: 65536
Nov 04, 2016 9:06:18 AM com.couchbase.client.deps.io.netty.buffer.ByteBufUtil <clinit>
FINE: -Dio.netty.maxThreadLocalCharBufferSize: 16384
Nov 04, 2016 9:06:18 AM com.couchbase.client.deps.io.netty.buffer.AbstractByteBuf <clinit>
FINE: -Dcom.couchbase.client.deps.io.netty.buffer.bytebuf.checkAccessible: true
Nov 04, 2016 9:06:18 AM com.couchbase.client.deps.io.netty.util.ResourceLeakDetector <clinit>
FINE: -Dcom.couchbase.client.deps.io.netty.leakDetection.level: simple
Nov 04, 2016 9:06:18 AM com.couchbase.client.deps.io.netty.util.ResourceLeakDetector <clinit>
FINE: -Dcom.couchbase.client.deps.io.netty.leakDetection.maxRecords: 4
Nov 04, 2016 9:06:18 AM com.couchbase.client.deps.io.netty.util.ResourceLeakDetectorFactory$DefaultResourceLeakDetectorFactory newResourceLeakDetector
FINE: Loaded default ResourceLeakDetector: com.couchbase.client.deps.io.netty.util.ResourceLeakDetector@5f7364c
Nov 04, 2016 9:06:18 AM com.couchbase.client.core.endpoint.AbstractEndpoint$2 operationComplete
WARNING: [null][ConfigEndpoint]: Could not connect to remote socket.
Nov 04, 2016 9:06:18 AM com.couchbase.client.core.node.CouchbaseNode$1 call
FINE: Disconnected (CONNECTING) from Node 10.200.0.10/10.200.0.10
Exception in thread "main" java.lang.RuntimeException: java.net.ConnectException: Connection refused: /10.200.0.10:8091
	at com.couchbase.client.core.utils.Blocking.blockForSingle(Blocking.java:85)
	at com.couchbase.client.java.cluster.DefaultClusterManager.info(DefaultClusterManager.java:59)
	at com.couchbase.client.java.cluster.DefaultClusterManager.info(DefaultClusterManager.java:54)
	at quickTestDeleteThis.Main.main(Main.java:23)
Caused by: java.net.ConnectException: Connection refused: /10.200.0.10:8091
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at com.couchbase.client.deps.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:223)
	at com.couchbase.client.deps.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:285)
	at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:589)
	at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:513)
	at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:427)
	at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:399)
	at com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
	at com.couchbase.client.deps.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
	at java.lang.Thread.run(Thread.java:745)
Nov 04, 2016 9:06:18 AM com.couchbase.client.core.endpoint.AbstractEndpoint$2 operationComplete
WARNING: [null][ConfigEndpoint]: Could not connect to remote socket.
    import java.util.logging.ConsoleHandler;
    import java.util.logging.Handler;
    import java.util.logging.Level;
    import java.util.logging.Logger;

    import com.couchbase.client.java.Cluster;
    import com.couchbase.client.java.CouchbaseCluster;

    public class Main {
    	public static void main(String[] args){
    		
    		Logger logger = Logger.getLogger("com.couchbase.client");
    		logger.setLevel(Level.FINEST);
    		for(Handler h : logger.getParent().getHandlers()) {
    		    if(h instanceof ConsoleHandler){
    		        h.setLevel(Level.FINEST);
    		    }
    		}
    		
    		Cluster cluster = CouchbaseCluster.create("10.200.0.10", "10.200.0.11", "10.200.0.12");
    		System.out.println(cluster.clusterManager("Administrator", "password").info().raw());
    		
    	}
    }

Regarding my setup, I am running 3 trusty/Ubuntu 14.04 VMs with vagrant.
Host machine is OSX 10.11.
My application with Java SDK 2.3.4 is running from the Host Machine. I launched Couchbase on the three VMs with:
docker run -d --name db -v ~/couchbase:/opt/couchbase/var --net=host couchbase
And editing the Vagrantfile to get a private IP address for all the machines to connect to each with and for the host machine to get access through (the address I use for the web interface too)

@jemasu6 ooh wait a second I think I see it now: you are opening the bucket manager and not actually opening a Bucket. Just do double check can you try opening a bucket? I think that will work, the BucketManager is using different facilities underneath than the actual bucket.

I think you are running into https://issues.couchbase.com/browse/JCBC-999 which is a known issue, I just didn’t get around fixing yet.

To verify, can you try cluster.openBucket with your setup and see if that works? If that works and the BucketManager doesn’t I think you are running into JCBC-999.

@slodyczka.stanislaw is your code the same or are you running into something different?

@daschl I tried: Bucket b = cluster.openBucket("myBucket", "password"); And it worked great, even when the first node in the bootstrap list is dead. On top of that bucketmanager().info() doesn’t, identical to the known issue. I must be running into JCBC-999, my apologies for not browsing the issues list first. Thanks for the help :slight_smile:

Just to confirm though, Im using the cluster.clustermanager, is it the same as the bucketmanager?

@jemasu6 indeed! The background story here is that we are opening sockets on the fly to our 8091 API for the management stuff, the “Bucket” is more involved since we need to do much more logic in the background so the paths diverge.

If I remember correctly the fix is not trivial which is why I pushed it out a bit but I hope I can get to it in one of the next release cycles!

@daschl
Ah ok cool! Thanks for the info and help. Looking forward to the update :slight_smile:

@daschl sorry for long time without response.

I update my SDK to 2.3.5 but still have issue.

My Code:

When all nodes in cluster is ok output is:

When I shut down first connection node (“172.17.0.3”) and again start application output is:

And this ‘ConnectTimeoutException’ is always and is not connection with couchbase

@slodyczka.stanislaw but that makes kinda sense, right? If you have a cluster of those 3 nodes, the downed node is part of the server cluster map… so it connects to one of the other, gets a new config and then keeps trying to connect to the downed node until you bring it up again or remove it from the cluster.

We need to try reconnecting to the downed node since it contains 1/3rd of the data!

@daschl but I cannot save document in this bucket when node is down.

of course! your downed node contains 1/3rd of the partitions, you need to either bring it up again or do failover.

@daschl so if one node is down all cluster is not to use until I run failover or run this node again?

@slodyczka.stanislaw, yes. Failover “rebuilds” entire cluster excluding the dead node.
As i understand, your problem (write to the cluster in case of “dead node”) should be solved on application level for any write-call with persistence flag; see http://docs.couchbase.com/sdk-api/couchbase-java-client-2.3.5/com/couchbase/client/java/PersistTo.html (you can also read from replicas, see bucket.getFromReplica)

@egrep so if I use e.g bucket.insert(doc, PersistTo.THREE) and my nodes is replica this should resolve my problem?

@slodyczka.stanislaw,
no, in case of three nodes use PersistTo.ONE. As i understand, with three-nodes cluster (and 1 replica setup for bucket) PersistTo.THREE should never be success (you have 1 master + 1 replica = 2 copies of document; where third copy should be placed ?). Now, if 1 node dies (no matter if it is none-node,master-node or replica-node for you document) you can still successfully persist to at least one (i.e. alive master or alive replica; one of these two is definitely still alive). Make a simple experiment with failover during insertion and you’ll gonna see it by yourself.

@daschl
I am getting same issue. My couchbase client stop working after first node goes down
As you mention in your comment:

"When you have a 3 node cluster and one node is down but still part of the cluster it is expected that the client tries to reconnect to that one node until it is removed from the cluster! And if 1 node is down then 1/3 of your reads/writes won’t succeed if that node still has partitions on it, which is likely the case "

How can I remove failed node from cluster ?
I am using couchbase-client, version: 1.4.12

@nitinvavdiya if a node is down you need to fail it over in the cluster UI, this will remove it from the cluster. The SDK will pick up the topology change.

@daschl
In production we may not remove node from UI as soon as node goes down.How can I handle it?