Bucket connection times out due to slow DNS reverse lookup

Hi Couchbase experts,

I am struggling with the following issue:

I have a Couchabse Community 5.0.1 cluster (2 nodes) deployed on an Openstack on Ubuntu 16.04.

I try to do a simple insert operation on the buckets based on the following code sippet (tried it with SDK 2.3.1, 2.5.1 and 2.5.8), using the public IP of the nodes as the client does not reside in the same network as the cluster.

`DefaultCouchbaseEnvironment.Builder builder = DefaultCouchbaseEnvironment
.builder()
.queryEndpoints(1)
.callbacksOnIoPool(true)
//.runtimeMetricsCollectorConfig(0)
//.networkLatencyMetricsCollectorConfig(0)
.socketConnectTimeout(600000) // 10 secs socket connect timeout
.connectTimeout(600000) // 30 secs overall bucket open timeout
.kvTimeout(600000) // 10 instead of 2.5s for KV ops
.managementTimeout(600000)
//.dnsSrvEnabled(true)
.kvEndpoints(1);
DefaultCouchbaseEnvironment env = builder.build();
CouchbaseCluster cluster = CouchbaseCluster.create(env, “134.60.47.82”);
Bucket bucket = cluster.openBucket(“ycsb”, “password” , 20, TimeUnit.SECONDS);

  JsonDocument doc = JsonDocument.create("document_id_" + System.currentTimeMillis(), JsonObject
      .create().put("some", "value"));

  bucket.upsert(doc);`

This results in the following exception:

INFORMATION: CouchbaseEnvironment: {sslEnabled=false, sslKeystoreFile='null', sslTruststoreFile='null', sslKeystorePassword=false, sslTruststorePassword=false, sslKeystore=null, sslTruststore=null, bootstrapHttpEnabled=true, bootstrapCarrierEnabled=true, bootstrapHttpDirectPort=8091, bootstrapHttpSslPort=18091, bootstrapCarrierDirectPort=11210, bootstrapCarrierSslPort=11207, ioPoolSize=4, computationPoolSize=4, responseBufferSize=16384, requestBufferSize=16384, kvServiceEndpoints=1, viewServiceEndpoints=12, queryServiceEndpoints=1, searchServiceEndpoints=12, configPollInterval=2500, configPollFloorInterval=50, ioPool=NioEventLoopGroup, kvIoPool=null, viewIoPool=null, searchIoPool=null, queryIoPool=null, coreScheduler=CoreScheduler, memcachedHashingStrategy=DefaultMemcachedHashingStrategy, eventBus=DefaultEventBus, packageNameAndVersion=couchbase-java-client/2.5.8 (git: 2.5.8, core: 1.5.8), retryStrategy=BestEffort, maxRequestLifetime=75000, retryDelay=ExponentialDelay{growBy 1.0 MICROSECONDS, powers of 2; lower=100, upper=100000}, reconnectDelay=ExponentialDelay{growBy 1.0 MILLISECONDS, powers of 2; lower=32, upper=4096}, observeIntervalDelay=ExponentialDelay{growBy 1.0 MICROSECONDS, powers of 2; lower=10, upper=100000}, keepAliveInterval=30000, continuousKeepAliveEnabled=true, keepAliveErrorThreshold=4, keepAliveTimeout=2500, autoreleaseAfter=2000, bufferPoolingEnabled=true, tcpNodelayEnabled=true, mutationTokensEnabled=false, socketConnectTimeout=600000, callbacksOnIoPool=true, disconnectTimeout=25000, requestBufferWaitStrategy=com.couchbase.client.core.env.DefaultCoreEnvironment$2@75f32542, certAuthEnabled=false, coreSendHook=null, forceSaslPlain=false, queryTimeout=75000, viewTimeout=75000, searchTimeout=75000, analyticsTimeout=75000, kvTimeout=600000, connectTimeout=600000, dnsSrvEnabled=false} Mai 25, 2018 8:52:24 AM com.couchbase.client.core.node.CouchbaseNode signalConnected INFORMATION: Connected to Node 134.60.47.82/bwcloud-fip82.rz.uni-ulm.de Mai 25, 2018 8:52:33 AM com.couchbase.client.core.config.DefaultConfigurationProvider$8 call INFORMATION: Opened bucket ycsb Mai 25, 2018 8:52:33 AM com.couchbase.client.core.node.CouchbaseNode signalDisconnected INFORMATION: Disconnected from Node 134.60.47.82/bwcloud-fip82.rz.uni-ulm.de Mai 25, 2018 8:52:54 AM com.couchbase.client.core.endpoint.AbstractEndpoint$2 onSuccess WARNUNG: [][KeyValueEndpoint]: Could not connect to remote socket. Mai 25, 2018 8:52:54 AM com.couchbase.client.core.endpoint.AbstractEndpoint$2 onSuccess WARNUNG: [][KeyValueEndpoint]: Could not connect to remote socket. Mai 25, 2018 8:52:54 AM com.couchbase.client.core.RequestHandler$1$1 onError WARNUNG: Received Error during Reconfiguration. com.couchbase.client.deps.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection timed out: no further information: /192.168.0.117:11210 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716) at com.couchbase.client.deps.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:225) at com.couchbase.client.deps.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:291) at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:634) at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:581) at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460) at com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) at com.couchbase.client.deps.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.ConnectException: Connection timed out: no further information ... 11 more

As you can see I already played with extending the potential timeouts but this did not help.

Remarks:
Yes port 11210 is reachable from the client (but via the public IP 134…), verified via telnet.
Also all Node to Client ports are open as specified in the Couchbase Docs.
And updating the hosts file of the client with the DNS name of the nodes is not an option as the client are dynamically spawned and removed in different clouds.

Also experimented with updating the hosts file of the nodes to the public DNS name but this also did not help (yet this was done after the cluster was established, but all server have been restarted).

If I try the same code in an identical setup with only one node it works fine.

Thanks a lot in advance for you help guys.

Cheers,
Daniel

Hi @seybi. I agree that this looks like a network issue. Have you checked all the ports are open? It’s not just 11210 - the full list is here https://developer.couchbase.com/documentation/server/current/install/install-ports.html.

Also the way you currently open the bucket is with a bucket password:

CouchbaseCluster cluster = CouchbaseCluster.create(env, “134.60.47.82”);
Bucket bucket = cluster.openBucket(“ycsb”, “password” , 20, TimeUnit.SECONDS);

In Couchbase 5.0+, based on customer feedback we added RBAC security, so you’ll now want to authenticate like this:

CouchbaseCluster cluster = CouchbaseCluster.create(env, “134.60.47.82”);
cluster.authenticate("Administrator", "password");
Bucket bucket = cluster.openBucket(“ycsb”);

(Of course you may want to setup real users, using Administrator is just to get you bootstrapped.)

Hope this helps :slight_smile:

Hi @graham.pople
thanks for your reply.

As I wrote, all required ports are accessible from the client side (verified via Telnet from client side) .

The strange thing is that the client tries to connect to the private IP of the Couchbase nodes (192…:11210) which is of course not publicly available. Instead it should connect to the public IP 134… as provided.

Could this issue relate to a misconfiguration of the Couchbase Server which seems to return the private IPs to the client instead of the public IPs to process the acutal operations ?

Regarding the bucket authentication: Thx for this hint, I also spotted this change in the recent updates but didn’t change it in my debugging code so far :wink:

Hi @seybi, ah understood. Could you try it with the authentication change quickly? The reason I ask is that I see there is a successful connection to Node 134.60.47.82 in the logs, which is quickly disconnected, then there’s the failed connection to 192.168.0.117 later. I’m wondering if the first connection is only disconnected due to the authentication thing.

Hi @graham.pople
I added the
cluster.authenticate("user", "password");
and changed the cluster.openBucket(“ycsb”); as you recommended but still the same exception is thrown and the client tries to connect to the private IP.

Any futher suggestions?
Btw., is it possible to turn off the reverse lookup at client side and only operate on IPs?

Hi @seybi. Sorry about the delay in getting back to you. I just wanted to let you know that we’re now looking at supporting having the cluster on a different network to the client. No ETA as yet, but it is in progress. Please see the Java SDK ticket if you want to follow along.

Hi @graham.pople ,
thanks a lot for the update and for looking into this.
I will definitely follow the progress.

So you one final questions about this issue: Is exlucisvely related to the Java SDK or is it the same behavior for the other available SDKs?

It’s going to be all SDKs.