Kafka Connector Disconnects from Couchbase after connecting

I am using Couchbase Kafka connector 3.4.5 to connect to Couchbase 6.5.
I configured the connector as sink and connected to my single VM cluster with no issues. Worked great.

Now I have a kubernetes couchbase cluster that requires TLS and configured the connector to use that.
I also setup TLS in the connector. I see in the logs, the connector does connect but disconnects less than a second later with no apparent error.

I copied the private key (self-signed) from Couchbase console and used this command to create the keystore I specified in connector settings:
keytool -importcert -keystore truststore.jks -file couchbase.crt

I know this couchbase server works because i used curl from the same server as the connector with the same userId and password and got response data from couchbase

Anyone have any idea why and what I might need to look for? I suspect a TLS issue since that is the main difference from my old VM.

Log snip below .

[2020-04-09 17:53:57,351] INFO Connected to Node xx.xxx.70.48 (com.couchbase.client.core.node.Node:120)
[2020-04-09 17:53:57,438] INFO Selected network configuration: default (com.couchbase.client.core.config.ConfigurationProvider:120)
[2020-04-09 17:53:57,653] INFO Opened bucket mybucket (com.couchbase.client.core.config.ConfigurationProvider:120)
[2020-04-09 17:53:57,711] INFO JsonConverterConfig values:
        converter.type = value
        schemas.cache.size = 1000
        schemas.enable = false
 (org.apache.kafka.connect.json.JsonConverterConfig:279)
[2020-04-09 17:53:57,712] INFO WorkerSinkTask{id=couchbase-sink-new-0} Sink task finished initialization and start (org.apache.kafka.connect.runtime.WorkerSinkTask:301)
[2020-04-09 17:53:57,736] INFO Cluster ID: Mf41r6zwS0SxJxZGW_ZaJA (org.apache.kafka.clients.Metadata:379)
[2020-04-09 17:53:57,746] INFO [Consumer clientId=consumer-1, groupId=couchbase-sink-new] Discovered group coordinator 172.16.0.4:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:675)
[2020-04-09 17:53:57,753] INFO [Consumer clientId=consumer-1, groupId=couchbase-sink-new] Revoking previously assigned partitions [] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:459)
[2020-04-09 17:53:57,754] INFO [Consumer clientId=consumer-1, groupId=couchbase-sink-new] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:491)
[2020-04-09 17:53:57,763] INFO Disconnected from Node xx.xxx.70.48 (com.couchbase.client.core.node.Node:120)
1 Like

Hi Barry,

Can you crank the log level up to debug and try it again? There should be a file called connect-log4j.properties somewhere in your Kafka installation you can edit.
Try changing this line:

log4j.rootLogger=INFO, stdout

to

log4j.rootLogger=DEBUG, stdout

Thanks,
David

Thanks . DEBUG helped with something else so I cant believe i forgot about that.

can i send the log directly to you or does something stand out?

I see a few exceptions in there:

[2020-04-09 22:06:52,768] DEBUG Removing and disconnecting node xx.xxx.70.48. (com.couchbase.client.core.RequestHandler:90)
[2020-04-09 22:06:52,769] DEBUG Got instructed to remove Node xx.xxx.70.48 (com.couchbase.client.core.RequestHandler:90)
[2020-04-09 22:06:52,769] DEBUG [xx.xxx.70.48]: Got instructed to disconnect. (com.couchbase.client.core.node.Node:85)
[2020-04-09 22:06:52,770] DEBUG [xx.xxx.70.48]: Got instructed to disconnect. (com.couchbase.client.core.node.Node:85)
[2020-04-09 22:06:52,770] DEBUG [xx.xxx.70.48]: Instructing Service BINARY to disconnect. (com.couchbase.client.core.node.Node:85)
[2020-04-09 22:06:52,770] DEBUG [xx.xxx.70.48][KeyValueService]: Got instructed to disconnect. (com.couchbase.client.core.service.Service:85)
[2020-04-09 22:06:52,770] DEBUG [xx.xxx.70.48][KeyValueService]: New number of endpoints is 1 (com.couchbase.client.core.service.Service:90)
[2020-04-09 22:06:52,771] DEBUG [xx.xxx.70.48][KeyValueService]: Disconnecting Endpoint during Service disconnect. (com.couchbase.client.core.service.Service:85)
[2020-04-09 22:06:52,786] DEBUG No TrustStore provided, using provided KeyStore to initialize both factories. (com.couchbase.client.core.endpoint.SSLEngineFactory:85)
[2020-04-09 22:06:52,798] DEBUG Unhandled exception during channel connect, ignoring. (com.couchbase.client.core.endpoint.Endpoint:105)
java.nio.channels.UnresolvedAddressException
    at java.base/sun.nio.ch.Net.checkAddress(Net.java:130)
	at java.base/sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:675)
	at com.couchbase.client.deps.io.netty.util.internal.SocketUtils$3.run(SocketUtils.java:83)
[2020-04-09 22:06:52,983] DEBUG [Consumer clientId=consumer-1, groupId=connect-pipeline-couchbase-sink-new] Heartbeat thread started (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:1018)
[2020-04-09 22:06:52,994] DEBUG [/xx.xxx.70.48:11207][KeyValueEndpoint]: Disconnected Endpoint. (com.couchbase.client.core.endpoint.Endpoint:85)
[2020-04-09 22:06:52,995] INFO Disconnected from Node xx.xxx.70.48 (com.couchbase.client.core.node.Node:120)
[2020-04-09 22:06:52,996] DEBUG [xx.xxx.70.48]: Disconnected (DISCONNECTING) from Node (com.couchbase.client.core.node.Node:85)
[2020-04-09 22:06:53,005] DEBUG [Consumer clientId=consumer-1, groupId=connect-pipeline-couchbase-sink-new] Disabling heartbeat thread (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:995)
[2020-04-09 22:06:53,005] INFO [Consumer clientId=consumer-1, groupId=connect-pipeline-couchbase-sink-new] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:491)
[2020-04-09 22:06:53,006] DEBUG [Consumer clientId=consumer-1, groupId=connect-pipeline-couchbase-sink-new] Sending JoinGroup ((type: JoinGroupRequest, groupId=connect-pipeline-couchbase-sink-new, sessionTimeout=10000, rebalanceTimeout=300000, memberId=consumer-1-680d8b9a-915f-4aa1-86d0-a0ec86388045, protocolType=consumer, groupProtocols=org.apache.kafka.common.requests.JoinGroupRequest$ProtocolMetadata@52a4161c)) to coordinator 172.16.0.4:9092 (id: 2147483647 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:499)
[2020-04-09 22:06:53,009] DEBUG Unhandled exception during channel connect, ignoring. (com.couchbase.client.core.endpoint.Endpoint:105)
java.nio.channels.UnresolvedAddressException
	at java.base/sun.nio.ch.Net.checkAddress(Net.java:130)
	at java.base/sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:675)
	at com.couchbase.client.deps.io.netty.util.internal.SocketUtils$3.run(SocketUtils.java:83)  
. . .
[2020-04-09 22:06:53,013] DEBUG [/xx.xxx.70.48:11207][KeyValueEndpoint]: Channel Inactive. (com.couchbase.client.core.endpoint.AbstractGenericHandler:85)

That UnresolvedAddressException stands out. Are the connector and Couchbase inside the same kubernetes network, or are you trying to connect from outside?

Couchbase is in the kubernetes network exposed by exposedServiceType: LoadBalancer, exposedFeatues, and I’m using external-dns to replicate to my DNS Zone.
The connector is on a VM in the same VNet as the k8s host.
The connector is configured to use the external IP of one of the nodes.

I can reach Couchbase by its external IP address using curl from the connector VM (I do have to specify -k because its self-signed cert)

Note that my .com domain is fake but the name is resolved by specifying the Azure DNS in CoreDNS – this all could be chasing into a rabbit hole due to that. Is that coming back to screw me? if something outside kubernetes tries to resolve the FQDN, it will fail.

If there is another way to reach the cluster from the same vnet without LoadBalance and external-dns, i can try that too.

your comment about UnresolvedAddressException and my comment about resolving FQDN from the connector host got me thinking.
I edited /etc/hosts and added the full node DNS name and its IP , low and behold it connected and upserted some items into my bucket.
I see more errors now related to trying to reach the other nodes in the cluster so I’ll add those and see where it gets me.

all of this pain because Couchbase development must wait for no man. the real development domain and certificates are slow to happen and i wont let that stop forward movement. not happening :wink:

will update once i resolve the issues or hit another wall.
Thanks.

EDIT
yup. adding the 3 nodes FQDNs and their dynamic IPs to hosts file got it done (until a node gets replaced and ruins everything). at least it got me moving forward.

Thanks for helping David.

1 Like