Couchbase Elasticsearch Connector - DcpChannel channel close notifications and high cpu utilization




We’re running the couchbase-elasticsearch-connector-4.0.0 GA release version. We’ve setup the connector to talk to both CB and ES over SSL (secureConnection=true in the config.toml file) but we’re seeing the following in our logs, continuously:

[nioEventLoopGroup-5-15] DEBUG c.c.c.d.c.DcpChannel - Got notified of channel close on Node $nodename/$ip:11207
[nioEventLoopGroup-5-18] INFO c.c.c.d.c.DcpChannel - Node $nodename/$ip:11207 socket closed, initiating reconnect.

Basically, the connector is getting a ton of connection close notifications from Couchbase and is continuously reconnecting. For the couchbase-elasticsearch-connector, we’ve set up a “group” of three workers (with unique member ids) all belonging to the same group. As I understand it, each worker is in charge of replicating a specific key-range (partitions) of CB documents to ES based on the vBucket/shard configurations. CB defaults to 1024. So if we have three workers, two should handle 341 partitions, and the last one 342 for a total of 1024 partitions.

We’ve also setup the keystore properly and have included all the certs/ca into the keystore in order to talk to Couchbase over a secure channel. We’re using Couchbase 5.5.2 and ES 6.4.2. Any reason why the connector keeps getting notified from CB to close the connection? As I understand it, 11207 is the SSL port to access the bucket.

We’re also seeing very high CPU utilization (>90%), even when we aren’t indexing any new documents into CB. We’ve disabled compression in the CBES config as well. We’re thinking that these connection resets are the cause, but it’s not easily reproducible as to when the connector gets into this state of high cpu utilization and is stuck persistently. I’ve looked at the metrics for each connector as well and there was nothing alarming in terms of any backlog of items to be replicated or latency issues. Any ideas?