After Connector Restart - Partition Rollbacks

lapral · July 8, 2020, 4:21pm

Couchbase Server Version: Couchbase Enterprise 5.5.3
DCP Client: 0.24.0
Kafka Connect Couchbase: 3.4.5

Problem: On one particular cluster, whenever we restart the Kafka Connector, all partitions rollback to 0.

The Bucket has 5 billion documents, and is under constant compaction, so I think that might have something to do with it?

More information: This is reproducible after running the Connector for 12 hours, killing it, and then restarting it 5 minutes later. The Cluster receives around 18K SETs per second if that matters.

david.nault · July 8, 2020, 6:40pm

Hi Auston,

If you were seeing the rollbacks occur after a failover or rebalance, I would recommend upgrading to the latest version of the connector to pick up the fixes described in the release notes for version 3.4.6. But this looks like a different issue.

I see you’re using Couchbase Enterprise Edition. If you have an Enterprise Subscription License, I’d recommend opening a support ticket so we can get the support team involved. They’ll probably ask you to run the cbcollect_info tool to capture the Couchbase Server logs.

I’m particularly interested in memcached.log which might have a clue about why the server told the connector to roll back to zero.

Your hunch about the constant compaction may prove to be correct, in which case there’s not much we can do from the connector side. However, there may be a way to tune the server to mitigate the problem. This is something the support team know way more about that I do.

Thanks,
David