The configuration looks okay. The behavior you’re seeing is mysterious.
The “obvious” causes would be if a second copy of the connector is somehow getting launched. You could check the number of DCP connections by going to the bucket statistics and expanding “DCP Queues” and viewing the “Other” DCP Connections graph:
The connector is expected to open
tasks.max (2, in this case) connections per Couchbase node. Might want to take a baseline measurement without the Kafka connector running, then measure again after it’s running. If the number jumps up by more than 2 per Couchbase node, then it’s likely there’s more than one copy of the connector running.
Another cause would be if the application talking to Couchbase is modifying the documents redundantly. A DCP mutation event is fired even if a document has identical contents before and after an update operation. I would try seeing how many Kafka messages are published when a document is modified via the web console UI.
The connector logs might also have some useful information. Grepping for “
INFO Poll returns” should show how many messages are published in each batch. The expected value here would be 1 for each Couchbase mutation or deletion.
If you want to enable debug logging, there should be a
connect-log4j.properties file somewhere under your Kafka installation. Set the root log level to DEBUG and restart the connector. Now you’ll see messages containing “
About to send” which show how many messages the Kafka Connect framework is publishing.
If all else fails, plugging in a custom
SourceHandler like you described would indeed give you complete control over the message flow. But I wouldn’t recommend that as a long-term solution … there’s got to be something else going wrong.
Please check for the possibility of duplicate connector instances, and also see if you can reproduce the problem be editing a document in the Web UI. If that doesn’t get us any closer to a solution then we’ll need to dig deeper.