Couchbase kafka-connector replays a subset of DCPEvents upon restart




I am seeing some behaviour with the kafka-connector that i am not expecting and wondering if someone can shed some light on it.

I am using the default of zookeeper for state storage and i can see that state is being stored and retrieved/re-initiated correctly. However, upon restarting the connector I see a subset of events get replayed.

I managed to track a document to a specific partition and can see that a change to the document incurs a correct increase in the sequence number that is stored in zookeeper, and can see that partition, with correct sequence number, loaded when the bridge restarts. However that key was being replayed and sent again on to kafka on every subsequent restart. Having left it overnight a restart does not trigger it to be replayed anymore.

It therefore appears that some subset of DCPEvents always get replayed and perhaps there is always a minimum number of changes or that everything within a given timespan is re-triggered when a subscription is restarted? I don’t know whether this is expected behaviour - perhaps until document changes have been flushed to disk they are replayed to avoid missing them or something?

Does anyone have any idea what might be happening here?

I am using kafka-connector version 2.0.0, with core-io 1.2.6, and am seeing this with a single node cluster, running couchbase version 4.1.0-5005 Enterprise Edition (build-5005).



The interface DCP gives us is ‘at least once delivery’, so that may be what you’re seeing here. The items you’re seeing multiple times are part of the current open checkpoint. At least, I believe that’s what is happening. @avsej can probably shed more light on it.