We use https://github.com/couchbase/java-dcp-client to periodically get data.
Docs are continuously being updated while mutations are sent by DCP.
We initially get sequence number of all vBuckets, and start DCP streaming till that sequence no.
When StreamEnd message is received we consider a vBucket as done.
However sometimes DCP just stops sending any mutations, job fails after waiting for a minute or two.
From logs, say I asked for seq no upto 100, during failure snapshot start seqno was something like 110, and snapshot end seq no was like 120.
So we already got more data than we requested, but DCP for some reason does not send a STREAM_END.
Why does this happen ?
From https://github.com/couchbase/kv_engine/blob/master/docs/dcp/documentation/concepts.md, DCP by default streams from vbucket master, so we should not have problem discussed in the last section " Streaming from a replica vbucket".
Rollback mitigation is not enabled.
However if it were enabled, would DCP stream hang if replica went down ?
Also, by what time is data guaranteed to be persisted ?
Say DCP stream start about 10 secs after getting sequence numbers.
Are those mutations persisted to disk when stream starts ?
Is it okay if I close DCP stream once memory only snapshots start ?