What is a Snapshot in the Spark Connector?

In each stream each record can either be:

  • Snapshot
  • Deletion
  • Mutation

What is a Snapshot, and why does it look like an iterable?

Thanks!

@zoltan.zvara a Snapshot is a specific message type from DCP, the database change protocol used by Couchbase as the source for Spark Streaming. It basically contains sequence number information and the state it is in on the server side, this is needed when you need to build reliable stream infrastructure on top. Note that for now you may want to simply consume the mutations and ignore the rest, we are reworking the underlying streaming bits right now and have big plans going forward (this is why spark streaming is marked as experimental right now since it will very likely change in the future)…

1 Like

But these Snapshots are going to be delivered to the receivers no matter the case right?

What are your plans regarding the Spark Streaming support? This question is increasingly important for us, since we plan to build applications on top of the current approach. Do you have any JIRA that might point us to the plans ahead about the streaming part?

@zoltan.zvara we are currently working on a standalone module for DCP that will have all the bells and whistles, once it is ready we’ll migrate over to this one. I can’t share a roadmap for it right now but since we have lots of demand for this I expect it to be “in the next few months”.