Verifying replication between sync gateway and couchbase lite

davidj · March 24, 2018, 3:36pm

I’ve developed a mobile app that replicates data from sync gateway to a local couchbase lite instance. We’ve been occasionally running into issues where it appears that not all documents have been replicated due to problems such as https://github.com/couchbase/couchbase-lite-ios/issues/2024 or simply missing documents due to issues in the upstream services that feed the documents into Couchbase, and I want to provide a tool to help verify the documents in cb lite match what is available in the cb cluster. I have a view on couchbase lite that can produce a count of documents by type. I’m thinking of providing a way to get the same information from sync gateway (filtered by the user’s channels) to compare the counts as a crude way to see if they match, and if not, which type(s) are mismatched. I’m also considering generating checksums to go a step further to validate the actual data. (Note: I know this would not likely be very useful in an environment where documents are frequently changing, but in my case they aren’t changing frequently, and when they do change it’s usually a bulk, off-hours thing.)

I have the same view in sync gateway, but apparently reduce doesn’t work there due to https://github.com/couchbase/sync_gateway/issues/857. I could run the view on sync gateway without reduce, get back all the results (possibly with multiple queries using limit and offset) and then reduce the results to counts myself, but it seems like a brute force, slow, costly approach. Does anyone have a better idea how to accomplish this goal? Would it be possible to query the server with N1QL to get these counts/checksums filtered by channel? Is there an entirely different approach that would work better?

David

hod.greeley · March 26, 2018, 8:49pm

Server doesn’t understand channels, but perhaps depending on your doc structure you might be able to do this with N1QL anyway. It’s very powerful in terms of manipulating results, so, for example, I think it should be possible to come up with a N1QL query that gives you aggregate doc totals broken out by features of the document.

The easy example would be if you have something like channel assignment according to some field in the document. Do you think your doc/channel structure lends itself to this?

There’s also the Eventing Service in developer preview for 5.5, which is quite powerful and flexible and could do what you want. That’s not ready for deployment yet, though.

davidj · March 27, 2018, 6:37pm

Thanks for the reply, @hod.greeley. The channels are not derived from data in the doc, at least not in a consistent way that we could effectively code into a query. So, if I understand right, the channels are in a hidden field, or stored separately and can’t be queried? I guess we could duplicate the channels in the doc, but that introduces the possibility of them getting out of sync.

jens · March 27, 2018, 8:03pm

Channels are always derived from data in the doc, because the sync function which computes the channels has no access to anything other than the doc’s properties.

The channels are associated with the document, but not accessible in a supported way.

davidj · May 25, 2018, 6:50pm

Yes, from Couchbase’s point of view the channels are derived from data in the doc, literally the channels we specify when we post the document in our case. However, we have a distributed architecture in which the computation of the channels to apply to each type of document is not centralized anywhere, and that information is not necessarily in the document we store in Couchbase, which is not the source of record for this data. Our sync function simply relies on the values already specified in the channels property. Incorporating the logic that can compute channels based on the content of each type of document that can flow through Couchbase is not a practical, supportable approach for us.

The behavior I see is that the channels I specify when posting a doc are not available to be used in a query. So as I wrote before, it looks like I would have to duplicate this list of channels in the document to accomplish what I want to do.