lastSequence remains constant across many push replications

RainerG · October 4, 2016, 12:10am

In a push replication I see how the Pusher GETs the doc with lastSequence at the beginning of the replication, then it does a lot of POST _revs_diff and _bulk_docs, and finally PUTs the doc with lastSequence.

It creates a new _rev on the doc, but for some reason lastSequence remains at 352730 through many push replications. This means that it redundantly proposes the same revisions with _revs_diff in every push replication.In my case this results in a lot of unnecessary traffic.

I see 2 types of errors in the sync gateway log. Please advise if they could be the reason that the pusher gets confused:

BulkDocs: Local Doc “_local/BtleCommandQueue6ejPHTRvbiOxCPCTKFn4J8uXvVnkLrf17JL2a2y8PfM” --> 404 No previous revision to replace (404 No previous revision to replace)

Why is the CB Lite pushing a _local document anyways ?

2016-09-26T13:31:07.717-07:00 WARNING: Sync fn exception: TypeError; doc = map[locationEnabled:%!s(bool=false) scanEnabled:%!s(bool=false) scanIntervalMS:%!s(float64=10000) interrogateIntervalMS:%!s(float64=10000) purgeOlderThanSec:%!s(float64=604800) _id:Settings6ejPHTRvbiOxCPCTKFn4J8uXvVnkLrf17JL2a2y8PfM _rev:41-6e0038bc92f0a0408761546fc4fa8412 _deleted:%!s(bool=true) appInstanceIdHash:6ejPHTRvbiOxCPCTKFn4J8uXvVnkLrf17JL2a2y8PfM showAnonymousDevices:%!s(bool=false) syncIntervalMS:%!s(float64=10000) userIdHash:1I8mxB5YFK1kChPjOAitLYI8GSnWRT7iUK38HeI5Idg _revisions:map[ids:[6e0038bc92f0a0408761546fc4fa8412] start:%!s(float64=41)] locationIntervalMS:%!s(float64=30000) syncUrl:https://semd4.biotronikusa.com/btleperftest/ timeStamp:%!s(float64=1.474583006289e+15) autoReconnect:%!s(bool=true) compactOnStart:%!s(bool=false) docName: docType:Settings] – db.(*Database).getChannelsAndAccess() at crud.go:871

Apparently something is wrong in my sync function. I can’t figure out what it is.

borrrden · October 4, 2016, 1:22am

It is not pushing a local document, it is creating a local document. That local document will store how far the push replication has gone. Since this is a distributed system, we cannot guarantee anything about what has happened on the other side so each will store their progress independently. At the beginning the client will ask the server for its progress, and if it matches what it has stored then it will continue from that point. Otherwise, it is considered invalid and for safety reasons the push will start from the beginning of the database to ensure that the server has all of the documents it does. The client will write its progress every so often during the replication, and again after it finishes. If the app were to be killed before it has a chance to do so, this is a potential reason for the progress not matching. You might think it would make sense to choose the lower of the two numbers in that case, but not every implementation uses numbers for its progress.

The first error indicates that the checkpoint has been erased from sync gateway (i.e. the client has a checkpoint but sync gateway does not). Does this sound like something that could have happened?

RainerG · October 4, 2016, 2:23pm

The first error is about "_local/BtleCommandQueue6ejPHTRvbiOxCPCTKFn4J8uXvVnkLrf17JL2a2y8PfM"
This is one of my local application docs on the the Android device. SG correctly kicks it out, but I would say CB Lite shouldn’t even send it there. This is probably unrelated to the lastSequence issue.

borrrden · October 4, 2016, 9:20pm

Oh right good point. That’s really weird I don’t think I’ve seen that happen before (CBL should not send it). The rest of the explanation is still valid though. If the replicator starts over, the logs will indicate it.

RainerG · October 6, 2016, 11:04pm

@borrrden: I figured out the problem. In a push replication, if a document fails to get pushed, the replication will continue with pushing newer documents (so far so good). On the next push replication it will start again from the sequence number of the old document which previously had a push failure. That’s great, assuming that the push failure was temporary and may succeed this time. But I have an old doc _local document, which sync gateway rejects correctly, because CB Lite should not have submitted it in the first place. This means every push replication gets progressively longer as more revisions get created, and the replicator never advances up the LastSequence checkpoint. Should I submit an issue, or does it fall under https://github.com/couchbase/couchbase-lite-net/issues/742 ?

Similarly if my sync function rejects an old document for some reason, the LastSequence checkpoint get stuck on that document, and replication will always restart from that sequence number.

borrrden · October 6, 2016, 11:29pm

It looks like a new issue has been file anyway that relates to this. I have no explanation right now as to why it is trying to push a local document…local documents are stored in a different table than other documents. If it somehow got stored in the document table that could cause this. Perhaps the forbidden errors need to be handled in the same way as 404, in that they should be skipped without retry. I’ll also think (with the team) about if we can report this back to the consumer.