Sync gateway change feed fetch repeated docs in longpoll mode

  1. query on ‘03-17 09:15:27’
{"active_only":true,"since":"5297414","limit":1000,"include_docs":true,"feed":"longpoll"}

response:

{
  "results":[
    {
      "seq":"5297415::5297418","id":"-UqvzOYR_CPhbUkAl5YYzWr",
    }
  ],
  "last_seq":"5297415::5297418"
}
  1. subsequent query made with since: 5297415::5297418, and get last_seq 5297415::5297420

  2. use a loop to repeat the change feed action, and get last_seq like:

5297415::5297418
5297415::5297420
5297415::5297421
5297415::5297423
5297415::5297424
5297415::5297425
5297415::5297427
5297415::5297428
5297415::5297430
5297415::5297432
...
5297415::5298104
  1. however, a query on ‘03-17 10:35:53’
{"active_only":true,"since":"5297415::5298104","limit":1000,"include_docs":true,"feed":"longpoll"}

fetch many repeated docs which fetched in previous queries

{
  "results":[
    // repeated docs which fetched in previous queries
    {
      "seq":"5297418","id":"-UqvzOYR_CPhbUkAl5YYzWr",
    },
    {
      "seq":"5297420","id":"-vf-sv4svjqk829ThtKBAMH"
    }
    ......
  ],
  "last_seq":"5297949::5298114"
}

Why does sync gateway send repeated docs? How can I avoid it?

Sync Gateway isn’t sending duplicate documents. A sequence Id of the form 5297415::5298104 is a “compound sequence”. Here 5297415 is the last contiguous sequence observed by Sync Gateway over the database change feed coming in from server. The use of compound sequence is an optimization to prevent sync gateway from waiting indefinitely for a missing sequence from server before it sends the changes to the client. By default sync gateway waits for 5 seconds before skipping a sequence configurable via these options.
So next time client requests for changes since 5297415::5298104, its going to try sending all skipped sequences starting from last stable_seq which is 5297415

Don’t get too attached to the details documented here because these are inner workings of sync gateway that can change and a lot of it has evolved since the writing of that page - but it should give you a good overview.

@priya.rajagopal Thank you for your answer.

I am still confused with it.

Our system asks Sync Gateway for all changes since 5306291 on 11:10. The response contains a doc "_id":"-RQC8d34A-ETZgI6hWBCJEu","_rev":"1-02361e471fe1bd3b3de0b141c2545dc74e7fc721" and the seq is a compound sequences 5306297::5306335.

It asks Sync Gateway for all changes since 5306297::5307115 on 12:36. The response contains the doc "_id":"-RQC8d34A-ETZgI6hWBCJEu","_rev":"1-02361e471fe1bd3b3de0b141c2545dc74e7fc721" and the seq is a compound sequences 5306335.

In my opinion,the doc of seq 5306297::5306335 and 5306335 are same. And we have to drop the seq 5306335 in our system since it has received the doc -RQC8d34A-ETZgI6hWBCJEu in the seq 5306297::5306335.

It there something wrong with my query? It there some way to make sure that we donot receive duplicate docs as as above?

Thans.

@priya.rajagopal do you have any suggestion for this please?

This is behaving as expected - you will need to perform the deduplication on your side, or switch to a continous changes feed (which does the dedeuplication work inside Sync Gateway).

The main problem is that under a scenario with skipped sequences on a non-continous changes feed, we don’t know the full set of sequences we’ve not sent to the client in previous requests. The first element of the sequence is the earliest skipped sequence, but there may be more between that first one and your since value.

Thanks. I will try the continuous model.