Different sync gateway instances working with same cluster return different results

Hi

My configuration is the following:

  • 2 couchbase 4.5.0 instances
  • 2 sync gateway 1.3.1 instances

Problem is that one sync gateway instance returns not all results. For example
http://sync-gateway1/db/_changes?filter=sync_gateway/bychannel&channels=apptables_mainUser111507_timeRecording
returns:

{“results”:[
{“seq”:530742,“id”:“mainUser111507__timeRecording__ORUW2ZKSMVRW64TENFXGOQTPN5VWS3THOM000000”,“deleted”:true,“changes”:[{“rev”:“3-24082f79195178decab645d13caf2ba3”}]}
,{“seq”:530821,“id”:“mainUser111507__timeRecording__ORUW2ZKSMVRW64TENFXGORLNOBWG66LFMU000000”,“deleted”:true,“changes”:[{“rev”:“2-68ddf874a53312ee1f8fa4740f519a45”}]}_
,{“seq”:5153971,“id”:“mainUser111507__timeRecording__MVWXA3DPPFSWK4Y0”,“changes”:[{“rev”:“6-4fe67fa2f8af1519a49663ae574515ed”}]}
],
“last_seq”:“5153971”}

But
http://sync-gateway1/db/_changes?filter=sync_gateway/bychannel&channels=apptables_mainUser111507_timeRecording
returns

{“results”:[
{“seq”:530593,“id”:“mainUser111507__timeRecording__ORUW2ZKSMVRW64TENFXGOU3JORSQ0000”,“deleted”:true,“changes”:[{“rev”:“2-ddf0f424d85600be85c65e955eac573f”}]}
,{“seq”:530742,“id”:“mainUser111507__timeRecording__ORUW2ZKSMVRW64TENFXGOQTPN5VWS3THOM000000”,“deleted”:true,“changes”:[{“rev”:“3-24082f79195178decab645d13caf2ba3”}]}
,{“seq”:530821,“id”:“mainUser111507__timeRecording__ORUW2ZKSMVRW64TENFXGORLNOBWG66LFMU000000”,“deleted”:true,“changes”:[{“rev”:“2-68ddf874a53312ee1f8fa4740f519a45”}]}
,{“seq”:5153969,“id”:“mainUser111507__timeRecording__ONUXIZLT”,“changes”:[{“rev”:“6-519bf1cd5fd2d6e9ab22f0dc4cf25e9b”}]}
,{“seq”:5153971,“id”:“mainUser111507__timeRecording__MVWXA3DPPFSWK4Y0”,“changes”:[{“rev”:“6-4fe67fa2f8af1519a49663ae574515ed”}]}
,{“seq”:5153972,“id”:“mainUser111507__timeRecording__MJXW623JNZTXG000”,“changes”:[{“rev”:“6-bcc20df480bae0fe8c67ca96804d4324”}]}
,{“seq”:5153973,“id”:“mainUser111507__timeRecording__OJSWC43PNZDG64SBMJZWK3TDMVZQ0000”,“changes”:[{“rev”:“4-6dd17da4d72e3d57b6df5fe23d3ed3cb”}]}
,{“seq”:5153974,“id”:“mainUser111507__timeRecording__MFRHGZLOMNSXG000”,“changes”:[{“rev”:“4-cb58b3c54ac3c595fee9e9a4a270ecba”}]}
],
“last_seq”:“5153974”}

When I call /_changes without filter - both instances return identical results including sequences which missed in first instance in example above. After restart of first sync gateway instance - it shows same results as second instance.

What can be the problem?

Thanks

Enabling Changes+ logging can help provide some insight into what’s happening differently between the two nodes.

My first thought was that the first Sync Gateway node had dropped the server mutation feed, and so wasn’t up to date with the latest changes. However, the presence of sequence 530593 in the second results doesn’t fit that scenario - there shouldn’t be a scenario where the first Sync Gateway fails to see older documents.

If you hit this issue again, you can enable “Changes+” logging on the fly using the _logging admin endpoint:

curl -X PUT --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{"Changes+":true}' 'https://[sync-gateway1]:4985/_logging'

https://developer.couchbase.com/documentation/mobile/1.4/references/sync-gateway/admin-rest-api/index.html#!/server/put_logging

It looks like problem not in mutations feed. As I can see in couchbase - sequence 5153974 relates to the following modification timestamp:
“time_saved”: “2017-03-01T18:58:24.514341554+01:00”

But both sync gateway instances where restarted two dates ago - at 2017-03-12.

According to log files when I made requests (not when modification was really done) they are the following:
Working instance:

10:10:13.927090 2017-03-14T10:10:13.927+01:00 HTTP: #71094: GET /ag-dev/_changes?filter=sync_gateway/bychannel&channels=apptables_mainUser111507_timeRecording (ADMIN)
10:10:13.927339 2017-03-14T10:10:13.927+01:00 Changes+: Int sequence multi changes feed…
10:10:13.927607 2017-03-14T10:10:13.927+01:00 Changes: MultiChangesFeed({apptables_mainUser111507_timeRecording}, {Since:0 Limit:0 Conflicts:false IncludeDocs:false Wait:false Continuous:false Terminator:0xc820e703c0 HeartbeatMs:0 TimeoutMs:300000 ActiveOnly:false}) …
10:10:13.929312 2017-03-14T10:10:13.929+01:00 Changes+: MultiChangesFeed: channels expand to channels.TimedSet{“apptables_mainUser111507_timeRecording”:channels.VbSequence{VbNo:(*uint16)(nil), Sequence:0x0}} …
10:10:13.929556 2017-03-14T10:10:13.929+01:00 Cache: Initialized cache for channel “apptables_mainUser111507_timeRecording” with options: &{ChannelCacheMinLength:50 ChannelCacheMaxLength:500 ChannelCacheAge:1m0s}
10:10:13.929912 2017-03-14T10:10:13.929+01:00 Cache: getCachedChanges(“apptables_mainUser111507_timeRecording”, {0 0 0 0 %!d(string=) 0 0 0 %!d(string=)}) → 0 changes valid from #5285447
10:10:13.930034 2017-03-14T10:10:13.930+01:00 Cache: Querying ‘channels’ view for “apptables_mainUser111507_timeRecording” (start=#1, end=#5285447, limit=0)
10:10:14.069901 2017-03-14T10:10:14.069+01:00 Cache: Got 8 rows from view for “apptables_mainUser111507_timeRecording”: #530593#5153974
10:10:14.069974 2017-03-14T10:10:14.069+01:00 Cache: Initialized cache of “apptables_mainUser111507_timeRecording” with 8 entries from view (#530593#5153974)
10:10:14.070032 2017-03-14T10:10:14.070+01:00 Cache: GetChangesInChannel(“apptables_mainUser111507_timeRecording”) → 8 rows
10:10:14.070065 2017-03-14T10:10:14.070+01:00 DIndex+: [changesFeed] Found 8 changes for channel apptables_mainUser111507_timeRecording
10:10:14.070231 2017-03-14T10:10:14.070+01:00 Changes+: Sending seq:530593 from channel apptables_mainUser111507_timeRecording
10:10:14.070270 2017-03-14T10:10:14.070+01:00 Changes+: Sending seq:530742 from channel apptables_mainUser111507_timeRecording
10:10:14.070484 2017-03-14T10:10:14.070+01:00 Changes+: MultiChangesFeed sending &{Seq:530593 ID:mainUser111507__timeRecording__ORUW2ZKSMVRW64TENFXGOU3JORSQ0000 Deleted:true Removed:{} Doc:map Changes:[map[rev:2-ddf0f424d85600be85c65e955eac573f]] Err: allRemoved:false branched:true}
10:10:14.070635 2017-03-14T10:10:14.070+01:00 Changes+: Sending seq:530821 from channel apptables_mainUser111507_timeRecording
10:10:14.070700 2017-03-14T10:10:14.070+01:00 Changes+: MultiChangesFeed sending &{Seq:530742 ID:mainUser111507__timeRecording__ORUW2ZKSMVRW64TENFXGOQTPN5VWS3THOM000000 Deleted:true Removed:{} Doc:map Changes:[map[rev:3-24082f79195178decab645d13caf2ba3]] Err: allRemoved:false branched:true}
10:10:14.070783 2017-03-14T10:10:14.070+01:00 Changes+: Sending seq:5153969 from channel apptables_mainUser111507_timeRecording
10:10:14.070863 2017-03-14T10:10:14.070+01:00 Changes+: MultiChangesFeed sending &{Seq:530821 ID:mainUser111507__timeRecording__ORUW2ZKSMVRW64TENFXGORLNOBWG66LFMU000000 Deleted:true Removed:{} Doc:map Changes:[map[rev:2-68ddf874a53312ee1f8fa4740f519a45]] Err: allRemoved:false branched:true}
10:10:14.070950 2017-03-14T10:10:14.070+01:00 Changes+: Sending seq:5153971 from channel apptables_mainUser111507_timeRecording
10:10:14.071010 2017-03-14T10:10:14.070+01:00 Changes+: MultiChangesFeed sending &{Seq:5153969 ID:mainUser111507__timeRecording__ONUXIZLT Deleted:false Removed:{} Doc:map Changes:[map[rev:6-519bf1cd5fd2d6e9ab22f0dc4cf25e9b]] Err: allRemoved:false branched:true}
10:10:14.071092 2017-03-14T10:10:14.071+01:00 Changes+: Sending seq:5153972 from channel apptables_mainUser111507_timeRecording
10:10:14.071163 2017-03-14T10:10:14.071+01:00 Changes+: MultiChangesFeed sending &{Seq:5153971 ID:mainUser111507__timeRecording__MVWXA3DPPFSWK4Y0 Deleted:false Removed:{} Doc:map Changes:[map[rev:6-4fe67fa2f8af1519a49663ae574515ed]] Err: allRemoved:false branched:true}
10:10:14.071280 2017-03-14T10:10:14.071+01:00 Changes+: Sending seq:5153973 from channel apptables_mainUser111507_timeRecording
10:10:14.071340 2017-03-14T10:10:14.071+01:00 Changes+: MultiChangesFeed sending &{Seq:5153972 ID:mainUser111507__timeRecording__MJXW623JNZTXG000 Deleted:false Removed:{} Doc:map Changes:[map[rev:6-bcc20df480bae0fe8c67ca96804d4324]] Err: allRemoved:false branched:true}
10:10:14.071425 2017-03-14T10:10:14.071+01:00 Changes+: Sending seq:5153974 from channel apptables_mainUser111507_timeRecording
10:10:14.071483 2017-03-14T10:10:14.071+01:00 Changes+: MultiChangesFeed sending &{Seq:5153973 ID:mainUser111507__timeRecording__OJSWC43PNZDG64SBMJZWK3TDMVZQ0000 Deleted:false Removed:{} Doc:map Changes:[map[rev:4-6dd17da4d72e3d57b6df5fe23d3ed3cb]] Err: allRemoved:false branched:false}
10:10:14.071601 2017-03-14T10:10:14.071+01:00 Changes+: MultiChangesFeed sending &{Seq:5153974 ID:mainUser111507__timeRecording__MFRHGZLOMNSXG000 Deleted:false Removed:{} Doc:map Changes:[map[rev:4-cb58b3c54ac3c595fee9e9a4a270ecba]] Err: allRemoved:false branched:false}
10:10:14.071634 2017-03-14T10:10:14.071+01:00 Changes: MultiChangesFeed done

Broken instance:

16:27:36.081647 2017-03-13T16:27:36.081+01:00 HTTP: #332784: POST /ag-dev/_changes?feed=normal&heartbeat=30000&style=all_docs&filter=sync_gateway%2Fbychannel (as mainUser111507)
16:27:36.081821 2017-03-13T16:27:36.081+01:00 Changes+: Int sequence multi changes feed…
16:27:36.081886 2017-03-13T16:27:36.081+01:00 Changes: MultiChangesFeed({apptables_mainUser111507_timeRecording}, {Since:0 Limit:0 Conflicts:true IncludeDocs:false Wait:false Continuous:false Terminator:0xc8b8e6e600 HeartbeatMs:10000 TimeoutMs:300000 ActiveOnly:false}) … (to mainUser111507)
16:27:36.090261 2017-03-13T16:27:36.090+01:00 Changes+: MultiChangesFeed: channels expand to channels.TimedSet{“apptables_mainUser111507_timeRecording”:channels.VbSequence{VbNo:(*uint16)(nil), Sequence:0x8189d}} … (to mainUser111507)
16:27:36.090443 2017-03-13T16:27:36.090+01:00 Cache: Initialized cache for channel “apptables_mainUser111507_timeRecording” with options: &{ChannelCacheMinLength:50 ChannelCacheMaxLength:500 ChannelCacheAge:1m0s}
16:27:36.090522 2017-03-13T16:27:36.090+01:00 Cache: getCachedChanges(“apptables_mainUser111507_timeRecording”, {0 530589 0 0 %!d(string=) 0 0 0 %!d(string=)}) → 0 changes valid from #5285444
16:27:36.090620 2017-03-13T16:27:36.090+01:00 Cache: Querying ‘channels’ view for “apptables_mainUser111507_timeRecording” (start=#1, end=#5285444, limit=0)
16:28:16.232952 2017-03-13T16:28:16.232+01:00 HTTP: #332793: POST /ag-dev/_changes?feed=normal&heartbeat=30000&style=all_docs&filter=sync_gateway%2Fbychannel (as mainUser111507)
16:28:16.233235 2017-03-13T16:28:16.233+01:00 Changes+: Int sequence multi changes feed…
16:28:16.233346 2017-03-13T16:28:16.233+01:00 Changes: MultiChangesFeed({apptables_mainUser111507_timeRecording}, {Since:0 Limit:0 Conflicts:true IncludeDocs:false Wait:false Continuous:false Terminator:0xc85323c420 HeartbeatMs:10000 TimeoutMs:300000 ActiveOnly:false}) … (to mainUser111507)
16:28:16.236981 2017-03-13T16:28:16.236+01:00 Changes+: MultiChangesFeed: channels expand to channels.TimedSet{“apptables_mainUser111507_timeRecording”:channels.VbSequence{VbNo:(*uint16)(nil), Sequence:0x8189d}} … (to mainUser111507)
16:28:16.237090 2017-03-13T16:28:16.237+01:00 Cache: getCachedChanges(“apptables_mainUser111507_timeRecording”, {0 530589 0 0 %!d(string=) 0 0 0 %!d(string=)}) → 0 changes valid from #5285444
16:28:36.110025 2017-03-13T16:28:36.110+01:00 Cache: Got 3 rows from view for “apptables_mainUser111507_timeRecording”: #530742#5153971
16:28:36.110316 2017-03-13T16:28:36.110+01:00 changes_view: Query took 1m0.019498692s to return 3 rows, options = db.Body{“stale”:false, “startkey”:interface {}{“apptables_mainUser111507_timeRecording”, 0x1}, “endkey”:interface {}{“apptables_mainUser111507_timeRecording”, 0x50a644}}
16:28:36.110386 2017-03-13T16:28:36.110+01:00 Cache: Initialized cache of “apptables_mainUser111507_timeRecording” with 3 entries from view (#530742#5153971)
16:28:36.110420 2017-03-13T16:28:36.110+01:00 Cache: GetChangesInChannel(“apptables_mainUser111507_timeRecording”) → 3 rows
16:28:36.110455 2017-03-13T16:28:36.110+01:00 DIndex+: [changesFeed] Found 3 changes for channel apptables_mainUser111507_timeRecording
16:28:36.110633 2017-03-13T16:28:36.110+01:00 Changes+: Sending seq:530742 from channel apptables_mainUser111507_timeRecording
16:28:36.110672 2017-03-13T16:28:36.110+01:00 Changes+: Sending seq:530821 from channel apptables_mainUser111507_timeRecording
16:28:36.110955 2017-03-13T16:28:36.110+01:00 Changes+: Sending seq:5153971 from channel apptables_mainUser111507_timeRecording
16:28:36.111038 2017-03-13T16:28:36.110+01:00 Cache: 2nd getCachedChanges(“apptables_mainUser111507_timeRecording”, {0 530589 0 0 %!d(string=) 0 0 0 %!d(string=)}) got 3 more, valid from #1!
16:28:36.111066 2017-03-13T16:28:36.111+01:00 DIndex+: [changesFeed] Found 3 changes for channel apptables_mainUser111507_timeRecording
16:28:36.111154 2017-03-13T16:28:36.111+01:00 Changes+: Sending seq:530742 from channel apptables_mainUser111507_timeRecording
16:28:36.111190 2017-03-13T16:28:36.111+01:00 Changes+: Sending seq:530821 from channel apptables_mainUser111507_timeRecording
16:28:36.111387 2017-03-13T16:28:36.111+01:00 Changes+: Sending seq:5153971 from channel apptables_mainUser111507_timeRecording
16:28:36.126195 2017-03-13T16:28:36.126+01:00 Heartbeat: heartbeat written to _changes feed for request received (as mainUser111975)
16:28:36.186343 2017-03-13T16:28:36.186+01:00 Changes+: MultiChangesFeed sending &{Seq:530742 ID:mainUser111507__timeRecording__ORUW2ZKSMVRW64TENFXGOQTPN5VWS3THOM000000 Deleted:true Removed:{} Doc:map Changes:[map[rev:3-24082f79195178decab645d13caf2ba3] map[rev:2-0eafc69d566cf3e045cb6d670ca46793] map[rev:2-f4d1d4fc964800157a93c41ea8884dc5] map[rev:2-37abd7df3b43788684b00c04498fbed1] map[rev:2-7d0f2a706e84b66c064cae6c15a888a9]] Err: allRemoved:false branched:true} (to mainUser111507)
16:28:36.190428 2017-03-13T16:28:36.190+01:00 Changes+: MultiChangesFeed sending &{Seq:530742 ID:mainUser111507__timeRecording__ORUW2ZKSMVRW64TENFXGOQTPN5VWS3THOM000000 Deleted:true Removed:{} Doc:map Changes:[map[rev:3-24082f79195178decab645d13caf2ba3] map[rev:2-f4d1d4fc964800157a93c41ea8884dc5] map[rev:2-7d0f2a706e84b66c064cae6c15a888a9] map[rev:2-0eafc69d566cf3e045cb6d670ca46793] map[rev:2-37abd7df3b43788684b00c04498fbed1]] Err: allRemoved:false branched:true} (to mainUser111507)
16:28:36.338316 2017-03-13T16:28:36.338+01:00 Heartbeat: heartbeat written to _changes feed for request received (as mainUser111360)
16:28:36.423329 2017-03-13T16:28:36.423+01:00 Changes+: MultiChangesFeed sending &{Seq:530821 ID:mainUser111507__timeRecording__ORUW2ZKSMVRW64TENFXGORLNOBWG66LFMU000000 Deleted:true Removed:{} Doc:map Changes:[map[rev:2-68ddf874a53312ee1f8fa4740f519a45] map[rev:2-1c6e152883adb5c2e814c76ac5ca30ff] map[rev:2-5ca1f0d9577fc31ed1c6628272590d02] map[rev:2-12c56827cee739fbea9b9a42625a503e] map[rev:2-5f01c313907c69a594e35ab3ca366978]] Err: allRemoved:false branched:true} (to mainUser111507)
16:28:36.423497 2017-03-13T16:28:36.423+01:00 Changes+: MultiChangesFeed sending &{Seq:5126039 ID:_user/mainUser111507 Deleted:false Removed:{} Doc:map Changes: Err: allRemoved:false branched:false} (to mainUser111507)
16:28:36.427997 2017-03-13T16:28:36.427+01:00 Changes+: MultiChangesFeed sending &{Seq:530821 ID:mainUser111507__timeRecording__ORUW2ZKSMVRW64TENFXGORLNOBWG66LFMU000000 Deleted:true Removed:{} Doc:map Changes:[map[rev:2-68ddf874a53312ee1f8fa4740f519a45] map[rev:2-12c56827cee739fbea9b9a42625a503e] map[rev:2-1c6e152883adb5c2e814c76ac5ca30ff] map[rev:2-5ca1f0d9577fc31ed1c6628272590d02] map[rev:2-5f01c313907c69a594e35ab3ca366978]] Err: allRemoved:false branched:true} (to mainUser111507)
16:28:36.428105 2017-03-13T16:28:36.428+01:00 Changes+: MultiChangesFeed sending &{Seq:5126039 ID:_user/mainUser111507 Deleted:false Removed:{} Doc:map Changes: Err: allRemoved:false branched:false} (to mainUser111507)
16:28:36.447268 2017-03-13T16:28:36.447+01:00 Changes+: MultiChangesFeed sending &{Seq:5153971 ID:mainUser111507__timeRecording__MVWXA3DPPFSWK4Y0 Deleted:false Removed:{} Doc:map Changes:[map[rev:6-4fe67fa2f8af1519a49663ae574515ed] map[rev:2-1a28c22919356c6d59836bd2bb477df1] map[rev:2-a17fd85429c67c1d7f60cbf66834effb] map[rev:2-079f4a4e9c2afe10e91767407f4de11c] map[rev:2-0eb91891cc4386881608f4693c3fb867] map[rev:2-95201007eff5afdfa3e737832353336d] map[rev:2-528a28f85c2195bf6a992f050050d580] map[rev:2-d8ff2c7c181df31e378ab79787bd14a3] map[rev:2-cd4db053ac93ca297f3518546cefb396] map[rev:3-6a9cf05fa230038c9cf1f7045dab352a] map[rev:2-a5e67febfab26b111d5920815a281fa7] map[rev:2-84a774ba66bc85b72e4b441c5a9b444b] map[rev:2-4917e0b009114f4e23892dad1d1bdd9b] map[rev:2-2d61b59c0783b9ac5a759db977795d71] map[rev:2-fbf51c082c20c5fabfa46fc4b1e83348] map[rev:2-c07c1661792dd66aad763aaf7d11063a] map[rev:2-ab2d362042fd7080d8b30204e9586f58]] Err: allRemoved:false branched:true} (to mainUser111507)
16:28:36.447377 2017-03-13T16:28:36.447+01:00 Changes: MultiChangesFeed done (to mainUser111507)
16:28:36.447628 2017-03-13T16:28:36.447+01:00 HTTP+: #332784: → 200 OK (0.0 ms)
16:28:36.449656 2017-03-13T16:28:36.449+01:00 Changes+: MultiChangesFeed sending &{Seq:5153971 ID:mainUser111507__timeRecording__MVWXA3DPPFSWK4Y0 Deleted:false Removed:{} Doc:map Changes:[map[rev:6-4fe67fa2f8af1519a49663ae574515ed] map[rev:2-84a774ba66bc85b72e4b441c5a9b444b] map[rev:2-c07c1661792dd66aad763aaf7d11063a] map[rev:2-1a28c22919356c6d59836bd2bb477df1] map[rev:2-95201007eff5afdfa3e737832353336d] map[rev:2-d8ff2c7c181df31e378ab79787bd14a3] map[rev:3-6a9cf05fa230038c9cf1f7045dab352a] map[rev:2-a5e67febfab26b111d5920815a281fa7] map[rev:2-4917e0b009114f4e23892dad1d1bdd9b] map[rev:2-528a28f85c2195bf6a992f050050d580] map[rev:2-2d61b59c0783b9ac5a759db977795d71] map[rev:2-fbf51c082c20c5fabfa46fc4b1e83348] map[rev:2-a17fd85429c67c1d7f60cbf66834effb] map[rev:2-079f4a4e9c2afe10e91767407f4de11c] map[rev:2-ab2d362042fd7080d8b30204e9586f58] map[rev:2-cd4db053ac93ca297f3518546cefb396] map[rev:2-0eb91891cc4386881608f4693c3fb867]] Err: allRemoved:false branched:true} (to mainUser111507)
16:28:36.449736 2017-03-13T16:28:36.449+01:00 Changes: MultiChangesFeed done (to mainUser111507)

And after it

10:10:22.688520 2017-03-14T10:10:22.688+01:00 HTTP: #643580: GET /ag-dev/_changes?filter=sync_gateway/bychannel&channels=apptables_mainUser111507_timeRecording (ADMIN)
10:10:22.688822 2017-03-14T10:10:22.688+01:00 Changes+: Int sequence multi changes feed…
10:10:22.689221 2017-03-14T10:10:22.688+01:00 Changes: MultiChangesFeed({apptables_mainUser111507_timeRecording}, {Since:0 Limit:0 Conflicts:false IncludeDocs:false Wait:false Continuous:false Terminator:0xc876e40000 HeartbeatMs:0 TimeoutMs:300000 ActiveOnly:false}) …
10:10:22.692982 2017-03-14T10:10:22.692+01:00 Changes+: MultiChangesFeed: channels expand to channels.TimedSet{“apptables_mainUser111507_timeRecording”:channels.VbSequence{VbNo:(*uint16)(nil), Sequence:0x0}} …
10:10:22.693265 2017-03-14T10:10:22.693+01:00 Cache: getCachedChanges(“apptables_mainUser111507_timeRecording”, {0 0 0 0 %!d(string=) 0 0 0 %!d(string=)}) → 3 changes valid from #1
10:10:22.693600 2017-03-14T10:10:22.693+01:00 DIndex+: [changesFeed] Found 3 changes for channel apptables_mainUser111507_timeRecording
10:10:22.694242 2017-03-14T10:10:22.693+01:00 Changes+: Sending seq:530742 from channel apptables_mainUser111507_timeRecording
10:10:22.694307 2017-03-14T10:10:22.694+01:00 Changes+: Sending seq:530821 from channel apptables_mainUser111507_timeRecording
10:10:22.694716 2017-03-14T10:10:22.694+01:00 Changes+: Sending seq:5153971 from channel apptables_mainUser111507_timeRecording
10:10:22.694929 2017-03-14T10:10:22.694+01:00 Changes+: MultiChangesFeed sending &{Seq:530742 ID:mainUser111507__timeRecording__ORUW2ZKSMVRW64TENFXGOQTPN5VWS3THOM000000 Deleted:true Removed:{} Doc:map Changes:[map[rev:3-24082f79195178decab645d13caf2ba3]] Err: allRemoved:false branched:true}
10:10:22.695016 2017-03-14T10:10:22.694+01:00 Changes+: MultiChangesFeed sending &{Seq:530821 ID:mainUser111507__timeRecording__ORUW2ZKSMVRW64TENFXGORLNOBWG66LFMU000000 Deleted:true Removed:{} Doc:map Changes:[map[rev:2-68ddf874a53312ee1f8fa4740f519a45]] Err: allRemoved:false branched:true}
10:10:22.695130 2017-03-14T10:10:22.695+01:00 Changes+: MultiChangesFeed sending &{Seq:5153971 ID:mainUser111507__timeRecording__MVWXA3DPPFSWK4Y0 Deleted:false Removed:{} Doc:map Changes:[map[rev:6-4fe67fa2f8af1519a49663ae574515ed]] Err: allRemoved:false branched:true}
10:10:22.695163 2017-03-14T10:10:22.695+01:00 Changes: MultiChangesFeed done

Your first request is being run as ADMIN, and the second request is being run as mainUser111507. I think that’s the root cause of the difference you’re seeing.

And? Permissions are set to channels, not to documents and same documents should be in same channel. And if you see - second request to broken instance was done as admin. And still 3 records instead of 8.

What else I see on broken server which may be can useful - between initial request and response from couchbase there are the following messages:

16:28:20.953915 2017-03-13T16:28:20.953+01:00 Changes: Connection lost from client: (as mainUser111641)
16:28:20.954138 2017-03-13T16:28:20.954+01:00 HTTP+: #332451: → 200 OK (0.0 ms)
16:28:20.954341 2017-03-13T16:28:20.954+01:00 Changes+: Notifying to check for _changes feed termination
16:28:20.954667 2017-03-13T16:28:20.954+01:00 Changes: Connection lost from client: (as mainUser111641)
16:28:20.954731 2017-03-13T16:28:20.954+01:00 HTTP+: #332452: → 200 OK (0.0 ms)
16:28:20.955205 2017-03-13T16:28:20.955+01:00 Changes: Connection lost from client: (as mainUser111641)
16:28:20.955265 2017-03-13T16:28:20.955+01:00 HTTP+: #332450: → 200 OK (0.0 ms)
16:28:20.955278 2017-03-13T16:28:20.955+01:00 Changes+: Waiting for “ag-dev”'s count to pass 5633
16:28:20.955457 2017-03-13T16:28:20.955+01:00 Changes+: Waiting for “ag-dev”'s count to pass 5634
16:28:20.955583 2017-03-13T16:28:20.955+01:00 Changes+: Waiting for “ag-dev”'s count to pass 5634
16:28:20.955680 2017-03-13T16:28:20.955+01:00 Changes+: Waiting for “ag-dev”'s count to pass 5634
16:28:20.955766 2017-03-13T16:28:20.955+01:00 Changes+: Waiting for “ag-dev”'s count to pass 5634
16:28:20.955863 2017-03-13T16:28:20.955+01:00 Changes+: Waiting for “ag-dev”'s count to pass 5634
16:28:20.955936 2017-03-13T16:28:20.955+01:00 Changes+: Waiting for “ag-dev”'s count to pass 5634
16:28:20.956019 2017-03-13T16:28:20.956+01:00 Changes+: Waiting for “ag-dev”'s count to pass 5634
16:28:20.956082 2017-03-13T16:28:20.956+01:00 Changes: MultiChangesFeed done (to mainUser111641)
16:28:20.956195 2017-03-13T16:28:20.956+01:00 Changes: MultiChangesFeed done (to mainUser111641)
16:28:20.956308 2017-03-13T16:28:20.956+01:00 Changes+: Waiting for “ag-dev”'s count to pass 5634
16:28:20.956416 2017-03-13T16:28:20.956+01:00 Changes+: Waiting for “ag-dev”'s count to pass 5633
16:28:20.956489 2017-03-13T16:28:20.956+01:00 Changes+: Waiting for “ag-dev”'s count to pass 5634
16:28:20.956565 2017-03-13T16:28:20.956+01:00 Changes+: Waiting for “ag-dev”'s count to pass 5634
16:28:20.956651 2017-03-13T16:28:20.956+01:00 Changes+: Waiting for “ag-dev”'s count to pass 5633
16:28:20.956730 2017-03-13T16:28:20.956+01:00 Changes+: Waiting for “ag-dev”'s count to pass 5634
16:28:20.956781 2017-03-13T16:28:20.956+01:00 Changes+: Waiting for “ag-dev”'s count to pass 5634
16:28:20.956830 2017-03-13T16:28:20.956+01:00 Changes+: Waiting for “ag-dev”'s count to pass 5634
16:28:20.956945 2017-03-13T16:28:20.956+01:00 Changes+: Waiting for “ag-dev”'s count to pass 7562
16:28:20.957038 2017-03-13T16:28:20.957+01:00 Changes+: Waiting for “ag-dev”'s count to pass 5634
16:28:20.957117 2017-03-13T16:28:20.957+01:00 Changes+: Waiting for “ag-dev”'s count to pass 5634
16:28:20.957147 2017-03-13T16:28:20.957+01:00 Changes+: Notifying to check for _changes feed termination

Going back to your original comment:

When I call /_changes without filter - both instances return identical results

Is that when calling as the administrator, or as a user that only has access to the channel?

Do you have log output for those calls?

No problem, here are logs from server when it was in broken state:

10:31:11.551964 2017-03-14T10:31:11.551+01:00 HTTP: #649855: GET /ag-dev/_changes?since=5153970 (ADMIN)
10:31:11.552080 2017-03-14T10:31:11.552+01:00 Changes+: Int sequence multi changes feed…
10:31:11.552745 2017-03-14T10:31:11.552+01:00 Changes: MultiChangesFeed({}, {Since:5153970 Limit:0 Conflicts:false IncludeDocs:false Wait:false Continuous:false Terminator:0xc8363aa4e0 HeartbeatMs:0 TimeoutMs:300000 ActiveOnly:false}) …
10:31:11.553112 2017-03-14T10:31:11.553+01:00 Changes+: MultiChangesFeed: channels expand to channels.TimedSet{"
“:channels.VbSequence{VbNo:(uint16)(nil), Sequence:0x0}} …
10:31:11.553346 2017-03-14T10:31:11.553+01:00 Cache: getCachedChanges("
”, {1 0 0 5153970 %!d(string=) 0 0 0 %!d(string=)}) → 500 changes valid from #5310672
10:31:11.553411 2017-03-14T10:31:11.553+01:00 Cache: Querying ‘channels’ view for “" (start=#5153971, end=#5310672, limit=0)
10:31:13.775288 2017-03-14T10:31:13.775+01:00 Cache: Got 44098 rows from view for "
”: #5153971#5310672
10:31:13.775521 2017-03-14T10:31:13.775+01:00 changes_view: Query took 2.221948974s to return 44098 rows, options = db.Body{“stale”:false, “startkey”:interface {}{““, 0x4ea4b3}, “endkey”:[]interface {}{””, 0x5108d0}}
10:31:13.776220 2017-03-14T10:31:13.776+01:00 Cache: GetChangesInChannel(“*”) → 44597 rows
10:31:13.776253 2017-03-14T10:31:13.776+01:00 DIndex+: [changesFeed] Found 44597 changes for channel *
10:31:13.776498 2017-03-14T10:31:13.776+01:00 Changes+: Sending seq:5153971 from channel *
10:31:13.776549 2017-03-14T10:31:13.776+01:00 Changes+: Sending seq:5153972 from channel *
10:31:13.776646 2017-03-14T10:31:13.776+01:00 Changes+: MultiChangesFeed sending &{Seq:5153971 ID:mainUser111507__timeRecording__MVWXA3DPPFSWK4Y0 Deleted:false Removed:{} Doc:map Changes:[map[rev:6-4fe67fa2f8af1519a49663ae574515ed]] Err: allRemoved:false branched:true}
10:31:13.776905 2017-03-14T10:31:13.776+01:00 Changes+: Sending seq:5153973 from channel *
10:31:13.777052 2017-03-14T10:31:13.776+01:00 Changes+: MultiChangesFeed sending &{Seq:5153972 ID:mainUser111507__timeRecording__MJXW623JNZTXG000 Deleted:false Removed:{} Doc:map Changes:[map[rev:6-bcc20df480bae0fe8c67ca96804d4324]] Err: allRemoved:false branched:true}
10:31:13.777154 2017-03-14T10:31:13.777+01:00 Changes+: Sending seq:5153974 from channel *
10:31:13.777254 2017-03-14T10:31:13.777+01:00 Changes+: MultiChangesFeed sending &{Seq:5153973 ID:mainUser111507__timeRecording__OJSWC43PNZDG64SBMJZWK3TDMVZQ0000 Deleted:false Removed:{} Doc:map Changes:[map[rev:4-6dd17da4d72e3d57b6df5fe23d3ed3cb]] Err: allRemoved:false branched:false}
10:31:13.777311 2017-03-14T10:31:13.777+01:00 Changes+: MultiChangesFeed sending &{Seq:5153974 ID:mainUser111507__timeRecording__MFRHGZLOMNSXG000 Deleted:false Removed:{} Doc:map Changes:[map[rev:4-cb58b3c54ac3c595fee9e9a4a270ecba]] Err: allRemoved:false branched:false}

As I understand from documentation - user either have access to all documents in channel or don’t have access to all of them. So I’m not sure why you concentrate on user who made request. In such case admin made initial request via GET request (opening appropriate URL in browser) and mainUser111507 made POST requests via couchbase lite.

I’ve found one more channel where problem was reproducible and checked the logs. And scenario for both channels was identical:

  1. Working scenario:

10:10:13.927090 2017-03-14T10:10:13.927+01:00 HTTP: #71094: GET /ag-dev/_changes?filter=sync_gateway/bychannel&channels=apptables_mainUser111507_timeRecording (ADMIN)
10:10:13.929556 2017-03-14T10:10:13.929+01:00 Cache: Initialized cache for channel “apptables_mainUser111507_timeRecording” with options: &{ChannelCacheMinLength:50 ChannelCacheMaxLength:500 ChannelCacheAge:1m0s}
10:10:14.069901 2017-03-14T10:10:14.069+01:00 Cache: Got 8 rows from view for “apptables_mainUser111507_timeRecording”: #530593#5153974
10:10:14.069974 2017-03-14T10:10:14.069+01:00 Cache: Initialized cache of “apptables_mainUser111507_timeRecording” with 8 entries from view (#530593#5153974)
10:10:14.070032 2017-03-14T10:10:14.070+01:00 Cache: GetChangesInChannel(“apptables_mainUser111507_timeRecording”) → 8 rows
So as you see records related to channel found pretty fast (withing 1 sec)

  1. Broken scenario:

16:27:36.081647 2017-03-13T16:27:36.081+01:00 HTTP: #332784: POST /ag-dev/_changes?feed=normal&heartbeat=30000&style=all_docs&filter=sync_gateway%2Fbychannel (as mainUser111507)
16:27:36.090443 2017-03-13T16:27:36.090+01:00 Cache: Initialized cache for channel “apptables_mainUser111507_timeRecording” with options: &{ChannelCacheMinLength:50 ChannelCacheMaxLength:500 ChannelCacheAge:1m0s}
16:28:20.953915 2017-03-13T16:28:20.953+01:00 Changes: Connection lost from client: (as mainUser111641)
16:28:20.954341 2017-03-13T16:28:20.954+01:00 Changes+: Notifying to check for _changes feed termination
16:28:20.955457 2017-03-13T16:28:20.955+01:00 Changes+: Waiting for “ag-dev”'s count to pass 5634
16:28:36.110316 2017-03-13T16:28:36.110+01:00 changes_view: Query took 1m0.019498692s to return 3 rows, options = db.Body{“stale”:false, “startkey”:interface {}
16:28:36.110420 2017-03-13T16:28:36.110+01:00 Cache: GetChangesInChannel(“apptables_mainUser111507_timeRecording”) → 3 rows

So what do I see in broken scenario: request to channel is done with ChannelCacheAge:1m0s. During the minute there is a lost connection and in 1 minute request is terminated by sg and sg puts response (possibly with not all data) to cache and returns always same (with not all data) results if someone makes request to channel in future.

Also the following I’ve found in error logs:

_time=2017-03-13T16:28:36.109+01:00 _level=INFO _msg=go-couchbase: call to ViewCustom(“sync_gateway”, “channels”) in github.com/couchbase/sync_gateway/db.(*DatabaseContext).getChangesInChannelFromView took 1m0.0190908s
_time=2017-03-13T16:30:33.209+01:00 _level=INFO _msg=go-couchbase: call to ViewCustom(“sync_gateway”, “channels”) in github.com/couchbase/sync_gateway/db.(*DatabaseContext).getChangesInChannelFromView took 1m0.009040731s
_time=2017-03-13T16:35:32.537+01:00 _level=INFO _msg=go-couchbase: call to ViewCustom(“sync_gateway”, “channels”) in github.com/couchbase/sync_gateway/db.(*DatabaseContext).getChangesInChannelFromView took 6m23.335301153s

So what is the reason? not enough resources? To many views? To many users or channels? Do we have anywhere limitations on such parameters?

Thanks for the additional details. The fact that the changes request is timing out during channel cache initialization when this issue occurs looks suspicious. I’ve filed https://github.com/couchbase/sync_gateway/issues/2383 to investigate this further.

However, the fact that the view queries are taking more than a minute suggests that the system may be overloaded. How many documents are you expecting to be in this particular channel? What’s the sizing on the SG and Couchbase Server nodes?

I had several buckets in cluster. Also couchbase and sync gateway where running on same vms. I’m testing if same problems will happen in couchbase cluster will have with single bucket. Looks like issue is not reproducible yet. But problem with caching still should be solved - in case one of couchbase instances will be temporary unavailable it will be a problem.