SG endless loop

The effect I’m getting is much the same as in Push sync works, pull sync does not, but the solution found there doesn’t apply to me, as I already use SG 1.2.1.

CB Server: 4.0.0-4051, Windows 10
CB Android: com.couchbase.lite:couchbase-lite-android:1.2.1-2
SG: 1.2.1 (4;26c202a)

Everything on localhost. Error occurs both in Android Emulator (API level 23) and a real device (Nexus 7, Android 5.1.1).

Scenario: I create a fresh cluster, start SG, add 3 views via a curl PUT on port 4985. SG configuration file (users will be created programmatically):

 {
     "log": ["Access", "Attach", "Auth", "Bucket", "Cache", "Cache+", "Changes", "Changes+", "CRUD",
     "CRUD+", "HTTP", "HTTP+", "Shadow", "Shadow+"],
     "adminInterface": "localhost:4985",
     "interface": ":4984",
     "databases": {
         "shoppingsync2": {
             "server": "http://localhost:8091",
             "sync": `function(doc) {
                 channel(doc.channels);
             }`,
             "users": {
                 "GUEST": {
                     "disabled": true
                 }
             }
         }
     }
 }

This is the SG log so far:

2016-07-20T15:46:10.631+02:00 Enabling logging: [Access Attach Auth Bucket Cache Cache+ Changes Changes+ CRUD CRUD+ HTTP HTTP+ Shadow Shadow+]
2016-07-20T15:46:10.636+02:00 ==== Couchbase Sync Gateway/1.2.1(4;26c202a) ====
2016-07-20T15:46:10.636+02:00 Opening db /shoppingsync2 as bucket "shoppingsync2", pool "default", server <http://localhost:8091>
2016-07-20T15:46:10.636+02:00 Opening Couchbase database shoppingsync2 on <http://localhost:8091>
2016/07/20 15:46:14  Trying with selected node 0
2016/07/20 15:46:14  Trying with http://127.0.0.1:8091/pools/default/bucketsStreaming/shoppingsync2
2016/07/20 15:46:14 Got new configuration for bucket shoppingsync2
2016-07-20T15:46:14.470+02:00 Bucket: PutDDoc("sync_gateway", ...) [87.0744ms]
2016/07/20 15:46:14  Trying with selected node 0
2016-07-20T15:46:14.638+02:00 Bucket: PutDDoc("sync_housekeeping", ...) [168.1157ms]
2016-07-20T15:46:14.644+02:00 Bucket: Incr("_sync:seq", 0, 0, 0) [5.0031ms]
2016-07-20T15:46:14.644+02:00 Bucket: Incr("_sync:seq", 0, 0, 0) [500.6µs]
2016-07-20T15:46:14.644+02:00 Cache: Initializing changes cache with options {ChannelCacheOptions:{ChannelCacheMinLength:0 ChannelCacheMaxLength:0 ChannelCacheAge:0} CachePendingSeqMaxWait:5s CachePendingSeqMaxNum:10000 CacheSkippedSeqMaxWait:1h0m0s}
2016-07-20T15:46:14.644+02:00 Bucket: StartTapFeed(...) [0]
2016-07-20T15:46:14.645+02:00 Bucket: Update("_sync:syncdata", 0, ...) --> <nil> [500.5µs]
2016-07-20T15:46:14.645+02:00 Bucket: Update("_sync:user:", 0, ...) --> CAS quit [499.8µs]
2016-07-20T15:46:14.645+02:00     Reset guest user to config
2016-07-20T15:46:14.645+02:00 WARNING: The GUEST user is the only user defined in the 'shoppingsync2' database, but is either disabled or has no access to any channels.  This means that you will not be able to get useful data out of the sync gateway over the standard port.  FIX: enable and/or grant access to the GUEST user to channels via the admin_channels parameter. -- rest.emitAccessRelatedWarnings() at server_context.go:1123
2016-07-20T15:46:14.645+02:00 Starting admin server on localhost:4985
2016-07-20T15:46:14.663+02:00 Starting server on :4984 ...
2016-07-20T15:46:14.665+02:00 Shadow: Watching doc changes...

Now I freshly install my app on the Android device. It creates a user via a PUT on 4985. The user afterwards looks like this:

{
  "name": "user1",
  "all_channels": {
    "!": 1
  },
  "sequence": 2,
  "passwordhash_bcrypt": "JDJhJDEwJFp2UGJTWDFPRlMvSko2OFJrcjRxM2VjMC5tYVBVeUJNS1RJTUVsdDdQUDVJUHRhc3hrM0tL",
  "rolesSince": {}
}

Then the Android app fills some 250 records into CBLite, which are nicely synchronized to the SG bucket on the server. A change on a single document also gets synchronized.

18 Seconds after the last action on the Android device (with the app still open and the device still active), SG runs into that endless loop. These are the log lines around this event - the looping starts at 15:54:44:

2016-07-20T15:54:26.186+02:00 Cache: Received #255 after 811ms ("b546a575-6409-4b2a-a9f8-56c8c2be2579" / "2-b61aac385ed340eec194241947860445")
2016-07-20T15:54:26.186+02:00 Cache:     #255 ==> channel "user1"
2016-07-20T15:54:26.186+02:00 Cache:     #255 ==> channel "*"
2016-07-20T15:54:26.186+02:00 Changes+: Notifying that "shoppingsync2" changed (keys="{*, user1}") count=36
2016-07-20T15:54:44.061+02:00 Changes: Connection lost from client:  (as user1)
2016-07-20T15:54:44.061+02:00 HTTP+: #011:     --> 200 OK  (0.0 ms)
2016-07-20T15:54:44.061+02:00 Changes+: Notifying to check for _changes feed termination
2016-07-20T15:54:44.062+02:00 Changes: MultiChangesFeed done   (to user1)
2016-07-20T15:54:44.067+02:00 Bucket: Update("_sync:user:user1", 0, ...) --> CAS quit [499.7µs]
2016-07-20T15:54:44.067+02:00 HTTP:  #054: POST /shoppingsync2/_changes  (as user1)
2016-07-20T15:54:44.067+02:00 Changes+: Int sequence multi changes feed...
2016-07-20T15:54:44.067+02:00 Changes: MultiChangesFeed({user1}, {Since:2 Limit:50 Conflicts:true IncludeDocs:false Wait:true Continuous:false Terminator:0xc0829ba360 HeartbeatMs:300000 TimeoutMs:300000 ActiveOnly:false}) ...   (to user1)
2016-07-20T15:54:44.067+02:00 Changes+: MultiChangesFeed: channels expand to channels.TimedSet{} ...   (to user1)
2016-07-20T15:54:44.067+02:00 Changes+: MultiChangesFeed waiting...   (to user1)
2016-07-20T15:54:44.067+02:00 Changes+: Waiting for "shoppingsync2"'s count to pass 36
2016-07-20T15:54:44.067+02:00 Changes: MultiChangesFeed done   (to user1)

The last 11 lines repeat at very high speed, endlessly.

And this is the Android log at the same time:

07-20 13:54:24.587 2576-2972/de.tamatzner.android.shoppinglistextended W/Sync: PusherInternal{http://192.168.178.20:4984/shoppingsync2, push, 0d0a7}: removePending: sequence 253 not in set, for rev {b546a575-6409-4b2a-a9f8-56c8c2be2579 #2-b61aac385ed340eec194241947860445}
07-20 13:54:43.275 2576-2981/de.tamatzner.android.shoppinglistextended W/ChangeTracker: ChangeTracker{http://192.168.178.20:4984/shoppingsync2, LongPoll}: Exception in change tracker
                                                                                        java.net.SocketTimeoutException
                                                                                            at java.net.PlainSocketImpl.read(PlainSocketImpl.java:484)
                                                                                            at java.net.PlainSocketImpl.-wrap0(PlainSocketImpl.java)
                                                                                            at java.net.PlainSocketImpl$PlainSocketInputStream.read(PlainSocketImpl.java:237)
                                                                                            at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:108)
                                                                                            at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:196)
                                                                                            at org.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:225)
                                                                                            at org.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:188)
                                                                                            at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:160)
                                                                                            at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:180)
                                                                                            at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:183)
                                                                                            at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:194)
                                                                                            at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:153)
                                                                                            at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:140)
                                                                                            at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.loadMore(UTF8StreamJsonParser.java:180)
                                                                                            at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd2(UTF8StreamJsonParser.java:2777)
                                                                                            at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._skipWSOrEnd(UTF8StreamJsonParser.java:2772)
                                                                                            at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:652)
                                                                                            at com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:443)
                                                                                            at com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringMap(MapDeserializer.java:473)
                                                                                            at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:330)
                                                                                            at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:25)
                                                                                            at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3560)
                                                                                            at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2646)
                                                                                            at com.couchbase.lite.replicator.ChangeTracker.runLoop(ChangeTracker.java:361)
                                                                                            at com.couchbase.lite.replicator.ChangeTracker.run(ChangeTracker.java:231)
                                                                                            at java.lang.Thread.run(Thread.java:818)

(Up to that point, no change to the data on the server has been made via the SG REST API.)

I understand from other posts that I shouldn’t worry about the socket timeout exception. I am migrating this app from CB 3, where syncing runs very stable. I have no idea where to search for the cause of this. Any help greatly appreciated.

Based on this log message, this looks like a case of the Android client dropping connection, and then immediately attempting to reconnect. If you’re seeing this log entry multiple times in a loop, it looks to me like the client is dropping the connection - that log entry occurs when Sync Gateway detects a half-closed connection (i.e. the client has disconnected).

@hideki Do you know of any scenarios that would cause an Android _changes request to be cancelled quickly? Possibly incorrect timeout settings?

I believe no changes occurs on sync-gateway also heart-beat is not thrown for a while (5min). The Socket connection is disconnected by timeout. Because it is not permanent error, so client re-connect to sync-gateway. I don’t know why heart-beat fallback does not work.

@tam5 From the second SG logs provided, it looks like there’s about 18s between iterations. That sounds different than the ‘at very high speed’ issue you describe. How often are you seeing those 11 lines repeated in practice?

Sorry I wasn’t quite clear on the error logs.

The Android log I submitted ended here, i.e. there were no more log entries after the stack trace.

The SG log, however, shows those same lines around 70 times per second, like this:

2016-07-20T15:54:44.061+02:00 Changes: Connection lost from client:  (as user1)
2016-07-20T15:54:44.061+02:00 HTTP+: #011:     --> 200 OK  (0.0 ms)
2016-07-20T15:54:44.061+02:00 Changes+: Notifying to check for _changes feed termination
2016-07-20T15:54:44.062+02:00 Changes: MultiChangesFeed done   (to user1)
2016-07-20T15:54:44.067+02:00 Bucket: Update("_sync:user:user1", 0, ...) --> CAS quit [499.7µs]
2016-07-20T15:54:44.067+02:00 HTTP:  #054: POST /shoppingsync2/_changes  (as user1)
2016-07-20T15:54:44.067+02:00 Changes+: Int sequence multi changes feed...
2016-07-20T15:54:44.067+02:00 Changes: MultiChangesFeed({user1}, {Since:2 Limit:50 Conflicts:true IncludeDocs:false Wait:true Continuous:false Terminator:0xc0829ba360 HeartbeatMs:300000 TimeoutMs:300000 ActiveOnly:false}) ...   (to user1)
2016-07-20T15:54:44.067+02:00 Changes+: MultiChangesFeed: channels expand to channels.TimedSet{} ...   (to user1)
2016-07-20T15:54:44.067+02:00 Changes+: MultiChangesFeed waiting...   (to user1)
2016-07-20T15:54:44.067+02:00 Changes+: Waiting for "shoppingsync2"'s count to pass 36
2016-07-20T15:54:44.067+02:00 Changes: MultiChangesFeed done   (to user1)
2016-07-20T15:54:44.067+02:00 HTTP+: #054:     --> 200 OK  (0.0 ms)
2016-07-20T15:54:44.072+02:00 Bucket: Update("_sync:user:user1", 0, ...) --> CAS quit [0]
2016-07-20T15:54:44.072+02:00 HTTP:  #055: POST /shoppingsync2/_changes  (as user1)
2016-07-20T15:54:44.072+02:00 Changes+: Int sequence multi changes feed...
2016-07-20T15:54:44.072+02:00 Changes: MultiChangesFeed({user1}, {Since:2 Limit:50 Conflicts:true IncludeDocs:false Wait:true Continuous:false Terminator:0xc0829a6cc0 HeartbeatMs:300000 TimeoutMs:300000 ActiveOnly:false}) ...   (to user1)
2016-07-20T15:54:44.072+02:00 Changes+: MultiChangesFeed: channels expand to channels.TimedSet{} ...   (to user1)
2016-07-20T15:54:44.072+02:00 Changes+: MultiChangesFeed waiting...   (to user1)
2016-07-20T15:54:44.072+02:00 Changes+: Waiting for "shoppingsync2"'s count to pass 36
2016-07-20T15:54:44.072+02:00 Changes: MultiChangesFeed done   (to user1)
2016-07-20T15:54:44.072+02:00 HTTP+: #055:     --> 200 OK  (0.0 ms)
2016-07-20T15:54:44.098+02:00 Bucket: Update("_sync:user:user1", 0, ...) --> CAS quit [0]
2016-07-20T15:54:44.099+02:00 HTTP:  #056: POST /shoppingsync2/_changes  (as user1)
2016-07-20T15:54:44.099+02:00 Changes+: Int sequence multi changes feed...
2016-07-20T15:54:44.099+02:00 Changes: MultiChangesFeed({user1}, {Since:2 Limit:50 Conflicts:true IncludeDocs:false Wait:true Continuous:false Terminator:0xc0829a7080 HeartbeatMs:300000 TimeoutMs:300000 ActiveOnly:false}) ...   (to user1)
2016-07-20T15:54:44.099+02:00 Changes+: MultiChangesFeed: channels expand to channels.TimedSet{} ...   (to user1)
2016-07-20T15:54:44.099+02:00 Changes+: MultiChangesFeed waiting...   (to user1)
2016-07-20T15:54:44.099+02:00 Changes+: Waiting for "shoppingsync2"'s count to pass 36
2016-07-20T15:54:44.099+02:00 Changes: MultiChangesFeed done   (to user1)
2016-07-20T15:54:44.099+02:00 HTTP+: #056:     --> 200 OK  (0.0 ms)
2016-07-20T15:54:44.112+02:00 Bucket: Update("_sync:user:user1", 0, ...) --> CAS quit [0]
2016-07-20T15:54:44.112+02:00 HTTP:  #057: POST /shoppingsync2/_changes  (as user1)
2016-07-20T15:54:44.112+02:00 Changes+: Int sequence multi changes feed...
2016-07-20T15:54:44.112+02:00 Changes: MultiChangesFeed({user1}, {Since:2 Limit:50 Conflicts:true IncludeDocs:false Wait:true Continuous:false Terminator:0xc082c19740 HeartbeatMs:300000 TimeoutMs:300000 ActiveOnly:false}) ...   (to user1)

So, when I killed the Android device after 10 seconds, I had around 7000 lines.

@tam5 That’s helpful. It might be even better to get the full SG logs posted to a gist, so that we can see exactly what’s going on.

I only see one instance of “Changes: Connection lost from client: (as user1)” in this excerpt - for the rest of the _changes requests, it looks like Sync Gateway is breaking out of wait mode immediately without a connection issue.

I’ve filed https://github.com/couchbase/sync_gateway/issues/1999 to dig into this further - if you can add the full SG logs there, along with any additional details on the exact steps you’ve taken (for user creation, any documents being written, etc).

Did so :slight_smile:

Thanks for your effort!

Just for anyone coming around: The root cause was that my SG user registration process was broken, so the Android user didn’t have access to the channel it tried to synchronize.