CBL 2.5.x Continuous push replicator gives me warnings and errors, stops functioning

avia_bdg · July 8, 2019, 11:59pm

I am running a continuous push replicator on iOS using Couchbase Lite 2.5.2:
Couchbase Server: Community Edition 6.0.0 build 1693
Couchbase Gateway: Couchbase Sync Gateway/2.5.0(271;bf3ddf6) CE

After pushing some documents, after a short time it gives me:

2019-07-09 00:49:31.714373+0100 App.iOS[21336:648888] [69]| WARNING) [Network] {C4SocketImpl#1}==> litecore::repl::C4SocketImpl wss://sync.server.com:443/app/_blipsync @0x7f7e57e822e0

Then I get:

2019-07-09 00:49:31.730066+0100 App.iOS[21336:648888] [69]| WARNING) [Network] {C4SocketImpl#1} Unexpected or unclean socket disconnect! (reason=errno, code=54)

Then some of this:

2019-07-09 00:49:31.740771+0100 App.iOS[21336:648803] [70]| ERROR) [Replicator] {Repl#2}==> litecore::repl::Replicator /Users/user/Library/Developer/CoreSimulator/Devices/AC3CC977-6FC7-40B6-B7F1-68F2E3A1F1D5/data/Containers/Data/Application/5E2403AD-FDDC-45C7-BE68-B17A22440CF4/Documents/app_30bf0d3d47f146d4ad9ca2a67505dd08.cblite2/ ->wss://sync.server.com:443/app/_blipsync @0x7f7e57e8b1b8

And finally this, after which it’ll not function properly anymore:

2019-07-09 00:49:31.752448+0100 App.iOS[21336:648803] [70]| ERROR) [Replicator] {Repl#2} Got LiteCore error: POSIX error 54 “CouchbaseLiteException (POSIXDomain / 54): Connection reset by peer.”

I have mostly been using single shot replication and that doesn’t give me these errors, even on longer syncs. The logs on the server don’t show any errors or abnormalities.

Anyone got any ideas?

jens · July 9, 2019, 12:45am

When you see a “Connection reset by peer” error, the issue is usually on the other side, i.e. Sync Gateway. Did it log any errors?

avia_bdg · July 9, 2019, 7:58am

That is the thing, the logs on the server show nothing abnormal, no errors. It seems to be all on the client’s side.

jens · July 9, 2019, 4:19pm

Next thing to look at is middleware. What’s in between client and server? Sometimes these gateways will close sockets because they think they’re idle or because they’ve violated some rule they impose. (A few customers have reported problems with Azure killing their connections due to WebSocket messages larger than 4KB, for example.)

avia_bdg · July 10, 2019, 11:15am

We use HAProxy as reverse proxy in between. We also thought it might be that and tried to connect directly to the Gateway instead but couldn’t get it working before. We finally found what I was doing wrong connecting direct today, and tested it, and it worked without the issues. So it’s something in HAProxy. We’ve now increased the timeout from 350 to 500 and it seems to be doing better.

avia_bdg · July 10, 2019, 11:40am

The Gateway was restarted to enable debug logging. So it is possible that the problem will reoccur if it’s been up for a longer time if it was nothing to do with HAProxy.