CBLite 1.4.4 - iOS - Understanding Slow Pull Replication / Push Replication Timeout

andrew.mclean · June 24, 2019, 2:51pm

Client: iOS CouchbaseLite 1.4.4
Sync Gateway: 1.5.x
Network: Wifi - Down 178 Mbps / Up 5.74 Mbps (appx. one time speed test taken during testing)

Number of Docs in DB: 37000
File Size of sqlite store on device: 237mb

I have a sync gateway client that is syncing down a large database. To summarize I have 3 separate issues that I would like to discuss:

Initial replication is really really slow. This may be expected baseReplication document count progress reporting does not work.
Experiencing a timeout doing a subsequent push replication after finishing the long initial pull replication to import the large db onto the device

Observations:

Fresh install of application, login created empty CBL database on device.
Running an initial device sync (push then pull synchronously):
- Push (runs quickly as it should, there are no revisions in empty db)
- The 1st pull replication takes up to 5 minutes, pulling down about 37000 documents as stated in the replicator. The size of the sqlite3 file after the replication completes is about 200mb. Despite large dataset the replicator succeeds with no errors. It appears the replication makes many calls to _bulk_docs, each returning a subset of documents to import into the database. This seems to be why the replication takes so much longer for this large database.
- I then immediately run a push replication without making any local revisions on this same large database that has just been pull replicated, The push replication always times out, even though there are no new revisions to push. The replicator makes as series of calls to _revs_diffs but never finishes, it just times out.
- Try a pull replication again and it takes a while but does not time out, rather it succeeds.
- Make a local revision, push replicate, and it succeeds in a reasonable time frame.

Questions:

Is there anything obvious that can be done to optimize the import time for this large pull replication?

Your documentation does state that you should not block the UI when doing an initial import because it may take some time. We likely need to implement channels to only sync documents relevant to the user, not the entire account db.

I realize that we are on a very outdated version of CBLite, but the drastic API change between 1.x and 2.x will require several months of effort in order to upgrade. I am looking for a way resolve this so I can buy myself the time to do an upgrade.

Why would a push replication with no local revisions always time out, but only right after doing that large pull replication? Is this anything that’s been observed before? Error:
2019-06-21 09:45:31.200478-0600 SpexPro[2419:1985293] Task .<347> load failed with error Error Domain=NSURLErrorDomain Code=-1001 “The request timed out.” UserInfo={_kCFStreamErrorCodeKey=-2102, NSUnderlyingError=0x2814ea940 {Error Domain=kCFErrorDomainCFNetwork Code=-1001 “(null)” UserInfo={_kCFStreamErrorCodeKey=-2102, _kCFStreamErrorDomainKey=4}}, _NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask .<347>, _NSURLErrorRelatedURLSessionTaskErrorKey=(
), NSLocalizedDescription=The request timed out., NSErrorFailingURLStringKey=http://sg-1.<my_host>.net:4984/default/_revs_diff, NSErrorFailingURLKey=http://sg-1.tryspex.net:4984/default/_revs_diff, _kCFStreamErrorDomainKey=4} [-1001]

jens · June 24, 2019, 7:23pm

None of us on the engineering team have done anything significant with CBL 1.x in years, so our knowledge is pretty stale. If you’re a Couchbase customer you should file a support request; the SEs might be more knowledgeable. (However, 1.x is nearing EOL so I don’t know how much longer support will be available.)

CBL 1.x was not very fast. That said, pulling only ~120 docs/sec at 5KB/doc sounds too slow. There are many factors that could affect performance, especially the hardware capabilities on both the client and the server side, as well as the number of server nodes. Try using the Couchbase admin console to see if there are bottlenecks on the server side. On the client, try running the app on-device with Xcode attached and use the Instruments tool to profile.

jens · June 24, 2019, 7:25pm

As for the timeout error: If we take the error at face value, it implies the server may be bogged down and unable to handle the request on time. There could be something different going on, but that would imply a bug in CBL; I don’t recall there being such a bug, but it’s been quite a while since I worked on 1.x.

lightandshadow · July 8, 2019, 7:19pm

Did you kick off a continuous replication? Tearing down a one shot replication might have overhead that continuing an existing replication might avoid. I would think the initial replication should return a pointer of where sync left off, so the next push replication wouldn’t need to diff the entire set. I’ve seen problems using continuous replication on Android, but exclusively use continuous replication on iOS.

If you’re running pulls and pushes manually, could it be that the push replication is hitting the server before you get the last sequence / sync pointer back from the initial pull? Kicking off an asynchronous push might start the process before all of the sync housekeeping is done on the server and sent back to the client.