Sync Failing with Large Channel Calculations


#1
  • Couchbase Lite Android, 1.4.X

We are seeing failures on our mobile clients during synchronization when targeting large numbers of channels. If a response does not come back in less than 10 seconds off of the _changes request, sync will fail repeatedly and never actually starts or completes. What is the solution to this?


#2

@ivan.lugo.cbl … what does the sync_gateway logs say when you do the pull replication?

You can change your log level to *(star) in sync gateway to see the whole conversation CBL and SG are having.

#curl localhost:4959/_logging -d ‘{"*":true}’ -H ‘Content-Type: application/json’ -X POST


#3

Do you see any timeout exceptions? Check out this somewhat related post for enabling logging .


#4

@househippo

Did you found any solution I am facing same thing without any channel.


#5

@pankaj.sharma

How many documents are on the _changes with no filter? hundreds , thousands or millions


#6

to be exact 52000 documents. with 4700 Images each ranging from 2 kb to 10 kb. total 21 types of documents.


#7

Thats not a lot of documents for the _changes to process.
Whats the longest time it takes to get a response back from _changes feed if you just hit the rest end point as a user through a browser?

Also in the _changes feed request do you see Couchbase server getting lots of keys GET()s?

if so you might want to leverage SG channel cache more.

 cache: {
        max_wait_pending: 0,
        max_num_pending: 0,
        max_wait_skipped: 0,
        enable_star_channel: false,
        channel_cache_max_length: 0,
        channel_cache_min_length: 0,
        channel_cache_expiry: 0
      },

channel_cache_max_length: (Default: 500)
Maximum number of entries maintained in cache per channel.
source: https://developer.couchbase.com/documentation/mobile/current/guides/sync-gateway/config-properties/index.html

Did you change the default timeout settings for the replicator on CBL? 10 seconds seems a little aggressive on the time out.


#8

Remember its a per channel cache so set the cache to the biggest channel length + 10/20%.


#9

In the bucket which I am most worried about doesnt have many channels. So I can even keep 100k in the cache.
But this

I will need to keep in mind in one of the bucket which will keep chat data between some 5000 users. The chat data will only be of few weeks. then we will purge that so the volume per channel there too wont be much. And of course it will always have continuous sync so I guess this optimisation will be the best thing.

in the following

I have used only channel_cache_max_length: and increased the value to 100k. I have read others meaning on the documentation but are still not clear to me. If possible can you let me know which ones will be best help in pull replication.

Another thing is right now I have 9 buckets but in production we will have only 4-5. Is it ok that all buckets are pointed to each sync gateway. And then load balance sync gateways using NGNIX or AWS elastic load balancer.
Another point is what if we have sync gateway running on docker which is behind Kubernetes, This can give us way more flexibility of populating new nodes.


#10

The SG channel cache also expires.

channel_cache_expiry: 60( seconds default)

test and see what works best for you.

People do that all the time to scale SG, remember SG is just a Golang web App and CB is your DB.