Serious performance issue due to number of documents pulled by Couchbase Lite

ronaldwhg · November 26, 2018, 8:31am

We are facing a serious issue where the pull replicator on our app takes up to 30 minutes to finish.

Background:

Our app is used to accept the online order from an eCommerce website.
The number of orders daily can reach up to thousands.
We have a function called ‘Make my app faster’ in the app which will purge old order documents that are older than 1 month.

The issue here is when the user log-out and log-in again. Since the pull replicator can only be filtered by channel it will pull all those documents since the beginning of time back to the device and it can take up to 30 minutes to complete.

We have tried to change the channel of those old documents from ‘order’ to ‘order_archive’ but still, those documents are being pulled to the app, although the body is empty they are still being pulled.

Is there any way for couchbase lite to really exclude those documents from being pulled at all?

Cheers,
RonaldWH

borrrden · November 26, 2018, 10:40pm

Are you sharing the same db file between multiple users? If so then why not make the local databases per-user. This way the same filter is used each time and there should be no need for it to restart from the beginning. In general, if you are often restarting sync from the beginning then something is off about the logic. What are you doing in the log in and log out logic?

ronaldwhg · November 27, 2018, 1:44am

Our app can only be used by 1 user at one time, so they will need to login to use it. Everytime a user is logged out, we will delete the local database to clear up space. When a user log in we will create a new local database in the device and link it to the pull replicator and set the channel based on the user id.

So after 1 month 1 user can have a lot of documents. And when this user log out and sign in again or if this user delete the app and reinstall it again and then log in, the app will create a new local dB, link it to the pull replicator and set the channel based on the user id and start the Pull. And it will pull all the documents again. Even though we have changed some of the documents’ channel.

Best regards,
RonaldWH

borrrden · November 27, 2018, 2:10am

Well if you delete the local DB every time then of course you will need to download things again. Exactly what you download is governed by various factors and I wouldn’t expect that you would download other documents.

If you make a request to your sync gateway, what does the changes feed show (see format below)?

http://(url):(port)/(dbname)/_changes?filter=sync_gateway%2Fbychannel&channels=(channel-name)

ronaldwhg · November 27, 2018, 2:45am

It return this (I masked the user name):

{“results”:[
{“seq”:174659,“id”:"_user/****",“changes”:}
],
“last_seq”:“174659”}

What does that mean?

As for the DB, is there any other better approach? We did try to change the channel of old documents to ‘channel_archive’ in the hope that it won’t be pulled, but it is still be pulled.

borrrden · November 27, 2018, 3:54am

That response indicates that there are no documents in that channel except for the one document which defines a user. Are you sure this is the correct channel?

mliu · December 6, 2018, 11:34pm

Trying to find a solution to same. Each user is replicating with a by_channel filter. (on 1.4.x iOS client)

I’m also attempting to archive obsolete docs by removing channel access. (verified by hitting sync_gateway _changes and _all_docs). Is it expected behavior for the an empty database to replicate docs a user no longer has access to and create them locally with empty bodies?

As I understand it, when the client initiates a pull replication with an empty database, it is setting the “active_only” flag on the _changes feed request to eliminate document removals from the channel. I do see that sync_gateway is hit with the changes request with the appropriate flag, but not understanding why the archived docs are being created locally with empty properties. e.g. database.documentCount is much bigger than what is expected.

In any case as a user’s access to documents is given and revoked, I expect there to be a strategy for a reset pull replication to be brief if the current number of documents active in the user’s channel is small even if the historical changes feed scales from thousands to hundreds of thousands. What is the recommended strategy here?