Syncing only recent data from server to couchbaselite

Hi, we have an application that has a lot of historical data for users. If they login we don’t want them having to sync down 100’s of MBs of data. To avoid this we would like to only sync the last couple weeks of data for documents that are time-sensitive, and then also sync docs that have no time-based component (like a user doc). In our current application we have Couchbase Lite 1.4.1 and Sync Gateway Community 1.3.1, so we use channels as the filter on our pull replicator to filter data in this way. This means some users have a channel list of 800 channels or more though, and I’m not sure if this can cause performance issues since we see sync gateway use a lot of CPU sometimes.

In couchbase lite 2.8 and sync gateway 2.8 is there a better way to filter synced data so we can still sync some documents based on timestamps and other documents regardless of time?

There are 3 ways to do replication filters:

  • Document IDs
  • Sync Gateway Channel Names
  • Custom function

Having 800 channels is not going to scale. You can try filter with custom function instead, if you have some information about timestamp in your database.

ReplicatorConfiguration config = new ReplicatorConfiguration(database, target);
config.setPullFilter((doc, flags) → { Validate Timestamp Here});

More on using the filter here: Data Sync using Sync Gateway | Couchbase Docs. When using custom function, the filter will be done on the client side. This means that SG will send all the updates but SG will do less work to perform the filter.

If you still want to use channels, can you give more details on how you implemented it?

Our example is users are creating large docs every few days, so let’s say 3 a week. A large account has access to the data of hundreds of users, so for one week that could be 300 documents. For all-time that could be tens of thousands of documents. We currently use channels and channel filtering on the client to handle this. (sync gateway 1.3.1, couchbase lite 1.4.1). Every doc is assigned a channel based on the userId of the user that owns it and the week it was created in. When a user logs in we only include the channels for the past week, the current week, and the next week, so that they aren’t downloading thousands of documents that could be months old. Of course this is for the case where a user has been using the app for a long time and then needs to log out and log back in or logs in on a new device for the first time. So we’re only trying to limit the initial download of data, we don’t care about it building up over time.

I don’t think a custom function could work for time-based document filtering, since based on the documentation of it a document is run once through the filter function when it is created, and if the filtering is done on the client side then won’t the client still download thousands of documents then filter them? So the initial login for a large client would be really slow?

Alternately we would love to be able to dynamically query or replicate docs from the database based on what the client is viewing. Our clients are actually only viewing the data for a single week and user at a time, which is a very small amount of data. Is there anyway with sync gateway 2.8 and couchbaselite 2.8 that we could replicate or download data based on a N1QL query or a single channel without restarting the entire replicator?

Typical pattern is to implement logic on server side to move the historical documents to an archived channels on a periodic basis. Clients only download content in active channel. Filtering doesn’t necessarily reduce transfer load - documents are still downloading the content but they are getting rejected on client side based on filter criteria.

Thanks! I could see a script running once a week on the server to move older documents to a new archived channel working well for us.