Couchbase XDCR - Change document expiration time on replication

andrewdjkilgore · March 1, 2014, 11:57pm

Hi

I am using Couchbase as a high write cache for events that are used for auditing purposes. I want to use XDCR to move the events into an offline / read only type cluster, that we can run data processing / views against, without affecting the primary write cluster.

On writing the documents to the primary cluster, they would have an expiration time of ~24-hours, however, I want to be able to modify this expiration time (e.g. 31-days) on replication.

I have also been reading through the docs to understand if the expiration of the item would be replicated (I wouldnt assume it is) - but want to confirm.

Any thoughts on how the above could be achieved?

Cheers,
Andrew

cihangirb · March 3, 2014, 6:24am

Hi Andrew, XDCR replicates the data as it appears on the source cluster. So XDCR will replicate the expiry as part of data and will delete it at the destination when it expires. If your intention is to keep the data, you may want to write the data without expiry but with that both clusters will keep the data around.
thanks
-cihan

andrewdjkilgore · March 3, 2014, 3:18pm

Thanks for the information.

If I remove the expiry time from the documents, and instead delete them in the live cluster once they have been replicated, will the delete operation also be replicated to the secondary cluster, removing the data there also?

Cheers
Andrew

cihangirb · March 3, 2014, 5:22pm

Yes, we replicate all operations including deletes. The main use-case for uni-directional XDCR is “hot spare” system, where destination server acts as a failsafe for the source cluster. So you will need to think of a custom solution here.

I am not sure how much you care about consistency of data but if you wanted to get creative, you could try is to use the aggregation system setup with a trick. The XDCR topology looks like this; A accumulates all data from many source clusters that produce none overlapping data: A <- B, A <- C, A<- D and so on… So keys are unique across all clusters and data on A is a full union of B, C, D and more.

Here is what you can try in your case;

Create bucket per day: ‘day1’, ‘day2’, ‘day3’ etc on source cluster. (per day because your expiration period is 24 hours).
In your app, add data to the new bucket every day and set up XDCR from the buckets ‘dayN’ to the destination bucket ‘all_days’. So on your second day, you are writing to day2 bucket and no longer writing to day1 bucket.
Once all data in day1 is replicated to ‘all_days’ destination bucket, delete the XDCR replication from ‘day1’ to ‘all_data’ - this won’t delete data from ‘all_day’ bucket, it will simply stop replicating from ‘day1’ bucket at that point. drop and recreate a new/empty day1 bucket.
this ensures all_data bucket continues to accumulate data while your source buckets (‘dayN’) gets cleaned up at a regular interval.

This is just an example and I am sure you can come up with other methods with similar effect. But make sure to test it end to end to ensure it does work in your case, as there are many assumptions I am making about your app: For example; this only works if key’s you are using are perpetually unique and you never reuse them between ‘dayN’ buckets. Or that you never update older data after a certain point (your expiring data won’t get updated but there may be other documents/values in your system without expiry etc etc… So use the technique with a lot of caution.

thanks
-cihan

andrewdjkilgore · March 7, 2014, 10:16pm

Thanks Cihan - this is useful