Changing Data Between Backup And Restore


I’m working on setting up a pre-production environment for pre-deployment testing, and we want to use a backup of the production Couchbase cluster for setup. Currently, we’re just restoring the backup of the cluster to the pre-production cluster as part of the spin-up process.

However, for security reasons we want to scramble some of the data, such as customer names, addresses, email address, etc. We don’t want real data like that in the pre-production environment. What would be the best way to accomplish this? I’d really like it scrambled before it gets restored to the new cluster, if possible. Some vague ideas I had were:

  1. Apply the scramble directly to the backup files (seems difficult)
  2. Restore the backup to a secure cluster first, then scramble, then backup again and restore to pre-production (slow)
  3. Maybe some fun trick with XDCR?

Any advice would be greatly appreciated!




All data is stored in K/V(JSON) in Couchbase. So as far as Couchbase Server is concerned the JSON is just a very long string of characters.
To change the JSON when you do the backup to remove sensitive data you would have to parse the data by hand. Couchbase does have a method to not backup certain key via a user defined REGEX on the key.

you can do a NQ1L Query(s) and ask for certain fields only and then write them to a file(s).
From their CBBackupManager can probably upload the data. Here is the link to docs to on the Backup Manager