Backup procedures

behrad · December 4, 2014, 2:04pm

I have multiple issues with backup in a deployment with around 1 billion records/month (600GB)

I didn’t understand the difference between diff/incr backup modes!?
the backup size is exactly the same as data size, is there any tool I can backup in binary/zip? or should I develop my own?
How can I backup in isolated weekly/monthly periods? so that I can manage/restore just data of a month? Couchbase cbbackup was unsuccessful to backup a database of 100 M records of size 50 GB.

justin1 · December 15, 2014, 2:30pm

Thanks for the question! Based on your questions this looks like you’re running 3.0.1 and testing incremental backups. Should you be running 2.x this isn’t an option.

I didn’t understand the difference between diff/incr backup modes!?
JM: There are ONLY three modes; full, diff, accu.
“full” is self explanatory. Complete dataset.
“diff” is changes since the last full or diff backup.
“accu” is all changes since the last full backup. Ignores the diff’s.
the backup size is exactly the same as data size, is there any tool I can backup in binary/zip? or should I develop my own?
JM: Nothing native is provided.
How can I backup in isolated weekly/monthly periods? so that I can manage/restore just data of a month? Couchbase cbbackup was unsuccessful to backup a database of 100 M records of size 50 GB.
JM: Run a “full” on day 1 of each month. Each day run “diff”. Each week run “accu”. This would provide granularity you’re after I believe as you could then restore the full plus the accumulative plus needed differentials to get to the desired restore point.
What was the error you received during the backup process? If a bug has been encountered we can get something filed here http://www.couchbase.com/issues/secure/Dashboard.jspa and/or perform a search to see if it’s a known issue.

behrad · December 15, 2014, 4:36pm

Thank you Justin for your nice answer, was really waiting to hear them.
I tried to to zip a generated backup file, however no size reductions… when I opened it, I saw binary data + json strings. Dya know of any ZIP algorithm, tool than can reduce size of such files? I think it should be sensitive to the text segments of the file
If not, we should completely ignore the native backup tool and develop a streamed view results -> CSV tool of our own.

justin1 · December 15, 2014, 7:07pm

My apologies I should have been more detailed about the compression question …

Q: I tried to to zip a generated backup file, however no size reductions… when I opened it, I saw binary data + json strings. Dya know of any ZIP algorithm, tool than can reduce size of such files? I think it should be sensitive to the text segments of the file If not, we should completely ignore the native backup tool and develop a streamed view results -> CSV tool of our own.
JM: As data is added/mutated in Couchbase the document is Snappy compressed in the drain queue. In other words, as we read documents from the cache and persist to disk we compress them. As a result, I wouldn’t expect you to see a lot of improvements by then using zip to compress further. This is also why you only see binary data as this is in a compressed format.

Food for thought … The vast majority of customers leverage cross data center replication (aka XDCR) as a online backup. This provides fully redundant copy of the cluster. You also have the ability to halt replication at a point in time and use the backup utilities on the second cluster for point in time scenarios. You could then re-enable XDCR to re-synch the two clusters all while taking full application load. This might be an alternative to writing your own utility.