My apologies I should have been more detailed about the compression question …
Q: I tried to to zip a generated backup file, however no size reductions… when I opened it, I saw binary data + json strings. Dya know of any ZIP algorithm, tool than can reduce size of such files? I think it should be sensitive to the text segments of the file If not, we should completely ignore the native backup tool and develop a streamed view results -> CSV tool of our own.
JM: As data is added/mutated in Couchbase the document is Snappy compressed in the drain queue. In other words, as we read documents from the cache and persist to disk we compress them. As a result, I wouldn’t expect you to see a lot of improvements by then using zip to compress further. This is also why you only see binary data as this is in a compressed format.
Food for thought … The vast majority of customers leverage cross data center replication (aka XDCR) as a online backup. This provides fully redundant copy of the cluster. You also have the ability to halt replication at a point in time and use the backup utilities on the second cluster for point in time scenarios. You could then re-enable XDCR to re-synch the two clusters all while taking full application load. This might be an alternative to writing your own utility.