Cbtransfer (5.5 server) - Does it move documents or copy?

I’m learning how to use cbtransfer as I want to MOVE documents from one bucket to another based off their key value. When it is done I want all the documents with key ‘abc*’ MOVED from bucket A to bucket B. The documentation only mentions a ‘transfer’ which I don’t know if it is just doing a ‘copy’ (documents now live in both buckets) or a true ‘move’ (documents only live in the destination bucket now).

Thanks,
Aaron

Hi @AaronColby ,

I does appear the cbtransfer (a python script) performs only replication (I looked at the docs and peeked at the code), if you want to delete the document form the source bucket after the transfer you need to use another tool (N1QL query or use an SDK) . IMHO this is a “safety first” thing and probably considered a feature. Furthermore perhaps the tool should have been aptly named cbreplicate.

I will propose an alternative yet simple solution just use Eventing as a point tool, however to use it you need to be on Couchbase version 6.5 or above (6.6.2 is the latest).

Eventing Function: MoveDataByKeyPrefix

/*
Requires two bucket bindings or aliases
1. src_bkt is a r+w alias to the Function's source bucket
2. dst_bkt is a r+w alias to the Destination bucket
For more performance set the the # workers to the # cores
*/
function OnUpdate(doc, meta) {
    // filter out all keys without prefix "mid:"
    if (! meta.id.startsWith("abc")) return;
    // copy src to dst
    dst_bkt[meta.id] = doc;
    // remove from src
    delete src_bkt[meta.id]
}

I used 12 workers and deployed the above Function form Everything. I tested it against 10M documents on my single test node and moved the data from one bucket to the other at 30K-37k docs/second.

If you you want to use cbtransfer to move the data first you can later use Eventing to purge the old data form the source bucket. The purge speed across 10M docs was 57K-62K docs/sec.

Eventing Function: PurgeDataByKeyPrefix

/*
Requires one bucket binding or alias
1. src_bkt is a r+w alias to the Function's source bucket
For more performance set the the # workers to the # cores
*/
function OnUpdate(doc, meta) {
    // filter out all keys without prefix "mid:"
    if (! meta.id.startsWith("abc")) return;
    // remove from src
    delete src_bkt[meta.id]
}

Eventing uses JavaScript and as a first class programming language you can process massive amounts of data in parallel Couchbase without concern for infrastructure, etc… For more details on Eventing refer to:
https://docs.couchbase.com/server/current/eventing/eventing-overview.html
And for more Eventing examples:
https://docs.couchbase.com/server/current/eventing/eventing-examples.html#examples-scriptlets

Best

Jon Strabala
Principal Product Manager - Server‌

1 Like