How to delete multiple couchbase entities having common key pattern?

I have a use case where I have to remove a subset of entities stored in couchbase, e.g. removing all entities with keys starting with “pii_”. I am using NodeJS SDK but there is only one remove method which takes one key at a time: http://docs.couchbase.com/sdk-api/couchbase-node-client-2.0.0/Bucket.html#remove

In some cases thousands of entities need to be deleted and it takes very long time if I delete them one by one especially because I don’t keep list of keys in my application.

Hey @shashanks,

I believe you asked this question on StackOverflow, but it doesn’t hurt for me to explain it here as well.

The best way to accomplish this is to create a Couchbase view by key and then range query over that view via your NodeJS code, making deletes on the results.

For example, your Couchbase view could look like the following:

function(doc, meta) {
    emit(meta.id, null);
}

Then in your NodeJS code, you could have something that looks like this:

var couchbase = require('couchbase');
var ViewQuery = couchbase.ViewQuery;

var query = ViewQuery.from('designdoc', 'by_id');

query.range("pii_", "pii_" + "\u0000", false);

var myBucket = myCluster.openBucket();
myBucket.query(query, function(err, results) {
    for(i in results) {
        // Delete code in here
    }
});

Of course your Couchbase design document and view will be named differently than the example that I gave, but the important part is the ViewQuery.range function that was used.

All document ids prefixed with pii_ would be returned, in which case you can loop over them and start deleting.

For reference, the StackOverflow thread is as follows:

Best,

This means we get keys in bulk but again we are deleting all the entities one-by-one using a loop. If I have 10000+ entities then this for-loop will bombard couchbase server with those many parallel requests instead of sending all keys in a list and let couchbase delete all those internally. Won’t these many requests affect other requests which are being fired by application for other use cases? Can’t I DELETE using views?

Hey @shashanks,

You have a valid concern, however, with Couchbase Server operating as efficiently as it does and NodeJS being asynchronous, this isn’t really an issue.

Every delete you issue in NodeJS will be non-blocking so the application layer won’t lock up. When a delete request hits Couchbase Server, the document is then marked for deletion and is then later deleted when compaction happens.

You are not able to delete data via a view directly.

I wouldn’t be too worried on this, but let me know if you have further questions :smile:

Best,

How to achieve the same using C# .net ?

@jainendra1807 this is a really old discussion, but my guess is that this can be done now with DELETE in N1QL with the META() function.

Something like:

DELETE 
FROM `my-bucket`
WHERE SUBSTR(META(t).id, 0, 4) = 'pii_'

To elaborate on @matthew.groves answer.

I believe: Delete on server should be delete on server, rather than requiring three steps like get data from server, iterate over it on client side and finally for each record fire delete on the server again.

In this regards, I think old fashioned RDBMS were better all you need to do is ‘DELETE * from database where something=something’.

Fortunately, there is something similar to SQL is available in CouchBase called N1QL (pronounced nickle). I am not aware about JavaScript (and other language syntax) but this is how I did it in python.

Query to be used: DELETE from b where META(b).id LIKE “%”

layer_name_prefix = cb_layer_key + "|" + "%"
query = ""
try:
    query = N1QLQuery('DELETE from `test-feature` b where META(b).id LIKE $1', layer_name_prefix)
    cb.n1ql_query(query).execute()
except CouchbaseError, e:
    logger.exception(e)

To achieve the same thing: alternate query could be as below if you are storing ‘type’ and/or other meta data like ‘parent_id’.

DELETE from <bucket_name> where type=‘Feature’ and parent_id=8;

But I prefer to use first version of the query as it operates on key, and I believe Couchbase must have some internal indexes to operate/query faster on key (and other metadata).

I created stackoverflow answer here: http://stackoverflow.com/a/40250817/454023

1 Like