Improve performance: Remove an element from an array present from all the couchbase docs

I have a bucket with around 46k docs. Requirement is to delete a specific element from an array from the documents. I will not be having the list of document keys as the number of documents to update might vary from 10-46ooo

Currently doing it using N1QL query.
Removing 123456 from the array and keeping all the other elements as it is.

UPDATE Test AS d
SET d.children= ARRAY l FOR l IN d.children WHEN l.childId != “123456” END
WHERE ANY v IN d.children SATISFIES v.childId = “123456” END;

CREATE INDEX ix1 ON Test (DISTINCT ARRAY v.childId FOR v IN children END);

Is there a better way of achieving this?

Checkout eventing option similar to What is the fastest way to update several million Docs

1 Like

Hi, @krishnaa

As @vsr1 indicated Eventing is a good solution (and you don’t need an Index) if you provide one or two sample documents I will be able to quickly gin up an Eventing function for you to try.

Best

Jon Strabala

@jon.strabala
Please find the sample docs. Scenario for example : Trying to delete an element from Children array .i.e. childId=100 from all the couchbase docs so remaining elements within the children array will be as it is.

    {
    		"parentname": "Ian",
    		"email": "ian@gmail.com",
    		"children": [
    		    {
    				"childId": "100",
    				"Details": [{
    					"fname": "Abama",
    					"age": 16,
    					"gender": "F"
    				},
    				{
    					"fname": "Sophia",
    					"age": 18,
    					"gender": "F"
    				}]
    			},
    			{
    				"childId": "101",
    				"Details": [{
    					"fname": "Alex",
    					"age": 17,
    					"gender": "M"
    				}]
    			}
    		]
    	}

Hi @krishnaa,

Thanks for the sample document - it helps to see what a customer is actually working on.

I created an Eventing Function 9 lines (11 with comments) called “parent_child_cleanup”. Please note src_bkt is a binding or bucket alias to the mutation source in read+write mode.

function OnUpdate(doc, meta) {
    // Ignore documents without a children property, better to use a type or a key prefix
    if (!doc.children) return;
    var children = doc.children.filter(function(value) { return value.childId !== "100" });
    if (children.length !== doc.children.length) {
        // 1 or more items removed, so update the mutation source via the r+w alias src_bkt
        // log('key '+meta.id+' removed ' + ( doc.children.length - children.length ) + ' items where childId === "100"');
        doc.children = children;
        src_bkt[meta.id] = doc;
    }
}

On my single node test box I can process 45K docs/sec. on my test setup when I deploy from the feed boundary of Everything where every document is updated, if I undeploy and run the Eventing function again it is much faster 20X since there is no KV work to be done.

If you want to see things work just uncomment the line

// log('key '+meta.id+' removed ' + ( doc.children.length - children.length ) + ' items where childId === "100"');

Again in the Eventing world if your using pure KV there is no need for an Index (of course Eventing can also do N1QL if needed while processing a mutation for flexibility, however you will lose some performance).

Best

Jon Strabala

@Jon Strabala

Thank you very much for the reply .I am still going through the documents for data enrichment and Eventing function.
I am working on a scenario where the child Id to be deleted is coming from the external source (rabbit MQ messages ) and then delete that specific child Id from all the docs present in the Couchbase. So trying to understand on how to achieve using eventing.