Update entry values from upper case to lowercase

Hello,

In our development environment we have a couchbase server running on 3 servers. We have around 600 milion entries and we would like to change some entry values from upper case to lower case.

We could not find any way to do this other than exporting ALL the data to some local files, replace the upper case to lower case and reimport to couchbase.

This seams super stupid, but we could not find a way to extract all the keys ( so that we can query and replacy each value for each key) or to replace all the entry values in a single query.

Is there any way in which we could change the value text for each key to lower case?

That’s unbalivable that the option of changing the value in a database is to export the data, modified with some shell scritp and re-import it in the DB.

Is there anyone who could give us a hint of how to do this?

Thank you

@flaviu you could easily use the Eventing Service (an EE feature) to do what you want transform your data in place.

This transformation technique has a great deal of performance as Eventing is a distributed client for our Database Change Protocol (DCP) feed.

Assume you have a source bucket called “source” with some data (millions or a billion doc is absolutely fine).

For this example I will just seed two documents and I will seed them via the N1QL Query workbench.

UPSERT INTO `source` (KEY,VALUE)
VALUES ("mydoctype:1",{
  "field1": "ABC",
  "field2": "WOW",
  "fieldN": "More",
  "id": 1,
  "other": "ABC ",
  "type": "mydoctype"
} ),
 VALUES ("mydoctype:2",{
  "field1": "ABC",
  "field2": "WOW",
  "fieldN": "More",
  "id": 2,
  "other": "ABC",
  "type": "mydoctype"
} );

We can write and use an Eventing function (just deploy it from the feed boundary Everything):

  • The source bucket is aliased to src_bkt in read write mode.
  • Updates the source bucket thus requires 6.5+ to run
  • Will run faster with more workers the default is 3 if you have a lot of cores on your event node try 12 or 24.

Our Eventing Function:

function setTolower(parent,key) {
    if(!parent || !key) return false;
    var value = parent[key];
    if (!value || (typeof value !== 'string')) return false;
    var lower = value.toLowerCase();
    if (lower == value)  return false;
    parent[key] = lower;
    return true;
}

function OnUpdate(doc, meta) {
    if (doc.type !== "mydoctype") return;
    var updated = false;

    updated = setTolower(doc,'field1') || updated;
    updated = setTolower(doc,'field2') || updated;
    // more lowercase conversions as needed ...
    updated = setTolower(doc,'fieldN') || updated;

    if (updated) src_bkt[meta.id] = doc; 
}

The result after running is all documents are transformed (in this case 2):

KEY mydoctype:1
{
  "field1": "abc",
  "field2": "wow",
  "fieldN": "more",
  "id": 1,
  "other": "ABC ",
  "type": "mydoctype"
}

and

KEY mydoctype:2
{
  "field1": "abc",
  "field2": "wow",
  "fieldN": "more",
  "id": 2,
  "other": "ABC",
  "type": "mydoctype"
}

Okay so I tested the above against 100M docs on a smallish non-MDS 2GHz sever (with 12 workers) and I updated all 100M documents in 48 minutes. Obviously if you had a real production system with say 4 KV nodes and 2 eventing nodes this would be much faster.

For more details on Eventing refer to https://docs.couchbase.com/server/current/eventing/eventing-overview.html

Thank you for this comprehensive answer. The question was from 2019, so I have resolved it in the meantime ( dump+ string to lower + reimport)

But I really appreciate your effort in putting this here. I hope other people would find it useful