DELETE very slow for a ton of records

I have a bucket with ~150 million records. A little over 40 million of those qualify to be deleted based on the following query:

delete from reporting where date < date_add_str(now_str(), -45, "day");

Unfortunately this is VERY slow (running at about 100 deletes/sec). Is there a quicker way to accomplish this? Maybe N1QL isn’t the best route?

  1. DELETE small batches with LIMT(Issue multiple commands with different range of dates).
    delete from reporting where date < date_add_str(now_str(), -45, “day”) LIMIT 10000;
  2. Use covered SELECT query and get the document keys and DELETE them using SDK.
  3. Also set the TTL (time to live) KV automatically deletes it.

You can also set max_parallism with above options.

Also create index once done with it you can drop it.
CREATE INDEX ix1 ON reporting(date);

1 Like

I will try 1 and 2.

TTL hasn’t been working, which is why I am currently working through this. TTL on all documents is 45 days but we have documents ranging from Feb 2017 and up still lingering.

@vsr1 that index already exists. It still runs at about 100 deletes/sec with N1QL delete statement.

I am struggling to see how to do a bulk delete with the .NET SDK as well now. Is that functionality removed?

@jmorris, @ingenthr, can you please answer the .NET question? Thank you.

@rmendoza

Bulk delete in the .NET SDK can be done like this:

var documents = new List<IDocument<object>>
        {
            new Document<object>
            {
                Id = "InsertAsync_Batch_doc1",
                Content = new {Name = "bob", Species = "Cat", Age = 5}
            },
            new Document<object> {Id = "InsertAsync_Batch_doc2", Content = 10},
            new Document<object> {Id = "InsertAsync_Batch_doc3", Content = new Cat {Name = "Cleo", Age = 10}}
        };
 var results = await _bucket.RemoveAsync(documents).ConfigureAwait(false); 

Note, that only the Id is relevant for the deletion, so your query can just build a list of empty documents with the Id set to the key of the document you would like delete.

An alternative way to do this is by building a list of keys and then using it to create a list of Tasks to execute:

  var keys = new List<string>{"key1", "key2"..."nthkey"};
  var tasks = keys.Select(key =>_bucket.RemoveAsync(key));  
  var results = await Task.WhenAll(tasks).ConfigureAwait(false);

As @vsr1 noted, you’ll want to do this in smaller batches while paginating through the keys you wish to delete.

-Jeff

1 Like

In case you didn’t see this, another user had TTL issues, which received the following reply:

Thanks! I was getting hung up using List instead of List. Got it working!

1 Like