Very slow deletes

I am seeing extremely slow delete-by-prefix on a couchbase server:

[8be883] Slow query run for 26.92883515358 seconds {"statement":"DELETE FROM couch WHERE META(couch).id LIKE '5fl6rnt396|cache_form|%'","creds":[{"user":"local:couch","pass":"couch"}]}

The server only has 50,000 documents.

To my understanding operating on document keys in such a simple way should not need an index. Anyways, these are the indexes deployed to the server:

CREATE INDEX `id_ix` ON `couch`((meta().`id`))
CREATE PRIMARY INDEX `#primary` ON `couch`

Any help is appreciated, this has very bad performance impact on the Drupal Couchbase Module:

Unless query supplies document id through uses USE KEYS, it required index. DELTE requires mutation will be slower. Also depends on how may items qualifies the predicate.

You can make it faster on of the following.

  1. set max_parallelism and run the DELETE
  2. You can restrict predicate and use multiple DELTES on different range of documents.
  3. You can use same DELETE with LIMIT clause and run in multiple times until all of them DELTED.

Default max_parallelism is 1. If you set it higher, the operation will be parallelized and the statement will run faster.

You can also increase the pipeline-batch and pipeline-cap so the delete operations will happen in larger batches.

Hi,

Thanks for the advice. I am using PHP SDK, but was UNABLE to set the different configuration settings (max_parallelism, pipeline_batch, n1ql_timeout, etc…) at the query level.

It would be very convenient (this is probably a bug in the PHP SDK) to be able to set these parameters at the query level, ie:

$query = CouchbaseN1qlQuery::fromString(“DELETE FROM {$this->name} WHERE META({$this->name}).id LIKE "{$prefix}%"”);
$query->options[‘max_parallelism’] = ‘2’;
$query->options[‘n1ql_timeout’] = ‘3600’;

The problem of this only being available at the connection level is that I might not want all queries to share the same settings.

It is possible to specify it on query level for PHP. The options you pass to the query should be in format described in the documentation:
https://developer.couchbase.com/documentation/server/current/n1ql/n1ql-rest-api/index.html

The correct form would be:

$query = \Couchbase\N1qlQuery::fromString('DELETE FROM `default` WHERE META(`default`).id LIKE "prefix"');
$query->options['max_parallelism'] = '2';
$query->options['timeout'] = '3600ms';

Thanks for your help.

Actually, looks like some of these settings are set directly on the query object (undocumented?) such as:

$query->max_parallelism = 2;

But even with these changes, timeouts still happen (now the message is different, it now complains about an “Index Scan Timeout”).

I.e. this is the query that timeouts:

EXPLAIN DELETE FROM couch WHERE META(couch).id LIKE “www.aaa.com|cache_menu|admin_menu:%”;

{
    "requestID": "42bfae67-6d69-41e0-bfd4-1564deffa0fd",
    "signature": "json",
    "results": [
        {
            "plan": {
                "#operator": "Sequence",
                "~children": [
                    {
                        "#operator": "IndexScan",
                        "covers": [
                            "cover ((meta(`couch`).`id`))",
                            "cover ((meta(`couch`).`id`))"
                        ],
                        "index": "id_ix",
                        "index_id": "dca2f008bdb837aa",
                        "keyspace": "couch",
                        "namespace": "default",
                        "spans": [
                            {
                                "Range": {
                                    "High": [
                                        "\"www.aaa.com|cachf\""
                                    ],
                                    "Inclusion": 1,
                                    "Low": [
                                        "\"www.aaa.com|cache\""
                                    ]
                                }
                            }
                        ],
                        "using": "gsi"
                    },
                    {
                        "#operator": "Parallel",
                        "~child": {
                            "#operator": "Sequence",
                            "~children": [
                                {
                                    "#operator": "Filter",
                                    "condition": "(cover ((meta(`couch`).`id`)) like \"www.aaa.com|cache_menu|a
dmin_menu:%\")"
                                },
                                {
                                    "#operator": "SendDelete",
                                    "alias": "couch",
                                    "keyspace": "couch",
                                    "namespace": "default"
                                }
                            ]
                        }
                    },
                    {
                        "#operator": "Discard"
                    }
                ]
            },
            "text": "DELETE FROM couch WHERE META(couch).id LIKE \"www.aaa.com|cache_menu|admin_menu:%\""
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "20.0014ms",
        "executionTime": "20.0014ms",
        "resultCount": 1,
        "resultSize": 2263
    }
}   

The involved index is this one:

CREATE INDEX id_ixONcouch((meta().id))

The couchbase server has a dedicated 4CPU 8Gb RAM machine that is sitting idle all the time, with just a single multi-tenant bucket:

These timeouts just seem to make no sense to me, specially the Index Scan timeout. How can it timeout scanning 260,000 documents?

Looking at the Explain results, seems like first index scan is not filtering the complete prefix, but only part of it.

It is not guaranteed to work, and probably a bug that you can set it directly. You should use options attributed for that.

‘_’ is pattern matching character. Try escaping like below.

DELETE FROM couch WHERE META(couch).id LIKE "www.aaa.com|cache\\_menu|admin\\_menu:%";

Also try with LIMIT clause and repeat this until all of them deleted to avoid timeouts.

DELETE FROM couch WHERE META(couch).id LIKE "www.aaa.com|cache\\_menu|admin\\_menu:%" LIMIT 10000;