N1QL Count() slow with WHERE

Hi

I’m running Couchbase Server 4.0 community and I’ve run into a performance problem.

Doing a simple N1QL query like this one:
SELECT COUNT(id) FROM default WHERE type='Location’
Is very slow (3 seconds) compared to doing this (0.05 seconds):
SELECT COUNT(*) FROM default

I figured it must be my index, so I created an index with id and type, but it had no effect at all.

I’ve been searching for several hours and I have no clue how to solve this. I hope someone can point me in the right direction.

Cheers

Actually all my WHERE statements seems to be slow.

I only have about 20.000 documents.

Hi @Heu,

These have been significantly improved in 4.1 and 4.5.

Yeah I just migrated everything to 4.5, which did help some queries, but i’m still getting slow results when using the WHERE statement.

Take this query for instance:
SELECT name FROM default WHERE
(CONTAINS(LOWER(name), ‘abc’) OR
CONTAINS(LOWER(address), ‘abc’)) AND
type=‘Location’ LIMIT 100

With this index:
CREATE INDEX test_index ON default(name,type,address) USING GSI

This takes around 2 seconds and doesn’t even hit the covered index.

I don’t know what i’m doing wrong, but i’m currently pulling my teeth out with stuff that should be straightforward.

I have plenty of memory and CPU power so that’s not it as well.

Hi @Heu,

Thanks for your patience as we go through this.

Please try the following query and indexes:

SELECT name
FROM default
WHERE LOWER(name) LIKE ‘abc%’ AND
type=‘Location’
UNION
SELECT name
FROM default
WHERE LOWER(address) LIKE ‘abc%’ AND
type=‘Location’
LIMIT 100
;

CREATE INDEX idx_name ON default( LOWER(name) ) WHERE type = ‘Location’;

CREATE INDEX idx_address ON default( LOWER(address), name ) WHERE type = ‘Location’;

Thanks for you fast reply.

It was indeed much faster, but it’s mostly because of the starts with ‘abc%’ syntax. Doing ‘%abc%’ is still slow.

Also the query seems very complex for something simple :confused:

%abc% is not as simple as it seems. It cannot effectively use a standard index. We have added support for indexing that in 4.5.1.

Hi, I am having performance issues on a similar scenario. I have a membase bucket with barely 2,000 documents and the following indexes:

CREATE INDEX id_ix ON couch((meta().id))

CREATE PRIMARY INDEX #primary ON couch

I want to “operate” on all items who’s key starts with a specific prefix using N1QL, so I’m running a query as such:

SELECT * FROM couch WHERE META(couch).id LIKE ‘local.pre%’ -> 160ms
SELECT COUNT(*) FROM couch WHERE META(couch).id LIKE ‘local.pre%’ -> 35ms

On our production servers, where the buckets are much more full, these operations take > 20seconds.

Using EXPLAIN shows that the indexes are being used, so I don’t really understand how can this be so slow.

This is affecting the Drupal Couchbase integration that I maintain: https://www.drupal.org/project/couchbasedrupal

[
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexScan”,
“covers”: [
“cover ((meta(couch).id))”,
“cover ((meta(couch).id))”
],
“index”: “id_ix”,
“index_id”: “321ddc05c6122a67”,
“keyspace”: “couch”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“local.prf”"
],
“Inclusion”: 1,
“Low”: [
"“local.pre”"
]
}
}
],
“using”: “gsi”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “(cover ((meta(couch).id)) like “local.pre%”)”
},
{
"#operator": “InitialGroup”,
“aggregates”: [
“count()"
],
“group_keys”: []
}
]
}
},
{
"#operator": “IntermediateGroup”,
“aggregates”: [
"count(
)”
],
“group_keys”: []
},
{
"#operator": “FinalGroup”,
“aggregates”: [
“count()"
],
“group_keys”: []
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: "count(
)”
}
]
},
{
"#operator": “FinalProject”
}
]
}
}
]
},
“text”: “SELECT COUNT(*) FROM couch WHERE META(couch).id LIKE ‘local.pre%’”
}
]

Hello David,
I’ve test you query using a Couchbase bucket of 200K documents and is very faster (6-10 ms). Im using 4.5 EE and MOI index.
Can you post the execution plan of the query:
SELECT * FROM couch WHERE META(couch).id LIKE ‘local.pre%’ ;
The EP posted is from SELECT COUNT(*) …

How many rows return a tipical query of this type? (SELECT * FROM couch WHERE META(couch).id LIKE ‘local.pre%’ ).

Couchbase versión?

Number of COUCHBASE nodes and distribution of Data/index/query engines?

Regards
Arturo

1 Like

But even a “COUNT(*)” is slow at 35ms.

This is couchbase 4.5.1 EE Anniversary Edition (windows). Single node on a local development computer, everything using defaults.

I’m running the queries through the web interface of the couchbase control panel.

Total documents ~1550, about 1500 start with the given prefix.

The execution plan for the select (not the count) is:

[
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexScan”,
“index”: “id_ix_2”,
“index_id”: “aec5d21b5ceb1bb5”,
“keyspace”: “couch”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“local.prf”"
],
“Inclusion”: 1,
“Low”: [
"“local.pre”"
]
}
}
],
“using”: “gsi”
},
{
"#operator": “Fetch”,
“keyspace”: “couch”,
“namespace”: “default”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “((meta(couch).id) like “local.pre%”)”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “self”,
“star”: true
}
]
},
{
"#operator": “FinalProject”
}
]
}
}
]
},
“text”: “SELECT * FROM couch WHERE META(couch).id LIKE ‘local.pre%’”
}
]

And this is a count() in production, with barely 50,000 total documents:

David,

Is this query will run with best performance before?
If true, the performance degradation would be for internal factors like:
Index fragmentation.
Can you try compact the index used, id_ix_2 and test the query again.

Also external factors like CPU, Disk I/O resources used by other applications.

Also take in account that the query is returning 1500 rows, then for every index entry need to fecth document info from the bucket at file level, then do 1500 I/O operations.

Regards
Arturo

I don’t know if this was running better before, and considering it behaves more or less in the same way in different servers, I’d say this is slow “by design”.

The scenarios here are really nothing big, can’t understand why is it so slow. I mean… we are using couchbase as a replacement for key value store (such as Redis). Redis won’t choke at millions of items, while Couchbase is slow on just 50,000?

Yes I understand that a query returning the 15,000 documents will take time to load all the data, but what about the count?

Indeed, the bootleneck was found while profiling a “delete”, we suddenly found out that everything was slow, including count queries (that explains why the delete itself was so slow).

There must be something wrong in the way we are indexing and querying this data, or a bug in couchbase that shows up with this specific scenario, otherwise the poor performance cannot be explained.

Hello David,
How many Cores has the machine and how much RAM is assiged at server and bucket level?
is virtual?

Regards
Arturo

This might sound a bit weird, but after a blind trial and error session looking for the latest changes in the code looks like the problem was the use of named parameters in a N1QL query. Using named parameters instead of the embeded strings in the query itself makes the query “probably” not use the indexes and makes the couchbase backend go on fire.

I can see that something similar to this situation was solved in 4.5.1:

http://developer.couchbase.com/documentation/server/current/introduction/whats-new.html

Optimization for Processing IN Operator with Parameters

In our particular case, the named parameter was used as input for a LIKE comparison:

$query = \CouchbaseN1qlQuery::fromString("DELETE FROM {$this->name} WHERE META({$this->name}).id LIKE \$prefix");
$query->namedParams(['prefix' => "{$prefix}%"]);

Yes, can you use 4.5.1.