Performance issues with GROUP BY on indexed field


#1

Hi all,

I have read a lot of good things about Couchbase and would like to migrate our project from MongoDB to Couchbase. To get started I imported a single Collection (table in MongoDB) into a bucket in Couchbase.

The collection has these metrics:
Number of documents: 6884
Size total: 199,5MiB
Average Document size: 29,7KiB

I tried to compare a typical operation between MongoDB and CouchDB to group documents by a single field and count documents per distinct field entry. The field has a index on MongoDB and CouchDB side.

MongoDB:
Aggregation pipeline consists of Group stage also generating the count, a sort stage sorting by count DESC and an output stage to write results in a temporary collection

Timing: 0.300 - 0.600ms

CouchDB:
Query used:
SELECT destination, Count(*) As count FROM Mice GROUP BY destination ORDER BY count DESC;

Timing: 2.9 - 3.0secs

That is 5-10 times the amount of time it takes in MongoDB.

I created the index for “destination” field like this:
CREATE INDEX idx_destination ON data(destination)

Is there anything WRONG with my attempt?

How can I speed up the grouping?

Thanks for your help.

Andreas Kroll


#2

Checkout 5.5 EE for https://blog.couchbase.com/understanding-index-grouping-aggregation-couchbase-n1ql-query/


#3

please check the EXPLAIN of the query. I don’t think it will use the index. You need to add “destination is not missing” for it to pick up the index.

e.g.
SELECT destination, Count(*) As count FROM data where destination is not missing GROUP BY destination ORDER BY count DESC


#4

You are completely right!

The explain did not show the usage of the index, and I hat to put in the where destination is not missing for it to work.

What is the reason for having to use this clause?

Thanks for your time.

Andreas


#5

Couchbase bucket can have any type of documents. Secondary indexes will not index the document when leading index key is MISSING in the document.
Without WHERE clause on leading index key optimizer will not pick that index because it will not qualify.

Checkout Design Index for query in https://blog.couchbase.com/a-guide-to-n1ql-features-in-couchbase-5-5-special-edition/


#6

@vsr1 gave the short answer. That’s essentially right.

As you’ve seen, In couchbase all the different collections (types of docs) are stored in a single bucket.
When you index on a bucket, the leading key/keys could be missing on non-related documents. Index could get bloated as well…