Best index for GROUP BY query?


#1

I have a documents in the following format:

{
“accountNumber”: “123” ,
“lastName”: “Smith” ,

}

I want to query the list of account numbers that have different lastName values for the same accountNumber using this query:

SELECT accountNumber FROM bucket GROUP BY accountNumber HAVING COUNT(DISTINCT lastName) > 1;

Without indexes I get the results back in ~30 seconds, but would like to index this query to run much faster.

  • Is there a good indexing strategy for GROUP BY clauses or are they primarily useful for WHERE clauses?
  • Is there an indexing strategy that would work best for this specific case or should I try a different approach?

I’m using the 4.1 dev preview for testing currently. Thanks.


#2

Indexing would help you if you had a WHERE clause.

One thing to try is:

CREATE INDEX idx ON mybucket(accountNumber, lastName);

EXPLAIN SELECT accountNumber
FROM mybucket
WHERE accountNumber IS NOT NULL
GROUP BY accountNumber
HAVING …;

Make sure you see cover() in the EXPLAIN output.