How to optimize query performance which aggregate millions of rows

ptulgaa · April 23, 2017, 2:47pm

Hi,
please help me following case but I think its very common.

we have half million of documents like

{
    "amount": 19.949364089383447,
    "create_date": "2017-04-23T17:09:37.6824921+08:00",
    "id": "stockmove_0001",
    "location_dest_id": "wlocation_0003",
    "location_id": "wlocation_0003",
    "model": "stockmove",
    "move_type": "income",
    "product_id": "27",
    "res_type": "",
    "version": "0.1"
  }

And using next n1ql query

select count(*) rows, sum(amount) sum from bucket where model = 'move'

to aggregate result which takes over 30 seconds.
But in production case its still incomplete, it has forecast to grow up 2 millions of rows
The bucket has default indexing,

Please suggest modeling solution, more indexing option or whatever missing.

vsr1 · April 23, 2017, 3:05pm

CREATE INDEX ix1 ON bucket(model,amount);
SELECT COUNT(1) rows, SUM(amount) sum FROM bucket WHERE model ='move';

Use EXPLIN and check if the query using covering index ix1

ptulgaa · April 24, 2017, 4:40pm

Hi @vsr1
thank you very much

index did the trick
result is acceptable range

The question after the index creation is, all index related data’s are missing
Does couchbase omits data when creating index?

thanks again

keshav_m · April 24, 2017, 5:23pm

Couchbase index is maintained asynchronously.

The index is maintained asynchronously. If you’re continously changing data, the index will be bit behind.
You can use consistency options: REQUEST_PLUS, AT_PLUS or Unbounded to set the consistency per query.

See: https://www.youtube.com/watch?v=WvjYKO27Vdk