How to optimize query performance which aggregate millions of rows

Hi,
please help me following case but I think its very common.

we have half million of documents like

{
    "amount": 19.949364089383447,
    "create_date": "2017-04-23T17:09:37.6824921+08:00",
    "id": "stockmove_0001",
    "location_dest_id": "wlocation_0003",
    "location_id": "wlocation_0003",
    "model": "stockmove",
    "move_type": "income",
    "product_id": "27",
    "res_type": "",
    "version": "0.1"
  }

And using next n1ql query

select count(*) rows, sum(amount) sum from bucket where model = 'move'

to aggregate result which takes over 30 seconds.
But in production case its still incomplete, it has forecast to grow up 2 millions of rows
The bucket has default indexing,

Please suggest modeling solution, more indexing option or whatever missing.

CREATE INDEX ix1 ON bucket(model,amount);
SELECT COUNT(1) rows, SUM(amount) sum FROM bucket WHERE model ='move';

Use EXPLIN and check if the query using covering index ix1

Hi @vsr1
thank you very much


index did the trick
result is acceptable range

The question after the index creation is, all index related data’s are missing
Does couchbase omits data when creating index?

thanks again

1 Like

Couchbase index is maintained asynchronously.

The index is maintained asynchronously. If you’re continously changing data, the index will be bit behind.
You can use consistency options: REQUEST_PLUS, AT_PLUS or Unbounded to set the consistency per query.

See: https://www.youtube.com/watch?v=WvjYKO27Vdk