the limit & offset clause is always applied as the last operation in sequence by the optimizer. Giving for example a set of documents, with the structure as below:
- a primary index on id field
- and a secondary index on tag field
issuing the query:
select a.* from default a where tag =‘somevalue’ offset x, limit y
when x grows to a large value retrieval time increase proportionally to x value.
Cause: the optimizer choose to use the secondary index, then does a very fast index scan but try to retrieve all documents by key from the buckets, this until x record position is reached, which in this situation is not needed because the server can only scan the first x entries only from the index then starting fetching documents for y documents instead of fetching x+y documents. this overload the server heavily.
Workaround found: rewrite the query by joining the main query with a subquery that does only metadata fetch operations, something like this:
select a.* from default a
join (select meta(b).id from default b where tag = ‘somevalue’ offset x limit y) on keys (…)
This way the optimizer choose only to fetch keys values from index, (it’s covered) then generate a sequence of y keys starting from x position, then join the outer query with resulting key(s) sequence, and does bucket retrieval.
So, the number of documents fetched is always y and not x+y
Is there a way to make the optimizer recognize this situation by doing (in this case) only y document fetches starting from index key data at x value?
this will allow to avoid the trick above.