N1QL Covering Indexes / Performance for Sorting and Where Conditions

n1ql
query

#1

I have some concerns related to sorting and where conditions when using n1ql queries.

Let’s say I have 1,000,000 documents similar to:
{ type: “good”, num: 1, other:“stuff” }

With the following index:
CREATE INDEX index_name ON bucket_name(type, num) WHERE type="good"

The following queries are covered by the index right? (no document fetching):
SELECT type,num FROM bucket_name WHERE type="good" AND num=1
SELECT type,num FROM bucket_name WHERE type="good" ORDER BY num

But the following queries are not covered:
SELECT * FROM bucket_name WHERE type="good" AND num=1
SELECT * FROM bucket_name WHERE type="good" ORDER BY num

For the not covered queries, it seems logical that the WHERE and the ORDER BY would be covered since the field they are dealing with is in the index. And then once the resulting set of index entries was determined, then the documents would be fetched in order to get the fields for the SELECT part of the query.

What I am afraid of is that this is not happening, but instead, every single document is fetched, and evaluated for the WHERE and for the ORDER BY.

  1. In the case of the WHERE - this mean a very large number of documents will be fetched if a small number of documents meet the condition. I’m assuming it stops once it reaches the LIMIT. That would be terrible for performance.
  2. In the case of the ORDER BY - this means that ALL documents must be fetched to sort them. That is even more terrible for performance.

I never want to make a query in my application where a where condition or a sort is not covered by the index because of the terrible performance implications. Ideally - the result set will always be determined by what’s in the index, and then additional properties would be fetched from the documents if necessary.

Is all of the above correct? Does anyone have thoughts or suggestions about this?


#2

No, only the matching documents are fetched. That is the meaning of an index scan. What you are afraid of is a full scan / primary scan. Just use EXPLAIN, and if you see index scan, you are good.


#3

So you are saying that (given 1,000,000):
CREATE INDEX index_name ON bucket_name(type, num) WHERE type="good"
SELECT * FROM bucket_name WHERE type="good" ORDER BY num LIMIT 10
Will for sure only fetch 10 documents?


#4

For that specific index and query:

In 4.5, it will only fetch (and sort) those documents where type=‘good’, whatever that cardinality is.

In 4.5.1, it will fetch at most 10 documents, and will not perform any sort.


#5

So the Order By gets ignored in 4.5.1? Not ordering isn’t acceptable in my case, but at least you’re avoiding overloading the server.
Is there no optimization for the fact that the property that is being ordered is directly in the index?


#6

The ORDER BY does not get ignored. It does the right thing.


#7

Can you clarify the above statement then?
Thanks.


#8

Hi @benbenwilde,

You already clarified the statement yourself. The index is already sorted, so no further sort is performed.


#9

Ok sorry I think I finally understand your answer now.

In 4.5.1 the Order By or the Where condition is run on the index directly if possible. Then after that the documents are fetched in order to project the result if needed.
Makes sense.

So 4.5.0 has the issue that it will fetch for sorting and conditions, even if that part is covered by the index. So it seems my best alternative for now with 4.5.0 is to only select the id, and then turn around and fetch the documents manually. Does that sound right?

How long do you expect until 4.5.1 is fully released?
Thanks


#10

Yes, you described 4.5.1 correctly.

4.5 is as I described it. As I said, it will only fetch the documents where type = “good”.

Let’s ask @prasad and @keshav_m. 4.5.1 EE should be available in about a month.