Index Service Returns All Documents and then it Sorts

I have a simple trip document and it has a property called startTs that represents the start of the trip.
Here is the index that I created

CREATE INDEX idx_test ON trips(startTs) WHERE (type = “trip”)

And here is my query

SELECT RAW trips FROM trips
WHERE startTs BETWEEN “0001-01-01T00:00:00.0000000+00:00” AND “9999-12-31T23:59:59.9999999+00:00”
AND type = “trip”
ORDER BY startTs DESC
LIMIT 1 OFFSET 0

When I look at the plan I see the following

#stats
#itemsOut - 197
#phaseSwitches - 791
execTime - 449.9µs
kernTime - 51.3µs
servTime - 1.930301ms
index - idx_test
index_id - 63ff29479857c3f7
index_projection
primary_key - true
keyspace - trips
namespace - default
spans
exact - true
range
high - “9999-12-31T23:59:59.9999999+00:00”
inclusion - 3
low - “0001-01-01T00:00:00.0000000+00:00”
using - gsi
#time_normal - 00:00.002
#time_absolute - 0.002380201

And here is my question: how come the items out is 197? I would expect this number to be 1 in this case. This becomes a big issue in production when I am paging on the large number of documents. Couchbase fetches those documents and it applies order and limit later in the pipeline.

Interesting observation: if I create an index on tripId (GUID) for example and sort/limit/offset by that, the items out is 1 which is expected. Is this a specific issue with date range?

Index will have entries in ASC order.
If query needs DSC it must get all qualified documents and do sort. There is no RVERSE scan.

Use DESC index like below.

CREATE INDEX idx_test ON trips ( startTs DESC ) WHERE ( type = “trip”);

Thank you very much, that worked.

A follow up question @vsr1
Let’s expand the example a bit. I also have a collection in my trip document called tags. It could be null, empty array or an array of tags.
I created the following index

CREATE INDEX idx_test_tags ON trips(startTs DESC,(distinct (array t for t in tags end))) WHERE (type = “trip”)

When I run the following query, items out is 199 which I don’t want

SELECT RAW trips FROM trips
WHERE startTs >= “0001-01-01T00:00:00.0000000+00:00”
AND startTs < “9999-12-31T23:59:59.9999999+00:00”
AND type = “trip”
ORDER BY startTs DESC
LIMIT 1 OFFSET 0

  • #stats
    • #itemsOut - 199
    • #phaseSwitches - 801
    • execTime - 455.1µs
    • kernTime - 66.7µs
    • servTime - 1.342899ms
  • index - idx_test_tags

and if run the following query I get the expected result

SELECT RAW trips FROM trips
WHERE startTs >= “0001-01-01T00:00:00.0000000+00:00”
AND startTs < “9999-12-31T23:59:59.9999999+00:00”
AND (ANY t IN tags SATISFIES t IN [“test”] END)
AND type = “trip”
ORDER BY startTs DESC
LIMIT 1 OFFSET 0

So, seems like the predicate needs to be there.
Now the question is: If the api user does not want to filter on tags and only on start time, how do we write a query that uses the idx_test_tags index.
I tried tags IS VALUED or tags IS NOT NULL but it did not work.

The both queries gives right results. What you are looking at is internal profile information. You should not worry about that unless you concern about performance.

For Array index, indexer will have one entry for each array value (i.e. 1 document of array has 10 values, indexer will have 10 entries). During IndexScan you have not given any array index key, it will give all 10 values. Then query will do dedup (As Part of DistinctScan operator) by document key after IndexScan.

As Index has array index key and query predicate not used on that key, index gives duplicate entries. So Offset,Limit can’t push to indexer. Query will apply those and terminates when satisfied. As query uses Index order no sort applied at the end , so it is not much overhead. If you are concern about offset then you should explore https://blog.couchbase.com/offset-keyset-pagination-n1ql-query-couchbase/

Yes @vsr1 both queries gives the correct result, but in our application and specifically this call, we are super concern about the performance.
Based on what you said, there is no way we could avoid fetching all the documents from the data node when tag filter is not provided? Also, what happens if the tags array is null or empty? The document does not get indexed?

It will not fetch all the documents From data node. Execute query and check #itemsOut , #itemsIn each operator you will see. (FYI: Indexscan output might have 150, fetch output might have 50, filter might have 10, limit might have 1, ) AS limit reaches query will stop and operators one by one from bottom.

If u concern create non array index, array index and based on query give USE INDEX,
if missing, null or empty and predicate doesn’t have tags it included. Once predicate has tags it becomes false eliminate.

If you concern about this also: Project META().id and see if it uses covered index. In that case USE SDKs to get actual document.

Got it. Here is the plan for the first query:

{
#operator”: “Sequence”,
#stats”: {
#phaseSwitches”: 1,
“execTime”: “1.5µs”
},
“~children”: [
{
#operator”: “Authorize”,
#stats”: {
#phaseSwitches”: 3,
“execTime”: “2.5µs”,
“servTime”: “937.1µs”
},
“privileges”: {
“List”: [
{
“Target”: “default:trips”,
“Priv”: 7
}
]
},
“~child”: {
#operator”: “Sequence”,
#stats”: {
#phaseSwitches”: 1,
“execTime”: “1.9µs”
},
“~children”: [
{
#operator”: “Sequence”,
#stats”: {
#phaseSwitches”: 2,
“execTime”: “452.399µs”,
“kernTime”: “200ns”,
“state”: “running”
},
“~children”: [
{
#operator”: “DistinctScan”,
#stats”: {
#itemsIn”: 199,
#itemsOut”: 199,
#phaseSwitches”: 801,
“execTime”: “130.9µs”,
“kernTime”: “1.960698ms”
},
“scan”: {
#operator”: “IndexScan3”,
#stats”: {
#itemsOut”: 199,
#phaseSwitches”: 801,
“execTime”: “438.699µs”,
“kernTime”: “49.1µs”,
“servTime”: “1.374499ms”
},
“index”: “idx_test_tags”,
“index_id”: “d08290d19ee16ead”,
“index_order”: [
{
“desc”: true,
“keypos”: 0
}
],
“index_projection”: {
“primary_key”: true
},
“keyspace”: “trips”,
“namespace”: “default”,
“spans”: [
{
“exact”: true,
“range”: [
{
“high”: “"9999-12-31T23:59:59.9999999+00:00"”,
“inclusion”: 1,
“low”: “"0001-01-01T00:00:00.0000000+00:00"”
}
]
}
],
“using”: “gsi”,
#time_normal”: “00:00.001”,
#time_absolute”: 0.001813198
},
#time_normal”: “00:00.000”,
#time_absolute”: 0.0001309
},
{
#operator”: “Fetch”,
#stats”: {
#itemsIn”: 33,
#itemsOut”: 16,
#phaseSwitches”: 103,
“execTime”: “620.199µs”,
“kernTime”: “2.103898ms”,
“servTime”: “2.742499ms”,
“state”: “running”
},
“keyspace”: “trips”,
“namespace”: “default”,
#time_normal”: “00:00.003”,
#time_absolute”: 0.003362698
},
{
#operator”: “Sequence”,
#stats”: {
#phaseSwitches”: 2,
“execTime”: “600.999µs”,
“kernTime”: “100ns”,
“state”: “running”
},
“~children”: [
{
#operator”: “Filter”,
#stats”: {
#itemsIn”: 16,
#itemsOut”: 16,
#phaseSwitches”: 67,
“execTime”: “640.3µs”,
“kernTime”: “4.275897ms”
},
“condition”: “(((("0001-01-01T00:00:00.0000000+00:00" <= (trips.startTs)) and ((trips.startTs) < "9999-12-31T23:59:59.9999999+00:00")) and ((trips.tags) is valued)) and ((trips.type) = "trip"))”,
#time_normal”: “00:00.000”,
#time_absolute”: 0.0006403
},
{
#operator”: “InitialProject”,
#stats”: {
#itemsIn”: 2,
#itemsOut”: 1,
#phaseSwitches”: 8,
“execTime”: “36.4µs”,
“kernTime”: “4.872497ms”
},
“raw”: true,
“result_terms”: [
{
“expr”: “trips
}
],
#time_normal”: “00:00.000”,
#time_absolute”: 0.0000364
},
{
#operator”: “FinalProject”,
#stats”: {
#itemsIn”: 1,
#itemsOut”: 1,
#phaseSwitches”: 6,
“execTime”: “2.5µs”,
“kernTime”: “100ns”
},
#time_normal”: “00:00.000”,
#time_absolute”: 0.0000025
}
],
#time_normal”: “00:00.000”,
#time_absolute”: 0.0006009990000000001
}
],
#time_normal”: “00:00.000”,
#time_absolute”: 0.000452399
},
{
#operator”: “Limit”,
#stats”: {
#itemsIn”: 1,
#itemsOut”: 1,
#phaseSwitches”: 5,
“execTime”: “5.6µs”,
“kernTime”: “900ns”
},
“expr”: “1”,
#time_normal”: “00:00.000”,
#time_absolute”: 0.0000056
}
],
#time_normal”: “00:00.000”,
#time_absolute”: 0.0000018999999999999998
},
#time_normal”: “00:00.000”,
#time_absolute”: 0.0009396000000000001
},
{
#operator”: “Stream”,
#stats”: {
#itemsIn”: 1,
#itemsOut”: 1,
#phaseSwitches”: 5,
“execTime”: “141.199µs”,
“kernTime”: “6.047297ms”
},
#time_normal”: “00:00.000”,
#time_absolute”: 0.000141199
}
],
“~versions”: [
“6.5.0-N1QL”,
“6.5.1-6299-enterprise”
],
#time_normal”: “00:00.000”,
#time_absolute”: 0.0000015
}

In the fetch step, items in is 33 and items out is 16. You are saying this does not mean Couchbase would transfer 16 documents from the data node to the query node?

It means it requested 33, 116 completed. As query limit requirement is satisfied and it stopped (terminated) the query to save resources.

SELECT META().id FROM …
and check it must used covered index . Then use SDKs to get 1 document only.
Or SELECT * FROM trips USE KEYS (SELECT RAW META().id FROM trips
WHERE startTs >= “0001-01-01T00:00:00.0000000+00:00”
AND startTs < “9999-12-31T23:59:59.9999999+00:00”
AND type = “trip”
ORDER BY startTs DESC
LIMIT 1 OFFSET 0)

Got it, Thanks for your help @vsr1.
p.s. The limit 1 was just an example, our page size is usually around 20.

Any value can still work