No Limit Pushdown for Keyset Pagination with Partial Index Filters

I am attempting to build an index that can be used for keyset pagination and statistcis. I am basing this work
on several articles including Database Pagination: Using OFFSET and Keyset in N1QL

The index:

CREATE INDEX `by_type_and_size_sg_idx_v1`
    ON `sync_gateway_sw1`(`type`, META().id, `subType`, `changeType`, ENCODED_SIZE(self))
 WHERE (`type` IS VALUED)
   AND (NOT IFMISSINGORNULL(`softDelete`, FALSE))
   AND (NOT IFMISSINGORNULL(`_deleted`, FALSE))
   AND (META().`id` NOT LIKE "_sync:%")

the query:

SELECT meta(sg).`id`
  FROM `sync_gateway_sw1` AS sg
 WHERE (sg.`type` = "user")
   AND (NOT IFMISSINGORNULL(sg.`softDelete`, FALSE))
   AND (NOT IFMISSINGORNULL(sg.`_deleted`, FALSE))
   AND (META(sg).`id` NOT LIKE "_sync:%")
 ORDER BY sg.`type`, META(sg).`id`
OFFSET 0
 LIMIT 50

To me, everything seems to satisfy the rules for pushing offset/limit down to the indexer (single keyspace, single span, exact predicate, order by key order matches, etc.), but obviously I’m wrong. No matter how I play, I can’t seem to get this partial index to push down or get rid of what seem to be extra steps (i.e. re-filtering).

The full query plan is below. Note that when I remove the last three filters from the index/query the limit does get pushed down, but I still end up with a filter, project, and limit step (again, even though the limit gets pushed down). So that makes it feel like the filters for the partial are a factor with the original query above.

My questions are:

  1. What prevents original query from getting pushed down?
  2. Why do I have the filter step in the query plans (why reapplied)?
  3. Is there a way to get this working with the indexer doing the lifting and have it set up for keyset pagination?

While I want to make this work, I’m more interested in truly understanding why it doesn’t.

The full query plan is below:

{
  "#operator": "Sequence",
  "~children": [
    {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexScan3",
          "as": "sg",
          "covers": [
            "cover ((`sg`.`type`))",
            "cover ((meta(`sg`).`id`))",
            "cover ((`sg`.`subType`))",
            "cover ((`sg`.`changeType`))",
            "cover (encoded_size(`sg`))",
            "cover ((meta(`sg`).`id`))"
          ],
          "filter_covers": {
            "cover (((`sg`.`type`) is valued))": true,
            "cover ((not ((meta(`sg`).`id`) like \"_sync:%\")))": true,
            "cover ((not ifmissingornull((`sg`.`_deleted`), false)))": true,
            "cover ((not ifmissingornull((`sg`.`softDelete`), false)))": true
          },
          "index": "by_type_and_size_sg_idx_v1",
          "index_id": "49f2f5e4012a8f69",
          "index_order": [
            {
              "keypos": 0
            },
            {
              "keypos": 1
            },
            {
              "keypos": 2
            },
            {
              "keypos": 3
            },
            {
              "keypos": 4
            }
          ],
          "index_projection": {
            "entry_keys": [
              0,
              2,
              3,
              4
            ],
            "primary_key": true
          },
          "keyspace": "sync_gateway_sw1",
          "namespace": "default",
          "spans": [
            {
              "range": [
                {
                  "high": "\"location\"",
                  "inclusion": 3,
                  "low": "\"location\""
                },
                {
                  "inclusion": 0,
                  "low": "null"
                }
              ]
            }
          ],
          "using": "gsi"
        },
        {
          "#operator": "Parallel",
          "maxParallelism": 1,
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Filter",
                "condition": "((((cover ((`sg`.`type`)) = \"location\") and cover ((not ifmissingornull((`sg`.`softDelete`), false)))) and cover ((not ifmissingornull((`sg`.`_deleted`), false)))) and cover ((not ((meta(`sg`).`id`) like \"_sync:%\"))))"
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "cover ((meta(`sg`).`id`))"
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    {
      "#operator": "Limit",
      "expr": "50"
    }
  ]
}
  1. If Index order is used you will not see “#operator”:“Order” at the end of query, you see “index_order” field in “#operator”: “IndexScan3”. Query avoids explicit sort. In your plan yes.
  2. If Offset is pushed to indexer you will see “offset” field in “#operator”: “IndexScan3” section. As you use offset 0 it will not show. Change to 10. Same is true for limit. You will see “limit” field. In your case it is not because META().id predicate causing all the predicates not push to indexer and false positives possible. When false positives possible it can’t push limit,offset to indexer.
  3. Even if no false positives, Filter operator will always applied, you can’t avoid it.
  4. Limit pushdown to indexer is hint. Indexer can honor or ignore. So limit is re-applied that is not an issue. Once limit reached it terminates query.

For your query to use keyset pagination trick the index and query such that it will not contradict with META().id index condition.

CREATE INDEX `by_type_and_size_sg_idx_v1`
    ON `sync_gateway_sw1`(`type`, SUBSTR(META().id,0), `subType`, `changeType`, ENCODED_SIZE(self))
 WHERE (`type` IS VALUED)
   AND (NOT IFMISSINGORNULL(`softDelete`, FALSE))
   AND (NOT IFMISSINGORNULL(`_deleted`, FALSE))
   AND (META().`id` NOT LIKE "_sync:%");

SELECT meta(sg).`id`
  FROM `sync_gateway_sw1` AS sg
 WHERE (sg.`type` = "user")
   AND (NOT IFMISSINGORNULL(sg.`softDelete`, FALSE))
   AND (NOT IFMISSINGORNULL(sg.`_deleted`, FALSE))
   AND (META(sg).`id` NOT LIKE "_sync:%")
 ORDER BY sg.`type`, SUBSTR(META(sg).`id`,0)
OFFSET 10
 LIMIT 50;

If needed use Additional predicate AND SUBSTR(META(sg).id,0) > “previous value” and remove OFFSET.

SUBSTR(META(sg).id,0) is same as META(sg).id i.e. 0 offset to length

I am always grateful for your answers. For anyone else who stumbles on this, there are other threads from vsr1 explaining false positives (e.g. here). I will say that I don’t always understand why the filter shows up a second time - I have other queries using this index where it doesn’t get re-applied. Regardless, thank you for your help!

Only place we don’t apply filter is indexer doing aggregation (https://blog.couchbase.com/understanding-index-grouping-aggregation-couchbase-n1ql-query/) . For all other times we apply filter.
One of main reason is, If doing Fetch document might have changed from indexer keys. Not applying filter can result in wrong results.

Another example :
CREATE INDEX ix1 ON default(c1);
SELECT META().id FROM default WHERE c1 > 5 AND c1 > $val1;
Query must generate plan at prepare time. It sends indexScan one of the value say 5
execution time val1 is 7 then there extra items from indexer (those are false positives) query must apply filter again and eliminate.
If val1 is 3. No false positives.

If you have query , index where Filter not applied post that will take look and respond.