Keyset pagination with N1QL, an index with multiple keys and per-key sort order

Hello there,

Implementing keyset pagination with multiple keys and an index is easy when all the keys have the same ordering (ascending or descending). It is just a matter of using JSON arrays comparison in N1QL selects where clauses. It performs a nice lexicographic comparison, and the index is used. However, Couchbase GSI allow to create indexes with per-key order, each key having its own.

JSON array comparison does not work any longer, and the only solution I see is to revert to a more complex where clause. For instance to implement:

[a, b, c] < [d, e, f]

It is possible to write the following:

(a < d) or ((a == d) and ((b < e) or ((b == e) and ((c < f)))))

It does the same, and it is possible to change the ordering based on each key (replace < with > wherever necessary).

It works, but it seems that the select execution plan does not use the index any longer, even with comparisons matching exactly the index. I guess the expression is becoming too complex for the select interpreter to figure out it is just looking at the index.
Any solution to this? Or we must forget efficient keyset pagination when there is per-key order?

Thanks!!!

It seems over-complex to use two conditions instead of >= or <= but an index should be used nonetheless. Perhaps if you could provide more detail of your indices and query we might be able to figure out why an index isn’t being used.

What version of Couchbase are you using?

With version 7, e.g.

CREATE INDEX ixf ON default(a ASC, b DESC, c ASC);

EXPLAIN SELECT * FROM default WHERE (a < d) or ((a == d) and ((b < e) or ((b == e) and ((c < f)))));

shows

...
                    "#operator": "IndexScan3",
                    "index": "ixf",
...
                    "spans": [
                        {
                            "range": [
                                {
                                    "inclusion": 0,
                                    "index_key": "`a`",
                                    "low": "null"
                                },
                                {
                                    "inclusion": 0,
                                    "index_key": "`b`"
                                },
                                {
                                    "inclusion": 0,
                                    "index_key": "`c`"
                                }
                            ]
                        }
...

Of course in this example since d, e & f are assumed to be other fields there is no real filtering. If we simplify and make d, e & f parameters:

EXPLAIN SELECT * FROM default WHERE a <= $d and b <= $e and c < $f;

We see:

...
                        {
                            "exact": true,
                            "range": [
                                {
                                    "high": "$d",
                                    "inclusion": 2,
                                    "index_key": "`a`",
                                    "low": "null"
                                },
                                {
                                    "high": "$e",
                                    "inclusion": 2,
                                    "index_key": "`b`",
                                    "low": "null"
                                },
                                {
                                    "high": "$f",
                                    "inclusion": 0,
                                    "index_key": "`c`",
                                    "low": "null"
                                }
                            ]
                        }
...

HTH.

I am using Couchbase 7.1.0.

The case I am working with has four keys plus the document identifier in the index. They are all ascending in the index for now. The “upper bound” is all constants.

I am giving the hint to the select to use the index. It is used in the case of JSON arrays comparisons, not used in the case of the complex comparison expression.

The index is created the following way:

CREATE INDEX hx2araw_user__last ON directory(l,f,m,(meta().id)) WHERE (@T = “@u”)

The JSON array select is:

select/+INDEX(directory hx2araw_user__last)/directory.las @0,directory.f as @1,directory.m as @2,directory.u as @3,meta(directory).id as @I from directory where
@T=“@u” and
[l, f, m, meta(directory).id] >= [“Lastname”, “Firstname”, “”, “”]
and
[l, f, m, meta(directory).id] <= [{}, {}, {}, {}]
order by@0asc,@1 asc,@2 asc,@I asc limit 20

The console indicates that the index is used.

The lexicographic comparison select is:

select/+INDEX(directory hx2araw_user__last)/directory.l as @0,directory.f as @1,directory.m as @2,directory.u as @3, meta(directory).id as @I from directory where
@T=“@u” and
(l > “Lastname”) or (
(l == “Lastname”) and (
(f > “Firstname”) or (
(f == “Firstname”) and (
(m > “”) or (
(m == “”) and (
(meta(directory).id >= “”)
)
)
)
)
)
)
and
(l < {}) or (
(l == {}) and (
(f < {}) or (
(f == {}) and (
(m < {}) or (
(m == {}) and (
(meta(directory).id <= {})
)
)
)
)
)
)
order by @0 asc, @1 asc,@2 asc,@I asc limit 20

Except if I made a mistake, it is equivalent to the former. The console says the index is not used. Simplified example, therefore a little silly.
Sorry the form eats up the back quotes…

It looks like you’re missing a pair of brackets surrounding the clauses after @T=“@u”; as it reads you have (@T==“@u” and l > “Lastname”) -or- the rest of the conditions. I’m guessing you mean to apply @T=“@u” always.

With the statement written as (please check this is the intended filtering):

explain select directory.l as `@0`,directory.f as `@1`,directory.m as `@2`,directory.u as `@3`, meta(directory).id as `@I`          
from directory
where `@T`="@u"
and
(
    l > "Lastname"
   or (
         l == "Lastname"
          and
          (
             f > "Firstname"
              or (
                     f == "Firstname"
                     and
                     (
                         m > "" or (m == "" and meta(directory).id >= "")
                     )
                 )
          )
        )
)
and
(
    l < {}
    or (
           l == {}
           and
           (
               f < {}
               or (
                      f == {}
                      and
                      (
                          m < {}
                          or (m == {} and meta(directory).id <= {})
                      )
                  )
           )
       )
)
order by `@0` asc, `@1` asc,`@2` asc,`@I` asc limit 20;

I see your index is used.

HTH.

1 Like

Oh my goodness, but you are correct! :slight_smile:
In the most complex question, the most silly error on my side.
BIG BIG THANKS!

I use strategy some thing like this to allow efficient indexScan pushdown. When complex OR, the planner quickly get confused and try to do more conservative pushdown and may scan unnecessarly.

See if the following performs any better

CREATE INDEX ixf ON default(a , b , c , META().id);

SELECT a, b,c,id
FROM ( SELECT a, b,c, META().id
       FROM default AS d
       WHERE a >=$a AND b >= $b AND c >=$c AND MEAT().id > $id
     ) AS d1
WHERE (a > $a OR (b > $b OR (c > $c OR id > $id)))
LIMIT 100;

Thanks!!! I’ll give it a try.