N1QL ORDER BY performance optimisation

Consider following use-case:

An Index

CREATE INDEX `type-index`ON bucket( type, createdAt) WHERE type='someType';

A query (which perform quite well)

SELECT a.*
FROM `bucket` a
WHERE a.tyoe='someType'
LIMIT 10 OFFSET 10

Execution: 14.51ms

A query (which perform really bad)

SELECT a.*
FROM `bucket` a
WHERE a.tyoe='someType'
ORDER BY a.createdAt DESC
LIMIT 10 OFFSET 10

Execution: 1m

Here is an EXPLAIN

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "IndexScan",
              "index": "type-index",
              "index_id": "ee6b75813c26795c",
              "keyspace": "bucket",
              "namespace": "default",
              "spans": [
                {
                  "Range": {
                    "High": [
                      "successor(\"someType\")"
                    ],
                    "Inclusion": 1,
                    "Low": [
                      "\"someType\""
                    ]
                  }
                }
              ],
              "using": "gsi"
            },
            {
              "#operator": "Parallel",
              "~child": {
                "#operator": "Sequence",
                "~children": [
                  {
                    "#operator": "Fetch",
                    "as": "a",
                    "keyspace": "bucket",
                    "namespace": "default"
                  },
                  {
                    "#operator": "Filter",
                    "condition": "((`a`.`_type`) = \"someType\")"
                  },
                  {
                    "#operator": "InitialProject",
                    "result_terms": [
                      {
                        "expr": "`a`",
                        "star": true
                      }
                    ]
                  }
                ]
              }
            }
          ]
        },
        {
          "#operator": "Order",
          "limit": "10",
          "offset": "10",
          "sort_terms": [
            {
              "desc": true,
              "expr": "(`a`.`createdAt`)"
            }
          ]
        },
        {
          "#operator": "Offset",
          "expr": "10"
        },
        {
          "#operator": "Limit",
          "expr": "10"
        },
        {
          "#operator": "FinalProject"
        }
      ]
    },
    "text": "SELECT a.* FROM `bucket` a WHERE a.type='someType' ORDER BY a.createdAt DESC IMIT 10 OFFSET 10"
  }
]

The bucket contains 1.7M documents. Does exist any way how to improve the performance of ORDER BY?

Is createdAt a number or a string? If you search the forums, we have posted workarounds for this. But will repost the workaround here. First need to know if createdAt is a number or a string.

It is a number. Unix timestamp.

Here you go. This will be fixed in the next major release.

CREATE INDEX `type-index`ON bucket( -createdAt ) WHERE type='someType';

SELECT a.*
FROM `bucket` a
WHERE type='someType'
AND -createdAt IS NOT NULL
ORDER BY -createdAt
LIMIT 10 OFFSET 10;
1 Like

Thanks, it works.

How about a bit optimize also count(*) query:

SELECT count(*)
FROM `bucket` a 
WHERE type='someType' AND -createdAt IS NOT NULL

Execution: 7.43s
Is quite a lot.

I have to do it, because I have to return total count to be able to generate pagination.

If you can use a dedicated bucket,

SELECT COUNT(*)
FROM bucket;

is fast. cc @keshav_m

1 Like

@petojurkovic, To generate pagination, would you need EXACT count or an approximate count would help?

Something along the lines of: FOUND_ROWS() in MySQL.
https://dev.mysql.com/doc/refman/5.7/en/information-functions.html

@keshav_m
approximate count would be fine

Thanks!

Thanks @petojurkovic. We’re thinking of this. Will keep you posted. Thanks.