ORDER BY Performance really slow


#2

5.0 supports DESC collation during index creation. Try the following.

CREATE INDEX desc_int ON default(AInt DESC);
SELECT default.* FROM default WHERE AInt IS NOT MISSING ORDER BY AInt DESC LIMIT 50;

In pre 5.0.0

CREATE INDEX neg_int ON default(-AInt);
SELECT default.* FROM default WHERE -AInt IS NOT MISSING ORDER BY -AInt LIMIT 50;

Index key here is -AInt. Every where in predicate needs to refer as -AInt and the corresponding values, relation conditions, Order collation needs to be reversed such that query results will not be altered.


#3

Hi, thank you. It works.

But this method requires us to create indexes on all keys that may need to be ordered. If I have highly dynamic entries, It is unlikely for us to create indexes on all of those keys.

Is there any other workaround on this problem?

Thanks


#4

If the query ORDER BY clause not same as that Index keys order including collision it requires sort. Which means it needs to retrieve all qualified documents before applying pagination.


#5

ok, is there any limit on number of indexes in each bucket? and number of buckets in each cluster?

thanks


#6

There is no limit on number of indexes in each bucket.

For number of buckets check this out.

https://developer.couchbase.com/documentation/server/current/clustersetup/server-setup.html
Check #6
https://blog.couchbase.com/10-things-developers-should-know-about-couchbase/


#7

Thank you. Moreover, based on what you posted earlier, if we can certain which variable(s) we need on query (with order by), we can use composite indexes as well, is this right?


#8

You can use composite indexes also.
NOTE: Query to use Index Order. ORDER BY clause and Index keys order including collision need to match from left to right to avoid SORT. When query uses Index order there will not be Order
operator of the EXPLAIN plan of the query.


#9

That’s an interesting approach. Is there any equivalent for string values?


#10

In 5.0.0 you can specify the DESC collation for each index key. In Pre 5.0.0 you can use above approach for numeric values or string of timestamp(convert to string of timestamp to numeric using STR_TO_MILLIS()).


#11

Hi! Sorry to resurrect an old thread, but does this same (pre-5.0) method work for 4.5.1? I’m noticing that it completely ignores my ORDER BY when applying the same logic.

Index:

CREATE INDEX `idx_activity__teamID__public_desc` ON `default`((`team`.`_id`),(-`created_at`)) WHERE ((`type` = "activity") and (`private` = false))

Query:

SELECT a._id, a.team._id as teamID, a.created_at
FROM default a
USE INDEX (`idx_activity__teamID__public_desc`)
WHERE a.`type` = "activity" AND a.`private` = false AND a.`team`.`_id` = "f51e4695-479d-4ea6-beab-425ad9484ee9" AND -created_at IS NOT MISSING
ORDER BY -created_at
LIMIT 20

Sample result:

[..., {
  "_id": "00456f69-7fe7-43fc-98bd-d87820d650c7",
  "created_at": "2017-08-27T16:28:21.453Z",
  "teamID": "f51e4695-479d-4ea6-beab-425ad9484ee9"
},
{
  "_id": "005cf862-9175-4b0d-9aa3-3856ffe2eb38",
  "created_at": "2017-09-24T22:31:30.906Z",
  "teamID": "f51e4695-479d-4ea6-beab-425ad9484ee9"
},
{
  "_id": "0061967b-5c86-4271-9cb0-87b06ab2c504",
  "created_at": "2017-08-04T23:23:46.579Z",
  "teamID": "f51e4695-479d-4ea6-beab-425ad9484ee9"
},
{
  "_id": "00729635-39f9-4c33-969d-3a92a81c69ba",
  "created_at": "2017-10-15T17:24:06.712Z",
  "teamID": "f51e4695-479d-4ea6-beab-425ad9484ee9"
}, ...]

I tried to follow the same negative fieldname as stated in your previous replies, but it seems to ignore the ORDER BY field altogether. If I omit the ORDER BY then it runs insanely fast on 50k+ records (< 20ms). If I change the ORDER BY to have -created_at DESC then it takes 200-250ms, and still does not order the records. If I removed the negative in the index, and keep the ORDER BY created_at DESC then it takes 500+ms. I’ve tried several different ways of omitting, specifying fields (created_at is never missing, by the way), but alas, I have come up empty handed. Any help would be wonderful!


#12

created_at is string. negative can be done on numerics. So change both index and query to use
-MILLIS(created_at) because created_at is timestamp you can convert it to numeric using MILLIS.


#13

Worked amazingly! If I keep the ORDER_BY it still is around 200ms (big improvement over the 500+ms it was before), however, removing the ORDER_BY brings it all the way down to 7ms. Do I need to keep the ORDER_BY to ensure the order? Or will the index with the -MILLIS() do that?


#14

Hi @zeestorm,

Post the index definition and query EXPLAIN. it should not based on query. May be the query might not covered because you are projecting created_at. So add created_at to index at the end.
Or Porject MILLIS_TO_STR(-(-MILLIS(created_at)))


#15

I apologize, I’m not exactly understanding what you’re saying.
This is the updated index:

CREATE INDEX `idx_activity__teamID__public_desc` ON `default`((`team`.`_id`),(-MILLIS(`created_at`))) WHERE ((`type` = "activity") and (`private` = false))

And the EXPLAIN query:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "IndexScan",
              "index": "idx_activity__teamID__public_desc",
              "index_id": "78158b04491e7b47",
              "keyspace": "default",
              "namespace": "default",
              "spans": [
                {
                  "Range": {
                    "High": [
                      "successor(\"f51e4695-479d-4ea6-beab-425ad9484ee9\")"
                    ],
                    "Inclusion": 1,
                    "Low": [
                      "\"f51e4695-479d-4ea6-beab-425ad9484ee9\"",
                      "null"
                    ]
                  }
                }
              ],
              "using": "gsi"
            },
            {
              "#operator": "Fetch",
              "as": "a",
              "keyspace": "default",
              "namespace": "default"
            },
            {
              "#operator": "Parallel",
              "~child": {
                "#operator": "Sequence",
                "~children": [
                  {
                    "#operator": "Filter",
                    "condition": "(((((`a`.`type`) = \"activity\") and ((`a`.`private`) = false)) and (((`a`.`team`).`_id`) = \"f51e4695-479d-4ea6-beab-425ad9484ee9\")) and ((-str_to_millis((`a`.`created_at`))) is not missing))"
                  },
                  {
                    "#operator": "InitialProject",
                    "result_terms": [
                      {
                        "expr": "`a`",
                        "star": true
                      }
                    ]
                  }
                ]
              }
            }
          ]
        },
        {
          "#operator": "Order",
          "limit": "20",
          "sort_terms": [
            {
              "expr": "(-str_to_millis((`a`.`created_at`)))"
            }
          ]
        },
        {
          "#operator": "Limit",
          "expr": "20"
        },
        {
          "#operator": "FinalProject"
        }
      ]
    },
    "text": "\nSELECT a.*\nFROM default a\nUSE INDEX (`idx_activity__teamID__public_desc`)\nWHERE a.`type` = \"activity\" AND a.`private` = false AND a.`team`.`_id` = \"f51e4695-479d-4ea6-beab-425ad9484ee9\" AND -MILLIS(a.created_at) IS NOT MISSING\nORDER BY -MILLIS(a.created_at)\nLIMIT 20"
  }
]

Are you saying I may need to have both -MILLIS(created_at), created_at in the index?


#16

Ok. You may need 4.6.x , If use above index by removing ORDER BY the results comes as index order but those are not guaranteed if query changes.


#17

Thank you. I’ll look into upgrading!


#18

Hi @zeestorm,

In 4.5.1 Instead of AND -MILLIS(a.created_at) IS NOT MISSING try AND -MILLIS(a.created_at) <= 0.

check the EXPLAIN. If the query uses Index Order, There should not be any “#operator”: “Order” . This will make things faster.

Also you can set pretty=false in 4.5.1 onwards


#19

Didn’t seem to change anything. I’ve removed the created_at check entirely from the WHERE query (as it always exists, I was just using the examples above for specifying it).


#20

Hi @zeestorm

CREATE INDEX `idx_activity__teamID__public_desc` ON `default`((`team`.`_id`),(-MILLIS(`created_at`)))
WHERE ((`type` = "activity") and (`private` = false));

SELECT a.*
FROM default a USE INDEX (`idx_activity__teamID__public_desc`)
WHERE a.`type` = "activity" AND a.`private` = false AND a.`team`.`_id` = "f51e4695-479d-4ea6-beab-425ad9484ee9"
AND -MILLIS(a.created_at) <= 0
ORDER BY a.`team`.`_id`, -MILLIS(a.created_at)
LIMIT 20;

In 4.5.1 Try above query. added ORDER BY a.team._id, -MILLIS(a.created_at)
In query where clause a.team._id is equality predicate the result will not change. In 4.6.x Query optimizer detects this and takes care of it.


#21

You are some kind of wizard! Worked like a charm, thanks again for everything!