Complex Order By / Performance

Hey all, I have a complex query that’s very slow when we introduce the ORDER BY. I was wondering if anybody could help me think outside the box to get this faster. Current query is taking around 8-10 seconds when processing ~1500 docs for a user.

We have item type documents which look like this:

{
  "type": "item",
  "owner": "dGVzdEBzaG9wc3RyZWFtLmlv",
  "labels": [
    {
      "categoryName": "Price",
      "text": "$10"
    },
    {
      "categoryName": "Style",
      "text": "Dress"
    },
    {
      "categoryName": "Size",
      "text": "M"
    }
  ]
}

Each item has a ‘labels’ array, each label has a ‘text’ and ‘categoryName’

What we need to do is query items, and order them dynamically at query time. The ordering needs to happen on multiple levels. This example below is our current implementation. It will order item docs by Style first in the order of (hat, dress, skirt), then by Size (XXS, XS, S, etc) and finally order by Price (low to high)

SELECT item,
      IFNAN(NANIF(ARRAY_POSITION(["HAT","DRESS","SKIRT"], sortLabel_0), -1), 999) as sort_0,
      IFNAN(NANIF(ARRAY_POSITION(["XXS","XS","S","M","L","XL","XXL"], sortLabel_1), -1), 999) as sort_1,
      IFNAN(NANIF(ARRAY_POSITION(["$5","$10","$20"], sortLabel_2), -1), 999) as sort_2
    FROM database AS item 
    LET  sortLabel_0 = (SELECT RAW UPPER(l.text)
                        FROM database i
                        USE KEYS META(item).id
                        LEFT OUTER UNNEST i.labels AS l
                        WHERE upper(l.categoryName) = "STYLE")[0],
     sortLabel_1 = (SELECT RAW UPPER(l.text)
                        FROM database i
                        USE KEYS META(item).id
                        LEFT OUTER UNNEST i.labels AS l
                        WHERE upper(l.categoryName) = "SIZE")[0],
      sortLabel_2 = (SELECT RAW UPPER(l.text)
                        FROM database i
                        USE KEYS META(item).id
                        LEFT OUTER UNNEST i.labels AS l
                        WHERE upper(l.categoryName) = "PRICE")[0]
    WHERE item.owner = "bHVsYXJvZXNoZXJ5bEBpY2xvdWQuY29t"
     AND item.type = "item"
     AND item._sync.rev IS NOT NULL 
     AND _deleted IS MISSING   
     ORDER BY sort_0 ASC, sortLabel_0 DESC, sort_1 ASC, sortLabel_1 DESC, sort_2 ASC, sortLabel_2 DESC, item.created DESC 
     OFFSET 0 LIMIT 200

This query is basically saying, for each item.label, find the Style and find it’s text position in the Style array. Then do the same for Size and then Price. If any of the texts are not found in the array, order it alphabetically after all the other found ones.
In the query above, labelSort_x is going to be the label text that matches the category, and sort_x is it’s actual position in the “order array”.

We have indexes on doc type, owner, label.text, label.categoryName. The query is fine, until we do Order By.
CREATE INDEX ss_idx_item_label ON database((label.text),(label.categoryName)) WHERE (type = item)

My question is, does anybody seen an immediate way to speed this up - or even a whole new approach to this? Again, the actual order needs to be dynamic. This is slow, because I’m guessing every item doc for this user is unnesting it’s labels and comparing it to the order array, which is very expensive it seems.

We’re new to Couchbase and still learning as much as we can, but this has us stumped.

Thank you very much!

-Tim

Edit: The reason we’re passing in the order to follow is because it’s defined at run / query time

not able to provide the working query because original query has syntax errors and don’t know much info. But you can try in the following lines.

CREATE INDEX ix1 ON database(owner) WHERE type = "item" AND _sync.rev IS NOT NULL;


SELECT item,
  IFNAN(NANIF(ARRAY_POSITION(["HAT","DRESS","SKIRT"], **sortLabel_0), -1), 999) as sort_0,
  IFNAN(NANIF(ARRAY_POSITION(["XXS","XS","S","M","L","XL","XXL"], **sortLabel_1), -1), 999) as sort_1,
  IFNAN(NANIF(ARRAY_POSITION(["$5","$10","$20"], **sortLabel_2**), -1), 999) as sort_2
FROM database AS item
LET sortLabel_0 = FIRST UPPER    (l.text) FOR l IN item.labels WHEN UPPER(l.categoryName) = "STYLE" END,
        sortLabel_1 = FIRST UPPER(l.text) FOR l IN item.labels WHEN UPPER(l.categoryName) = "SIZE" END,
        sortLabel_2 = FIRST UPPER(l.text) FOR l IN item.labels WHEN UPPER(l.categoryName) = "PRICE" END
    WHERE item.owner = "bHVsYXJvZXNoZXJ5bEBpY2xvdWQuY29t"
     AND item.type = "item"
     AND item._sync.rev IS NOT NULL
     AND _deleted IS MISSING
     ORDER BY sort_0 ASC, sortLabel_0 DESC, sort_1 ASC, sortLabel_1 DESC, sort_2 ASC, sortLabel_2 DESC, item.created DESC
     OFFSET 0 LIMIT 200;
1 Like

Incredible. This got it down to < 2 seconds from ~9-10. Thank you so much!
Still would like to see it faster, but that will come with more experience. I greatly appreciate your help.

Sort takes time. As you are projecting whole document set pretty=false
If _deleted is part of the document you can add _deleted IS MISSING to index WHERE clause

1 Like