Index Scan of Documents with Missing Attributes

Assume you have an index like this:

CREATE INDEX `type_idx` ON `shadows`(`type`,`subtype`,`org`)

The type is always present and valued, but subtype and org may not be. In fact, subtype may be missing when org is valued.

With a query like this:

  SELECT META(s).`id`
    FROM `shadows` AS s
   WHERE (s.`type` = "sample-type")
     AND (s.`subtype` IS MISSING)
     AND (s.`org` = "03387460-a1b2-42dd-8325-65b2ba6bde3f")
   LIMIT 1000

You’ll get what looks, to me, like a reasonable range scan (in fact, a pretty bounded one):

  "spans": [
      "exact": true,
      "range": [
          "high": "\"sample-type\"",
          "inclusion": 3,
          "low": "\"sample-type\""
          "high": "null",
          "inclusion": 0
          "high": "\"03387460-a1b2-42dd-8325-65b2ba6bde3f\"",
          "inclusion": 3,
          "low": "\"03387460-a1b2-42dd-8325-65b2ba6bde3f\""

My assumption - perhaps bad - from this EXPLAIN is that MISSING gets indexed and sorted before “null” in the index (hence the high of null with inclusion 0). My understanding from other posts is also that MISSING does get indexed if the attribute is not the first attribute in the index. In other words, type has to be valued but subtype and org do not have to be. Assuming all this is true, then the above range scan should be in good form and be quick.

When run against a relatively large index, however, we see this run very slowly. In fact, it is behaving exactly as though I’m getting a FULL scan of every row of type “sample-type” - like there is no cutting down to the org. Can someone help me understand why the above might yield a really big range scan?

Thank you!

Your assumption on first paragraph is right.

Indexing experts able to answer the second paragraph. cc @jeelan.poola

Thanks for the confirmation on assumptions! Still hoping there’s some insight on the slowdown from @jeelan.poola if they have a moment.