Covered Intersect Scan with Adaptive Index

Using the following documents with four attributes each:

test:1
  {"type":"test", "u":"4b0f86fc", "a1":"e2d27cc7", "a2":"7e16f001", "s":"graceful"}
test:2
  {"type":"test", "u":"4b0f86fc", "a1":"cad907a9", "a2":"7e16f001", "s":"credit"}
test:3
  {"type":"test", "u":"4b0f86fc", "a1":"e2d27cc7", "a2":"89ef6a8f", "s":"overflow"}
test:4
  {"type":"test", "u":"6718d226", "a1":"cad907a9", "a2": "89ef6a8f", "s": "desert"}

The goal was to have a covering index for u and s, and then a separate adaptive index for a1, a2, … an (there are two above, but more in the real case). The u attribute would be a standard filter on every query, the ax attributes extra filters built into the query string conditionally, and the s attribute used for sorting final results. The intent is that there would be an intersect scan with some covered attributes for sorting. Essentially we have paged queries powering a UI that supports filters and sorted columns, where the user gets to pick the filters.

I have seen this post:

Covering Indexes with IntersectScan or UnionScan

Which indicates that covered drops out on intersect scans as late as 4.5. However, the documentation here:

https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/covering-indexes.html

implies that INTERSECT is one of the query types that “. . . can benefit from covering indexes.” We’re running 6.0.2 (Enterprise), so I was hoping maybe the situation had changed since 4.5 and covered would work with intersect.

Unfortunately, I’m finding that:

  1. I cannot get the intersect scan to happen with an adaptive index, only individual.
  2. Without adaptive, I can get the intersect scan to work, but not get covered attributes for sorting (always a fetch).

For #1, here is the test:

  CREATE INDEX `us_idx` ON `test`(`u`,`s`) WHERE `type`="test"
  CREATE INDEX `adp_idx` ON `test`(DISTINCT PAIRS({a1, a2}))
  SELECT meta().id FROM `test` WHERE `u`="4b0f86fc" AND `a1`="cad907a9" AND `type`="test"

It correctly returns test:2, but the explain indicates us_idx is used along with a fetch for a1 - no intersect scan. Hitting:

SELECT meta().id FROM test WHERE a1=“cad907a9”

Happily uses the adaptive index and correctly returns test:2 and test:4, so it looks like I just can’t get the adaptive to come in on an intersect scan (even if I add the WHERE clause for the type attribute in the adaptive index).

For #2, I’m going to change it up a little (no adaptive index - I dropped adp_idx and built a targeted index):

  CREATE INDEX `us_idx` ON `test`(`u`,`s`) WHERE `type`="test"
  CREATE INDEX `attr1_idx` ON `test`(`a1`) WHERE `type`="test"

With these:

SELECT meta().id FROMtestWHEREu="4b0f86fc" ANDa1="cad907a9" ANDtype="test"

Now does the intersect scan between us_idx and attr1_idx, but I’ve lost the coverage for s. Sorting ends up taking a fetch (it’s that coverage that I was hoping made it in post 4.5 based on the 6.0 docs).

Ultimately, I’m really after something like this:

  SELECT RAW meta().id
    FROM `test`
   WHERE `u`="4b0f86fc" AND `a1` LIKE "cad907a9" AND `type`="test"
   ORDER BY `s`

yielding an intersect scan between some kind of indexes with s covered for fast sort (and not having to to do a really broad scan across a 10’s of thousands of potential items).

Am I basically out of luck on this? We’re playing with FTS to solve this, but there are separate challenges there and would like the N1QL option.

Looking forward to a response, and thanks in advance for any help or thoughts!

PS-One last thing… The reason I am not trying to build an adaptive index that includes u (seems obvious) is the clear indication that adaptive doesn’t support covered. Additionally, I have quite a few columns for sorting (over 10), so the nothing that I could add those to the us_idx (s1, s2, s3… sn) was attractive if I could then just put the other filters in separate “light” indexes.

IntersectScan can’t be covered. IntersectScan used multiple indexes are used. As each index maintained independently , asynchronously We can’t combine the data from two indexes and return as query results (This result in wrong results). Due to that It must do Fetch and reapply the Filters on whole document.

Adaptive index also can’t be covered due to the way it stores data and it can result in false positives when used cover and must Fetch the document and reapply the Filters.

If you want to use ORDER from index , add additional predicate (s IS NOT MISSING) (If it is okay for you)

 CREATE INDEX `us_idx` ON `test`(`u`,`s`,a1) WHERE `type`="test";
 SELECT RAW meta().id
    FROM `test`
   WHERE `u`="4b0f86fc" AND `a1` LIKE "cad907a9" AND `type`="test" AND s IS NOT MISSING
   ORDER BY `s`;

https://blog.couchbase.com/create-right-index-get-right-performance/
https://blog.couchbase.com/offset-keyset-pagination-n1ql-query-couchbase/

First off, thank you for answering these queries as thoughtfully as you do, and please be patient with my ignorance.

I don’t quite understand the restrictions you’ve outlined (I get it may not be implemented right now). Conceptually, indexes exist for a set of documents and each document has covered attributes. An intersection scan looks for documents that satisfy criteria from two or more indexes, and therefore, an intersect scan should be able to cover the union of all covered attributes across those individual indexes.

So I’m re-reading what you wrote as carefully as I can to try to understand your point. When you say “. . . maintained independently, asynchronously” are you talking about consistency between the indexes? That the two indexes may not individually be based on the same mutation of the document, and therefore not only can’t you trust the covered values in the index, Couchbase can’t even trust that the document truly meets the query criteria (i.e. it could have found a document that doesn’t really meet all criteria on each index; it may also miss a document that should have been found). For that reason, after getting a “hit” Couchbase fetches and reapply the filters to the current best representation of the document to ensure the “hit” (a potentially very large performance hit, and you may still get missing docs)?

If that is the real issue - if I understood you correctly - wouldn’t that be the point of the scan_consistency flags (not_bounded, at_plus, request_plus)? If the indexes are collectively mutually consistent as of a particular point (or not if you don’t care), then the content between the indexes could be trusted at that point for that query, and you could use those covered values? For performance it may actually be worth it to take the consistency hit.

If I did not properly understand you point, could you offer a few extra words of explanation?

Thank you for your time and energy @vsr1.

INSERT INTO default VALUES (“doc1”, {“address”: “1234 First Street”, “city”: "San Francisco, “state”: “CA”, “zipcode” : 91234});
CREATE INDEX ix1 ON default(zipcode);
CREATE INDEX ix2 ON default(city);
SELECT META().id, city, zipcode FROM default WHERE city=“San Francisco” AND zipcode = 91234;

Above query can use index

  • ix1 only
  • ix2 only
  • ix1, ix2 and do IntersectScan due to AND clause

As ix1 has zipcode, META().id and ix2 has city and META().id , Query use zipcode, city, META().id .
Query can’t use multiple indexes(ix1,ix2) combine the values from different sources and produce covered query .Due to each index maintained independently , asynchronously.

  • Time T1 : The document is what described above.
  • Index ix1, ix2 also upto time T1
  • Query issued it can produce right results.
  • At Time T2 User updated zipcode = 81234, city = “New York”;
  • Index ix1 still at time T1, ix2 is Fast Time T2
  • Now issue following query it gets results with city =“New York”:, zipcdoe = 91234.
  • SELECT META().id, city, zipcode FROM default WHERE city=“New York” AND zipcode = 91234;

At any given time there is no document that has value of city=“New York” AND zipcode = 91234.
By combining the documents from different sources atomicity of the document will be lost. Intersect scan result in false positives, Due to that N1QL Fetch the document and re-applying predicates.

If you really Interested only document key with covering on intersectScan .
Use Two independent queries with SET Operator INTERSECT, UNION.

SELECT META().id FROM default WHERE city = "San Francisco" 
INTERSECT
SELECT META().id FROM default WHERE zipcode = 91234;

Thank you, again. This is VERY much appreciated.