How can an Intersect Scan return more items than MIN(inputs from respective indices). It's returning the MAX

Need some insight into how intersectScans work. All I have to go on is this article:https://dzone.com/articles/performance-ingredients-for-nosql-intersect-scans

I’m having an issue where I have an intersect scan that involves two index scans:
the first returns x items and the second returns y=(x-z) items.

the intersect scan takes in (x+y) items and returns x items (the max of its inputs)

I’m missing something big, how is that possible. I’m guessing it’s a performance optimization where beyond some threshold of something, the engine just gives up and returns the larger results set?

(not an academic problem…after the 50,000 doc fetch and re-application of predicates, the result set = 163, really need to get this right.)

Please help. (CB v6 Enterprise)

IntersectScan must do Fetch. And it re-applies the predicates. So the output of IntersectScan can have false positives.
#itemsIn in IntersectScan is sum of the all the input sources.
N1QL optimizer uses rule based. Don’t know anything about how many items each input produces.
Due to that IntersectScan on the N1QL is little different than pure Intersect.
When ever any input is done it uses all items received from that input as possible candidate and terminate other inputs early vs wait for all the inputs do pure intersect. Some times this performs well (input produces 10, input 2 produces 1M - in this case it terminates input 2 early) and some times not ( input 1 produces 50,000 input 2 produces 40,000 - intersect is only 100).

If really need pure Intersect use SET INTERSECT ( Q1 INTERSECT Q2) for each query as covered and projecting document ids as USE KEYS.

If intersectScan performing poorly avoid it by using USE INDEX.

1 Like

thank you, absolutely amazing explanation