LCB_ERR_TIMEOUT (201) occurring MANY times per day

It fetched 380K documents because you don’t have right index of your correlated query.
Every main query row it scanning demo for all type=‘choice’.

  1. Create CREATE INDEX ix1 ON demo(type, code, label);
  2. You only doing [0]th element so Add LIMIT 1 inside correlated subquery
SELECT a.*, META().id AS metaid,
       (SELECT RAW b.label
         FROM demo b
         WHERE b.type='choice'
        AND b.code = a.code
        LIMIT 1)[0] AS _signature_reason_label
FROM demo a
WHERE a.type = 'signature_form'
    AND (archived IS MISSING
        OR archived = NULL
        OR archived = FALSE)
1 Like

Thanks for the input. We added a modified version of the index you suggested:

CREATE INDEX `idx_choice_type_code_label` ON `demo`(`type`,`code`,`label`) WHERE (`type` = "choice")

And this was sufficient to improve performance (even without LIMIT 1).

Update: we added the additional suggested index

CREATE INDEX ix1 ON npky(ansDispatched DESC, ansStatus, ansCallSign, ansSource)

which improves the performance of the original query. More importantly though, it seems to now be preventing the timeout error altogether.

In other words, with this index in place, we no longer get random timeouts. We can execute and re-execute the same query over and over with no timeouts.

Of course, the question remains: Why does the absence of this index cause the entire query to fail completely, but only sometimes? If we’re lacking the ideal index, I would expect slower query performance, not random query failures.

Thanks again for everyone’s help.

When used OrderedIntersectScan/IntersectScan there are synchronization problem. Most of them fixed but looks like there corner case left. This is bug track via [MB-52662] (https://issues.couchbase.com/browse/MB-52662). Thanks for reporting the issue.
When used new index it reduced number of intersects or eliminated.

In this situation CBO (EE) might help to choose best index avoid IntersectScans if possible.

1 Like

My understanding is that an IntersectScan occurs when 2 or more separate indexes are referenced in a single WHERE clause. Is this correct? If so, should we now be attempting to avoid ALL IntersectScans until further notice? Because we support ad-hoc querying by our users, it’s not very practical for us to have to try to create a pure covering index for every possible query shape in an attempt to avoid these timeouts. Thanks again.

Most of the issues in this area are addressed. After 7.0.2 this first one found.
Aways it is not possible create single index with all where clause fields.
Most cases should work. Planner will try pick composite index and try any subset indexes.
If issue avoid with USE INDEX.

Yes, it is what I was asking for, but again it does not show an intersect scan being executed.
The only thing that I can spot is one request doing a primary scan (scan/fetch/project)

Well now that we’ve added the covering index, we’re not experiencing the timeouts any more. I guess we can delete the covering index in an attempt to restore the errors so we can capture a debug, but if the previous debug output doesn’t include the intersect scan, then I’m not sure about WHEN the debug needs to be run, i.e. what range of time does debug cover? Do we have to perfectly synchronize a timeout with running debug?

:wink: You would have to produce stacks after the request starts, but before the timeout kicks in and the requests winds down.
You collect too early and the intersect scan hasn’t started, too late, and we don’t have the request anymore.

We have two possible stall scenarios here:
1- the intersect scan waits for one or more of the children to go away: in this case the intersect scan goroutine would show
2- the intersect scan goroutine terminates, but leaves an incorrect state: in this case the intersect scan goroutine would not show among the stacks, but the main request goroutine would show in a specific state

So far I haven’t seen either.

We ran the debug script again. This time, we removed the covering index that prevents the bad query from timing out, then started the bad query, then once it was clear the query was going to timeout (e.g. no response after 10 seconds), we ran the debug script and captured the output.

debug.zip (19.3 KB)

Hope this is correct. I do see an intersect scan captured.