Query performance with string function grinds to a halt

The following query executes in ~50ms:

SELECT META().id AS metaid,
       (ARRAY a.v FOR a IN answers WHEN a.q = 'ePatient.17' END)[0] AS dob
FROM bucket
WHERE `type`='pcr'
    AND ansSource = 'import'
    AND ANY a IN answers SATISFIES a.q = 'ePatient.17'
    AND a.v IS NOT NULL
    END
LIMIT 100

using the following index:

CREATE INDEX idx_whatever ON bucket((DISTINCT(ARRAY(a.q) FOR a IN answers END)), ansSource) WHERE (type = 'pcr')

Adding just about ANY condition with a string function to the query:

* AND a.v LIKE '%-'%'
* AND REGEXP_POSITION(a.v, '^\\\d{4}') > -1 
* etc

Grinds the query performance to nearly a complete halt at 30+ seconds.

Index Advisor has no suggestions.

Is there no way to construct a performant query that includes the use of a string function?

If not able to push the predicate to indexer things will take time. ex: LIKE, regexp etc

See if this helps CB 7.1+

CREATE INDEX ix1 ON bucket(ALL ARRAY FLATTEN_KEYS(ansSource, a.q, a.v) FOR a IN answers END) WHERE type = "pcr";

SELECT META().id AS metaid, a.v AS dob
FROM bucket AS b
UNNEST b.answers AS a
WHERE b.`type` = "pcr"
    AND b.ansSource = "import"
    AND a.q = "ePatient.17"
    AND a.v LIKE "%-%"
LIMIT 100

Thanks for the input. I appreciate that things will “take time”, but that’s not the answer I’m looking for. It just so happens that I have an identical copy of the data in MySQL and running the same query takes just .4 seconds instead of 30+ seconds to execute (an almost unfathomable 750x performance increase) on just ~500K records, all without requiring complex index machinations such as pushing predicates onto the index. This is now at least the 5th time I’ve been forced to inquire about couchbase query performance, which regularly proves completely unsatisfactory even in very simple use cases like the one above.

Is the plan the same simply with the addition of the extra string-function filter or does it change substantially? And is it purely LIKE & the REGEXP functions that result in this drop-off in performance ? Or do simpler ones like LENGTH or CONTAINS also result in the same change to performance.

(And could you confirm the version you’re seeing this problem with?)

Thanks.

@dh The performance drops with just the addition of a string function (LIKE, REGEXP, etc). Further, the performance hit depends entirely on how many items match.

For example, if I check AND a.v LIKE '1%', performance is fine because MOST records have a.v starting with 1. But if I do AND a.v LIKE '3%', performance grinds to a halt, because few or no records have a.v starting with 3.

Any time we execute a query where the WHERE clause has to scan many documents (or in this case, many answers across many documents), and we fail to craft the query to quite precisely do as much arbitrary pre-filtering as possible, we get major performance hit.

Thank you. Just to clarify, it only drops with a string-pattern function? Or does it drop with any non-string-pattern function too? (e.g. if you tried AND POSITION(a.v,"3") == 0 in lieu of AND a.v LIKE '3%')

I’m trying to understand if it is solely the regex - LIKE is implemented as a regex - based filters or any field filter. (I’m not observing a difference in my local tests, but want to be sure.)

In all likelihood it isn’t related to the pattern matching functions but rather the need to fetch more documents to satisfy the LIMIT clause.

(I understand that your basic complaint is the performance of document fetching for query evaluation and the need to avoid this.)