Slow query performance with correlated subquery, even with advised index

Given documents of the following shapes:

{
    "type": "user",
    "firstName": "Fred",
    "middleName": null,
    "lastName": "Smith",
    "agencies": ["agency1", "agency2"],
    "userType": "cc",
    "isLocked": false,
    "isEnabled": true,
    "role": "mainrole"
}

{
    "type": "role",
    "code": "mainrole",
    "permissions": ["qa|m", "users|m", "roles|e"]
}

The following query:

SELECT META().id AS metaid,
       REPLACE(CONCAT2(' ', COALESCE(firstName, ''), COALESCE(middleName, ''), COALESCE(lastName, '')), ' ', ' ') AS name
FROM dev
WHERE type='user'
    AND dev.userType <> 'mobile'
    AND dev.isEnabled = TRUE
    AND dev.isLocked <> TRUE
    -- this subquery increases query time by a factor of 10
    AND EXISTS(
        SELECT NULL
        FROM dev b2
        WHERE b2.type='role'
        AND b2.code = dev.`role`
        AND ANY p IN b2.permissions SATISFIES p IN ['qa|m'] END)
        AND (dev.agencies IS NULL OR (ANY a IN dev.agencies SATISFIES a = 'agency|myems' END)
    )
ORDER BY lastName

Takes ~1.8 seconds to execute, even with what the console suggests is an adequate index.

Advisor shows:

Index Currently Used
CREATE INDEX adv_userType_isLocked_DISTINCT_agencies_agencies_isEnabled_type ON `dev`(`userType`,`isLocked`,(distinct (array `a` for `a` in `agencies` end)),`agencies`) WHERE ((`type` = 'user') and (`isEnabled` = true))
Existing indexes are sufficient.

The problem is that the performance of this query is quite unacceptable, given the extremely low record counts. We have only about 2700 users, 5 roles, 40 permissions and 2 agencies.

So I’m at a bit of an impasse here. With no ability to provide a “better” index, this very basic query is performing miserably, with seemingly no way to improve.

FYI - completely removing the dependent subquery improves performance to around 215ms to execute - still slow, but obviously worlds better. But of course, I need the subquery.

I expect there’s some secret optimization that I can perform and that the index advisor is simply failing to suggest it. But if this is true, then I’m forced to question the usefulness of the advisor.

Help!

Is there reason you can’t move this filter outside subuqery to parent?

AND (dev.agencies IS NULL OR (ANY a IN dev.agencies SATISFIES a = 'agency|myems' END)

CREATE INDEX ix1 ON `dev`(`userType`,`isLocked`) WHERE ((`type` = 'user') and (`isEnabled` = true));
CREATE INDEX ix2 ON `dev`(code, DISTINCT permissions) WHERE type = "role";

SELECT META().id AS metaid,
       REPLACE(CONCAT2(' ', COALESCE(firstName, ''), COALESCE(middleName, ''), COALESCE(lastName, '')), ' ', ' ') AS name
FROM dev
WHERE type='user'
    AND dev.userType <> 'mobile'
    AND dev.isEnabled = TRUE
    AND dev.isLocked <> TRUE
    AND (dev.agencies IS NULL OR (ANY a IN dev.agencies SATISFIES a = 'agency|myems' END)
    AND EXISTS(
        SELECT RAW 1
        FROM dev b2
        WHERE b2.type='role'
        AND b2.code = dev.`role`
        AND ANY p IN b2.permissions SATISFIES p IN ['qa|m'] END
        LIMIT 1)
    )
ORDER BY lastName

OR

SELECT d.metaid, d.name
FROM (SELECT META().id AS metaid,
          REPLACE(CONCAT2(' ', COALESCE(firstName, ''), COALESCE(middleName, ''), COALESCE(lastName, '')), ' ', ' ') AS name,
          lastName AS lastName,
          dev.`role`
      FROM dev
      WHERE type='user'
             AND dev.userType <> 'mobile'
             AND dev.isEnabled = TRUE
             AND dev.isLocked <> TRUE
             AND (dev.agencies IS NULL OR (ANY a IN dev.agencies SATISFIES a = 'agency|myems' END))) AS d
WHERE EXISTS(
        SELECT RAW 1
        FROM dev b2
        WHERE b2.type='role'
        AND b2.code = d.`role`
        AND ANY p IN b2.permissions SATISFIES p IN ['qa|m'] END
        LIMIT 1)
ORDER BY d.lastName

As you have 5 roles only use CTE

CREATE INDEX ix1 ON `dev`(`userType`,`isLocked`) WHERE ((`type` = 'user') and (`isEnabled` = true));
CREATE INDEX ix3 ON `dev`(code, permissions) WHERE type = "role";

WITH lroles AS ( SELECT b2.code, b2.permissions
                FROM dev b2
                WHERE b2.type='role' AND b2.code IS NOT NULL
              )
SELECT metaid,
       REPLACE(CONCAT2(' ', COALESCE(firstName, ''), COALESCE(middleName, ''), COALESCE(lastName, '')), ' ', ' ') AS name
FROM dev
WHERE type='user'
      AND dev.userType <> 'mobile'
      AND dev.isEnabled = TRUE
      AND dev.isLocked <> TRUE
      AND (dev.agencies IS NULL OR (ANY a IN dev.agencies SATISFIES a = 'agency|myems' END))
      AND EXISTS (SELECT RAW 1
                  FROM lroles AS r
                  WHERE r.code = dev.`role`
                  AND ANY p IN r.permissions SATISFIES p IN ['qa|m'] END
                  LIMIT 1)
ORDER BY lastName;

if code is string also try this as you have few roles

WITH objRoles AS (OBJECT o.code:o.permissions
                  FOR o IN (SELECT b2.code, b2.permissions
                            FROM dev b2
                            WHERE b2.type='role' AND b2.code IS NOT NULL
                           )
                  END)
SELECT metaid,
       REPLACE(CONCAT2(' ', COALESCE(firstName, ''), COALESCE(middleName, ''), COALESCE(lastName, '')), ' ', ' ') AS name
FROM dev
WHERE type='user'
      AND dev.userType <> 'mobile'
      AND dev.isEnabled = TRUE
      AND dev.isLocked <> TRUE
      AND (dev.agencies IS NULL OR (ANY a IN dev.agencies SATISFIES a = 'agency|myems' END))
      AND ANY p IN objRoles.[dev.`role`] SATISFIES p IN ['qa|m'] END
ORDER BY lastName;
1 Like

@vsr1 I hope the couchbase team is paying you like a superhero :wink:

I tried all 4 variants you suggested with the following results:

#1 moving agencies check into parent: 750ms
#2 moving EXISTS outside of FROM: 600ms even with fully covering index
#3 lroles CTE: 150ms ← much better
#4 objRoles CTE: 50ms ← best!

Obviously, I’ll go with the 4th variant.

The problem I have now is that I may be forced to try to come up with a myriad of my own variants for any given query just to eek out baseline performance, even after the index advisor has proclaimed it’s already using the best index. This is quite frustrating.

For example, why does my original correlated subquery, which feels like the simplest and most obvious choice, perform so dismally? Is there a ‘secret’ rule that correlated subqueries should be avoided at all costs? And if so, should they even be supported, if they can never be optimized?

Finally, I’ve uncovered what appears to be a bug in the console (7.0.2). When I use the #4 query, the index advisor fails, showing the following instead of the actual advice: Click 'Advise' to generate query index advice.

Thanks again for the excellent help.

1 Like

At present query only advise only on parent query.
Subquery ADVISE/EXPLAIN/Monitor will be there in 7.1.2 MB-31640(Soon)
There is not best index correlated subquery end up doing every outer row full scan on corrected subquery might taking time.

FYI: check #3, #4 fixed few typos

If I understand correctly:

  • The index advisor cannot suggest optimal indexes for any query that contains a subquery.
  • Support for this feature is expected in 7.1.2.
  • Proper indexing cannot guarantee subquery performance.

It seems clear that ANY use of subqueries in N1QL should be avoided, which feels like a significant drawback. I’m deeply concerned that this may apply to the analytics service as well.

If reformatting a subquery as a common table expression (as we did above) offers dramatically improved performance, shouldn’t the query engine automatically do this on our behalf whenever we execute a subquery? It certainly feels like this type of optimization is possible.

Please let me know if any of the above is incorrect.

Thanks again

Pre 7.1.2 The advisor is per query block. One can do separately advise on subquery (any correlated field replace as named parameter just purpose of advise/explain)

In 7.1.2 It will recommend as whole query (including subqueries).
If proper index subquery works optimal way.
Advise or query execution works as user provided query. At present N1QL doesn’t have query rewrite (Some times query rewrite can generate optimal plan like one above simple example: EXISTS/ NON EXISTS sub query doesn’t produce all results LIMIT 1 can make faster )