What is the right approach to create indexes to a complex query

query
n1ql

#1

I’m faced to a quite complex query:

SELECT DISTINCT RAW e.visitId FROM bucket e
WHERE
    e.type = 'EVENT'
    AND e.visitId IN (
        SELECT raw item.visitId FROM (
            SELECT ARRAY_SORT(ARRAY_AGG(event.dateTime))[-1] as dateTime, event.visitId FROM bucket event
            JOIN bucket visit ON KEYS event.visitId
            WHERE event.type = "EVENT"
                AND (visit.status = "OPEN" OR (visit.status IS MISSING AND visit.stopDateTime IS NOT VALUED))
                AND visit.visitType = "VT_CONSULTATION"
                AND event.visitId NOT IN(
                    SELECT raw ev.visitId FROM bucket ev
                        WHERE 
                            ev.type = "EVENT"
                        AND (
                        		ev.eventTypeId IN ["ET_CPN", "ET_CPS", "ET_CPON", "ET_PF"]
                        	OR  ev.configurableMetadataId IN ["ET_CPN", "ET_CPS", "ET_CPON", "ET_PF"]
                        )
                        AND SPLIT(META(ev).`id`,':')[0] != "_sync" 
                        AND (ev.isVoided = false OR ev.isVoided IS MISSING)
                )
                AND SPLIT(META(visit).`id`,':')[0] != "_sync" 
                AND (visit.isVoided = false OR visit.isVoided IS MISSING)
            GROUP BY event.visitId) items
        UNNEST items item
        WHERE DATE_DIFF_MILLIS(NOW_MILLIS(), item.dateTime, "hour") >6
    )

And I’m wondering: what is the best approach to manage indexes for these kind of queries (because I will need to create other queries with such a complexity). Is it better to split the query and create index for each part of the query? Is it possible to group every conditions on a single index?

Based on this example, what are the indexes I should create?

Thank’s.

PS: I’m using Couchbase Server 4.5.1


#2

Each subquery optimized itself. Index will be used when the FROM clause has bucket. You can run the EXPLAIN on each SELECT (if there is correlated replace with dummy $xxx) and check how the plan looks and what index needs to be created.

See if you can reduce complexity by using MAX and HAVING clauses based on your query logic.
Also try with Index Join [Index on joined field] because there are lot of predicates on right side of JOIN

    SELECT RAW DISTINCT event.visitId FROM bucket event
    JOIN bucket visit ON KEYS event.visitId
    WHERE event.type = "EVENT"
        AND (visit.status = "OPEN" OR (visit.status IS MISSING AND visit.stopDateTime IS NOT VALUED))
        AND visit.visitType = "VT_CONSULTATION"
        AND event.visitId NOT IN(
            SELECT raw ev.visitId FROM bucket ev
                WHERE
                    ev.type = "EVENT"
                AND (
                        ev.eventTypeId IN ["ET_CPN", "ET_CPS", "ET_CPON", "ET_PF"]
                    OR  ev.configurableMetadataId IN ["ET_CPN", "ET_CPS", "ET_CPON", "ET_PF"]
                )
                AND SPLIT(META(ev).`id`,':')[0] != "_sync"
                AND (ev.isVoided = false OR ev.isVoided IS MISSING)
        )
        AND SPLIT(META(visit).`id`,':')[0] != "_sync"
        AND (visit.isVoided = false OR visit.isVoided IS MISSING)
    GROUP BY event.visitId
    LETTING mdateTime = MAX(event.dateTime)
    HAVING DATE_DIFF_MILLIS(NOW_MILLIS(), mdateTime, "hour") > 6 ;