Multiple Index JOIN from a single document

I have a type of document like this one:

{
    "_id": "abc",
    "patientId": "xyz",
    "functionalUnitId": "101",
    "lastModifier": "user101"
    "type": "EVENT",
    ...
}

patientId, functionalUnitId and lastModifier points to 3 different documents on which I need to retrieve some data. I actually manage to have an INDEX JOIN on one of those (patient) but the 2 others are retrieved using a “basic” JOIN which, I suppose, slow down my query (actually it takes ~15sec on a 130.000 documents bucket).

Here is the INDEX JOIN I already created:

CREATE INDEX idx_patientId_event ON DATA_BUCKET(patientId) WHERE (type = “EVENT”)

And here is my actual query:

SELECT ARRAY_AGG ({'id': META(event).id,
                    'patientId': event.patientId, 
                    'visitId': event.visitId, 
                    'fields': fields, 
                    'patientFirstName': INITCAP(patient.givenName), 
                    'patientFamilyName': UPPER(patient.familyName), 
                    'patientBirthDate': MILLIS_TO_STR(patient.birthDate), 
                    'patientSex': patient.sex,
                    'functionalUnitId': event.functionalUnitId,
                    'functionalUnitName': functionalUnit.name,
                    'completed': event.isComplete,
                    'lastUpdate': event.lastModificationDate,
                    'lastUpdateById': event.lastModifier,
                    'lastUpdateByName': UPPER(lastModifier.familyName) || ' ' || INITCAP(lastModifier.givenName)}) as events, eventType
FROM DATA_BUCKET patient
JOIN DATA_BUCKET event ON KEY event.patientId FOR patient
JOIN METADATA_BUCKET functionalUnit ON KEYS event.functionalUnitId
JOIN USERS_BUCKET lastModifier ON KEYS event.lastModifier
LET fields = ARRAY {'id': field.elementValueTypeId, 'value': field.`value`, 'vdo': field.valueDataOptionId} FOR field IN event.elementValues END,
    eventType = IFMISSINGORNULL(event.eventTypeId, event.configurableMetadataId)
WHERE event.type = 'EVENT'
AND patient.type = 'PATIENT'
AND eventType IS NOT MISSING
AND ARRAY_LENGTH(fields) > 0
AND IFMISSING(event.isVoided, false) = false
AND IFMISSING(patient.isVoided, false) = false
GROUP BY eventType
ORDER BY eventType

As we can see their is one INDEX JOIN and 2 “basic” JOIN. Is their a way to speed up the query by making multiple INDEX JOIN (or anything else that could speed up the query is welcome) ?

You have lot of filters on event. Can you try all lookup joins.

CREATE INDEX ix1 ON DATA_BUCKET(IFMISSINGORNULL(eventTypeId, configurableMetadataId),IFMISSING(isVoided, false))
WHERE type = 'EVENT' AND ARRAY_LENGTH(elementValues) > 0;

   SELECT eventType,
        ARRAY_AGG ({'id': META(event).id,
                    'patientId': event.patientId,
                    'visitId': event.visitId,
                    'fields': ARRAY {'id': field.elementValueTypeId,
                                     'value': field.`value`,
                                     'vdo': field.valueDataOptionId }
                              FOR field IN event.elementValues END,
                    'patientFirstName': INITCAP(patient.givenName),
                    'patientFamilyName': UPPER(patient.familyName),
                    'patientBirthDate': MILLIS_TO_STR(patient.birthDate),
                    'patientSex': patient.sex,
                    'functionalUnitId': event.functionalUnitId,
                    'functionalUnitName': functionalUnit.name,
                    'completed': event.isComplete,
                    'lastUpdate': event.lastModificationDate,
                    'lastUpdateById': event.lastModifier,
                    'lastUpdateByName': UPPER(lastModifier.familyName) || ' ' ||
                                        INITCAP(lastModifier.givenName)}) as events
FROM DATA_BUCKET event
JOIN DATA_BUCKET patient ON KEYS event.patientId
JOIN METADATA_BUCKET functionalUnit ON KEYS event.functionalUnitId
JOIN USERS_BUCKET lastModifier ON KEYS event.lastModifier
LET eventType = IFMISSINGORNULL(event.eventTypeId, event.configurableMetadataId)
WHERE event.type = 'EVENT'
      AND eventType IS NOT MISSING
      AND ARRAY_LENGTH(event.elementValues) > 0
      AND IFMISSING(event.isVoided, false) = false
      AND patient.type = 'PATIENT'
      AND IFMISSING(patient.isVoided, false) = false
GROUP BY eventType
ORDER BY eventType;

The index doesn’t seems to improve the query time. I can share the explain statement:
(updated with the new query)

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan2",
            "index": "idx_from_cb_forum",
            "index_id": "8a604dbc321a329f",
            "index_projection": {
              "primary_key": true
            },
            "keyspace": "DATA_BUCKET",
            "namespace": "default",
            "spans": [
              {
                "exact": true,
                "range": [
                  {
                    "inclusion": 1,
                    "low": "null"
                  },
                  {
                    "high": "false",
                    "inclusion": 3,
                    "low": "false"
                  }
                ]
              }
            ],
            "using": "gsi"
          },
          {
            "#operator": "Fetch",
            "as": "event",
            "keyspace": "DATA_BUCKET",
            "namespace": "default"
          },
          {
            "#operator": "Join",
            "as": "patient",
            "keyspace": "DATA_BUCKET",
            "namespace": "default",
            "on_keys": "(`event`.`patientId`)"
          },
          {
            "#operator": "Join",
            "as": "functionalUnit",
            "keyspace": "METADATA_BUCKET",
            "namespace": "default",
            "on_keys": "(`event`.`functionalUnitId`)"
          },
          {
            "#operator": "Join",
            "as": "lastModifier",
            "keyspace": "USERS",
            "namespace": "default",
            "on_keys": "(`event`.`lastModifier`)"
          },
          {
            "#operator": "Parallel",
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "Let",
                  "bindings": [
                    {
                      "expr": "ifmissingornull((`event`.`eventTypeId`), (`event`.`configurableMetadataId`))",
                      "var": "eventType"
                    }
                  ]
                },
                {
                  "#operator": "Filter",
                  "condition": "(((((((`event`.`type`) = \"EVENT\") and (`eventType` is not missing)) and (0 < array_length((`event`.`elementValues`)))) and (ifmissing((`event`.`isVoided`), false) = false)) and ((`patient`.`type`) = \"PATIENT\")) and (ifmissing((`patient`.`isVoided`), false) = false))"
                },
                {
                  "#operator": "InitialGroup",
                  "aggregates": [
                    "array_agg({\"completed\": (`event`.`isComplete`), \"fields\": array {\"id\": (`field`.`elementValueTypeId`), \"value\": (`field`.`value`), \"vdo\": (`field`.`valueDataOptionId`)} for `field` in (`event`.`elementValues`) end, \"functionalUnitId\": (`event`.`functionalUnitId`), \"functionalUnitName\": (`functionalUnit`.`name`), \"id\": (meta(`event`).`id`), \"lastUpdate\": (`event`.`lastModificationDate`), \"lastUpdateById\": (`event`.`lastModifier`), \"lastUpdateByName\": ((upper((`lastModifier`.`familyName`)) || \" \") || title((`lastModifier`.`givenName`))), \"patientBirthDate\": millis_to_str((`patient`.`birthDate`)), \"patientFamilyName\": upper((`patient`.`familyName`)), \"patientFirstName\": title((`patient`.`givenName`)), \"patientId\": (`event`.`patientId`), \"patientSex\": (`patient`.`sex`), \"visitId\": (`event`.`visitId`)})"
                  ],
                  "group_keys": [
                    "`eventType`"
                  ]
                }
              ]
            }
          },
          {
            "#operator": "IntermediateGroup",
            "aggregates": [
              "array_agg({\"completed\": (`event`.`isComplete`), \"fields\": array {\"id\": (`field`.`elementValueTypeId`), \"value\": (`field`.`value`), \"vdo\": (`field`.`valueDataOptionId`)} for `field` in (`event`.`elementValues`) end, \"functionalUnitId\": (`event`.`functionalUnitId`), \"functionalUnitName\": (`functionalUnit`.`name`), \"id\": (meta(`event`).`id`), \"lastUpdate\": (`event`.`lastModificationDate`), \"lastUpdateById\": (`event`.`lastModifier`), \"lastUpdateByName\": ((upper((`lastModifier`.`familyName`)) || \" \") || title((`lastModifier`.`givenName`))), \"patientBirthDate\": millis_to_str((`patient`.`birthDate`)), \"patientFamilyName\": upper((`patient`.`familyName`)), \"patientFirstName\": title((`patient`.`givenName`)), \"patientId\": (`event`.`patientId`), \"patientSex\": (`patient`.`sex`), \"visitId\": (`event`.`visitId`)})"
            ],
            "group_keys": [
              "`eventType`"
            ]
          },
          {
            "#operator": "FinalGroup",
            "aggregates": [
              "array_agg({\"completed\": (`event`.`isComplete`), \"fields\": array {\"id\": (`field`.`elementValueTypeId`), \"value\": (`field`.`value`), \"vdo\": (`field`.`valueDataOptionId`)} for `field` in (`event`.`elementValues`) end, \"functionalUnitId\": (`event`.`functionalUnitId`), \"functionalUnitName\": (`functionalUnit`.`name`), \"id\": (meta(`event`).`id`), \"lastUpdate\": (`event`.`lastModificationDate`), \"lastUpdateById\": (`event`.`lastModifier`), \"lastUpdateByName\": ((upper((`lastModifier`.`familyName`)) || \" \") || title((`lastModifier`.`givenName`))), \"patientBirthDate\": millis_to_str((`patient`.`birthDate`)), \"patientFamilyName\": upper((`patient`.`familyName`)), \"patientFirstName\": title((`patient`.`givenName`)), \"patientId\": (`event`.`patientId`), \"patientSex\": (`patient`.`sex`), \"visitId\": (`event`.`visitId`)})"
            ],
            "group_keys": [
              "`eventType`"
            ]
          },
          {
            "#operator": "Parallel",
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "InitialProject",
                  "result_terms": [
                    {
                      "expr": "`eventType`"
                    },
                    {
                      "as": "events",
                      "expr": "array_agg({\"completed\": (`event`.`isComplete`), \"fields\": array {\"id\": (`field`.`elementValueTypeId`), \"value\": (`field`.`value`), \"vdo\": (`field`.`valueDataOptionId`)} for `field` in (`event`.`elementValues`) end, \"functionalUnitId\": (`event`.`functionalUnitId`), \"functionalUnitName\": (`functionalUnit`.`name`), \"id\": (meta(`event`).`id`), \"lastUpdate\": (`event`.`lastModificationDate`), \"lastUpdateById\": (`event`.`lastModifier`), \"lastUpdateByName\": ((upper((`lastModifier`.`familyName`)) || \" \") || title((`lastModifier`.`givenName`))), \"patientBirthDate\": millis_to_str((`patient`.`birthDate`)), \"patientFamilyName\": upper((`patient`.`familyName`)), \"patientFirstName\": title((`patient`.`givenName`)), \"patientId\": (`event`.`patientId`), \"patientSex\": (`patient`.`sex`), \"visitId\": (`event`.`visitId`)})"
                    }
                  ]
                }
              ]
            }
          }
        ]
      },
      {
        "#operator": "Order",
        "sort_terms": [
          {
            "expr": "`eventType`"
          }
        ]
      },
      {
        "#operator": "FinalProject"
      }
    ]
  },
  "text": "SELECT eventType,\n        ARRAY_AGG ({'id': META(event).id,\n                    'patientId': event.patientId,\n                    'visitId': event.visitId,\n                    'fields': ARRAY {'id': field.elementValueTypeId,\n                                     'value': field.`value`,\n                                     'vdo': field.valueDataOptionId }\n                              FOR field IN event.elementValues END,\n                    'patientFirstName': INITCAP(patient.givenName),\n                    'patientFamilyName': UPPER(patient.familyName),\n                    'patientBirthDate': MILLIS_TO_STR(patient.birthDate),\n                    'patientSex': patient.sex,\n                    'functionalUnitId': event.functionalUnitId,\n                    'functionalUnitName': functionalUnit.name,\n                    'completed': event.isComplete,\n                    'lastUpdate': event.lastModificationDate,\n                    'lastUpdateById': event.lastModifier,\n                    'lastUpdateByName': UPPER(lastModifier.familyName) || ' ' ||\n                                        INITCAP(lastModifier.givenName)}) as events\nFROM DATA_BUCKET event\nJOIN DATA_BUCKET patient ON KEYS event.patientId\nJOIN METADATA_BUCKET functionalUnit ON KEYS event.functionalUnitId\nJOIN USERS lastModifier ON KEYS event.lastModifier\nLET eventType = IFMISSINGORNULL(event.eventTypeId, event.configurableMetadataId)\nWHERE event.type = 'EVENT'\n      AND eventType IS NOT MISSING\n      AND ARRAY_LENGTH(event.elementValues) > 0\n      AND IFMISSING(event.isVoided, false) = false\n      AND patient.type = 'PATIENT'\n      AND IFMISSING(patient.isVoided, false) = false\nGROUP BY eventType\nORDER BY eventType;"

}

Query did not qualify the index. It looks like you have not modified query as recommended in previous post. Order of FROM clause and ON clause are changed.

Sorry didn’t see their was something new in the query. I tried but it doesn’t really improve the execution time neither. I updated the explain plan inside my previous reply.

The Plan looks optimal. Due to 4 level JOIN, GROUP, ORDER might taking time.

You can enable profiling and check where it taking time. https://dzone.com/articles/query-performance-monitoring-made-easy-with-couchb

I’m on a community plan so I can’t enable query profiling. But I’m thinking that maybe I’m using the wrong tool to achieve my goal? I mean, I’m used to play with N1QL Query but I didn’t think of creating a VIEW, could it make any difference in a performance point of view? I saw that VIEWS are good for computing aggregated data.

The result of this query is made for analysis purposes, maybe setting the ELS connector is a better way to get the output faster?