Multiple Index JOIN from a single document

n1ql

#1

I have a type of document like this one:

{
    "_id": "abc",
    "patientId": "xyz",
    "functionalUnitId": "101",
    "lastModifier": "user101"
    "type": "EVENT",
    ...
}

patientId, functionalUnitId and lastModifier points to 3 different documents on which I need to retrieve some data. I actually manage to have an INDEX JOIN on one of those (patient) but the 2 others are retrieved using a “basic” JOIN which, I suppose, slow down my query (actually it takes ~15sec on a 130.000 documents bucket).

Here is the INDEX JOIN I already created:

CREATE INDEX idx_patientId_event ON DATA_BUCKET(patientId) WHERE (type = “EVENT”)

And here is my actual query:

SELECT ARRAY_AGG ({'id': META(event).id,
                    'patientId': event.patientId, 
                    'visitId': event.visitId, 
                    'fields': fields, 
                    'patientFirstName': INITCAP(patient.givenName), 
                    'patientFamilyName': UPPER(patient.familyName), 
                    'patientBirthDate': MILLIS_TO_STR(patient.birthDate), 
                    'patientSex': patient.sex,
                    'functionalUnitId': event.functionalUnitId,
                    'functionalUnitName': functionalUnit.name,
                    'completed': event.isComplete,
                    'lastUpdate': event.lastModificationDate,
                    'lastUpdateById': event.lastModifier,
                    'lastUpdateByName': UPPER(lastModifier.familyName) || ' ' || INITCAP(lastModifier.givenName)}) as events, eventType
FROM DATA_BUCKET patient
JOIN DATA_BUCKET event ON KEY event.patientId FOR patient
JOIN METADATA_BUCKET functionalUnit ON KEYS event.functionalUnitId
JOIN USERS_BUCKET lastModifier ON KEYS event.lastModifier
LET fields = ARRAY {'id': field.elementValueTypeId, 'value': field.`value`, 'vdo': field.valueDataOptionId} FOR field IN event.elementValues END,
    eventType = IFMISSINGORNULL(event.eventTypeId, event.configurableMetadataId)
WHERE event.type = 'EVENT'
AND patient.type = 'PATIENT'
AND eventType IS NOT MISSING
AND ARRAY_LENGTH(fields) > 0
AND IFMISSING(event.isVoided, false) = false
AND IFMISSING(patient.isVoided, false) = false
GROUP BY eventType
ORDER BY eventType

As we can see their is one INDEX JOIN and 2 “basic” JOIN. Is their a way to speed up the query by making multiple INDEX JOIN (or anything else that could speed up the query is welcome) ?


#2

You have lot of filters on event. Can you try all lookup joins.

CREATE INDEX ix1 ON DATA_BUCKET(IFMISSINGORNULL(eventTypeId, configurableMetadataId),IFMISSING(isVoided, false))
WHERE type = 'EVENT' AND ARRAY_LENGTH(elementValues) > 0;

   SELECT eventType,
        ARRAY_AGG ({'id': META(event).id,
                    'patientId': event.patientId,
                    'visitId': event.visitId,
                    'fields': ARRAY {'id': field.elementValueTypeId,
                                     'value': field.`value`,
                                     'vdo': field.valueDataOptionId }
                              FOR field IN event.elementValues END,
                    'patientFirstName': INITCAP(patient.givenName),
                    'patientFamilyName': UPPER(patient.familyName),
                    'patientBirthDate': MILLIS_TO_STR(patient.birthDate),
                    'patientSex': patient.sex,
                    'functionalUnitId': event.functionalUnitId,
                    'functionalUnitName': functionalUnit.name,
                    'completed': event.isComplete,
                    'lastUpdate': event.lastModificationDate,
                    'lastUpdateById': event.lastModifier,
                    'lastUpdateByName': UPPER(lastModifier.familyName) || ' ' ||
                                        INITCAP(lastModifier.givenName)}) as events
FROM DATA_BUCKET event
JOIN DATA_BUCKET patient ON KEYS event.patientId
JOIN METADATA_BUCKET functionalUnit ON KEYS event.functionalUnitId
JOIN USERS_BUCKET lastModifier ON KEYS event.lastModifier
LET eventType = IFMISSINGORNULL(event.eventTypeId, event.configurableMetadataId)
WHERE event.type = 'EVENT'
      AND eventType IS NOT MISSING
      AND ARRAY_LENGTH(event.elementValues) > 0
      AND IFMISSING(event.isVoided, false) = false
      AND patient.type = 'PATIENT'
      AND IFMISSING(patient.isVoided, false) = false
GROUP BY eventType
ORDER BY eventType;

#3

The index doesn’t seems to improve the query time. I can share the explain statement:
(updated with the new query)

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan2",
            "index": "idx_from_cb_forum",
            "index_id": "8a604dbc321a329f",
            "index_projection": {
              "primary_key": true
            },
            "keyspace": "DATA_BUCKET",
            "namespace": "default",
            "spans": [
              {
                "exact": true,
                "range": [
                  {
                    "inclusion": 1,
                    "low": "null"
                  },
                  {
                    "high": "false",
                    "inclusion": 3,
                    "low": "false"
                  }
                ]
              }
            ],
            "using": "gsi"
          },
          {
            "#operator": "Fetch",
            "as": "event",
            "keyspace": "DATA_BUCKET",
            "namespace": "default"
          },
          {
            "#operator": "Join",
            "as": "patient",
            "keyspace": "DATA_BUCKET",
            "namespace": "default",
            "on_keys": "(`event`.`patientId`)"
          },
          {
            "#operator": "Join",
            "as": "functionalUnit",
            "keyspace": "METADATA_BUCKET",
            "namespace": "default",
            "on_keys": "(`event`.`functionalUnitId`)"
          },
          {
            "#operator": "Join",
            "as": "lastModifier",
            "keyspace": "USERS",
            "namespace": "default",
            "on_keys": "(`event`.`lastModifier`)"
          },
          {
            "#operator": "Parallel",
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "Let",
                  "bindings": [
                    {
                      "expr": "ifmissingornull((`event`.`eventTypeId`), (`event`.`configurableMetadataId`))",
                      "var": "eventType"
                    }
                  ]
                },
                {
                  "#operator": "Filter",
                  "condition": "(((((((`event`.`type`) = \"EVENT\") and (`eventType` is not missing)) and (0 < array_length((`event`.`elementValues`)))) and (ifmissing((`event`.`isVoided`), false) = false)) and ((`patient`.`type`) = \"PATIENT\")) and (ifmissing((`patient`.`isVoided`), false) = false))"
                },
                {
                  "#operator": "InitialGroup",
                  "aggregates": [
                    "array_agg({\"completed\": (`event`.`isComplete`), \"fields\": array {\"id\": (`field`.`elementValueTypeId`), \"value\": (`field`.`value`), \"vdo\": (`field`.`valueDataOptionId`)} for `field` in (`event`.`elementValues`) end, \"functionalUnitId\": (`event`.`functionalUnitId`), \"functionalUnitName\": (`functionalUnit`.`name`), \"id\": (meta(`event`).`id`), \"lastUpdate\": (`event`.`lastModificationDate`), \"lastUpdateById\": (`event`.`lastModifier`), \"lastUpdateByName\": ((upper((`lastModifier`.`familyName`)) || \" \") || title((`lastModifier`.`givenName`))), \"patientBirthDate\": millis_to_str((`patient`.`birthDate`)), \"patientFamilyName\": upper((`patient`.`familyName`)), \"patientFirstName\": title((`patient`.`givenName`)), \"patientId\": (`event`.`patientId`), \"patientSex\": (`patient`.`sex`), \"visitId\": (`event`.`visitId`)})"
                  ],
                  "group_keys": [
                    "`eventType`"
                  ]
                }
              ]
            }
          },
          {
            "#operator": "IntermediateGroup",
            "aggregates": [
              "array_agg({\"completed\": (`event`.`isComplete`), \"fields\": array {\"id\": (`field`.`elementValueTypeId`), \"value\": (`field`.`value`), \"vdo\": (`field`.`valueDataOptionId`)} for `field` in (`event`.`elementValues`) end, \"functionalUnitId\": (`event`.`functionalUnitId`), \"functionalUnitName\": (`functionalUnit`.`name`), \"id\": (meta(`event`).`id`), \"lastUpdate\": (`event`.`lastModificationDate`), \"lastUpdateById\": (`event`.`lastModifier`), \"lastUpdateByName\": ((upper((`lastModifier`.`familyName`)) || \" \") || title((`lastModifier`.`givenName`))), \"patientBirthDate\": millis_to_str((`patient`.`birthDate`)), \"patientFamilyName\": upper((`patient`.`familyName`)), \"patientFirstName\": title((`patient`.`givenName`)), \"patientId\": (`event`.`patientId`), \"patientSex\": (`patient`.`sex`), \"visitId\": (`event`.`visitId`)})"
            ],
            "group_keys": [
              "`eventType`"
            ]
          },
          {
            "#operator": "FinalGroup",
            "aggregates": [
              "array_agg({\"completed\": (`event`.`isComplete`), \"fields\": array {\"id\": (`field`.`elementValueTypeId`), \"value\": (`field`.`value`), \"vdo\": (`field`.`valueDataOptionId`)} for `field` in (`event`.`elementValues`) end, \"functionalUnitId\": (`event`.`functionalUnitId`), \"functionalUnitName\": (`functionalUnit`.`name`), \"id\": (meta(`event`).`id`), \"lastUpdate\": (`event`.`lastModificationDate`), \"lastUpdateById\": (`event`.`lastModifier`), \"lastUpdateByName\": ((upper((`lastModifier`.`familyName`)) || \" \") || title((`lastModifier`.`givenName`))), \"patientBirthDate\": millis_to_str((`patient`.`birthDate`)), \"patientFamilyName\": upper((`patient`.`familyName`)), \"patientFirstName\": title((`patient`.`givenName`)), \"patientId\": (`event`.`patientId`), \"patientSex\": (`patient`.`sex`), \"visitId\": (`event`.`visitId`)})"
            ],
            "group_keys": [
              "`eventType`"
            ]
          },
          {
            "#operator": "Parallel",
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "InitialProject",
                  "result_terms": [
                    {
                      "expr": "`eventType`"
                    },
                    {
                      "as": "events",
                      "expr": "array_agg({\"completed\": (`event`.`isComplete`), \"fields\": array {\"id\": (`field`.`elementValueTypeId`), \"value\": (`field`.`value`), \"vdo\": (`field`.`valueDataOptionId`)} for `field` in (`event`.`elementValues`) end, \"functionalUnitId\": (`event`.`functionalUnitId`), \"functionalUnitName\": (`functionalUnit`.`name`), \"id\": (meta(`event`).`id`), \"lastUpdate\": (`event`.`lastModificationDate`), \"lastUpdateById\": (`event`.`lastModifier`), \"lastUpdateByName\": ((upper((`lastModifier`.`familyName`)) || \" \") || title((`lastModifier`.`givenName`))), \"patientBirthDate\": millis_to_str((`patient`.`birthDate`)), \"patientFamilyName\": upper((`patient`.`familyName`)), \"patientFirstName\": title((`patient`.`givenName`)), \"patientId\": (`event`.`patientId`), \"patientSex\": (`patient`.`sex`), \"visitId\": (`event`.`visitId`)})"
                    }
                  ]
                }
              ]
            }
          }
        ]
      },
      {
        "#operator": "Order",
        "sort_terms": [
          {
            "expr": "`eventType`"
          }
        ]
      },
      {
        "#operator": "FinalProject"
      }
    ]
  },
  "text": "SELECT eventType,\n        ARRAY_AGG ({'id': META(event).id,\n                    'patientId': event.patientId,\n                    'visitId': event.visitId,\n                    'fields': ARRAY {'id': field.elementValueTypeId,\n                                     'value': field.`value`,\n                                     'vdo': field.valueDataOptionId }\n                              FOR field IN event.elementValues END,\n                    'patientFirstName': INITCAP(patient.givenName),\n                    'patientFamilyName': UPPER(patient.familyName),\n                    'patientBirthDate': MILLIS_TO_STR(patient.birthDate),\n                    'patientSex': patient.sex,\n                    'functionalUnitId': event.functionalUnitId,\n                    'functionalUnitName': functionalUnit.name,\n                    'completed': event.isComplete,\n                    'lastUpdate': event.lastModificationDate,\n                    'lastUpdateById': event.lastModifier,\n                    'lastUpdateByName': UPPER(lastModifier.familyName) || ' ' ||\n                                        INITCAP(lastModifier.givenName)}) as events\nFROM DATA_BUCKET event\nJOIN DATA_BUCKET patient ON KEYS event.patientId\nJOIN METADATA_BUCKET functionalUnit ON KEYS event.functionalUnitId\nJOIN USERS lastModifier ON KEYS event.lastModifier\nLET eventType = IFMISSINGORNULL(event.eventTypeId, event.configurableMetadataId)\nWHERE event.type = 'EVENT'\n      AND eventType IS NOT MISSING\n      AND ARRAY_LENGTH(event.elementValues) > 0\n      AND IFMISSING(event.isVoided, false) = false\n      AND patient.type = 'PATIENT'\n      AND IFMISSING(patient.isVoided, false) = false\nGROUP BY eventType\nORDER BY eventType;"

}


#4

Query did not qualify the index. It looks like you have not modified query as recommended in previous post. Order of FROM clause and ON clause are changed.


#5

Sorry didn’t see their was something new in the query. I tried but it doesn’t really improve the execution time neither. I updated the explain plan inside my previous reply.


#6

The Plan looks optimal. Due to 4 level JOIN, GROUP, ORDER might taking time.

You can enable profiling and check where it taking time. https://dzone.com/articles/query-performance-monitoring-made-easy-with-couchb


#7

I’m on a community plan so I can’t enable query profiling. But I’m thinking that maybe I’m using the wrong tool to achieve my goal? I mean, I’m used to play with N1QL Query but I didn’t think of creating a VIEW, could it make any difference in a performance point of view? I saw that VIEWS are good for computing aggregated data.

The result of this query is made for analysis purposes, maybe setting the ELS connector is a better way to get the output faster?