Question about N1QL query performance with composite index


#1

Hi there,

I have a campaign document like this:

{
    "id": "campaign_id",
    "type": "SR_CAMPAIGN",
    "program": {
        "id": "program_id",
        "name": "program_name"
    }
    "targetFlight": {
        "flight": {
            "key": "flight_key"
        }
    },
    "status": "CREATED"
}

It has been simplified a lot, I have just kept the relevant properties.

Then I have these indexes (among others):

CREATE INDEX `#type-idx` ON `tp`(`type`)

CREATE INDEX `sr-campaigns-search-idx` ON `la`(`type`,(`program`.`id`),    ((`targetFlight`.`flightEvent`).`key`),`status`) WHERE (`type` = "SR_CAMPAIGN")

This query returns results immediately:

SELECT *
FROM `tp`
WHERE type = 'SR_CAMPAIGN'
AND program.id = 'TEST'
AND targetFlight.flightEvent.`key` = 'LA7763 BRC#20181128T1435~AEP#20181128T1638'
AND status IN ["CREATED","ACTIVE"]

Plan:

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "IndexScan2",
        "index": "sr-campaigns-search-idx",
        "index_id": "477b36efed778a4e",
        "index_projection": {
          "primary_key": true
        },
        "keyspace": "tp",
        "namespace": "default",
        "spans": [
          {
            "exact": true,
            "range": [
              {
                "high": "\"SR_CAMPAIGN\"",
                "inclusion": 3,
                "low": "\"SR_CAMPAIGN\""
              },
              {
                "high": "\"TEST\"",
                "inclusion": 3,
                "low": "\"TEST\""
              },
              {
                "high": "\"LA7763 BRC#20181128T1435~AEP#20181128T1638\"",
                "inclusion": 3,
                "low": "\"LA7763 BRC#20181128T1435~AEP#20181128T1638\""
              },
              {
                "high": "\"ACTIVE\"",
                "inclusion": 3,
                "low": "\"ACTIVE\""
              }
            ]
          },
          {
            "exact": true,
            "range": [
              {
                "high": "\"SR_CAMPAIGN\"",
                "inclusion": 3,
                "low": "\"SR_CAMPAIGN\""
              },
              {
                "high": "\"TEST\"",
                "inclusion": 3,
                "low": "\"TEST\""
              },
              {
                "high": "\"LA7763 BRC#20181128T1435~AEP#20181128T1638\"",
                "inclusion": 3,
                "low": "\"LA7763 BRC#20181128T1435~AEP#20181128T1638\""
              },
              {
                "high": "\"CREATED\"",
                "inclusion": 3,
                "low": "\"CREATED\""
              }
            ]
          }
        ],
        "using": "gsi"
      },
      {
        "#operator": "Fetch",
        "keyspace": "tp",
        "namespace": "default"
      },
      {
        "#operator": "Parallel",
        "~child": {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "Filter",
              "condition": "(((((`tp`.`type`) = \"SR_CAMPAIGN\") and (((`tp`.`program`).`id`) = \"TEST\")) and ((((`tp`.`targetFlight`).`flightEvent`).`key`) = \"LA7763 BRC#20181128T1435~AEP#20181128T1638\")) and ((`tp`.`status`) in [\"CREATED\", \"ACTIVE\"]))"
            },
            {
              "#operator": "InitialProject",
              "result_terms": [
                {
                  "expr": "self",
                  "star": true
                }
              ]
            },
            {
              "#operator": "FinalProject"
            }
          ]
        }
      }
    ]
  },
  "text": "SELECT *\nFROM `tp`\nWHERE type = 'SR_CAMPAIGN'\nAND program.id = 'TEST'\nAND targetFlight.flightEvent.`key` = 'LA7763 BRC#20181128T1435~AEP#20181128T1638'\nAND status IN [\"CREATED\",\"ACTIVE\"]"
}

However if I remove the program.id predicate, the query performs very poorly and takes 30s to return a result:

SELECT *
FROM `tp`
WHERE type = 'SR_CAMPAIGN'
AND targetFlight.flightEvent.`key` = 'LA7763 BRC#20181128T1435~AEP#20181128T1638'
AND status IN ["CREATED","ACTIVE"]

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "IntersectScan",
        "scans": [
          {
            "#operator": "IndexScan2",
            "index": "#type-idx",
            "index_id": "5ba28cfa409ee08",
            "index_projection": {
              "primary_key": true
            },
            "keyspace": "tp",
            "namespace": "default",
            "spans": [
              {
                "exact": true,
                "range": [
                  {
                    "high": "\"SR_CAMPAIGN\"",
                    "inclusion": 3,
                    "low": "\"SR_CAMPAIGN\""
                  }
                ]
              }
            ],
            "using": "gsi"
          },
          {
            "#operator": "IndexScan2",
            "index": "sr-campaigns-search-idx",
            "index_id": "477b36efed778a4e",
            "index_projection": {
              "primary_key": true
            },
            "keyspace": "tp",
            "namespace": "default",
            "spans": [
              {
                "exact": true,
                "range": [
                  {
                    "high": "\"SR_CAMPAIGN\"",
                    "inclusion": 3,
                    "low": "\"SR_CAMPAIGN\""
                  }
                ]
              }
            ],
            "using": "gsi"
          }
        ]
      },
      {
        "#operator": "Fetch",
        "keyspace": "tp",
        "namespace": "default"
      },
      {
        "#operator": "Parallel",
        "~child": {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "Filter",
              "condition": "((((`tp`.`type`) = \"SR_CAMPAIGN\") and ((((`tp`.`targetFlight`).`flightEvent`).`key`) = \"LA7763 BRC#20181128T1435~AEP#20181128T1638\")) and ((`tp`.`status`) in [\"CREATED\", \"ACTIVE\"]))"
            },
            {
              "#operator": "InitialProject",
              "result_terms": [
                {
                  "expr": "self",
                  "star": true
                }
              ]
            },
            {
              "#operator": "FinalProject"
            }
          ]
        }
      }
    ]
  },
  "text": "SELECT *\nFROM `tp`\nWHERE type = 'SR_CAMPAIGN'\nAND targetFlight.flightEvent.`key` = 'LA7763 BRC#20181128T1435~AEP#20181128T1638'\nAND status IN [\"CREATED\",\"ACTIVE\"]"
}

There are around 500k documents in the bucket.

My use case is to have an index with multiple fields so that I can perform queries using a different combination of criteria, sometimes matching all the index’s fields and sometimes just some.

When the query contains all the index’s fields it performs perfectly, however if I remove just one criterion, query performance drops dramatically. Any advice on how can I improve the performance of the latter case?

Thanks in advance.


#2

Leading continuous index keys only passed to indexer. rest applied post indexscan/fetch.

If program.id predicate is not there and you are not interested on the document that doesn’t have this filed
you can add predicate program.id IS NOT MISSING predicate.

Please checkout https://blog.couchbase.com/n1ql-practical-guide-second-edition/


#3

Many thanks for your reply @vsr1,

Can you please elaborate more on the following statement, or provide a relevant link to docs/forums/blog?

Additionally, do you think there is any way of achieving the equivalent result just with the index definition?

Many thanks.


#4

Example:
CREATE INDEX ix1 ON (c1,c2,c3,c4);
SELECT c1, c4 FROM default WHERE c1 > 10 AND c4 > 50;

If index ix1 is selected the query has predicates on c1, c4. Only c1 is continuous leading predicate so
only c1 will be pushed to indexer and scan all values of c1 > 10 , then c4 >50 applied as post indexscan filter.
Do EXPLAIN and spans section for details.
https://docs.couchbase.com/server/6.0/performance/index-scans.html

Alter native already provided. If not interested on missing values of c2, c3
SELECT c1, c4 FROM default WHERE c1 > 10 AND c4 > 50 AND c2 IS NOT MISSING and c3 IS NOT MISSING;


#5

Many thanks @vsr1.

I think I more or less grasp what you mean; I’d say there is a lack of deeper understanding of how index scans work on my side.

I’ll read this documentation carefully to get a better understanding of it - I don’t like to use a solution just because I have been told so :smiley:

Cheers.


#6

If you interested check MB-30422