Long query exec time despite using index

username98764158 · July 5, 2018, 2:54pm

I created the following index:

CREATE INDEX `idx_post_tags_score_desc` ON `bucket`((-`score`),`creationTime`,
    `upvotes`,(`creator`.`followerCount`),`countComments`,
    (distinct (array `t` for `t` in `tags` end))) 
    WHERE (`_class` = "some.package.Class") WITH { "num_replica":2 }

This is the query I want to execute:

SELECT * FROM `bucket` 
WHERE `_class` = "some.package.Class" 
AND  -(-score) IS NOT NULL 
AND  creationTime >= 1499140972 AND creationTime <= 1530676972 
AND  upvotes >= 0 AND upvotes <= 182021
AND  creator.followerCount >= 1000 AND creator.followerCount <= 8118747 
AND  countComments >= -1 AND countComments <= 10000000 
AND  id NOT IN ["06727cbe-485a-4ef4-b820-b373231ec336"] 
AND  (ANY AND EVERY t IN ["tag1", "tag2"] SATISFIES t IN tags END) 
ORDER BY -score LIMIT 16 OFFSET 0

This is the explain of the query:

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "DistinctScan",
            "scan": {
              "#operator": "IndexScan2",
              "index": "idx_post_tags_score_desc",
              "index_id": "3d18f2353d75c2cc",
              "index_projection": {
                "primary_key": true
              },
              "keyspace": "bucket",
              "namespace": "default",
              "spans": [
                {
                  "exact": true,
                  "range": [
                    {
                      "inclusion": 0,
                      "low": "null"
                    },
                    {
                      "high": "1530676972",
                      "inclusion": 3,
                      "low": "1499140972"
                    },
                    {
                      "high": "182021",
                      "inclusion": 3,
                      "low": "0"
                    },
                    {
                      "high": "8118747",
                      "inclusion": 3,
                      "low": "1000"
                    },
                    {
                      "high": "10000000",
                      "inclusion": 3,
                      "low": "(-1)"
                    }
                  ]
                }
              ],
              "using": "gsi"
            }
          },
          {
            "#operator": "Fetch",
            "keyspace": "bucket",
            "namespace": "default"
          },
          {
            "#operator": "Parallel",
            "maxParallelism": 1,
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "Filter",
                  "condition": "(((((((((((((`bucket`.`_class`) = \"some.package.Class\") and ((-(`bucket`.`score`)) is not null)) and (1499140972 <= (`bucket`.`creationTime`))) and ((`bucket`.`creationTime`) <= 1530676972)) and (0 <= (`bucket`.`upvotes`))) and ((`bucket`.`upvotes`) <= 182021)) and (1000 <= ((`bucket`.`creator`).`followerCount`))) and (((`bucket`.`creator`).`followerCount`) <= 8118747)) and ((-1) <= (`bucket`.`countComments`))) and ((`bucket`.`countComments`) <= 10000000)) and (not ((`bucket`.`id`) in [\"06727cbe-485a-4ef4-b820-b373231ec336\"]))) and any and every `t` in [\"tag1\", \"tag2\"] satisfies (`t` in (`bucket`.`tags`)) end)"
                },
                {
                  "#operator": "InitialProject",
                  "result_terms": [
                    {
                      "as": "_ID",
                      "expr": "(meta(`bucket`).`id`)"
                    },
                    {
                      "as": "_CAS",
                      "expr": "(meta(`bucket`).`cas`)"
                    },
                    {
                      "expr": "`bucket`",
                      "star": true
                    }
                  ]
                },
                {
                  "#operator": "FinalProject"
                }
              ]
            }
          }
        ]
      },
      {
        "#operator": "Limit",
        "expr": "16"
      }
    ]
  },
  "text": "SELECT META(`bucket`).id AS _ID, META(`bucket`).cas AS _CAS, `bucket`.* FROM `bucket` \nWHERE `_class` = \"some.package.Class\" \nAND  -score IS NOT NULL \nAND  creationTime >= 1499140972 AND creationTime <= 1530676972 \nAND  upvotes >= 0 AND upvotes <= 182021\nAND  creator.followerCount >= 1000 AND creator.followerCount <= 8118747 \nAND  countComments >= -1 AND countComments <= 10000000 \nAND  id NOT IN [\"06727cbe-485a-4ef4-b820-b373231ec336\"] \nAND  (ANY AND EVERY t IN [\"tag1\", \"tag2\"] SATISFIES t IN tags END) ORDER BY -score LIMIT 16 OFFSET 0"
}

In the query web console the execution takes 1.00m. The same query in my Spring backend seems to execute faster (about 2000 ms) (don’t know why?). But even 2s is quite slow. Is the index or query correctly implemented? Is there any way to improve execution time? There are 264,385 documents of type Class in the database, 3,783,270 documents total. The couchbase cluster contains 3 nodes (4 cores, 32gb ram each).

vsr1 · July 5, 2018, 3:56pm

Plan looks right.

Check your ANY clause according to How to correctly create array index. Check you need array indexing can this apply post index scan.
This is minor instead of negative score, Use score DESC in the index and query
use profiling to find out where it taking time https://blog.couchbase.com/optimize-n1ql-performance-using-request-profiling/

username98764158 · July 5, 2018, 5:10pm

Thanks for your reply.

to 1. : I changed the query according to your advice on how to correctly create array index
to 2. : I created the index with DESC and used it in the query
to 3. : The servTime and kernTime is round about 4:30m

I discovered that the execution only is that long when there are tags used in the query which do not occur in any documents tags. If i use tags which are in some documents, the execution time is about 300ms. So i guess everything works fine and as expected.

If a user requests some tags which are not in any document, the query just times out after 5s and the application knows there are no results. Correct me, if my thinking is wrong here.

Anyway, thanks again for your help, vsr1!

vsr1 · July 5, 2018, 5:21pm

As you have LIMIT query will process until limit is reached or end of the results or query time out (SDKs sets default to 75sec, this can be changed).

username98764158 · July 5, 2018, 5:31pm

I have to correct myself. The query execution time is still high (1:25m) when looking for 1 known tag. Execution time ranges from 300ms to some seconds to over a minute, depending on the tag I query for.

I use the following query:

SELECT * FROM `bucket` 
WHERE `_class` = "some.package.Class" 
AND  score IS NOT NULL
AND  creationTime >= 1499140972 AND creationTime <= 1530676972 
AND  upvotes >= 0 AND upvotes <= 182021
AND  creator.followerCount >= 1000 AND creator.followerCount <= 8118747 
AND  countComments >= -1 AND countComments <= 10000000 
AND ANY t IN  tags SATISFIES t  = "tag1" END
ORDER BY score DESC LIMIT 16 OFFSET 0

And this is the PLAN text as suggested:

{
      "#operator": "Sequence",
      "#stats": {
        "#phaseSwitches": 2,
        "execTime": "1.031µs",
        "kernTime": "1m25.664019573s"
      },
      "~children": [
        {
          "#operator": "Authorize",
          "#stats": {
            "#phaseSwitches": 4,
            "execTime": "1.851µs",
            "kernTime": "1m25.662888765s",
            "servTime": "1.110527ms"
          },
          "privileges": {
            "List": [
              {
                "Target": "default:bucket",
                "Priv": 7
              }
            ]
          },
          "~child": {
            "#operator": "Sequence",
            "#stats": {
              "#phaseSwitches": 3,
              "execTime": "1.44µs",
              "kernTime": "1m25.662883675s"
            },
            "~children": [
              {
                "#operator": "Sequence",
                "#stats": {
                  "#phaseSwitches": 2,
                  "execTime": "1.307µs",
                  "kernTime": "1m25.662856307s"
                },
                "~children": [
                  {
                    "#operator": "DistinctScan",
                    "#stats": {
                      "#itemsIn": 17,
                      "#itemsOut": 16,
                      "#phaseSwitches": 68,
                      "execTime": "39.885µs",
                      "kernTime": "44.633µs",
                      "servTime": "1m25.65005865s"
                    },
                    "limit": "16",
                    "scan": {
                      "#operator": "IndexScan2",
                      "#stats": {
                        "#itemsOut": 38,
                        "#phaseSwitches": 159,
                        "execTime": "63.625µs",
                        "kernTime": "21.037µs",
                        "servTime": "1m25.649995353s"
                      },
                      "index": "idx_post_tags_score_desc",
                      "index_id": "e4910e7bd7fb4cb",
                      "index_projection": {
                        "primary_key": true
                      },
                      "keyspace": "bucket",
                      "namespace": "default",
                      "spans": [
                        {
                          "exact": true,
                          "range": [
                            {
                              "inclusion": 0,
                              "low": "null"
                            },
                            {
                              "high": "1530676972",
                              "inclusion": 3,
                              "low": "1499140972"
                            },
                            {
                              "high": "182021",
                              "inclusion": 3,
                              "low": "0"
                            },
                            {
                              "high": "8118747",
                              "inclusion": 3,
                              "low": "1000"
                            },
                            {
                              "high": "10000000",
                              "inclusion": 3,
                              "low": "(-1)"
                            },
                            {
                              "high": "\"tag1\"",
                              "inclusion": 3,
                              "low": "\"tag1\""
                            }
                          ]
                        }
                      ],
                      "using": "gsi",
                      "#time_normal": "01:25.6500",
                      "#time_absolute": 85.650058978
                    },
                    "#time_normal": "01:25.6500",
                    "#time_absolute": 85.65009853500001
                  },
                  {
                    "#operator": "Fetch",
                    "#stats": {
                      "#itemsIn": 16,
                      "#itemsOut": 16,
                      "#phaseSwitches": 71,
                      "execTime": "42.996µs",
                      "kernTime": "1m25.650165484s",
                      "servTime": "4.799374ms"
                    },
                    "keyspace": "bucket",
                    "namespace": "default",
                    "#time_normal": "00:00.0048",
                    "#time_absolute": 0.00484237
                  },
                  {
                    "#operator": "Sequence",
                    "#stats": {
                      "#phaseSwitches": 5,
                      "execTime": "3.077µs",
                      "kernTime": "1m25.662832141s"
                    },
                    "~children": [
                      {
                        "#operator": "Filter",
                        "#stats": {
                          "#itemsIn": 16,
                          "#itemsOut": 16,
                          "#phaseSwitches": 69,
                          "execTime": "7.643549ms",
                          "kernTime": "1m25.65515201s"
                        },
                        "condition": "((((((((((((`bucket`.`_class`) = \"some.package.Class\") and ((`bucket`.`score`) is not null)) and (1499140972 <= (`bucket`.`creationTime`))) and ((`bucket`.`creationTime`) <= 1530676972)) and (0 <= (`bucket`.`upvotes`))) and ((`bucket`.`upvotes`) <= 182021)) and (1000 <= ((`bucket`.`creator`).`followerCount`))) and (((`bucket`.`creator`).`followerCount`) <= 8118747)) and ((-1) <= (`bucket`.`countComments`))) and ((`bucket`.`countComments`) <= 10000000)) and any `t` in (`bucket`.`tags`) satisfies (`t` = \"tag1\") end)",
                        "#time_normal": "00:00.0076",
                        "#time_absolute": 0.007643549
                      },
                      {
                        "#operator": "InitialProject",
                        "#stats": {
                          "#itemsIn": 16,
                          "#itemsOut": 16,
                          "#phaseSwitches": 69,
                          "execTime": "14.153µs",
                          "kernTime": "1m25.662795958s"
                        },
                        "result_terms": [
                          {
                            "expr": "self",
                            "star": true
                          }
                        ],
                        "#time_normal": "00:00.0000",
                        "#time_absolute": 0.000014153
                      },
                      {
                        "#operator": "FinalProject",
                        "#stats": {
                          "#itemsIn": 16,
                          "#itemsOut": 16,
                          "#phaseSwitches": 71,
                          "execTime": "10.141µs",
                          "kernTime": "1m25.662813296s"
                        },
                        "#time_normal": "00:00.0000",
                        "#time_absolute": 0.000010141
                      }
                    ],
                    "#time_normal": "00:00.0000",
                    "#time_absolute": 0.000003077
                  }
                ],
                "#time_normal": "00:00.0000",
                "#time_absolute": 0.000001307
              },
              {
                "#operator": "Limit",
                "#stats": {
                  "#itemsIn": 16,
                  "#itemsOut": 16,
                  "#phaseSwitches": 71,
                  "execTime": "5.936µs",
                  "kernTime": "1m25.66286469s"
                },
                "expr": "16",
                "#time_normal": "00:00.0000",
                "#time_absolute": 0.000005936
              }
            ],
            "#time_normal": "00:00.0000",
            "#time_absolute": 0.00000144
          },
          "#time_normal": "00:00.0011",
          "#time_absolute": 0.001112378
        },
        {
          "#operator": "Stream",
          "#stats": {
            "#itemsIn": 16,
            "#itemsOut": 16,
            "#phaseSwitches": 39,
            "execTime": "3.492µs",
            "kernTime": "1m25.66401233s"
          },
          "#time_normal": "00:00.0000",
          "#time_absolute": 0.000003492
        }
      ],
      "~versions": [
        "2.0.0-N1QL",
        "5.1.1-5723-enterprise"
      ],
      "#time_normal": "00:00.0000",
      "#time_absolute": 0.000001031
    }

vsr1 · July 5, 2018, 5:38pm

IndexScan taking time. You can increase indexer memory and also you may want to try bigger machine (4 core might be too small with all services)
Also if query has always tags move that to 2nd index key due to quality predicate.

username98764158 · July 5, 2018, 5:51pm

I’ll try increasing the indexer memory.

Are there alternatives to increasing the size of the machine? Only running index service on one machine instead of all service ist probably one possibility?

My concern now is that if using a bigger machine to execute these queries in an acceptable time is the only solution, than the Couchbase argument of scaling horizontally with a lot of small machines would be invalid, right? Can’t the index be distributed over the nodes and therefore the indexscan executed on these nodes in parallel?

vsr1 · July 5, 2018, 5:55pm

If you are using EE version you can separate services and use MOI or plasma. cc @deepkaran.salooja
Also depends on the range predicates causing to scan whole index.