Filter taking too long

Hello,

N1QL: SELECT * from config where __t=‘irdb-net’;
Index: Create index test_idx on config(__t) where __t=‘irdb-net’;

Q1: As we can see, predicate of N1QL is matching the partial index (WHERE clause of the index). Which means that IndexScan3 must have returned only those keys where __t=‘irdb-net’. After IndexScan3, Fetch takes place. So as a part of fetch, only those docs are fetched from data service where __t=‘irdb-net’. After Fetch we see Filter. Why is it again applying the filter to verify __t=‘irdb-net’? Why is the optimizer doing redundant work of applying the filter when the documents it has fetched from the data service are based on IndexScan3 i.e. __t=‘irdb-net’?

Q2: The N1QL returns 1100 docs. For such a small number of docs, filter is taking seconds. Why is filter taking very long? As we can see in the plan below, kernTime for Filter operator is 38 seconds, which is huge.

PFB the plan:

{
  "#operator": "Sequence",
  "#stats": {
    "#phaseSwitches": 1,
    "execTime": "8.264µs"
  },
  "~children": [
    {
      "#operator": "Authorize",
      "#stats": {
        "#phaseSwitches": 3,
        "execTime": "19.079µs",
        "servTime": "2.349551ms"
      },
      "privileges": {
        "List": [
          {
            "Target": "default:config",
            "Priv": 7
          }
        ]
      },
      "~child": {
        "#operator": "Sequence",
        "#stats": {
          "#phaseSwitches": 1,
          "execTime": "6.041µs"
        },
        "~children": [
          {
            "#operator": "IndexScan3",
            "#stats": {
              "#itemsOut": 1096,
              "#phaseSwitches": 4387,
              "execTime": "3.244368ms",
              "kernTime": "161.670692ms",
              "servTime": "6.62889ms"
            },
            "index": "test_idx",
            "index_id": "723e61ca870376a2",
            "index_projection": {
              "primary_key": true
            },
            "keyspace": "config",
            "namespace": "default",
            "spans": [
              {
                "exact": true,
                "range": [
                  {
                    "high": "\"irdb-net\"",
                    "inclusion": 3,
                    "low": "\"irdb-net\""
                  }
                ]
              }
            ],
            "using": "gsi",
            "#time_normal": "00:00.0098",
            "#time_absolute": 0.009873258
          },
          {
            "#operator": "Fetch",
            "#stats": {
              "#itemsIn": 1096,
              "#itemsOut": 1096,
              "#phaseSwitches": 4525,
              "execTime": "7.276627ms",
              "kernTime": "7.034375ms",
              "servTime": "243.837747ms"
            },
            "keyspace": "config",
            "namespace": "default",
            "#time_normal": "00:00.2511",
            "#time_absolute": 0.251114374
          },
          {
            "#operator": "Sequence",
            "#stats": {
              "#phaseSwitches": 1,
              "execTime": "19.39µs"
            },
            "~children": [
              {
                "#operator": "Filter",
                "#stats": {
                  "#itemsIn": 1096,
                  "#itemsOut": 1096,
                  "#phaseSwitches": 4387,
                  "execTime": "18.91395ms",
                  "kernTime": "38.269818759s"
                },
                "condition": "((`config`.`__t`) = \"irdb-net\")",
                "#time_normal": "00:00.0189",
                "#time_absolute": 0.01891395
              },
              {
                "#operator": "InitialProject",
                "#stats": {
                  "#itemsIn": 1096,
                  "#itemsOut": 1096,
                  "#phaseSwitches": 3293,
                  "execTime": "2.451859ms",
                  "kernTime": "10.334159ms"
                },
                "result_terms": [
                  {
                    "expr": "self",
                    "star": true
                  }
                ],
                "#time_normal": "00:00.0024",
                "#time_absolute": 0.002451859
              },
              {
                "#operator": "FinalProject",
                "#stats": {
                  "#itemsIn": 1096,
                  "#itemsOut": 1096,
                  "#phaseSwitches": 3289,
                  "execTime": "1.74005ms",
                  "kernTime": "1m27.437288175s"
                },
                "#time_normal": "00:00.0017",
                "#time_absolute": 0.0017400500000000001
              }
            ],
            "#time_normal": "00:00.0000",
            "#time_absolute": 0.00001939
          }
        ],
        "#time_normal": "00:00.0000",
        "#time_absolute": 0.000006041
      },
      "#time_normal": "00:00.0023",
      "#time_absolute": 0.00236863
    },
    {
      "#operator": "Stream",
      "#stats": {
        "#itemsIn": 1096,
        "#itemsOut": 1096,
        "#phaseSwitches": 4387,
        "execTime": "1.093579ms",
        "kernTime": "1m27.455749882s"
      },
      "#time_normal": "00:00.0010",
      "#time_absolute": 0.0010935790000000001
    }
  ],
  "~versions": [
    "2.0.0-N1QL",
    "6.0.4-3082-enterprise"
  ],
  "#time_normal": "00:00.0000",
  "#time_absolute": 0.000008263999999999999
}

Q1: IndexScan is based on range scans (i.e index scan give false positives based on how predicate can be transformed into range). Due to that we always apply Filter again. FYI: If Fetch is done need to apply predicate other is underneath data might have changed (index is eventual consistence).

Q2. Filter execTime is low. kernTime (down stream operators too) high, That means waiting for resources. cc @Marco_Greco

1 Like

Thanks.

From the plan, how do we find which downstream operator is taking long? Also, is there a way to find what resources is it waiting for?

But still, 38000 ms to check value of 1 field for 1100 docs, does it not look huge?

So - we have three types of times

  • execTime, this is time actually spent in the CPU
  • servTime, this is time spent waiting for services (indexer for scans, KV for fetches)
  • kernTime, this is time spent waiting for the kernel to find free cpu time to execute. This can be due to two causes: overloaded cpus, or a bottleneck upstream or downstream.

In the case of Filter, the only concern is kernTime: this means that the filter has processed everything quickly (the execTime is 18.9ms) but has had to wait 38s in between receiving the documents and sending them on.
The projections have waited even longer: 1m27s in kernTime.
What I would say is that either the n1ql node is overloaded, or the stream operator has had to wait on the client to send the data.

Useful links to understand monitoring and profiling:

1 Like

Thanks @Marco_Greco . This is making some sense and I will need to meditate more on it to grasp it completely.

One question: based on your comments, I looked at the plan again but could not understand how does one know that stream is a phase after/before filter. Indentation is not indicative. Is there a way to read the plan or am I missing something obvious?

Regards

The query plan flows in general from top to bottom. You see indentation increases when an operator has one or more child operators, e.g., a Sequence or Parallel operator. You see this typically as either “~Children” or “~Child” in the query plan. Another example of child operator is the build side of a HashJoin operator or the inner side of a NestedLoopJoin operator. The indentation decreases when the child operator(s) is done and comes back to the level of the parent operator.

In the query plan here the Filter operator is before the Stream operator. The Stream operator is the one sending the query result back to the client and is typically the last operator in a query plan.

1 Like

Oh ok! Thanks @bingjie.miao. I was mis-reading a part of marco’s comment.