N1QL query is slow

Hi, I have a bucket with 2M documents (each document is less than 2KB). When I try range scan query with N1QL, it takes more than 1 minute to finish the query (my query should scan 1.2M documents). This is slower than my expectation. I try similar query on a MongoDB cluster, it takes 4s to scan 1M documents over 3M documents.

Any tips to make the query faster? I doubt whether I’m using Couchbase in a wrong way:(

More contexts here:

  • My query is “SELECT COUNT(*), AVG(edgeLatency) FROM test WHERE timestampUtc BETWEEN ‘2016-11-22T01:00:00’ AND ‘2016-11-23T06:00:00’”. Here is one response.
{
  "requestID": "d37772c4-10cf-4095-a156-8c739ac81c2d",
  "clientContextID": "59::60",
  "signature": {
    "$1": "number",
    "$2": "number"
  },
  "results": [
    {
      "$1": 1266665,
      "$2": 0.49988867718860125
    }
  ],
  "status": 0,
  "errors": [],
  "warnings": [],
  "metrics": {
    "elapsedTime": "1m22.4896596s",
    "executionTime": "1m22.48869s",
    "resultCount": 1,
    "resultSize": 76,
    "mutationCount": 0,
    "errorCount": 0,
    "warningCount": 0,
    "sortCount": 0
  }
}
  • Field “timestampUtc” is a datetime and I have created GSI INDEX on it.
Definition: CREATE INDEX timestampUtc_index ON test(`timestampUtc`) USING GSI
  • This is explanation of my query.
{
  "requestID": "a9628394-8d00-42b6-9084-248b4ddc1f75",
  "clientContextID": "61::62",
  "signature": "json",
  "results": [
    {
      "plan": {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan",
            "index": "timestamputc_index",
            "index_id": "93504482a482e79c",
            "keyspace": "test",
            "namespace": "default",
            "spans": [
              {
                "Range": {
                  "High": [
                    "\"2016-11-23T06:00:00\""
                  ],
                  "Inclusion": 3,
                  "Low": [
                    "\"2016-11-22T01:00:00\""
                  ]
                }
              }
            ],
            "using": "gsi"
          },
          {
            "#operator": "Fetch",
            "keyspace": "test",
            "namespace": "default"
          },
          {
            "#operator": "Parallel",
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "Filter",
                  "condition": "((`test`.`timestampUtc`) between \"2016-11-22T01:00:00\" and \"2016-11-23T06:00:00\")"
                },
                {
                  "#operator": "InitialGroup",
                  "aggregates": [
                    "avg((`test`.`edgeLatency`))",
                    "count(*)"
                  ],
                  "group_keys": []
                }
              ]
            }
          },
          {
            "#operator": "IntermediateGroup",
            "aggregates": [
              "avg((`test`.`edgeLatency`))",
              "count(*)"
            ],
            "group_keys": []
          },
          {
            "#operator": "FinalGroup",
            "aggregates": [
              "avg((`test`.`edgeLatency`))",
              "count(*)"
            ],
            "group_keys": []
          },
          {
            "#operator": "Parallel",
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "InitialProject",
                  "result_terms": [
                    {
                      "expr": "count(*)"
                    },
                    {
                      "expr": "avg((`test`.`edgeLatency`))"
                    }
                  ]
                },
                {
                  "#operator": "FinalProject"
                }
              ]
            }
          }
        ]
      },
      "text": "SELECT COUNT(*), AVG(edgeLatency) FROM test WHERE timestampUtc BETWEEN '2016-11-22T01:00:00' AND '2016-11-23T06:00:00'"
    }
  ],
  "status": 0,
  "errors": [],
  "warnings": [],
  "metrics": {
    "elapsedTime": "5.9998ms",
    "executionTime": "4.9983ms",
    "resultCount": 1,
    "resultSize": 3861,
    "mutationCount": 0,
    "errorCount": 0,
    "warningCount": 0,
    "sortCount": 0
  }
}
  • I have set max_parallelism=8 in my query. This makes my query faster but I cannot get extra improvement if max_parallelism is greater than 8.
  • My Couchbase cluster has 12 nodes, each node is a DS13_v2 Azure VM. My bucket is the only bucket running on cluter except for a empty default bucket. My bucket has 60GB RAM quota.
    |Size|CPU cores|Memory: GiB|Local SSD: GiB|Max data disks|Max cached disk throughput: IOPS / MBps (cache size in GiB)|Max uncached disk throughput: IOPS / MBps|Max NICs / Network bandwidth|
    | ------| ------ | ------ |
    |Standard_DS13_v2|8|56|112|16|32,000 / 256 (288)|25,600 / 384|8 high|
  • My cluster is using Couchbase Server 4.0.0-4051 Community Edition (build-4051). I have tried 4.1 CE and 4.5.1 EE on single machine and it has same performance. With MOI in 4.5.1 EE, the query is faster but it still takes more than 1 minute.
  • Covering index doesn’t fit my scenarios.

Why?

I think data fetch will take a lot of time, you can improve query by CoverIndex.

Thanks for looking at this topic.

I need to do aggregation with many fields to my documents, it is strange if I put most fields into index.
Actually I have tried covering index by modifying my query, it is still slower than MongoDB.

  • SELECT MIN(timestampUtc) is 13ms
  • SELECT MAX(timestampUtc) is 45s
  • SELECT COUNT(timestampUtc) is 1.4s

May I know the query performance when you do N1QL query?

remove your index,and
Definition:

CREATE INDEX timestampUtc_edgeLatency_index ON test(`timestampUtc`,`edgeLatency`) USING GSI

and try

SELECT COUNT(1), AVG(edgeLatency) FROM test WHERE timestampUtc BETWEEN '2016-11-22T01:00:00' AND '2016-11-23T06:00:00`
1 Like

Much appreciate for the advice! Besides CoverIndex, any other way to improve the performance?

You should analyze query log to find which part of query take more time,and improve that part ,such as using latest CB Server/add more query service/add more index service/redesign the document structure etc.

Have you thought about splitting the index between nodes using whatever field makes sense, say a-k on node a, l-z on node b.

1 Like

How we can achive this? Could you please let me know? As I am facing the same issue. Thanks.

What is issue. Post the query and index as separate post