Simple count(*) query takes 2+ seconds

Hello,

90% of docs have ‘formname’ out of 46,027,358 documents.

about 3.5 million docs have ‘formname=creative’

Thanks,
Moon

Hi @moon0326,

Even though there are significant new optimizations in 4.5, counting 90% of the documents is not one of them. We have that in our backlog.

If you frequently need to count 90% of your documents, I would suggest using two separate buckets. Counting 100% of the documents is instantaneous, whereas counting 90% is currently very expensive.

Thank you for the answer.

I thought IndexScacnCount is there to avoid counting them.

I think there must be a clear statement about what ‘count’ can do in n1ql. Pagination is such a must feature for almost all application and I thought n1ql can replace a view.

I’ve been struggling with ‘count’ since version 4.0 and we still have the exact same problem with 4.5.

It is my mistake that I did not benchmark (I thought it was a basic feature just like any other query language), but I had to spend at least a week to change all my n1ql code to views back then when I was on 4.0.

Now, even with 4.5…it looks like I don’t have a choice.

This simply means that I can’t never get rid of my views and I can’t utilize flexibility of N1ql for my applications.

Can you tell us why you need to count 90% of the documents? It might help us to think of suggestions…

For this particular query, our use case really simple.

We have product documents in our bucket. We simply want to check # of products. That’s really it. However, that’s just one use case.

We can do counting with views, but we are trying to avoid views if n1ql can do the same thing for us.

We also have another service that is using couchbase. This particular service is designed to be a CMS with custom custom types. As it is a CMS, pagination is a must. I think we are going to be OKAY with count() query up to a point, but I can’t assume it. We currently use views with the service. I was wanting to replace views to n1ql as that simplifies our process.

I have the same issue, using Couchabse Server 4.6DP when running query:

CREATE INDEX book_type_id ON mybucket (type,id)
SELECT count(*) FROM mybucket where type='book';

Execution takes: 24.11s Why it is so slow? Am I doing something wrong?
The bucket contains 4M documents. Just for comparison, PostgreSQL handles this query in <2 sec
Here is an EXPLAIN.

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexCountScan",
          "covers": [
            "cover ((`mybucket`.`type`))",
            "cover ((`mybucket`.`id`))",
            "cover ((meta(`mybucket`).`id`))"
          ],
          "index": "book_type_id",
          "index_id": "15c1a4d18b8ef855",
          "keyspace": "mybucket",
          "namespace": "default",
          "spans": [
            {
              "Range": {
                "High": [
                  "successor(\"book\")"
                ],
                "Inclusion": 1,
                "Low": [
                  "\"book\""
                ]
              }
            }
          ],
          "using": "gsi"
        },
        {
          "#operator": "IndexCountProject",
          "result_terms": [
            {
              "expr": "count(*)"
            }
          ]
        }
      ]
    },
    "text": "select count(*) from mybucket where `type`='book';"
  }
]

Can you try the following index.

CREATE INDEX idx_books ON mybucket( id ) WHERE type = 'book';

EXPLAIN SELECT COUNT(*)
FROM mybucket USE INDEX ( idx_books )
WHERE type = 'book' AND id IS NOT NULL;

It is the same, The first run ~50s, the second ~25s.

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexCountScan",
          "covers": [
            "cover ((`mybucket`.`id`))",
            "cover ((meta(`mybucket`).`id`))"
          ],
          "index": "idx_books",
          "index_id": "ccc47a7a61786be8",
          "keyspace": "mybucket",
          "namespace": "default",
          "spans": [
            {
              "Range": {
                "Inclusion": 0,
                "Low": [
                  "null"
                ]
              }
            }
          ],
          "using": "gsi"
        },
        {
          "#operator": "IndexCountProject",
          "result_terms": [
            {
              "expr": "count(*)"
            }
          ]
        }
      ]
    },
    "text": "SELECT COUNT(*)\nFROM mybucket USE INDEX ( idx_books )\nWHERE type = 'book' AND id IS NOT NULL;"
  }
]

Yes, @siri and the indexing team are working on this. Will be much faster in 5.0.

For now, try the following query:

SELECT COUNT(*) FROM mybucket;

You will see that the result is instant. So if you put books in a separate bucket, you can use this count.

We are currently testing 5.0.0 DEV (AprilDB) and we are also (still) struggeling with COUNT(*) queries. We need it for pagination.

@geraldss What has be done for Count(*) queries in 5.0? Can you point to some JIRAs?

From this thread I see that the EXPLAIN should output something like IndexCountScan but I don’t see this. I just see normal IndexScan.
Does that IndexCountScan need to be enabled at first somehow?

Could you please post the query, index and EXPLAIN. cc @keshav_m

For IndexCountScan, there needs to be an index covering the query and the query should only have COUNT(*) in its projection.

There is no separate enablement for this feature.
see: https://dzone.com/articles/a-deep-dive-into-couchbase-n1ql-query-optimization

Please try with the 5.0 Beta. https://www.couchbase.com/downloads

We’ll post the jira for improvements soon.