Simple count(*) query takes 2+ seconds

geraldss · April 24, 2016, 3:49am

Ok, getting closer. N1QL does not use a schema, and does not use a cost-based optimizer (data statistics). In short, your data set is skewed in the worst possible way for N1QL.

You can try the following index:

CREATE INDEX idx_form ON default(form) WHERE form = “test”;

You want your EXPLAIN to show IndexCountScan instead of IndexScan.

moon0326 · April 25, 2016, 8:30pm

Hello @geraldss

I deleted the index I created before and added a new index with “where”.

I still get “IndexScan” in the explain output.

vsr1 · April 25, 2016, 8:59pm

IndexCountScan is not available in 4.50 DP. Need to wait for 4.50 Beta.

moon0326 · April 25, 2016, 9:11pm

Thank you @vsr1

If that is the case, what is improved in 4.5DP ? I’m a little confused now.

I was told that 4.5DP has improved count query performance.

geraldss · April 25, 2016, 9:57pm

Hi @moon0326,

My mistake. IndexCountScan is in Beta, not DP.

For DP, please do the following:

(1) Insert 100 or 1000 document with form = test2

(2) Change your query to search for test2, using the same index

This is more representative. What you are doing currently is equivalent to a full scan, but without schema or statistics.

Beta scheduled for next week.

Thanks,
Gerald

moon0326 · April 25, 2016, 10:13pm

Thank you for the clarification.

We’ve decided to go with views until 4.5 stable gets released.

dmitryb · May 20, 2016, 2:44pm

I’ll continue this thread as I’m also experiencing slow performance issue with count(*) query on Couchbase 4.5 Beta.

My use case is very typical and extremely simple. I store user session documents and each document has user attribute which contains user identifier as a string.
Document example:

{
  ...
  "user": "123456789",
  ...
}

I created the following index:

CREATE INDEX `user-idx` ON `default`(`user`) USING GSI;

Now I want to find 10 users with biggest number of open (existing) sessions. The query is simple:

select `user`, count(*) from `default` group by `user` order by count(*) desc limit 10;

That query takes ~8sec on ~50k documents in the database. Extremely long!
I searched the forum and found a suggestion to add where field is not null clause to the query.
And here is the new query:

select `user`, count(*) from `default` where `user` is not null group by `user` order by count(*) desc limit 10;

This query takes ~2.5 sec, yes 3 times faster but still very slow comparing to RDBMS.
Here is the EXPLAIN output:

cbq> explain select `user`, count(*) from `default` where `user` is not null group by `user` order by count(*) desc limit 10;
{
    "requestID": "ca80774d-cd2c-4cbe-a318-924627b68991",
    "signature": "json",
    "results": [
        {
            "plan": {
                "#operator": "Sequence",
                "~children": [
                    {
                        "#operator": "Sequence",
                        "~children": [
                            {
                                "#operator": "IndexScan",
                                "covers": [
                                    "cover ((`default`.`user`))",
                                    "cover ((meta(`default`).`id`))"
                                ],
                                "index": "user-idx",
                                "index_id": "f631c4c57f717e25",
                                "keyspace": "default",
                                "namespace": "default",
                                "spans": [
                                    {
                                        "Range": {
                                            "Inclusion": 0,
                                            "Low": [
                                                "null"
                                            ]
                                        }
                                    }
                                ],
                                "using": "gsi"
                            },
                            {
                                "#operator": "Parallel",
                                "~child": {
                                    "#operator": "Sequence",
                                    "~children": [
                                        {
                                            "#operator": "Filter",
                                            "condition": "(cover ((`default`.`user`)) is not null)"
                                        },
                                        {
                                            "#operator": "InitialGroup",
                                            "aggregates": [
                                                "count(*)"
                                            ],
                                            "group_keys": [
                                                "cover ((`default`.`user`))"
                                            ]
                                        }
                                    ]
                                }
                            },
                            {
                                "#operator": "IntermediateGroup",
                                "aggregates": [
                                    "count(*)"
                                ],
                                "group_keys": [
                                    "cover ((`default`.`user`))"
                                ]
                            },
                            {
                                "#operator": "FinalGroup",
                                "aggregates": [
                                    "count(*)"
                                ],
                                "group_keys": [
                                    "cover ((`default`.`user`))"
                                ]
                            },
                            {
                                "#operator": "Parallel",
                                "~child": {
                                    "#operator": "Sequence",
                                    "~children": [
                                        {
                                            "#operator": "InitialProject",
                                            "result_terms": [
                                                {
                                                    "expr": "cover ((`default`.`user`))"
                                                },
                                                {
                                                    "expr": "count(*)"
                                                }
                                            ]
                                        }
                                    ]
                                }
                            }
                        ]
                    },
                    {
                        "#operator": "Order",
                        "limit": "10",
                        "sort_terms": [
                            {
                                "desc": true,
                                "expr": "count(*)"
                            }
                        ]
                    },
                    {
                        "#operator": "Limit",
                        "expr": "10"
                    },
                    {
                        "#operator": "FinalProject"
                    }
                ]
            },
            "text": "select `user`, count(*) from `default` where `user` is not null group by `user` order by count(*) desc limit 10"
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "4.195335ms",
        "executionTime": "4.101078ms",
        "resultCount": 1,
        "resultSize": 4955
    }
}

As you can see the query uses IndexScan, not IndexCountScan and I don’t know why.

Unless I’m doing something wrong (what could it be??) I don’t really see that the problem is fixed in 4.5 Beta.

geraldss · May 20, 2016, 3:10pm

Hi @dmitryb,

How long does it take without the ORDER BY?

dmitryb · May 20, 2016, 3:14pm

about 2.0 sec, a bit faster

geraldss · May 20, 2016, 3:23pm

Yes, that is very long. Can you try using curl and setting max-parallelism to twice your number of cores?

Whatever the issue is here, we are looking into it. I’m just curious about your environment.

Thanks.

dmitryb · May 20, 2016, 3:48pm

I’m running on a VM with 8 cores.
Request using REST API:

curl -XGET 'http://127.0.0.1:8093/query/service?statement=select%20`user`,%20count(*)%20from%20`default`%20where%20`user`%20is%20not%20null%20group%20by%20`user`%20limit%2010&max_parallelism=16'

takes a bit longer ~2.2 sec

I think the main problem is that instead of IndexCountScan, IndexScan method is used and I don’t know how to force IndexCountScan usage, it doesn’t happen automatically.

geraldss · May 20, 2016, 3:54pm

IndexCountScan cannot be used with GROUP BY. The counting must be done inside the query engine rather than the indexer. This slow performance is surprising, because you are using a covering index. We will look into it.

egrep · May 22, 2016, 9:21pm

@geraldss, @vsr1,

IndexCountScan is not available in 4.50 DP. Need to wait for 4.50 Beta.

Is there any chance of backporting “IndexCountScan” to 4.1.X branch ?

geraldss · May 23, 2016, 4:41am

Hi @egrep,

Unfortunately, this is not likely. There are many interdependent optimizations in 4.5.

amar1991 · August 9, 2016, 8:46am

Hi @geraldss,

I’m using couchbase server 4.5 EE. My count query is taking more than 3 sec which is not expected.

Q -> select count(meta().id) from cms where _type=‘Product’ and _active=true;
and Index is as follow
CREATE INDEX prod_count_idx ON cms((meta().id)) WHERE ((_type = “Product”) and (_active = true)) ;

here is explain

explain select count(meta().id) from cms where _type=‘Product’ and _active=true;
{
“requestID”: “52bd095d-a34d-43b2-b01f-345f2bee2cea”,
“signature”: “json”,
“results”: [
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexCountScan”,
“covers”: [
“cover ((meta(cms).id))”,
“cover ((meta(cms).id))”
],
“index”: “prod_count_idx”,
“index_id”: “43b767402115eb6e”,
“keyspace”: “cms”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“Inclusion”: 0,
“Low”: [
“null”
]
}
}
],
“using”: “gsi”
},
{
"#operator": “IndexCountProject”,
“result_terms”: [
{
“expr”: “count((meta(cms).id))”
}
]
}
]
},
“text”: “select count(meta().id) from cms where _type=‘Product’ and _active=true”
}
],
“status”: “success”,
“metrics”: {
“elapsedTime”: “55.717264ms”,
“executionTime”: “55.61457ms”,
“resultCount”: 1,
“resultSize”: 1426
}
}

document structure
{
"_active": true,
"_createdAt": 1469801004787,
"_createdBy": “admin”,
"_modifiedAt": 1469801004787,
"_modifiedBy": “admin”,
"_type": “Product”,
“status”: “ONLINE”,
},
“id”: “Product::MA.CH.MA.326662”
}
i have around 1 million such document. more than 2 sec for count query is not expected. Please help.

geraldss · August 9, 2016, 7:30pm

In 4.5, that is the best plan for our counts. Can you make the query more selective?

We will be improving the performance of counts by improving how our indexer maintains counts. But that is not in 4.5, unfortunately.

hanswong · September 8, 2016, 7:35am

We used part of production records to test this. Total records number is 4.8M, indexed 686K.
Definition: CREATE INDEX test_idx ON GuotaiTestTrading(cmd,close_time,rebateStatus,classType) WHERE ((((classType = “Order”) and ((cmd = 0) or (cmd = 1))) and (0 < close_time)) and (rebateStatus = 0))

below is the explain of select.
[
{
“plan”: {
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “DistinctScan”,
“scan”: {
“#operator”: “IndexScan”,
“index”: “test_idx”,
“index_id”: “c57413922161a164”,
“keyspace”: “GuotaiTestTrading”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
“successor(0)”
],
“Inclusion”: 0,
“Low”: [
“0”,
“0”,
“0”,
““Order””
]
}
},
{
“Range”: {
“High”: [
“successor(1)”
],
“Inclusion”: 0,
“Low”: [
“1”,
“0”,
“0”,
““Order””
]
}
}
],
“using”: “gsi”
}
},
{
“#operator”: “Parallel”,
“~child”: {
“#operator”: “Sequence”,
“~children”: [
{
“#operator”: “Fetch”,
“keyspace”: “GuotaiTestTrading”,
“namespace”: “default”
},
{
“#operator”: “Filter”,
“condition”: “(((((GuotaiTestTrading.classType) = “Order”) and ((GuotaiTestTrading.rebateStatus) = 0)) and (((GuotaiTestTrading.cmd) = 0) or ((GuotaiTestTrading.cmd) = 1))) and (0 < (GuotaiTestTrading.close_time)))”
},
{
“#operator”: “InitialProject”,
“result_terms”: [
{
“expr”: “self”,
“star”: true
}
]
}
]
}
}
]
},
{
“#operator”: “Order”,
“limit”: “1”,
“sort_terms”: [
{
“desc”: true,
“expr”: “(GuotaiTestTrading.close_time)”
}
]
},
{
“#operator”: “Limit”,
“expr”: “1”
},
{
“#operator”: “FinalProject”
}
]
},
“text”: “select * from GuotaiTestTrading use index(test_idx USING GSI)\nwhere classType = “Order”\nand rebateStatus = 0\nand (cmd = 0 or cmd =1)\nand close_time >0\norder by close_time desc\nlimit 1”
}
]

isha · September 8, 2016, 4:22pm

Hi @hanswong

This is a different issue from the earlier discussions in this post, since your query doesn’t use a count. Please open a different forum question for this with the issue, version of couchbase. (I assume that you are facing a performance issue ? )

You could also take a look at the following thread to see if you can improve the performance of your query. (if that is the issue you are facing)
https://www.couchbase.com/forums/t/performance-drop-when-system-is-under-load/?source_topic_id=7838

Thanks

moon0326 · September 17, 2016, 6:53am

Hello everyone and @geraldss

We’ve finally upgraded to 4.5.

I have documents with a field ‘form’.

count query still takes over 10+ seconds even with IndexCountScan.

This is my explain.

{
    "requestID": "ab50a005-3178-4d1a-8c24-a362b50e5341",
    "signature": "json",
    "results": [
        {
            "plan": {
                "#operator": "Sequence",
                "~children": [
                    {
                        "#operator": "IndexCountScan",
                        "covers": [
                            "cover ((`bucket`.`form`))",
                            "cover ((meta(`bucket`).`id`))"
                        ],
                        "index": "idx_form",
                        "index_id": "6eecf52eb064a628",
                        "keyspace": "bucket",
                        "namespace": "default",
                        "spans": [
                            {
                                "Range": {
                                    "High": [
                                        "\"formname\""
                                    ],
                                    "Inclusion": 3,
                                    "Low": [
                                        "\"formname\""
                                    ]
                                }
                            }
                        ],
                        "using": "gsi"
                    },
                    {
                        "#operator": "IndexCountProject",
                        "result_terms": [
                            {
                                "expr": "count(1)"
                            }
                        ]
                    }
                ]
            },
            "text": "select count(1) from bucket where form='formname'"
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "7.832115ms",
        "executionTime": "7.792515ms",
        "resultCount": 1,
        "resultSize": 1547
    }
}

and this is my query select count(1) from catalog where form='creative';

My hope for under 1 sec is gone. How can I improve this 13+ seconds query to something…faster?

geraldss · September 18, 2016, 7:53pm

Do all your documents have the same value of formname? How many documents total, and how many of those have formname?