N1QL Performances - Are 2,5K requests by second at 2ms each an expected average?

Hi,

I’m currently running some tests on Couchbase, and I wanted to know if the number I got while doing N1QL requests are what I should expect, or if I should be getting some better number.

Using an average cluster, I’m getting at most 2500 N1QL requests by second (around 2 ms response time for each) before capping the servers CPU. Is that normal ?

I’m running Couchbase Entreprise Edition, version 4.5. The client code is in C#, using the official Couchbase SDK.

I’m running a test cluster on MS Azure, with 6 nodes:

  • 2 nodes “data” (16 cores / 40 Go)
  • 1 node “data + query” (8 cores / 24 Go)
  • 3 nodes “index + query” (2 * 8 cores / 24 Go, 1 * 16 cores / 40 Go)

All data is stored on SSD drives.

The cluster contains 40 millions documents, for a total size of approximatly 100 Go.

When testing key / value access, I’m able to get more than 180K reads by second at 30% CPU, with an average response time under 0.2 ms.

I’m using indexes, with the given definitions:

CREATE INDEX idxType ON catalog(type)
CREATE INDEX idxEan ON catalog(ean) WHERE (type = "Product")

(“catalog” is the bucket name)

The test request looks like that:

select catalog.* from catalog where type = 'Product' and ean = '{ean}'

where the “ean” field value change for each request.

I expected a performance hit when going from key access to N1QL, but I’m surprised that it’s so huge - in fact, I’m so surprised that I think I did something wrong :slight_smile:

I tried to change to topology of the cluster (adding more query servers, for example), but to no avail - each query server seems to be able to handle around 700 / 800 N1QL queries by second at best, which seems to be very low. The bottleneck is very clearly the query nodes CPU.

I have tried using EXPLAIN, but it does not give me much - the idxEan is used, as expected.
I also tried to create an index containing the ean field and the doc ID, and to select only the doc ID, but I did not notice any significant performance improvement.

I’m at loss about what I could try next…

If any needed information is needed to understand the case; please tell me, I will be happy to provide it.

Thanks for you help,
Quentin.

When testing key / value access you are directly connecting to KV and fetching documents.
With N1QL you need to look index first to fetch document key and fetch the document, process it to apply predicate and send it to client. There is lot of processing involved due to that high cpu and low through put.

Could you please provide the following info.

  1. Using MOI or forestdb
  2. Avg documents each query produces.
  3. Is queries using unbounded or request_plus
  4. Query uses prepared statements or adhoc

As I said, I was expecting a performance hit when going from key / value to N1QL, but I’m surprised at how much of a hit it is.
What I’m really asking here is “are those number in line with what should be expected, or should they be better” :slight_smile:

To answer your questions:

1) Using MOI or forestdb

Memory-Optimized Global Secondary Indexes

2) Avg documents each query produces.

Average document size is 4K, but that’s fairly variable, the biggest are around 150K.

3) Is queries using unbounded or request_plus

I’ve tried both (setting the QueryRequest.ScanConsistency to ScanConsistency.RequestPlus and ScanConsistency.NotBounded), with roughly the same performance results.

4) Query uses prepared statements or adhoc

Same as 3: I’ve tried both (setting the QueryRequest.AdHoc to true or false), without any noticeable impact on the response time.

I’ve just run a test, with unbounded queries and adhoc = false. I’m getting an average response time of 4ms, 2700 request / sec, with all the query nodes between 60 & 80% CPU.

For reference, I don’t know if it can help… here’s an EXPLAIN of an example request, done with the webconsole:

explain select catalog.* from catalog where type = 'Product' and ean = '2111100989237'

{
    "requestID": "a63b31e8-baf6-4e5c-8a4f-8bbfa80b4215",
    "clientContextID": "34dd3cfe-7298-4431-aaa8-737b11fb4a9c",
    "signature": "json",
    "results": [
        {
            "plan": {
                "#operator": "Sequence",
                "~children": [
                    {
                        "#operator": "IndexScan",
                        "index": "idxEan",
                        "index_id": "c0ce9ce2005ca79c",
                        "keyspace": "catalog",
                        "namespace": "default",
                        "spans": [
                            {
                                "Range": {
                                    "High": [
                                        "\"2111100989237\""
                                    ],
                                    "Inclusion": 3,
                                    "Low": [
                                        "\"2111100989237\""
                                    ]
                                }
                            }
                        ],
                        "using": "gsi"
                    },
                    {
                        "#operator": "Fetch",
                        "keyspace": "catalog",
                        "namespace": "default"
                    },
                    {
                        "#operator": "Parallel",
                        "~child": {
                            "#operator": "Sequence",
                            "~children": [
                                {
                                    "#operator": "Filter",
                                    "condition": "(((`catalog`.`type`) = \"Product\") and ((`catalog`.`ean`) = \"2111100989237\"))"
                                },
                                {
                                    "#operator": "InitialProject",
                                    "result_terms": [
                                        {
                                            "expr": "`catalog`",
                                            "star": true
                                        }
                                    ]
                                },
                                {
                                    "#operator": "FinalProject"
                                }
                            ]
                        }
                    }
                ]
            },
            "text": "select catalog.* from catalog where type = 'Product' and  ean = '2111100989237'"
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "2.152512ms",
        "executionTime": "2.105212ms",
        "resultCount": 1,
        "resultSize": 2399
    }
}

Are you using prepared queries, with the $ean provided as a parameter? (You can look in system:prepareds to see whether a prepared statement is actually being created.)

Also, do you absolutely need to fetch the complete document? If you just need a few fields, it might be better to create a custom index that includes the result fields.

Are you using prepared queries, with the $ean provided as a parameter?

I tried a couple of things (using a parameter, setting the variable in the request myself, etc).
I’ve found that I got the best performance (not by a huge margin) with the following client code:

    var query = new QueryRequest()
        .Statement($"select catalog.* from catalog where type = 'Product' and ean = $1 limit 1")
        .AddPositionalParameter(ean)
        .ScanConsistency(ScanConsistency.NotBounded)
        .AdHoc(true);

I’m got around 3K requests / sec with that client code.

(You can look in system:prepareds to see whether a prepared statement is actually being created.)

I ran SELECT * FROM system:prepareds in the console while a test was in progress, and I got a couple thousand result.

Also, do you absolutely need to fetch the complete document? If you just need a few fields, it might be better to create a custom index that includes the result fields.

Yes, I need to fetch the complete document.
We tried to use a composite index (ean, docId), and then get the doc by it’s id. I did not run that test myself, but it was reported to me that we got a similar throughput and response time.

Hi @quentin.a,

How many number of documents each query returns.

“I ran SELECT * FROM system:prepareds in the console while a test was in progress, and I got a couple thousand result.”

I think that’s the problem. Your system should be preparing this statement once, but it is doing it for every request. The SDK folks should be able to help you avoid this problem.

You can delete the entries from system:prepareds by doing “DELETE FROM system:prepareds” as Administrator, to clear the field, if necessary.

How many number of documents each query returns.

1 document in nearly every case, and sometimes ( <0.1% of the requests) none.
In that example use case, the ean field is a secondary document ID.

I have following recommendations.

  1. Move to 4.6.2 or 4.6.1 or 4.5.1
  2. Use pretty=false on query service
  3. Create duplicate index(same index definition) on all the indexer nodes so that load balance will happen.
  4. When running performance run check if the queries are distributed on all query nodes.

cc @keshav_m

I’ve run the DELETE query, and restarted my tests with

            var query = new QueryRequest()
                 .Statement($"select catalog.* from catalog where type = 'Product' and ean = $1 limit 1")
                 .AddPositionalParameter(ean)
                 .ScanConsistency(ScanConsistency.NotBounded)
                 .AdHoc(false);

I now have some better results: 7500 requests / seconds, 2ms average, query nodes at 70% CPU.
I noticed that as I ramp up the number of parrallel requests, the number of results for SELECT * FROM system:prepareds scale linerarly… I’m not sure if it’s normal or not.

I quickly reached the point where I had 3500 results in system.prepared.

I also noticed that shutting down a test client does not clear the prepared statements on the cluster. Each time I start a client, I add some more results to the system.prepared. That does not seems to be expected ?

  1. Move to 4.6.2 or 4.6.1 or 4.5.1

I was mistaken, and my OP contains an error on that: all the tests I’ve run were against a CB Entreprise 4.6.1 cluster, instead of CBE 4.5.

  1. Use pretty=false on query service

No significant difference.

  1. Create duplicate index(same index definition) on all the indexer nodes so that load balance will happen.

I don’t understand what that mean ? If I try to create an index with an existing name, I get an error “index already exist”. Do you mean “create an index with a different name and the same definition” ?
Is that a normal practice ? It does seems weird ?

  1. When running performance run check if the queries are distributed on all query nodes.

The CPU usage is similar accross all query nodes, so I suppose that the queries are distributed evenly.

“create an index with a different name and the same definition”. It is normal and help in scaling. Prepare time we use one of the index and execution time we identify these are identical indexes and indexer client does load balancing based on load on the indexer node.

Ok ! I’ll try that, and come back with the results.

“Each time I start a client, I add some more results to the system.prepared. That does not seems to be expected ?”

That doesn’t seem right to me. You should have a single entry for the query, which then gets used for all requests. Too many queries are being prepared and then not reused. Either the client or the SDK is doing something wrong.

SDK folks, anything to add?

@johan_larson, If each client using different prepare name those are treated different.

@quentin.a,

How many clients you are using. How many threads per each client using.
Is Each thread prepares once and execute many times with different value.

@johan_larson, If each client using different prepare name those are treated different.

When I run select *system.prepared , every resulta has a different “technical” name, like “4c593a3e-a1ea-4162-aa12-9c51f31ad322”.
I don’t see a method in the SDK to set a fixed name for each request… ?

I’ve also tried to reuse the QueryRequest object, but it does not work.

How many clients you are using. How many threads per each client using.

A dozen clients, each starting 1000 async request and waiting for the result (and then starting 1000 new async requests).

Is Each thread prepares once and execute many times with different value.

I suppose not, and I don’t know how to do that with the C# SDK. The requests are prepared, and somehow reused (the number of system.prepared entries grow with the number of clients, but not with the test duration, as if every client created a couple hundred different prepared request and then reused them).

@ingenthr. Could you please help here.

I think this is the perfect example where profiling would help, although in 4.6 we don’t have as much of that as we are going to have in 5.0.
Is there any way you could try it out with the 5.0 Beta (out today)?
Essentially you run your load as usual, and then from cbq, run

\set -profile timings;
select catalog.* from catalog where type = ‘Product’ and ean = [some value] limit 1;

In the result document there will be a an executionTimings section with the full execution tree with timings for each operator.
That’ll give us an idea of what’s waiting on what.
If that doesn’t work, I can guide you through collecting some statistics in 4.6 (although that is not going to be pretty)

HTH,
Marco