Low performance with N1QL queries

Hi all,

We have performed some tests for comparing the performance of Couchbase and MongoDB.
The results were:

  • Test 1: Getting documents by their ID. In this case Couchbase performed around 21K operations per second, and MongoDB performed around 9K operations per second.
  • Test 2: Searching documents by one of their fields, having an index on that field. In this case Cauchbase 4.5.1 Enterprise Edition performed around 2K op/sec, Cauchbase 4.1.0 Community Edition performed around 0.5K op/sec, and MongoDB 3.2.10 Community Server performed around 9K op/sec.

For improving the performance of Cochbase in Test 2 I have set up adhoc=false (prepared statements) and increased the number of Query (N1QL) endpoints to 100.

You can find more details about the tests here: database-tests

Am I doing anything wrong? Are those numbers normal?

Thanks in advance.

@ernesto.perez can you please post your n1ql query here as well as the index definition you are using? I’m sure @geraldss can help you speed things up!

Oh one thing, when you did benchmark on 4.5.1 did you use memory-optimized-indexes?

Hi @daschl, thanks for answering.

This is the index I’m using:
CREATE INDEX firstname_index ON default(firstname) USING GSI

The query I’m doing, using the Java API is:
final N1qlQuery query = N1qlQuery.parameterized(“SELECT firstname FROM default WHERE firstname = $1”,
JsonArray.from("Walter " + id),
N1qlParams.build().adhoc(false));

Regarding the index setting, I have tested both Standard Global Secondary Indexes and Memory-Optimized Global Secondary Indexes, without noticing a major difference.

The Data RAM Quota is only 2048 MB but only 72.2 MB are in use.

Hi @ernesto.perez,

Yes, we will help you get the correct configuration and numbers. First, we should focus on 4.5.1 EE, and leave out 4.1 CE for now.

A few points about Couchbase:

  • the client and server should not be on the same hardware / node
  • the data, index, and query services should each be on separate nodes

These configurations match how Couchbase is actually deployed in production, so they make the benchmarks more meaningful. Can you make those changes and rerun the tests.

After those tests, we can test the Couchbase Server in isolation, not using Java. You can write a script around cbq shell or curl to post your queries directly.

Also adding my colleagues @keshav_m and @vsr1 to this thread.

1 Like

Dear Gerald,

You wrote “the data, index, and query services should each be on separate nodes”.

We’re running a few production nodes and all of them have those services on the same VMs.
How would you suggest separating those services into different nodes???

Thank you in advance,
Alex

Hi @prasad, what is the best way to migrate this. I think you can add new VMs, with one service each, and then rebalance the data nodes.

Note that the VMs should run on dedicated hardware.

Thanks Gerald.

  1. So let’s say we have a cluster of 10 nodes now. What should be the number of nodes that run data (X), index (Y) and queries (Z) to properly balance the load?

  2. Should we continue using 1 Java client and it will know somehow to connect to the proper node depending on the type of operation we are requesting (CRUD vs. query vs. index creation)?

  3. Dedicated hardware as opposed to shared one (regular AWS EC2 VMs)?

Alex

Alex,

(1) Without any data to go by, how about 4 data nodes, 3 index nodes, 3 query nodes.

(2) One java client is fine. @daschl can advise on SDK version.

(3) I think that’s fine. @prasad can advise here.

Gerald

  1. If 10 splits to 4-3-3 - why bother? Why not simply running out-of-the-box configuration with all services running on all nodes? :slight_smile:

  2. OK

  3. Do you mean that it’s fine to use regular AWS EC2 (shared hardware) VMs?

Alex

hi @alex1,

  1. If 10 splits to 4-3-3 - why bother? Why not simply running out-of-the-box configuration with all services running on all nodes? :slight_smile:

This is all about better scaling, and better/right resource allocation/utilization for the right services. Note that, “Without any data to go by”, (4,3,3) is recommended. Once you add in more details about your specific setup, things will change. Pls read through and understand Couchbase’s MDS architecture at: http://developer.couchbase.com/documentation/server/4.5/architecture/services-archi-multi-dimensional-scaling.html

For ex, below are few points to high-light the importance and non-trivial nature of this problem:

  • query is cpu/mem intensive, where as index is also disk intensive.

  • If your data is sized, to say 4 nodes, but you want to scale your queries, then you need to plan more query/n1ql nodes.

  • If you carefully design your queries to use covering indexes, then you can avoid unnecessary load onto your data nodes. Then, you can scale index nodes based on number of indexes you have (and number of replica/duplicate indexes etc).

  • And, don’t ignore the load an index node (and every index) adds to the whole system (and data nodes), especially when you have high mutation rates to your documents.

  • running all services on all nodes is simple, but its not hard to visualize how it hurts scaling, and causes unnecessary resource contention between various services. For ex: if you have 4 important indexes, what’s the point in running index service on all 10 nodes. Note that, typically, one index can node can easily serve multiple query/n1ql nodes.

  1. Do you mean that it’s fine to use regular AWS EC2 (shared hardware) VMs?

Following is some good documentation on this topic:

hth,
-Prasad

1 Like