Performance implications of having many collections/scopes/indexes in Couchbase 7.0 Beta

I would like to ask about the performance implications of having many scopes, collections, and indexes in Couchbase 7.0 Beta.

Our applications initialize their own scope in the target bucket lazily when the Data API is acquired and an operation (K/V or query) is called for a specific JVM class. The Data API also checks for indexes and registers new indexes when required. Moreover, the Data API waits for all scopes/collections/indexes to be online before proceeding with the first operation.

Due to this design, in our tests, we create many scopes/collections/indexes. We observed, that tests would slow down and start to fail, as create/get scope/collections/indexes operations become slower.

In one of our applications, for a single bucket having 100 scopes, 30 collections in each scope, and 60 indexes in each scope turned out to be so detrimental to performance, that create/get scope/collections/indexes required minutes to go through. When an application is getting initialized, many instances of the Data API start to acquire collections/indexes for their own use in parallel.

Why having a lot of indexes/collections (mostly passive ones) turn out to be so bad for performance? Is this performance degradation only appears in Community edition or in Enterprise edition as well?

@zoltan.zvara,

If you are seeing larger index creation times with increase in number of indexes, it is most likely due to index planner component (Couchbase Server - Index Planner for Global Secondary Indexes). For faster index creation times, you can disable planner using the below setting:

curl -u : http://<indexer_ip>:9102/settings -X POST -d ‘{“queryport.client.usePlanner”:false}’

Please note that disabling planner can lead to sub-optimal distribution of index load on the cluster.

Thanks,
Varun

1 Like

Thanks @varun.velamuri, we will not disable it, but instead allow for a larger await time during adaptive index creation in applications. However, we frequently stumble upon index creations stuck at “building 100%” state. Usually happens when an application initializes a bucket with 10s of collections and indexes at a time. Sometimes 2 indexes are created with the same name, although we first make an attempt to create an index, then we check if it gets online. If not, we may wait or make another attempt. Nevertheless, I think the index creation service is not designed to handle these circumstances, but it feels like application will eventually start to adaptively initialize the database with 100s of indexes at a time with the new collection-data-model and this could be a problem.

You may be seeing some of the indexing inconsistencies due to the eventually consistent metadata model today. We will soon have strongly consistent metadata, which as you note is necessitated by having large number of collections and indexes, and that should address this problem.

1 Like

@zoltan.zvara,

However, we frequently stumble upon index creations stuck at “building 100%” state.

I see that you have raised a separate post for this issue. Will follow-up this issue on that post.

@zoltan.zvara,

Starting with couchbase server 7.0, we support multiple applications trying to create indexes in parallel. We achieve this by scheduling the indexes for creation in the background. Indexes with same name shouldn’t get created.

If you are seeing multiple indexes with the same name, that can be due to a bug. Please provide indexer.log and query.log files - from all nodes - so that we can analyse the problem further.

There was a known behaviour around this which its fixed with https://issues.couchbase.com/browse/MB-38685. Any build with number equal to or higher than 3711 should contain the fix. IF you are using the build with higher build number, please provide the logs.

Thanks.

Our build number is 3739.

Is it possible for me to provide indexer and query logs in secret so that application internals are not exposed to public, but solely for the purpose of Couchbase investigating the issues? Are these logs contain sensitive data?

One thing I might add is that we also see crashes of the indexing service during our tests in CI.

So we have a scope/collection set up and the application runs 500+ tests, including transactions and so on on a Community 7.0.0 Beta 2-node cluster with virtually unlimited resources. 1 of the tests will check if the application can initialize itself, so it creates a new scope and attempts to set up everything (creating 80 collections and 200 indexes), meanwhile the remaining tests run in parallel. CPU or memory is still available on the CB nodes, but sometimes the indexer service crashes.

What logs and data would you need to investigate?

Hi @zoltan.zvara,

Regarding log collection, please check log redaction feature of Couchbase server. Using this, you will be able to redact some of the sensitive information.

Also, if you don’t want to post a public link to the logs, you can send me a link in the Personal Message on the forum itself.

Hope this helps.