Choose number and configuration for search nodes

I have the below mentioned CB cluster configuration:

  1. Version - 6.0.2
  2. 5 data, 5 index, 8 query nodes on Azure
  3. Bucket has 130 million documents
  4. Each node configuration:
    Operating system : Linux (centos 7.7.1908)
    Size:Standard E64-32s_v3 (32 vcpus, 432 GB memory)
    vCPUs : 32
    RAM : 432 GB

Now i want to know how to calculate number of FTS nodes and its machine configuration such that out of 130 million documents, i just need to index 60 million documents and only with its document id. I just want to store one more field from each document to search node. Also, there will be a throughput of 100 request per second.

And what number of search nodes and configuration, you guys suggest for the above cluster?


Hi @Nitesh_Gupta ,

You may start with 2 FTS nodes of the above configuration of default 6 partitioned index, with 1 replica enabled for HA.
Thereafter, depending on the latency requirements you may adjust the partition count to leverage the CPU cores available empirically.

And you may set the FTS memory quota to ~70% of the RAM, ie ~290GB.

You also need to think about the future growth potential of the data too while provisioning the nodes/partitions.

Now, after these initial trials and future growth considerations, if numbers are looking good,
you may explore nodes with a slightly lesser configuration like 16 cores with lesser RAM too.

Thanks @sreeks for this initial configuration to start with.

I tried creating index on 6 lacs documents out of 130 million documents with above mentioned configurations and requirements. Creating index took almost 1 hour and after index completion, doc count was showing 130 million.

  1. why it took so much time?
  2. why doc count was showing complete document count?
  3. While index create/update, it took 98% CPU utilization. why?

Index created was on one field with only index checked box. Then later i added 3 more fields to index with only index checked box. It again took 1 hour. why is it time consuming?

Doc count indicates the number of documents processed from the bucket. Not the real count of documents in the index.
This stat label is updated in the latest server software.

  1. why it took so much time?
    As FTS has to process/parse all 130M documents, it is taking this much time. Only by inspecting the document contents it knows whether this is the type of document the user need to index or not.

  2. why doc count was showing complete document count?
    Its the count of documents processed so far.

  3. While index create/update, it took 98% CPU utilization. why?
    FTS has to text analyze the documents, index it. And there is background compaction and a lot of other activities going on.

FTS has to parse the whole 130M documents again once you change the index definition or mapping. As mentioned in other thread - its a rebuild from zero.

How many nodes you have and what is its hardware configuration? Are those hosting only FTS service?

Copying from my initial post:

Adding 2 search nodes with memory quota of 70% RAM and index being created with 6 partitions (may be default value, i am not sure because i do not have option to set it while index definition) having 1 replica with scorch type.

And, yes these 2 nodes are hosting only search service.