Creating the SGI faster

@eldorado, for Standard GSI, the recommended memory resident percentage is 20%. Based on the size of indexed field(and indexing overheads), that should give you an idea of how much memory would be required.
May be you can first index 10M items and check how many resources it needs and extrapolate the numbers from there.

Regarding CPU, you can start with something like 20 cores, and if indexer is saturating that, allocate more. As the workload becomes disk-bound, SSD bandwidth would determine the indexing throughput.

If resources on a single box are limited, index can be partitioned. This can improve the indexing throughput significantly by being able to utilize resources across nodes(memory/cpu/disk i/o).

https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/index-partitioning.html

thanks but a simple / Covered or Partition Index creation taking 4-6 hrs to complete on this much of data volume . we have TB’s of data and I am simply asking any method to or guidelines to increase Index threads during creation to make it faster . It s a hrs and hrs of wait just to see what is right indexing approach . Trying to see if this could be avoided . We have Index in GSI and not memory optimized . We are not concern on h/w resources now but need to know FASTER approach to create Index.
thanks

@eldorado, what is the node spec in terms of CPU, memory, disk bandwidth? How much memory is being allocated to the index service? What is the CPU/disk IO usage observed when the index is being built? What is the memory resident percent of the index after it has been built fully?

PS: If you are already an enterprise customer, you can get it touch with Couchbase Professional Services team to help you with sizing.

We have Kubernetes On-premise platform with 80 GB RAM on each node of Couchbase with 4 node cluster . Index is SGI and NOT memory optimized , 10 CPU each node … 150 GB storage in each node … Index service is not in MDS … all service available across all nodes…

cluster:
# The amount of memory that should be allocated to the data service
dataServiceMemoryQuota: 40960
# The amount of memory that should be allocated to the index service
indexServiceMemoryQuota: 10240
# The amount of memory that should be allocated to the search service
searchServiceMemoryQuota: 2048
eventingServiceMemoryQuota: 10240
# The amount of memory that should be allocated to the analytics service
analyticsServiceMemoryQuota: 10240
#
indexStorageSetting: plasma

volumeClaimTemplates:
- metadata:
name: couchbase
spec:
accessModes:
- ReadWriteOnce
storageClassName: pure-block
resources:
requests:
storage: 150Gi

Rest of the Usages I will post when I can capture the statistics …
I am potential EE customer but doing POC’s across different document store like Mongo , CB .
If we are not getting great community help to prove in POC we need to find out other DB option . Currently for CB Indexing is big challenge per say because we need some practical guidance as creating indexes for hrs and hrs not being helpful . Is there any modular approach ?
Do you have practical guidelines on best way to leverage all threads to increase Index creation performance ? Also is there any performance benefit we can is 6.5 version , I know its still in Beta
thanks

My best guess, without the statistics, would be that index service doesn’t have enough CPU to go faster. As data service and index service are colocated, there could be CPU contention when trying to build index. As MDS is the recommended setup for production, probably you can change the POC setup to also follow that so you get more realistic numbers.

You can probably try with first allocating one node for only Index/Query and 50GB memory quota for indexer. If there is a box with better CPU, that would be help significantly. Just to give you some numbers from our internal benchmarks for Standard GSI for index creation build times:

  1. 500M docs, 50% memory resident - Build time = 18mins (CPU usage = 35 cores)
  2. 500M docs, 20% memory resident - Build time = 38mins (CPU usage = 25cores)

These benchmarks are run on boxes with SSDs (500MB/sec I/O throughput).

The basic guideline is to have enough memory to keep at least 20% memory resident for the index and having 25-30 CPU cores with good SSDs.

1 Like

Hi @deepkaran.salooja - I am certainly excited to try this options and I am currently stuck with MDS configuration for my Kubenetes cluster for multi node CB cluster installation . Somehow the config of couchbase-cluster.yaml is restricting me on MDS … Once I have that I would definitely try this .
If you don’t mind can you see if the nodes allocations and quota for the POD’s from CPU and memory respective is good or not . Here is the reference in another thread.

Also we have pureblock and pure-file storages on Data / Index everything . Are your saying we need to put index on SSD based h/w . Do you think realistically Indexing will work better with query perf with 150 billions of documents ?

  • thanks and appreciate your help

@eldorado, cpu: “30”, memory: 50Gi seems like a reasonable starting point for 150M documents. Once the index gets built, you can look at the memory resident percent on the UI graphs(keeping it 20% memory resident should be good).

pureblock should be fine. What matters is the I/O bandwidth you can get. You can observe it using iostats etc.

It depends on what kind of queries you want to run. If the queries need to scan a subset of data, then index service provides you with working set management by caching the most used index data in memory. You can also benefit by creating “partial indexes” to meet specific query needs to better manage the TCO.

If the queries are more of “analytical” nature i.e. the requirement is to scan the whole 150B documents before answering a query, you may need to try it out with the “analytics” service if query/index service cannot meet the SLA.

@deepkaran.salooja - Is there any ability to change the memory resident % of it ? and where should I look this into ? In statistics section I see only showing Index stat for PRIMARY KEY indexes and not the Covered index created for my bucket .
Are you saying I could see that number in the " Max Index RAM Used %" section in the Server stat ?
All just for my own benefit during Index creation does that Advanced Index thread and configuration makes any difference ?
thanks

@eldorado, memory resident percent is available per index on the UI. Look for “cache resident percent” mini graph in the UI. More details here:

If the memory resident is too low, you can allocate more quota to index service under Settings or partition the index.

All just for my own benefit during Index creation does that Advanced Index thread and configuration makes any difference ?

By default, index service sets index threads to be equal to all CPU cores on the system. Unless you have set it to lower number, the default(0 value) should work fine.

@deepkaran.salooja , thanks . I got it . for my 4 index nodes I see one is having 8% cache resident rest of the nodes all are 0 as Index created only on this node (I guess) and index is not partition index either. As you are saying we can allocate more quota per node but that mean editing the cluster yaml if there is not other faster way (as I am trying to do in settings and it is always reverted to previous number and doesn’t preserver my changes)
Also for each node I have 10 GB index quota allocated in settings . so technically 4 nodes index can be total 40 GB .
In my case for Index on 180 M index items 8% on one is is cache resident with 7 GB memory used with 9 GB datasize with 31% fragmentation . Do you see any health issue here ?
node that target to be Index availability on 15B records … do you think its reasonable to make the index partitioned ?

It would be a good idea to create the index as partitioned index. For 15B items, a single node may not have enough resources. Another option you could consider is to create partial index with a where clause to reduce number of items per index. But that depends on your usage of the index and if it makes sense for the query you intend to run.

What you see is expected for a non-partitioned index. It can only reside on a single node.

I think you may need to edit your cluster yaml(I am not very familiar with that area). If cluster hasn’t been allocated the resources, you may not be able to increase the allocation from UI->Settings.

thanks @deepkaran.salooja
May be you can help me understanding one more issue in terms of data load …
When loading the data through python scripts from Oracle to CB we see Ops/sec is very small 1K ops/ seconds or half .
Guessing this is the speed how data as each documents is ingested from Oracle to CB . I see ~20-30% off CPU usages in all 3 data nodes.
How can we push the limit of ingestion to be blazing fast (not sure if network is bottleneck here ) ?

We have sufficient CPU and memory across all POD’s so exact bottleneck is not known . We have Purestorage so IO bottleneck should be not a problem .
Any pointer will be helpful .

@eldorado,

I am no expert in Oracle to couchbase migration but for the slow ops/sec, I think we should first identify where the bottleneck is. What is the average size of your doc and what is the available network bandwidth. We can try to roughly estimate if network is a bottleneck network or not.

Also, I see there are three stages in the pipeline: Reading data from Oracle, Processing it in Python script (if any processing is done), Uploading data to Couchbase. Can you let us know the Ops/sec at the end of each stage in the pipeline.

Thanks,
Varun

thanks @varun.velamuri - Sure we will get some data for you .
Coming to actual context of slow Index creation which I was talking to @deepkaran.salooja
In my 180M docs creating Primary Index took 7 mins and only 1 node was 100% busy as I haven’t partitioned .
CREATE PRIMARY INDEX TABLE_PRIMARY_INDEX ON TABLE

However when we do below it took 12 mins:
CREATE PRIMARY INDEX TABLE_PRIMARY_INDEX ON TABLE PARTITION BY HASH(META().id);
I look for “Cache Resident Percent” in Index Stats it was using 95% while Index creation is in progress . I have 4 nodes (all services) and all 4 Node was busy with 40% on Max Index RAM used ,

My idea was to split the load of Index creation across 4 nodes parallel so that it would be faster which I am not seeing here :
Not sure why I don’t see that performance of creating partition Index is faster .

@eldorado,

Among the many benefits of partitioned indexes, one of them is to offer horizontal scaling per index. If a single node does not have enough resources to fit the entire index, then partitioned index can help in those cases. There is an additional work done to achieve this. E.g., For each document, the HASH(META().id) has to be computed and the document has to be routed to the corresponding indexer node for indexing.

In this case, whether the index is in DGM or not will influence the build time of index. For the use case you are trying, what was the resident percent when you did:
CREATE PRIMARY INDEX TABLE_PRIMARY_INDEX ON TABLE

For comparing the build times, I would try the following: Increase the number of documents (or) reduce the quota so that index goes to DGM. With this new configuration, check the partitioned index build time.

Thanks,
Varun

@varun.velamuri - Understand .
I already have 200M documents on top of which I am trying to build Index .
As explained in below link I am getting rid of Primary Key Indexes .


So I have tried to create covered Index(Non Partitioned) and seeing that cache memory resident on one node is ~5-7% while Index build is going on …
I understand that 20% of Memory resident theory for Index but for DGM and plasma to work in best use cases . I have 10 GB RAM quota on each node so 40 GB total RAM quota for Index Services . I have separate 20 GB per node Data service RAM quota .
How do I know upfront that how much memory the Index will take so that I can plan for the IndexServiceMemory Quota ?

@eldorado,

I was checking the index build times on our perf cluster for both partitioned index and non-partitoned index. The test builds secondary index. When indexing 100M docs, the partitioned index build was almost 3 times faster than non-partitioned index (Non-partitioned index took 580 sec, partitioned index too 168 seconds). This is possibly due to our cluster configuration. So, I think we need to understand more about the test you are trying and see why there such a performance. Can you share the cbcollect logs so that we can study and understand the reason for such a build time.

For IndexServiceMemory quota, it depends on a lot of factors like doc size, number of indexes, type of index etc. I can give you a rough idea but this may not be completely accurate. If you are indexing 10M docs, and the avg. size of indexable value is 100 bytes (say), then for close to 100% resident ratio, data size with overheads would come to around (10M * (100 + 56) * 2). ~= 3G. I would set the quota to 4G to accommodate for indexer rss. This is just a rough calculation as there are many other factors that should be considered.

PS: If you are already an enterprise customer, you can get it touch with Couchbase Professional Services team to help you with sizing.

@varun.velamuri In a MDS Config : a Covered index took ~16 minutes on 184 M docs … The covered index on 11 data keys … where the entire document having ~50 keys . This covered index is non-partitioned .
There is no other activity ran while this index are being created.
I have 4 data and query nodes and 4 index and query nodes running in Pureblock storage and each node has been given 75 CPU max
The statistics graph looks like below while index creation was in progress …

for same index, partitioned in 3 nodes with partition by hash took : ~42 mins .
The statistics looks like below for one of the index node .

Note that the bucket after load is Uncompressed , no compaction ran and no fragmentation (because it says for Plasma index no need for fragmentation to run on bucket)

Those performance benchmark you have shared is absolutely what are we looking for because the timings has to be improved because we are talking about index on 150 B doc eventually.
IndexServiceMemQuota is 50GB in each node for 3 index nodes .

I am trying to find out where are having the bottleneck . Can you share all possible informations index-scan, index-cap , servicers what you need to look for to help further.

1 Like

@eldorado, Looking at the screen shots, I see 0 total indexed items and 0B average item size for non-partitioned index while I see 32.6M total indexed items and 45B average item size for partitioned index.

After the build finishes, can you compare the stats for both the indexes and see if there is any discrepancy.

thats is true number … for cross-check I have started rebuilding them again … and I see 0 total index items and 0 B avg item size for non-partition one . (this is captured during index build time ) .
Sorry in previous response there is typo I want to mean 4 data and query nodes and NOT 74