Create index is slow on the huge no. of documents

n1ql

#1

I am creating two index on a Couchbase cluster.

One is the primary index, the other is a secondary. I am creating these indexes on approximately 30 billions documents. The secondary index is for 3 elements.

I started to create them 7 hours ago. However the current progresses are 35 and 39 percent.
Is it usually take a time to create indexes on such huge data or is something is wrong on my environment ?
When do you think that the creation will finish ?

The size of the cluster is 4 nodes (16 cores for each) and the total index RAM quota is 10GB. 2 nodes are index server.

The index Settings is as follows:

Indexer Threads: 8
In Memory Snapshot Interval: 200 ms
Stable Snapshot Interval: 5000 ms
Max Rollback Points: 5
Indexer Log Level: info

Thanks


#2

The index build times can be high for a few reasons;

  • retrieval of the information from data service is slow.
  • index nodes can’t save the index to disk fast enough

There are a few options;

  • use defer_build option to build both indexes together. defer build will ensure you scan once and build both indexes.
  • you could also partition your indexes and get more nodes to parallelize your index build. for partitioning you can specify a filter (WHERE clause in CREATE INDEX). However I should note that there may be some queries that may not be able to take advantage of range scans in the index that is partitioned.
  • Last, We have another option in 4.5 called memory optimized indexes that can build the index much faster in memory - however given the count of the docs, I don’t think you will be able to fit your index into memory.

What is the document key size and index key size? just curious.
thanks
-cihan


#3

Hi cihangirb,

Thank you for your reply. The index key size is 45 bytes.
I am using 4 nodes for the cluster and each node is 16 cores and SSD storage on AWS.
I don’t think retrieval or save is slow, but what do you think ?

Thanks