Index creation is very slow on a CE 6.6

I am helping a friend to install CB CE 6.6 and he has 5 servers each one with 4 x SSD NVME and 512 GB of ram. Each server is running data, query, index The servers memory is alocate for data - 200 GB and 300 GB for index per server.

The cluster has 2 buckets, each one with approx. 5 TB. (one bucket having 1.5B documents the other one 650M documents)

I am creating the indexes with defer_build: true (after all the documents were migrated from another cluster which needs to be decommissioned)

The problem I have is that the index building is super mega slow (something like 14% in 10 hours). The servers load is less than 10%, and are physical servers (not virtual machines), access to disk is super fast (4 X NVME SSD - Datacenter Edition Disks) and the servers have 10GB network cards. Network usage is around 100Mbps and disk usage is less than 120MBs Write and 30MBs Read.

Basically, servers are mostly idle.

I am trying to figure out what I can do to improve the index creation performance. Not sure if there is a Community Edition limitation or there is something in the configuration or with the hardware.

What can I investigate more?

Thank you

CE uses forestdb storage vs EE plasma . cc @varun.velamuri, @sduvuru

Hi @flaviu,

As @vsr1 has mentioned, EE uses plasma storage which scales very well with the large data sets. Please try with EE, on the same hardware and see if the index build performance is better.

Also, please check index memory quota (which can be found on settings page). The default value of index memory quota can be too small. Please allocated the appropriate memory quota to get better performance.

Thanks.

My friend cannot use CE at this point (it is a bootstrapped company).

The servers have more RAM than the index sizes (there are 2TB of RAM allocated just for indexes)

The indexes were not built after 5 days… This is absurd. CE is not usable for larger data sets…

You are basically pushing customers away with all these limitations on the CE (no SSL, unusable indexes on large data sets, and other limitations)

Your company should help any developer embrace your technology. But for some reasons instead of understanding that more developers (advocates in many company) mean more sales, you think that more limitations between the CE and the EE bring more sales.

In reality, it makes your product not used by the people who should be your portavoce…

Probably my friend will just change Couchbase with something else, I tried to convince him that your technology is amazing, but with CE this is not the case