Indexes size and compaction


#1

Hi,

I have one cluster with one node.

I have 2 tera disk for data and 4 indexes.
I am indexing full ethereum block chain.
The problem is, that if I create the indexes in the start, they are filling the disk before compaction.

I deleted all the indexes, and started only the big one.
Its was around 800 gigs. Then today after compaction, I see that its on 100 gigs.

What is the best practices for my case?
Will 1 Tera disk for data and 2 Tera disk for indexing will be enough?
Can I save disk space in any way? after all its SSD and cost a lot.

And one more question, I didn’t found how I can move the indexes directory to a new location after installation.

@raju

Thanks,
Ady,


#2

@ady.shimony, are you using GSI or View based indexes? Also, share your couchbase server version.

And one more question, I didn’t found how I can move the indexes directory to a new location after installation.

It is not possible to change it after the setup has been done. You’ll have to reinstall couchbase on this node and change it while doing the initial node setup.


#3

@deepkaran.salooja

GSI
Community Edition 5.1.1 build 5723

Document key is 60 characters long and index key is 42 characters long and a number (long).

CREATE INDEX index_tx_from ON tx-history(from,blockNumber) WITH { “defer_build”:true }

300 million docs are in one the of indexes, the rest contain less documents.


#4

What is the compaction setting for indexes? You can check it under UI->Settings.


#5

Its on the default values, didn’t change it.


#6

The compaction settings look fine. Is your bucket loaded with the data when you create the index or is it the other way around when you see the 800 GB usage? It is better to load the data first and then create the index.

What kind of memory quota has been assigned to the index service?


#7

I am indexing the entire ethereum block chain.
Its an ongoing process, every 15 secs there are 100 ~ 300 new docs.
So the index must update itself all the time, I prefer to start the data and the index from the first block.

The target is that the user will install couchbase and my software, and will run it with minimum db maintenance.

In the case of 800 gig it was loaded with data, 217 million docs.

Indexing service memory as in default, 500 meg.


#8

Thanks for sharing the details.

Please try the following things:

  1. Change the compaction settings to run full compaction only on Sunday(once a week). It is better if you can schedule it during off peak timings. Most likely, it seems the full compaction is not able to keep up with the incoming mutations causing large disk usage. By default, the circular reuse keeps the fragmentation at 66%. So you know not see disk usage 3 times the compacted file size.

  2. Increase your memory quota to something more reasonable e.g. 4GB or above. This should ease some pressure on the system and it can work better.


#9

Thank you.

I need to figure out disk size for indexes, any way to calculate it?
If after compaction its ~100 gig , and I will have 3 more indexes, lets say total 250 gigs, will 1 tera will be enough?

Also, considering your suggestion for compaction on Sunday, will 500 gig disk will be enough for 250 gig of data after compaction?

What is the rule of thumb here, if any?


#10

The community edition storage engine doesn’t have accurate sizing. But as a ballpark, if you are dealing with 250GB fully compacted indexed data, then considering 66% fragmentation, you will need 750GB disk size on a regular day(worst case). When the full compaction runs on Sunday, it needs to write the data to a new file for compaction. The worst case being twice the size of the file being compacted.

Enterprise storage engine has much more efficient disk usage and doesn’t need separate full compaction.