Full index Hard Disk

Hi experts, I have 2 nodes, with hardisks dedicated to index for each one. One index hardisk in my second server is getting full, After restart CB server the problem is solved but for 2 or three days and I have to restart it again. Only in the second server is happening.

The configuration (not in production):

2 x server (1 disk for data / 1 disk for index)
SO: Debian
CB: 4.5.0-2601

Before restart SERVER-2:

Filesystem Size Used Avail Use% Mounted on
/dev/sda3 198G 74G 114G 40% /
/dev/sdb1 230G 218G 8.0K 100% /mnt/disk2 (index disk)

After restart SERVER-2:

Filesystem Size Used Avail Use% Mounted on
/dev/sda3 198G 71G 117G 38% /
/dev/sdb1 230G 137G 82G 63% /mnt/disk2 (index disk)

Do you have any information to try to solve the issue?.

Thanks!

Can you provide details of how many indexes are there on the second server? Data size, disk size, fragmentation of each index(available on UI) and what is the storage mode (Standard, Memory Optimized) and compaction mode(under Settings in UI).

Also can you check what files are taking up most space when you hit close to 100%.

Hi deepkaran, thanks, next the information:

Storage Mode: Standard GSI
Database Auto Compactation:
DB Fragmentation trigger: 30%
Index Fragmentation: Circular Write Mode

Indexes:




Right now the index disk on server-2 is full, and today im seeing that the index disk on server-1 is reaching 96%, I think that it has the same symptom.

SERVER-1:
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 198G 59G 129G 32% /
/dev/sdb1 230G 209G 9.0G 96% /mnt/disk2 (index disk)

SERVER-2:
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 198G 70G 118G 38% /
/dev/sdb1 230G 218G 8.0K 100% /mnt/disk2 (index disk)

# du -h /mnt/disk2
4.0K /mnt/disk2/index/.delete
24G /mnt/disk2/index/@2i/3Gsync_idx_pktcall_4489010307541555293_0.index
4.3G /mnt/disk2/index/@2i/3Gsync_idx_psiprou_15184179246179977310_0.index
47G /mnt/disk2/index/@2i/3Gsync_#primary_11464445008298211043_0.index
27G /mnt/disk2/index/@2i/3Gsync_idx_twamp_6471893295886341383_0.index
101G /mnt/disk2/index/@2i
101G /mnt/disk2/index
16K /mnt/disk2/lost+found
101G /mnt/disk2

If I sum the index disk files never reach the total capacity of the disk (230GB), I try to find hidden files or any other information but I couldnt find anything, I tried to execute manual compactation more than one time and nothing happend. The only workaround that I’ve found is to restart de CB server but the issue is not solved.

Thanks again for any information!

As the fragmentation is 0% for all the indexes, running compaction is not going to make a difference.

Indexer doesn’t use any hidden/tmp files for execution. At the time of compaction, there can be extra disk usage as data is copied over to a new file while the old file is still around. So one possibility is that you see high disk usage when compaction is in progress and restarting the server is aborting compaction and cleaning up the extra copy of the data.

But from your screenshots, fragmentation is at 0% which doesn’t indicate compaction is going to trigger.

In general, “du” should be able to tell what is taking up space on the disk. Unless you are able to figure out what files are taking up the extra space, it is difficult to say what the source of the problem is.

We are experiencing same issues with index data. Here’s our cluster: 9 nodes, 256G mem each, and on each node we have 3 disks, 1T each. My setup when configuring the server:

After the cluster was set up, we loaded it with some data, not much comparing to the capacity of the cluster:

Then I created primary index for metrics-metadata, which worked fine. When I created a GSI on that bucket, and after the index was 100% ready, I got warning about disk space full. One of the nodes (and only one) is 95% full on the index disk. Here’s the output of du:

/mnt/storage2/couchbase/data$ sudo du --max-depth=2 -h
4.0K    ./.delete
932G    ./@2i/metrics-metadata_metadata_type_5261382729763826900_0.index
932G    ./@2i
2.3M    ./@indexes/travel-sample
2.3M    ./@indexes
932G    .

While the data itself takes only 22G on the data disk, the index file is filling up the index disk. And I noticed it’s still growing even the index showed 100% ready. I had to drop the index. After a while, the index disk space is freed up.

Is it normal to have this huge index file with such a small amount of data? And why is this only happening on one of the 9 nodes? Can we spread the index file across the cluster?

Following the above post, the situation on our staging cluster is even worse. We have the same setup, and the same indices. The secondary index filled up the disk of one node while it’s at 95% ready. Now it’s stuck there, cannot finish. And if I try to drop that index from command line, I got an error stating this index is “not found”.

Any suggestions for this?

@hclc, which couchbase version are you using? If you are on 4.5.0, please switch to Circular Write Mode. (change compaction mode from UI “Settings” -> “Auto Compaction” -> “Index Fragmentation” -> “Circular Write Mode”).

In general, Indexer save 5 snapshots of data(at different points in time) for recovery. This can lead to more disk space usage than data size. Also the storage itself has write amplification due to MVCC architecture. When the compaction is in progress, data on the disk gets duplicated when the new compacted file gets created.

The write amplification can be prevented to a large extent by using Circular Write Mode in 4.5.0. More details here:
http://developer.couchbase.com/documentation/server/4.5/indexes/gsi-for-n1ql.html

Thanks for your information deepkaran. However we are using version 4.0. Any workaround for this version? Also, we have 9 nodes in this cluster with plenty of storage, can we save the indexer file in more than one disk?

You can try a couple of things:

Thank you! we cannot use WHERE clause because we want to index the whole bucket. But I set the rollback point and will give it another try.

Again, very important question (I think), is there a way to split the index file in to multiple nodes? Couchbase is a clustered system, why the index has to be on one node, and one disk? Any work around for this?

You can manually partition your index to be placed on multiple nodes e.g.

CREATE INDEX productName_index1 ON bucket_name(productName, ProductID) WHERE type=“product” AND productName BETWEEN “A” AND “K” USING GSI WITH {“nodes”:“node1:8091”};

CREATE INDEX productName_index2 ON bucket_name(productName, ProductID) WHERE type=“product” AND productName BETWEEN “K” AND “Z” USING GSI WITH {“nodes”:“node2:8091”};

With the indexes above, if you search for productName = “APPLE WATCH” the scans will go to productName_index1 and productName = “SAMSUNG WATCH” will end up on productName_index2.

1 Like

I set the rollback points to 2, and added the where clause, however the index size is still filling up my hard drive (over 850G now), and index is just 95% ready. Based on my experience last time, I cannot drop index at this time – will get an error saying “index not found”.

What options do I have? Do I have to watch the index grow and completely fill my disk? Is there a way to stop the index from building?

Then again, the total bucket size is only 150G (in memory, and on disk), why does the index have to be this big? Is there something I did wrong.

I’m using couchbase server 4.0.1.

Now I have successfully dropped all the index I created (manually partitioned as suggested in the above response), but I see the disk usage on one node is still growing.

It is the folder named “/mnt/storage2/couchbase/data/@2i”. It is close to 100% on the 1T disk now.

What is the content of this folder? How can we stop it from eating up the disk space?

2i is the directory for GSI indexes. You may be able to identify what index is growing and the log may tell you more about what the indexer is doing on that node.

Thanks! Inside that folder it was a file (or folder) named as my index’s name. It kept growing for about 15 minutes after I dropped the index, till my disk was 100%, then it finally cleaned itself.

Any idea about why the index has to be this big?

@hclc, as I mentioned in my earlier comment, the high disk space usage is due to high write amplification of storage engine due to MVCC architecture. And when the compaction is in progress, data on the disk gets duplicated when the new compacted file gets created, which takes lot more space.

This problem has already been solved by Circular Write Mode in 4.5.0. Please try that out.
More details here:
http://developer.couchbase.com/documentation/server/4.5/indexes/gsi-for-n1ql.html

Thanks @deepkaran.salooja. I thought I could fix this issue by tuning down the number of rollback points.

Looks like I need to upgrade to 4.5.0. Will try that out.