Index files ./mainIndex/log.000000000000xx.data are filling my hard drive 100%

n1ql
index

#1

Hi
I have a small index of only 100 elements and suddenly it starts to generate 4GB files in /opt/couchbase/var/lib/couchbase/data/@2i/cache-idx-user_*_0.index/mainIndex/ and fill 100% mi hard disk :frowning:

$ ls -lh /opt/couchbase/var/lib/couchbase/data/@2i/cache-idx-user_4986327002958805671_0.index/mainIndex
total 17G
-rwxr-x--- 1 couchbase couchbase   36 jul 30 12:57 config.json
-rwxr-x--- 1 couchbase couchbase 8,0K ago  4 08:06 header.data
drwxr-x--- 3 couchbase couchbase 4,0K ago  4 07:56 index
-rwxr-x--- 1 couchbase couchbase 4,0G ago  4 07:02 log.00000000000054.data  <------ !!!!
-rwxr-x--- 1 couchbase couchbase 4,0G ago  4 06:54 log.00000000000055.data  <------ !!!!
-rwxr-x--- 1 couchbase couchbase 4,0G ago  4 07:25 log.00000000000056.data  <------ !!!!
-rwxr-x--- 1 couchbase couchbase 4,0G ago  4 07:56 log.00000000000057.data  <------ !!!!
-rwxr-x--- 1 couchbase couchbase 1,4G ago  4 08:06 log.00000000000058.data  <------ !!!!
  • How is it possible to have such a large log size if I only have 100 elements in the index and there is no any activity on the server? (it is a test environment) (the disk fill at night. Nobody was in the office)

The server has less than 10% of space in use and suddenly these files start to be generated and filled to 100% in just a few minutes :frowning:

  • What logs in should I look at to see what’s going on?

  • What is the function of these log.00000000000000xx.data files?

  • In the /var/lib/couchbase/data/@2i/xxxxxxxx_index_name_xxxxxxxxx.index folder, what is the difference between /mainIndex and /docIndex?

  • After Filling the server disk, the only solution I have found is to delete these files, do a “Drop Index” and then do the “Create Index” again (and it works without any problem for a few days until suddenly, in a few minutes the log.0000000000xx.data is generated a lot of GB).

More info:

  • The index is very simple

CREATE INDEX cache-idx-user ON cache(wir_id,wir_title,wir_date,wir_sou_id,wir_status,wir_typ_id) WHERE (type = “user”) WITH { “defer_build”:true }

  • Other subfolder:

$ ls -lh /opt/couchbase/var/lib/couchbase/data/@2i/cache-idx-user_4986327002958805671_0.index/mainIndex/index/checkpoint.00000000000412
total 12K
-rwxr-x— 1 couchbase couchbase 8,0K ago 4 07:56 header.data
-rwxr-x— 1 couchbase couchbase 4,0K ago 4 07:56 log.00000000000000.data <----- small
-rwxr-x— 1 couchbase couchbase 31 ago 4 07:56 meta.json

  • “/docIndex” of this index

$ ls -la /opt/couchbase/var/lib/couchbase/data/@2i/cache-idx-user_4986327002958805671_0.index/docIndex
total 15M
-rwxr-x— 1 couchbase couchbase 36 jul 30 12:57 config.json
-rwxr-x— 1 couchbase couchbase 8,0K ago 4 07:56 header.data
drwxr-x— 3 couchbase couchbase 4,0K ago 4 07:56 index
-rwxr-x— 1 couchbase couchbase 271M ago 4 07:56 log.00000000000000.data

$ ls -la /opt/couchbase/var/lib/couchbase/data/@2i/cache-idx-user_4986327002958805671_0.index/docIndex/index/checkpoint.00000000000416
total 12K
-rwxr-x— 1 couchbase couchbase 8,0K ago 4 07:56 header.data
-rwxr-x— 1 couchbase couchbase 4,0K ago 4 07:56 log.00000000000000.data
-rwxr-x— 1 couchbase couchbase 28 ago 4 07:56 meta.json

  • I have more index similar, but only hace problems with one:

$du -sh /opt/couchbase/var/lib/couchbase/data/@2i/*
82M cache-idx-posts_1107430388812631205_0.index
52M cache-IDX-PRIMARY_11659241970044229763_0.index
113M cache-idx-file_16400365949326360127_0.index
17G cache-idx-user_4986327002958805671_0.index <------------- 17 GB !!!
87M cache-idx-tags_18119205706417122373_0.index
93M cache-idx-test_6138298915443446591_0.index
72M persistent-IDX-PRIMARY_2507625354504552406_0.index
620K sessions-IDX-PRIMARY_15825472269477416696_0.index
4,4M MetadataStore

Thanks


#2

The backIndex and mainIndex are internal store structures maintained per index.
Could you share the split of disk size usage for both backIndex and mainIndex when it reaches 17G ?

Also, we could look at the following stats from the indexer stats endpoint: http://<indexer_node_address>:9102/stats/storage

lss_used_space, lss_data_size, items_count, lss_fragmentation.

Ideally, disk size should increase only if there are some mutations and items count is increasing.

Thanks


#3

Is it possible to share the dataset, so we can attempt a full reproduction?
Also which version of Couchbase Server are you using and on which platform?
Thanks


#4

Yes is posible :slight_smile:
I just re-created the index (I had deleted it to prevent it from filling up my hard drive again).

CREATE INDEX xxcache-idx-wire ON XXCache(wir_id,wir_title,wir_date,wir_sou_id,wir_status,wir_typ_id) WHERE (entitytype = “wire”) WITH { “defer_build”:true }

It’s a not a production enviroment, is small development environment with little data. Only 20GB of hard disk space and little use.
The index has only 158 elements.

At the end of post I leave you a copy of 1 record. (If you need the complete dataset tell me the format in which you need it and how I can generate it and copy in pastebin)

The dataset are texts of sports news written in Spanish, with signs and accents typical of this language.
Maybe the problem is the special characters?

We are using version 5.0.1 Community in RedHat 6 in production without problems (but we do not use indexing service).

In this small development environment (where we are having problems), we are evaluating the new features of CouchBase 5.5GA in an Ubuntu Server 18.04 virtual machine.
Before the problems started it was a RedHat7+CB5.0.1 that we have updated the nodes to Ubuntu18.04+CB5.5GA .
That’s why we started to suspect if it was a 5.5 bug.

Thank you for the information.
Having only 157 elements in the index but generating 17GB GB of log.00000000000000000000xx.data files, I thought that they would be simple log files without importance and that I could delete them in case of filling 100% the disk (the total disk space is only 20GB). Now I’ve learned that I can’t do it :frowning:

  • /mainIndex: 17GB
  • / backIndex: 0.3GB

I have just created the problematic index again and if it is reproduced again I will give you the real data.

I didn’t know this URL with so much information I will use it from now on :slight_smile:

lss_used_space: 1220608,
lss_data_size: 11540,
items_count: 158,
lss_fragmentation: 99,

It is a development environment with little use and few changes/mutations.
Disk filling has occurred at night when I was not working

Index: “XXCache:xxcache-idx-wire”,
Id: 5845518418984482000,
Stats: {
MainStore: {
memory_quota: 482240102,
punch_hole_support: true,
count: 158,
compacts: 1,
purges: 0,
splits: 0,
merges: 0,
inserts: 158,
deletes: 0,
compact_conflicts: 0,
split_conflicts: 0,
merge_conflicts: 0,
insert_conflicts: 0,
delete_conflicts: 0,
swapin_conflicts: 0,
persist_conflicts: 0,
memory_size: 29277,
memory_size_index: 0,
allocated: 29452,
freed: 175,
reclaimed: 175,
reclaim_pending: 0,
reclaim_list_size: 175,
reclaim_list_count: 1,
reclaim_threshold: 50,
allocated_index: 0,
freed_index: 0,
reclaimed_index: 0,
num_pages: 1,
items_count: 158,
num_rec_allocs: 159,
num_rec_frees: 1,
num_rec_swapout: 0,
num_rec_swapin: 0,
bytes_incoming: 22917,
bytes_written: 1220608,
write_amp: 65.14,
write_amp_avg: 53.26,
lss_fragmentation: 99,
lss_data_size: 11540,
lss_used_space: 1220608,
lss_num_reads: 0,
lss_read_bs: 0,
lss_blk_read_bs: 0,
lss_gc_num_reads: 0,
lss_gc_reads_bs: 0,
lss_blk_gc_reads_bs: 0,
lss_rdr_reads_bs: 0,
lss_blk_rdr_reads_bs: 0,
cache_hits: 168,
cache_misses: 0,
cache_hit_ratio: 1,
rlss_num_reads: 0,
rcache_hits: 10,
rcache_misses: 0,
rcache_hit_ratio: 1,
resident_ratio: 1,
mvcc_purge_ratio: 1,
currSn: 167,
gcSn: 165,
gcSnIntervals: “[0 91 138 166]”,
purger_running: false,
mem_throttled: false,
lss_throttled: false,
lss_head_offset: 0,
lss_tail_offset: 1220608,
num_wctxs: 14,
num_readers: 12,
num_writers: 4,
buf_memused: 51180,
page_bytes: 103,
page_cnt: 1,
page_itemcnt: 1,
avg_item_size: 103,
avg_page_size: 103
},

BackStore: {
memory_quota: 482240102,
punch_hole_support: true,
count: 158,
compacts: 6,
purges: 0,
splits: 0,
merges: 0,
inserts: 158,
deletes: 0,
compact_conflicts: 0,
split_conflicts: 0,
merge_conflicts: 0,
insert_conflicts: 0,
delete_conflicts: 0,
swapin_conflicts: 0,
persist_conflicts: 0,
memory_size: 24981,
memory_size_index: 0,
allocated: 102976,
freed: 77995,
reclaimed: 77995,
reclaim_pending: 0,
reclaim_list_size: 77995,
reclaim_list_count: 6,
reclaim_threshold: 50,
allocated_index: 0,
freed_index: 0,
reclaimed_index: 0,
num_pages: 1,
items_count: 158,
num_rec_allocs: 623,
num_rec_frees: 465,
num_rec_swapout: 0,
num_rec_swapin: 0,
bytes_incoming: 23549,
bytes_written: 1220608,
write_amp: 63.43,
write_amp_avg: 51.83,
lss_fragmentation: 99,
lss_data_size: 11456,
lss_used_space: 1220608,
lss_num_reads: 0,
lss_read_bs: 0,
lss_blk_read_bs: 0,
lss_gc_num_reads: 0,
lss_gc_reads_bs: 0,
lss_blk_gc_reads_bs: 0,
lss_rdr_reads_bs: 0,
lss_blk_rdr_reads_bs: 0,
cache_hits: 389,
cache_misses: 0,
cache_hit_ratio: 1,
rlss_num_reads: 0,
rcache_hits: 231,
rcache_misses: 0,
rcache_hit_ratio: 1,
resident_ratio: 1,
mvcc_purge_ratio: 1,
currSn: 167,
gcSn: 165,
gcSnIntervals: “[0 91 138 166]”,
purger_running: false,
mem_throttled: false,
lss_throttled: false,
lss_head_offset: 0,
lss_tail_offset: 1220608,
num_wctxs: 14,
num_readers: 0,
num_writers: 4,
buf_memused: 51738,
page_bytes: 69003,
page_cnt: 6,
page_itemcnt: 465,
avg_item_size: 148,
avg_page_size: 11500
}
}
},

1 EXAMPLE RECORD

{
“wir_id”: 164,
“wir_title”: “McLaren puso fin en Canad\u00e1 a una racha de 64 carreras puntuando”,
“wir_leadin”: “McLaren puso fin en Canad\u00e1 a una racha de 64 carreras puntuando”,
“wir_body”: "\n Madrid, 10 jun (EFE).- La escuder\u00eda McLaren puso fin este domingo en Montreal (Canad\u00e1) a una racha de 64 carreras consecutivas puntuando (Sergio P\u00e9rez fue und\u00e9cimo y Jenson Button duod\u00e9cimo), un resultado que los responsables achacan a una mala estrategia en la elecci\u00f3n de neum\u00e1ticos.\n El jefe de equipo, Martin Whitmarsh, reconoci\u00f3 que “todo el fin de semana fue dif\u00edcil”.\n “No hemos sido r\u00e1pidos. Despu\u00e9s de la clasificaci\u00f3n, en la que no nos fue bien, era muy dif\u00edcil que nos recuper\u00e1semos. Pudimos estar en los puntos, pero no lo hicimos, no fuimos lo bastante r\u00e1pidos para estar con los punteros”, admiti\u00f3.\n “Es uno de esos fines de semana”, a\u00f1adi\u00f3, “en los que, aparte de no tener un coche r\u00e1pido, ninguna cosa estuvo en su sitio”.\n Whitmarsh afirm\u00f3 que el fin de la racha puntuando no constituye una decepci\u00f3n especial. “Obviamente, es triste que haya terminado la racha, pero no nos preocupa eso. Por lo que estamos frustrados es porque no hicimos un mejor trabajo”. EFE\n ",
“wir_date”: “2013-06-10 14:40:00”,
“wir_sou_id”: 2,
“wir_status”: 1,
“wir_viewed”: 0,
“wir_typ_id”: 1,
“wir_facebook”: 0,
“wir_google”: 0,
“wir_meneame”: 0,
“wir_twitter”: 0,
“wir_sent”: 0,
“wir_linkedin”: 0,
“entitytype”: “wire”
}

Thank you so much for answering.


#5

We are facing the same problem with Couchbase Enterprise Edition 5.5.0 build 2958, so it would be really interesting if you can ensure if it is a 5.5 related bug?


#6

Hi Daniel
Our problem with the excessive growth of index files is also occurring in Enterprise Edition 5.5.0 build 2958. In a 3-node test mini-cluster with only 10000 items in the butcket and 100 items in the index.
In Comminity Edition 5.0.1 it doesn’t happen to us.

Maybe it was a 5.5 bug or a problem with my mini-cluster sizing because the server only had 2GB of total RAM and I saw in the logs the OOM (Kernel Out of Memory Killer) killing processes.

2 days ago I expanded the RAM and recreated the indices… I’m watching to see if it happens again or not.


#7

@daniel.moritz, @treo, yes this bug only affects Couchbase Enterprise Edition 5.5.0 build 2958. We are actively working to address it.


#8

Thank you very much for the honesty.

I have just reproduced the problem today in 5.5.
I have 2 similar clusters (3 nodes, same data, same indexes, same development environment with minimal requests), one with version 5.0.1 Community (no problems) and another with 5.5 Enterprise (no problems).
The problem has just been reproduced today.
I’m enclosing all the information you need in case it helps you to solve it.
Tell me if I can help you with anything or you need the dataset.

Today at night, 3 small index of only 100 elements suddenly starts to generate 4GB files in /opt/couchbase/var/lib/couchbase/data/@2i/*_0.index/mainIndex/log.00000000000009.data

158 index items (0 scan/sec 142B size/item)= 22KB data size = !!! 13GB disk size !!! and growing

screenshot_index_big_2018-06-13%2019-48-462018-08-10%2012-41-18

Index: “XXCache:xxcache-idx-wire” stats:

{
Id: 5845518418984482000,
Stats: {
MainStore: {
memory_quota: 483183820,
punch_hole_support: true,
count: 158,   <---------------------
compacts: 1,
purges: 0,
splits: 0,
merges: 0,
inserts: 158,   <---------
deletes: 0,
compact_conflicts: 0,
split_conflicts: 0,
merge_conflicts: 0,
insert_conflicts: 0,
delete_conflicts: 0,
swapin_conflicts: 0,
persist_conflicts: 0,
memory_size: 27085,  <-----------
memory_size_index: 0,
allocated: 92710,
freed: 65625,
reclaimed: 175,
reclaim_pending: 65450,
reclaim_list_size: 65625,
reclaim_list_count: 3,
reclaim_threshold: 50,
allocated_index: 0,
freed_index: 0,
reclaimed_index: 0,
num_pages: 1,
items_count: 158,
num_rec_allocs: 475,
num_rec_frees: 317,
num_rec_swapout: 0,
num_rec_swapin: 0,
bytes_incoming: 22917,
bytes_written: 44131770368,  <---------------------
write_amp: 65.14,
write_amp_avg: 1925721.97,
lss_fragmentation: 99, <------- (too small for autodefrag.)
lss_data_size: 11171,
lss_used_space: 2003805249,  <----------
lss_num_reads: 10744495,
lss_read_bs: 395349880,
lss_blk_read_bs: 44324900864,
lss_gc_num_reads: 10744495,
lss_gc_reads_bs: 395349880,
lss_blk_gc_reads_bs: 44324900864,
lss_rdr_reads_bs: 0,
lss_blk_rdr_reads_bs: 0,
cache_hits: 179,
cache_misses: 0,
cache_hit_ratio: 1,
rlss_num_reads: 0,
rcache_hits: 21,
rcache_misses: 0,
rcache_hit_ratio: 1,
resident_ratio: 1,
mvcc_purge_ratio: 1,
currSn: 23301,
gcSn: 23299,
gcSnIntervals: "[0 23256 23290 23300]",
purger_running: false,
mem_throttled: false,
lss_throttled: true,
lss_head_offset: 42127953920,
lss_tail_offset: 44131770368,
num_wctxs: 14,
num_readers: 12,
num_writers: 4,
buf_memused: 1173520,
page_bytes: 45937,
page_cnt: 3,
page_itemcnt: 317,
avg_item_size: 144,
avg_page_size: 15312
},

BackStore: {
memory_quota: 483183820,
punch_hole_support: true,
count: 158,  <-------------------
compacts: 8,
purges: 0,
splits: 0,
merges: 0,
inserts: 158,
deletes: 0,
compact_conflicts: 0,
split_conflicts: 0,
merge_conflicts: 0,
insert_conflicts: 0,
delete_conflicts: 0,
swapin_conflicts: 0,
persist_conflicts: 0,
memory_size: 25269,  <----------
memory_size_index: 0,
allocated: 155930,
freed: 130661,
reclaimed: 130661,
reclaim_pending: 0,
reclaim_list_size: 130661,
reclaim_list_count: 8,
reclaim_threshold: 50,
allocated_index: 0,
freed_index: 0,
reclaimed_index: 0,
num_pages: 1,
items_count: 158,
num_rec_allocs: 939,
num_rec_frees: 781,
num_rec_swapout: 0,
num_rec_swapin: 0,
bytes_incoming: 23549,
bytes_written: 46556536832,  <---------------
write_amp: 63.43,
write_amp_avg: 1977006.96,
lss_fragmentation: 99,
lss_data_size: 11384, <------
lss_used_space: 12034331500, <---------------
lss_num_reads: 11340059,
lss_read_bs: 386601617,
lss_blk_read_bs: 41723101184,
lss_gc_num_reads: 11340059,
lss_gc_reads_bs: 386601617,
lss_blk_gc_reads_bs: 41723101184,
lss_rdr_reads_bs: 0,
lss_blk_rdr_reads_bs: 0,
cache_hits: 34569,
cache_misses: 0,
cache_hit_ratio: 1,
rlss_num_reads: 0,
rcache_hits: 34411,
rcache_misses: 0,
rcache_hit_ratio: 1,
resident_ratio: 1,
mvcc_purge_ratio: 1,
currSn: 23301,
gcSn: 23299,
gcSnIntervals: "[0 23256 23290 23300]",
purger_running: false,
mem_throttled: false,
lss_throttled: true,
lss_head_offset: 34522193920,
lss_tail_offset: 46556536832,
num_wctxs: 14,
num_readers: 0,
num_writers: 4,
buf_memused: 1174968,
page_bytes: 116101,
page_cnt: 8,
page_itemcnt: 781,
avg_item_size: 148,
avg_page_size: 14512
}

docIndex size > mainIndex size

--- /opt/couchbase/var/lib/couchbase/data/@2i/XXCache_xxche-idx-wire_5845518418984481875_0.index ----------------------------------------------------------------------------------------------------------------
                         /..                                                                                                                                                                                       
   11,0 GiB [##########] /docIndex
    3,1 GiB [##        ] /mainIndex

@dev-cb1:/opt/couchbase/var/lib/couchbase/data/@2i/XXCache_xxcache-idx-wire_5845518418984481875_0.index/docIndex# ls -lh
total 7,6G
-rwxr-x--- 1 couchbase couchbase   36 ago  7 21:13 config.json
-rwxr-x--- 1 couchbase couchbase 8,0K ago 10 12:54 header.data
drwxr-x--- 3 couchbase couchbase 4,0K ago 10 04:50 index
-rwxr-x--- 1 couchbase couchbase 4,0G ago 10 12:41 log.00000000000009.data  <-----
-rwxr-x--- 1 couchbase couchbase 4,0G ago 10 12:32 log.00000000000010.data  <-----
-rwxr-x--- 1 couchbase couchbase 1,5G ago 10 12:54 log.00000000000011.data   <------

dev-cb1:/opt/couchbase/var/lib/couchbase/data/@2i/XXCache_xxcache-idx-wire_5845518418984481875_0.index/mainIndex# ls -lh
total 2,5G
-rwxr-x--- 1 couchbase couchbase   36 ago  7 21:13 config.json
-rwxr-x--- 1 couchbase couchbase 8,0K ago 10 12:56 header.data
drwxr-x--- 3 couchbase couchbase 4,0K ago 10 04:50 index
-rwxr-x--- 1 couchbase couchbase 3,3G ago 10 12:56 log.00000000000010.data   <------

Note: In other problematic indexes, the size of the directories is the opposite:
docIndex size < mainIndex size

--- /opt/couchbase/var/lib/couchbase/data/@2i/XXCache_xxcache-idx-user_2089964825977465142_0.index ----------------------------------------------------------------------------------------------------------
                         /..                                                                                                                                                                                       
   10,5 GiB [##########] /mainIndex
   49,1 MiB [          ] /docIndex

Index definition:

CREATE INDEX xxcache-idx-wire ON XXCache(wir_id,wir_title,wir_date,wir_sou_id,wir_status,wir_typ_id) WHERE (entitytype = “wire”)


#9

Thanks for providing the details.
We have identified the root cause of the problem and fixes have been proposed (MB-30731). The problem occurs when the number of items in the index are very low.


Thanks,
Sarath