Why is my Couchbase Indexer failing?

index
n1ql
#1

Logs show this, repeatedly.

>  Service 'indexer' exited with status 2. Restarting. Messages:
>     goproj/src/github.com/couchbase/plasma/page.go:826 +0x111
>     github.com/couchbase/plasma.(*Plasma).Persist(0xc435096400, 0x7f0498362e00, 0xc436f38600, 0xc4374c1b80, 0x0, 0x0)
>     goproj/src/github.com/couchbase/plasma/persistor.go:139 +0x174
>     github.com/couchbase/plasma.(*Plasma).PersistAll2.func1(0x7f0498362e00, 0x0, 0x0, 0xffffffffffffffff, 0x21, 0x21)
>     goproj/src/github.com/couchbase/plasma/persistor.go:182 +0x5a
>     github.com/couchbase/plasma.(*Plasma).VisitPartition(0xc435096400, 0x0, 0x0, 0xffffffffffffffff, 0xc440d28ce0, 0x0, 0x0)
>     goproj/src/github.com/couchbase/plasma/page_visitor.go:64 +0x1ef
>     github.com/couchbase/plasma.(*Plasma).PageVisitor.func1(0xc440d28cf0, 0xc440d28d00, 0x1, 0x1, 0xc435096400, 0xc440d28ce0, 0x0, 0x0, 0xffffffffffffffff)
>     goproj/src/github.com/couchbase/plasma/page_visitor.go:40 +0x89
>     created by github.com/couchbase/plasma.(*Plasma).PageVisitor
>     goproj/src/github.com/couchbase/plasma/page_visitor.go:41 +0x1ae
>     [goport(/opt/couchbase/bin/indexer)] 2019/05/07 15:00:54 child process exited with status 2

Couchbase was working fine until today.

The Java SDK always gives an error “Indexer In Warmup State. Please retry the request later.” The indexing GUI has red margins, as if the indexes are unbuilt. Dropping indexes from the GUI or the Java SDK always fails.

Restarting the Couchbase server does not help.

This thread shows the same error message on Windows. It says that this can happen if the data is corrupted on power-off. I certainly hope that that did not happen. Any machine can potentially lose power, and unrecoverable corruption of production data is a reason to strictly avoid Couchbase.

This is “Enterprise Edition 6.0.1 build 2037 ,” a single node on a development laptop, Ubuntu 18.10.

1 Like
#2

Please share the indexer.log or share the cbcollect from UI->Logs. We’ll need to look at the full stack to understand the exact problem.

#3

UI-> Logs , then “collect logs”, gives this,

 Error: Unable to collect logs from the following nodes:
127.0.0.1 Node errors:
**127.0.0.1**File "/opt/couchbase/bin/cbcollect_info", line 331 except OSError, e: ^ SyntaxError: invalid syntax

My Ubuntu 18.10 has Python3 on the system Python, but I see that cbcollect still uses Python2. (When installing Couchbase, I also had to wrestle with Python2/3 errors).

I did not uninstall/install Python recently, but I did attempt to install cbc. I think that the indexer started failing before that, but anyway,these instructions for installing cbc , failed with the message shown below.

(Note that sudo was required for the following command.)

sudo perl couchbase-csdk-setup
...

Running apt-get -qq update..
Running: apt-get -q install libcouchbase2-core libcouchbase2-libevent libcouchbase2-bin libcouchbase-dev
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package libcouchbase2-core
E: Unable to locate package libcouchbase2-libevent
E: Unable to locate package libcouchbase2-bin
E: Unable to locate package libcouchbase-dev
Couldn't install! at couchbase-csdk-setup line 207, <STDIN> line 2.
...
#4

Please share the indexer.log o
Here it is, with index-names redacted: (It is 20 MB and as a text file, cannot be attached inside the forum)
https://drive.google.com/file/d/1US40mQBBbaeZ7zGA_B-kA9BifYRCDk_y/view?usp=sharing

#5

Further analysis shows that /opt/couchbase/var/lib/couchbase/data/@2i has 36 GB and may have filled my 100 GB system disk. I thought I had configured Couchbase to put data on my 1 TB data disk, but I can see that this may be the root cause.

But in that case… why don’t the various error messages (above) say “disk space low”?

#6

Hi @Joshua_Fox,

You can see Alert on couchbase server Web UI when disk is getting full on the couchbase cluster nodes.

#7

Amit, thank you.

Thank you. I suggest that

  1. Couchbase should shutdown cleanly when 3% of the disk remains. The user cannthen clean up and restart. This is better than just entering an unclear error state and potentially borking the OS.

  2. The advance warning is valuable, but when the disk fills up, Couchbase should clearly shows"disk full" with errors

1 Like
#8

Hi @Joshua_Fox,

Thanks a lot for sharing the log file. Looks like it may be a newly identified bug. I have opened MB-34153.

Please note that the couchbase indexing should not panic in case of disk getting full. It should keep retrying the write operation until the error goes away. So, any panic is a potential bug. Thanks for reporting it.

Starting with couchbase server 5.5, couchbase indexing service detects the in disk corruption (if any) and ensures availability of non-corrupt indexes. MB-28139. Please note that the corruption (as mentioned in the other forum post) could have been caused by the events outside of the couchbase’s control (for example hardware misbehaving).

Finally,
Thanks for the suggestion on better handling of the disk full scenario. I will make a suggestion for this internally.

#9

Thank you. I understand that resource exhaustion is difficult to do correctly, so my 2 cents as a user may help here.

amit.kulkarni

      Couchbase




    May 13

Hi @Joshua_Fox,

Thanks a lot for sharing the log file. Looks like it may be a newly identified bug. I have opened MB-34153.

Please note that the couchbase indexing should not panic in case of disk getting full. It should keep retrying the write operation until the error goes away.

Really? Shouldn’t Couchbase stop indexing when the disk is more than 97% full? After all, you don’t want to make things worse.

So, any panic is a potential bug. Thanks for reporting it.

Starting with couchbase server 5.5, couchbase indexing service detects the in disk corruption (if any) and ensures availability of non-corrupt indexes. MB-28139. Please note that the corruption (as mentioned in the other forum post) could have been caused by the events outside of the couchbase’s control (for example hardware misbehaving).

I was running Couchbase in a GCE VM but also on my laptop, which is where I experienced disk exhaustion.

This may or may not be an exceptional use case, but here goes:

  1. Couchbase did not allow me to define a data-directory on my large data disk – it simply rejected such directory choices-- so I left the data on my smaller system disk.

  2. The data was protected deep inside a system directory (opt) so that Baobab and similar didn’t even show me the cause of the disk exhaustion until I retried with sudo.

  3. Soon after I forceably deleted this data, my OS died. (The fan spun noisily for a week. After it was cleaned, Ubuntu would not boot up and I had to reinstall an Ubuntu OS, which did work.)

I do *not *think that Couchbase caused it, but it is true that these occurred at the same time. One can consider a hardware failure causing problems with Couchbase (though I don’t think that happened),