Disk Write Queue - no items removed?

#1

I’m using a 3 node cluster, where I receive lots of timeout exceptions

java.lang.RuntimeException: java.util.concurrent.TimeoutException
        at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:75)
        at com.couchbase.client.java.CouchbaseBucket.get(CouchbaseBucket.java:128)
        at com.couchbase.client.java.CouchbaseBucket.get(CouchbaseBucket.java:123)

Investigating on the cause there are two things I found that my be different from what they should be:

  1. There is one node, whichs disk write queue seems to only grow (currently 623K) but not to decrease

  2. On this node the projector seems to run when I check the processes but noting is listening on port 9999 (“netstat -ntpl | grep 9999” returns no result):

    ps aux | grep projector
    498 18784 0.0 0.0 474172 2880 ? Sl 2015 10:51 /opt/couchbase/bin/projector -kvaddrs=127.0.0.1:11210 -adminport=:9999 127.0.0.1:8091

We are using Cocuhbase 4.0.0-4051 and Java SDK 2.2.4.

What I found so far, there should be backoff starting when we reach 1M items in the disk write queue. And if I understood right, removing or manual failover of the node should lead to data loss of those items in the disk write queue as they are not replicated/ persisted now.

Does anyone have recommendations on what I could do to not loose the data in the disk write queue? I could think of removing the node (after being sure not to loose any data) because I want to update to 4.1. anyway.
If the problems may be related to the projector, which is not listening - is there a way to (re)start the projector without data loss and removing the node?

Any help would be greaty appreciated.

#2

Found out in the projector logs, that port 9999 was in use when projector was started. Now there is no one listening on that port.

Is it possible to kill the projector process and restart it on the CLI without any data loss or other side-effects brining the node down?

#3

Hey @techilla,

I assume that you have reached a resolution for this issue yourself, but for reference the Couchbase babysitter process should appropriately restart any processes which prematurely terminate, so it should be safe to restart the projector process.