Couch base cluster (version 3.0.1) does not come up after servers restart


#1

Hi,

We are currently using Couchbase version 3.0.1.
We have a cluster with 5 nodes with replication factor 1, with 2 buckets configured. All the 5 nodes were restarted without bringing the couch base service down. But once the servers came up, couch base nodes always show pending status under ‘Server Nodes’.
Currently 2 nodes show as they are down. We are afraid of loosing the data.
All the permissions on the servers seem to be fine. And there is nothing much we could understand in the logs. We have also tried to do a rebalance. But it always fails.

When the servers came up, we saw that one node was missing from the cluster.
Also seeing the below when we are trying to rebalance:
Rebalance exited with reason {not_all_nodes_are_ready_yet,

Attached the screenshots for reference.
Please let know how to resolve this issue.


#3

First, before anything else, I’d recommend taking a filesystem level backup of the down nodes. There are ways to recover that data using CLI tools if needed.

I’d recommend having a look at the more detailed node logs. In particular, is there anything from ns_server at the ERROR level that indicates what’s happening?

You may also want to watch the processes owned by the couchbase user. If there is a problem during warmup and/or you’re getting core files, that could be preventing your startup. Your screenshot does indicate the memcached process is exiting.


#4

Hi Matt, thanks for the reply.
I am sharing the logs in the below location:

Can you please have a look at it.
Also, the couch base console page hangs when we go to the views section.

Regards,
Rajeev.


#5

Hi Matt,

We have restored the backup on all the nodes. And restarted the service on all the nodes.
Currently all the nodes are in ‘pend’ status (yellow state).

We see the below message in the logs:
Port server memcached on node ‘babysitter_of_ns_1@127.0.0.1’ exited with status 71.

Please advise on the same.

Regards,
Rajeev.