I have a an urgent issue with our live couchbase cluster. We have 2 nodes configured and one node is being reported as down. The cluster has been running for a couple of months.
Windows Server 2012 R2
Couchbase was configured to use hostnames and not ip addresses.
Only have around 8000 items in total across 5 buckets
I can see that the Couchbase windows service is running on the failed node, I cannot however connect to the admin console - It just says “page cannot be found”. I have tried restarting the service as well as server. The CouchbaseServer service seems to startup fine but I still cannot connect.
Another issue is that I would like fail over the node that’s down, but for some reason when I view the buckets, the “Replica” counts are 0. Does this mean that the data has never been replicated? The buckets are configured to have 1 replica.
I have had issues with Couchbase on a single node before, and the fix was easy enough as it just required a re-install and a data restore. But now I have an additional node which seems to only contain half the data…?
I’m still able to make a backup of the data on the failed node via the cbbackup tool using the “couchbasefile://” url. I can also make a backup of the entire couchbase installation folder, which includes the data and index folders.
What is the best way to proceed based on the above configuration? First prize would be to get connectvity back, but failing that, would setting up a 3rd server, restoring the failed node’s data there, adding it to the cluster and rebalancing work?
Please let me know what logs / files will be useful in trying to troubleshoot the failed node!