Rolling restart of cluster

I have a three node cluster that I have been building a prototype running Couchbase 3 on Linux. Linux often have updates, and I would love to update the node OS’s more often, but it seems to take a whole day to update the nodes. Failovers take a long time, rebalances fail on empty vbuckets which then lead to writing 1024+ dummy records to each bucket, so that rebalances work, and then remove the dummy records…

Does anyone have a standard procedure for doing a rolling restart of a cluster?

What version and OS are you using? This should be simple with 3.0, where we wouldn’t expect the kind of rebalance issues you report.

I am using 3.0.1. I found the rebalancing would get stuck at 80% or 75%. Once I followed the directions as seen in this stack overflow, http://stackoverflow.com/questions/28842349/couchbase-3-0-rebalance-stuck, the rebalance would complete. The graceful failovers would take a very long time. It would be nice to see something in the Couchbase documentation that is similar to this documentation for Elasticsearch:
https://www.elastic.co/guide/en/elasticsearch/guide/current/_rolling_restarts.html

Anyone have any thoughts? Is this in the wrong forum?

Hi envitraux,
I have raised your questions with our support team and waiting for them to come back to me. Hopefully I will have an answer shortly.

Thanks for your patience.

Failovers shouldn’t take much time unless you’re including rebalance duration into it. We basically promote already existing replicas vbuckets to active as part of failover.

For your specific case, I would suggest employing swap rebalance strategy. Mark X nodes for removal that are going for OS upgrade and add another set of X servers, trigger rebalance after that. That would certainly lower down amount of rebalances that you have to perform.

Yes, we’re aware of this bug. We are planning to roll out a CE version containing fix for it too.

Thanks. Once I figured out the empty vbucket problem. The whole process went much more smoothly. I will later post my code which cleans the vbuckets.

Keith

Here is the C# code I use to solve the vBucket problem that stops Rebalances and Failovers

public static void rebalanceFix(bool create = true, string suffix = "")
        {
            string[] bucks = { "buck1", "buck2", "buck3", "default" };

            foreach (string buckname in bucks)
            {

                using (var bucket = CMDMSvc.CouchbaseManager.cluster.OpenBucket(buckname))
                {
                    for (int i = 1; i < 3080; i++)
                    {
                        var key = "rebalance::" + i.ToString() + (!string.IsNullOrWhiteSpace(suffix) ? "::" + suffix : "");
                        if (create)
                        {
                            Couchbase.Document<string> doc = new Couchbase.Document<string>();

                            doc.Id = key;//"rebalance::" + i.ToString();
                            //doc.Expiry = 10 * 60 * 3 * 3 * 1000; // Note I don't use this because I could not figure the time to do a rebalance. So I call this routine again to remove the keys after rebalance
                            doc.Content = "test" + i.ToString();

                            var result = bucket.Upsert<string>(doc);

                            Console.WriteLine(doc.Id + result.Message + result.Success.ToString());
                        }
                        else
                        {
                            var resultdel = bucket.Remove(key);
                            Console.WriteLine(key + resultdel.Message + resultdel.Success.ToString());

                        }
                    }

                }
            }
        }