Readiness Probe failures seen for Couchbase cluster pods when deployed using the Kubernetes operator

After deploying the Couchbase cluster the readiness probe of the pods fail. The /opt/couchbase/var/lib/couchbase/logs/error.log file is below -

[ns_server:error,2019-06-18T06:54:45.971Z,ns_1@127.0.0.1:<0.334.0>:menelaus_web:loop:143]Server error during processing: ["web request failed",
                             {path,"/node/controller/rename"},
                             {method,'POST'},
                             {type,exit},
                             {what,
                              {{{{normal,
                                  {gen_server,call,[<0.565.0>,pause]}},
                                 {gen_server,call,
                                  [remote_monitors,
                                   {register_node_renaming_txn,
                                    <0.571.0>}]}},
                                {gen_server,call,
                                 [dist_manager,
                                  {adjust_my_address,
                                   "couchbase-cluster-cbop-test-couchbase-cluster-0000.couchbase-cluster-cbop-test-couchbase-cluster.cbop-test.svc",
                                   true,#Fun<ns_cluster.8.22381509>},
                                  infinity]}},
                               {gen_server,call,
                                [ns_cluster,
                                 {change_address,
                                  "couchbase-cluster-cbop-test-couchbase-cluster-0000.couchbase-cluster-cbop-test-couchbase-cluster.cbop-test.svc"},
                                 30000]}}},
                             {trace,
                              [{gen_server,call,3,
                                [{file,"gen_server.erl"},{line,188}]},
                               {menelaus_web_node,handle_node_rename,1,
                                [{file,"src/menelaus_web_node.erl"},
                                 {line,424}]},
                               {request_throttler,do_request,3,
                                [{file,"src/request_throttler.erl"},
                                 {line,59}]},
                               {menelaus_web,loop,2,
                                [{file,"src/menelaus_web.erl"},
                                 {line,121}]},
                               {mochiweb_http,headers,5,
                                [{file,
                                  "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"},
                                 {line,94}]},
                               {proc_lib,init_p_do_apply,3,
                                [{file,"proc_lib.erl"},{line,239}]}]}]
> [ns_server:error,2019-06-18T06:54:52.365Z,ns_1@couchbase-cluster-cbop-test-couchbase-cluster-0000.couchbase-cluster-cbop-test-couchbase-cluster.cbop-test.svc:query_stats_collector<0.799.0>:rest_utils:get_json_local:63]Request to (n1ql) /admin/stats failed: {error,
>                                         {econnrefused,
>                                          [{lhttpc_client,send_request,1,
>                                            [{file,
>                                              "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
>                                             {line,220}]},
>                                           {lhttpc_client,execute,9,
>                                            [{file,
>                                              "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
>                                             {line,169}]},
>                                           {lhttpc_client,request,9,
>                                            [{file,
>                                              "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
>                                             {line,92}]}]}}
> [ns_server:error,2019-06-18T09:51:59.915Z,ns_1@couchbase-cluster-cbop-test-couchbase-cluster-0000.couchbase-cluster-cbop-test-couchbase-cluster.cbop-test.svc:<0.12837.14>:menelaus_web_alerts_srv:can_listen:530]Cannot listen due to nxdomain from inet:getaddr
> 
> [ns_server:error,2019-06-18T11:19:23.916Z,ns_1@couchbase-cluster-cbop-test-couchbase-cluster-0000.couchbase-cluster-cbop-test-couchbase-cluster.cbop-test.svc:<0.16360.21>:menelaus_web_alerts_srv:can_listen:530]Cannot listen due to nxdomain from inet:getaddr
> 
> [ns_server:error,2019-06-18T11:19:32.915Z,ns_1@couchbase-cluster-cbop-test-couchbase-cluster-0000.couchbase-cluster-cbop-test-couchbase-cluster.cbop-test.svc:<0.16752.21>:menelaus_web_alerts_srv:can_listen:530]Cannot listen due to nxdomain from inet:getaddr
> 
> [ns_server:error,2019-06-19T04:29:02.915Z,ns_1@couchbase-cluster-cbop-test-couchbase-cluster-0000.couchbase-cluster-cbop-test-couchbase-cluster.cbop-test.svc:<0.4912.105>:menelaus_web_alerts_srv:can_listen:530]Cannot listen due to nxdomain from inet:getaddr
> 
> [ns_server:error,2019-06-19T04:29:11.916Z,ns_1@couchbase-cluster-cbop-test-couchbase-cluster-0000.couchbase-cluster-cbop-test-couchbase-cluster.cbop-test.svc:<0.5317.105>:menelaus_web_alerts_srv:can_listen:530]Cannot listen due to nxdomain from inet:getaddr

I tried setting up Couchbase using helm charts as well as manually deploying each yaml file.

Couchbase Yaml Source - Autonomous Operator 1.2 – Open Source Kubernetes downloaded from couchbase.com
Helm Repo - https://github.com/couchbase-partners/helm-charts
There were no changes made to the deployment files.

Couchbase Cluster:
baseImage: “couchbase/server”
version: “enterprise-6.0.1” and “enterprise-6.0.0”

Operator Version: couchbase/operator:1.2.0

Deployed the service on IBM Cloud’s Kubernetes service.

That’s definitely more of a question for the Couchbase server team. The logs show the name did change eventually so it’s most likely misleading.

In general 50 character DNS labels are a bad idea as it will go over the 63 octet limit when we start allocating resources for persistent volume claims and services. Try and use a shorter cluster name and see if that helps.

1 Like

As an aside we set the readiness flag only when data has been balanced across the cluster so Kubernetes upgrade doesn’t start blowing your data away (I’m guessing you don’t want data loss?), this is so the system honors pod disruption budgets.

Thanks a lot. Shortening the name worked.

1 Like