Incorrect status after Couchbase restart

Serge · May 13, 2019, 1:56pm

Hello,

When the Kafka connect is running and if a Couchbase instance is restarted (in a 3 nodes cluster) we start to see the exceptions from logs as below . Mostly stating bad match from the crash report . And I suppose we have these errors due to the index.
Kafka connec couchbase 3.4.2
Couchbase Community 6.0

[ns_server:error,2019-04-16T00:43:31.776+02:00,ns_1@el7323.ebc.local:service_status_keeper-index<0.439.0>:service_status_keeper:handle_cast:119]Service service_index returned incorrect status
[ns_server:error,2019-04-16T00:43:36.781+02:00,ns_1@el7323.ebc.local:service_status_keeper_worker<0.438.0>:rest_utils:get_json_local:63]Request to (indexer) getIndexStatus failed: {ok,
{{500,“Internal Server Error”},
[{“Content-Length”,“124”},
{“Date”,
“Mon, 15 Apr 2019 22:43:36 GMT”},
{“Content-Type”,
“application/json”}],
<<"{“code”:“error”,“error”:“Fail to retrieve cluster-wide metadata from index service”,“failedNodes”:[“el7324.ebc.local:9102”]}">>}}
[ns_server:error,2019-04-16T00:43:36.781+02:00,ns_1@el7323.ebc.local:service_status_keeper-index<0.439.0>:service_status_keeper:handle_cast:119]Service service_index returned incorrect status
[ns_server:error,2019-04-16T00:43:41.784+02:00,ns_1@el7323.ebc.local:service_status_keeper_worker<0.438.0>:rest_utils:get_json_local:63]Request to (indexer) getIndexStatus failed: {ok,
{{500,“Internal Server Error”},
[{“Content-Length”,“124”},
{“Date”,
“Mon, 15 Apr 2019 22:43:41 GMT”},
{“Content-Type”,
“application/json”}],
<<"{“code”:“error”,“error”:“Fail to retrieve cluster-wide metadata from index service”,“failedNodes”:[“el7324.ebc.local:9102”]}">>}}
[ns_server:error,2019-04-16T00:43:41.785+02:00,ns_1@el7323.ebc.local:service_status_keeper-index<0.439.0>:service_status_keeper:handle_cast:119]Service service_index returned incorrect status

[ns_server:debug,2019-04-15T23:57:22.949+02:00,ns_1@el7323.ebc.local:ns_memcached-shopping-basket<0.13178.3>:ns_memcached:init:158]Starting ns_memcached
[ns_server:debug,2019-04-15T23:57:22.949+02:00,ns_1@el7323.ebc.local:<0.12923.3>:ns_pubsub:do_subscribe_link:145]Parent process of subscription {bucket_info_cache_invalidations,<0.12922.3>} exited with reason shutdown
[ns_server:debug,2019-04-15T23:57:22.949+02:00,ns_1@el7323.ebc.local:<0.13179.3>:ns_memcached:run_connect_phase:181]Started ‘connecting’ phase of ns_memcached-shopping-basket. Parent is <0.13178.3>
[error_logger:error,2019-04-15T23:57:22.949+02:00,ns_1@el7323.ebc.local:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]** Generic server <0.12920.3> terminating
** Last message in was {‘EXIT’,<0.12925.3>,{badmatch,{error,einval}}}
** When Server state == {state,1,0,0,
{,},
{,},
{,},
connected,
{1555,365437,956520},
“shopping-basket”,#Port<0.23618>,
{interval,#Ref<0.0.2.79025>},
[{<0.12924.3>,#Ref<0.0.2.79053>},
{<0.12927.3>,#Ref<0.0.2.79052>},
{<0.12926.3>,#Ref<0.0.2.79051>}],
,undefined}
** Reason for termination ==
** {badmatch,{error,einval}}

[error_logger:error,2019-04-15T23:57:22.949+02:00,ns_1@el7323.ebc.local:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]Supervisor received unexpected message: {ack,<0.12920.3>,
{error,{badmatch,{error,einval}}}}

[error_logger:error,2019-04-15T23:57:22.950+02:00,ns_1@el7323.ebc.local:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]

david.nault · May 13, 2019, 11:00pm

Hi Serge,

Thank you for reporting this.

It’s too soon to know for certain, but I suspect the root cause might be the same as CBES-127, DCP client issuing a HELO request with a too-long user agent string.

You could confirm whether this is the issue by temporarily downgrading to version 3.4.1 of the Kafka connector.

Thanks,
David

Serge · May 14, 2019, 8:03am

Hi David,

Thanks a lot for your suggestion that indeed fixed our problem.
I’ve similar behavior with Syncgateway but I guess the root cause is totally different since there is no Kafka connector there.

Regards,

Serge

david.nault · May 21, 2019, 5:31pm

Hi @serge,

Just wanted to let you know we’ve released version 3.4.4 of the Kafka connector with a fix for the issue you were seeing. Please let us know if you run into any more trouble.

Thanks,
David