So we have had 2 servers in our cluster running for over 100 days, without any issues. Today I went to add a new server to the cluster.
I was able to add the server successfully, but then once that was done and I tried to rebalance, the rebalance got about 18% done, but then there were errors in the rebalance (which I unfortunately did not capture). Then one of the original servers seemed to go into a loop of warming up. Then the second server started to do the same thing.
In the end the servers both became completely unreachable running at 100% (I couldn’t even ssh in).
Now they seem to have come back to being available to the cluster, but they can’t seem to finish warming up.
I am getting a bunch of errors like this one for different buckets:
Compactor for view access/_design/main
(pid [{type,view}, {name, <<“access/_design/main”>>}, {important,false}, {fa, {#Fun<compaction_new_daemon.25.86110551>, [<<“access”>>, <<"_design/main">>, {config, {30,undefined}, {30,undefined}, undefined,false,false, {daemon_config,30,131072, 20971520}}, false, {[{type,bucket}]}]}}]) terminated unexpectedly (ignoring this): {badmatch, {error, {{case_clause, {{error, vbucket_stream_not_found}, {bufsocket, #Port<11670.12123>, <<>>}}}, [{couch_dcp_client, init, 1, [{file, “/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/couch_dcp/src/couch_dcp_client.erl”}, {line, 312}]}, {gen_server, init_it, 6, [{file, “gen_server.erl”}, {line, 304}]}, {proc_lib, init_p_do_apply, 3, [{file, “proc_lib.erl”}, {line, 239}]}]}}} hide
Plus others like this for “projector” and “indexer”:
Service ‘projector’ exited with status 134. Restarting. Messages: github.com/couchbase/indexing/secondary/projector.(*Projector).doMutationTopic(0xc4201260a0, 0xc44cc6dc20, 0xc44cc60016, 0x0, 0x0) /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/projector/projector.go:375 +0x32b fp=0xc42f0f6f18 sp=0xc42f0f6d98 github.com/couchbase/indexing/secondary/projector.(*Projector).handleRequest(0xc4201260a0, 0x11e0100, 0xc4265f5800, 0x16) /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/projector/adminport.go:94 +0x562 fp=0xc42f0f6f90 sp=0xc42f0f6f18 runtime.goexit() /home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.3/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc42f0f6f98 sp=0xc42f0f6f90 created by github.com/couchbase/indexing/secondary/projector.(*Projector).mainAdminPort /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/projector/adminport.go:70 +0x8dc [goport(/opt/couchbase/bin/projector)] 2019/01/22 14:24:32 child process exited with status 134
What does this all mean - what is going on?
All the nodes are on the same version (5.1.1 build 5723). I won’t seem to come out of the warmup state.
HELP!
Thanks,
Scott