Server going down while XDCR


#1

I am getting the following errors while I am trying to do XDCR.
After the errors, all nodes go down (or pending).
Are there things to check before doing XDCR?

Control connection to memcached on ‘ns_1@10.10.36.122’ disconnected: {{badmatch,
{error,
closed}},
[{mc_client_binary,
cmd_vocal_recv,
5,
[{file,
“src/mc_client_binary.erl”},
{line,
151}]},
{mc_client_binary,
select_bucket,
2,
[{file,
“src/mc_client_binary.erl”},
{line,
346}]},
{ns_memcached,
ensure_bucket,
2,
[{file,
“src/ns_memcached.erl”},
{line,
1269}]},
{ns_memcached,
handle_info,
2,
[{file,
“src/ns_memcached.erl”},
{line,
744}]},
{gen_server,
handle_msg,
5,
[{file,
“gen_server.erl”},
{line,
604}]},
{ns_memcached,
init,
1,
[{file,
“src/ns_memcached.erl”},
{line,
171}]},
{gen_server,
init_it,
6,
[{file,
“gen_server.erl”},
{line,
304}]},
{proc_lib,
init_p_do_apply,
3,
[{file,
“proc_lib.erl”},
{line,
239}]}]}

And the following…

Port server memcached on node ‘babysitter_of_ns_1@127.0.0.1’ exited with status 137. Restarting. Messages: Mon Mar 2 14:34:58.096133 KST 3: (default) DCP (Producer) eq_dcpq:xdcr:default-d26f8afda309eef8fbc3f77e27f654d2 - (vb 669) Backfill complete, 4636 items read from disk, last seqno read: 45965
Mon Mar 2 14:34:58.096167 KST 3: (default) Backfill task (1 to 45965) finished for vb 669 disk seqno 45965 memory seqno 45965
Mon Mar 2 14:34:58.108894 KST 3: (default) DCP (Producer) eq_dcpq:xdcr:default-32599121cf8fda7183c7e539896b07ea - (vb 972) Sending disk snapshot with start seqno 0 and end seqno 45349
Mon Mar 2 14:34:59.921341 KST 3: (default) DCP (Producer) eq_dcpq:mapreduce_view: default _design/my_view (prod/main) - (vb 657) Stream closing, 12210 items sent from disk, 0 items sent from memory, 42462 was last seqno sent
Mon Mar 2 14:35:00.037592 KST 3: (default) DCP (Producer) eq_dcpq:mapreduce_view: default _design/my_view (prod/main) - (vb 0) stream created with start seqno 20324 and end seqno 20327


The hardware spec on source cluster is:

  • 4vCPU
  • 8GB Mem
  • Couchbase Server CE 3.0.1

The hardware spec on destination cluster is:

  • AWS EC2
  • 2vCPU
  • 15GiB Mem
  • Couchbase Server CE 3.0.1

XDCR Settings for the source cluster

  • Max Replications per Bucket: 8
  • workers per Replication: 2
  • Checkpoint Interval: 1800
  • Batch Count: 500
  • Batch Size (kB): 2048
  • Failure Retry Interval: 15
  • Optimistic Replication Threshold: 256

Maybe the settings are too high for the above hardware spec?


#2

@Dynamicscope looking at the hardware spec of the nodes, it looks pretty undersized for XDCR. Typical recommendation is to have 4 CPU cores for key-value operations, +1 extra core for each bucket being replicated over XDCR and +1 extra core for each design document you have on the cluster.