Couchbase down for "read error"


#1

The Couchbase Version: 2.1.0 enterprise edition (build-718-rel)
The couchbase server down abnormally,below is some logs.
Brief info from memcached.log:

Thu Mar 6 15:45:08.311128 CST 3: 50 Closing connection due to read error: Connection reset by peer Thu Mar 6 15:45:08.385636 CST 3: (presence) Trying to connect to mccouch: "127.0.0.1:11213" Thu Mar 6 15:45:08.385976 CST 3: (presence) Connected to mccouch: "127.0.0.1:11213" Thu Mar 6 15:45:09.304519 CST 3: Extension support isn't implemented in this version of bucket_engine Thu Mar 6 15:45:09.325151 CST 3: (presence) Failed to load mutation log, falling back to key dump Thu Mar 6 15:45:09.611628 CST 3: (presence) Shutting down tap connections! Thu Mar 6 15:45:09.611687 CST 3: (presence) Stopping warmup while engine is loading data from underlying storage, shutdown = yes Thu Mar 6 15:45:09.611786 CST 3: (presence) warmup completed in 1222 ms Thu Mar 6 15:56:02.109629 CST 3: (usercenter) Trying to connect to mccouch: "127.0.0.1:11213" Thu Mar 6 15:56:02.109956 CST 3: (usercenter) Connected to mccouch: "127.0.0.1:11213" Thu Mar 6 15:56:02.125553 CST 3: (usercenter) Warning: failed to load the engine session stats due to IO exception "basic_ios::clear" Thu Mar 6 15:56:02.125620 CST 3: (usercenter) Failed to load mutation log, falling back to key dump Thu Mar 6 15:56:02.125706 CST 3: (usercenter) metadata loaded in 13 ms Thu Mar 6 15:56:02.125697 CST 3: Extension support isn't implemented in this version of bucket_engine Thu Mar 6 15:56:02.125852 CST 3: (usercenter) warmup completed in 13 ms Thu Mar 6 15:56:03.649318 CST 3: (usercenter) TAP (Consumer) eq_tapq:anon_1 - Reset vbucket 390 was completed succecssfully...
info from error.log
[ns_server:error,2014-03-06T14:56:29.436,ns_1@192.168.100.106:<0.26823.1>:ns_memcached:verify_report_long_call:294]call topkeys took too long: 633980 us [error_logger:error,2014-03-06T15:32:12.320,ns_1@192.168.100.106:error_logger<0.6.0>:ale_error_logger_handler:log_msg:76]** Generic server 'couch_stats_reader-presence' terminating ** Last message in was refresh_stats ** When Server state == {state,"presence",1394091122315,[]} ** Reason for termination == ** {timeout,{gen_server,call, [dir_size, {dir_size,"/opt/couchbase/var/lib/couchbase/data/presence"}]}}

[error_logger:error,2014-03-06T15:32:12.359,ns_1@192.168.100.106:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: couch_stats_reader:init/1
pid: <0.26815.1>
registered_name: 'couch_stats_reader-presence’
exception exit: {timeout,
{gen_server,call,
[dir_size,
{dir_size,
"/opt/couchbase/var/lib/couchbase/data/presence"}]}}
in function gen_server:terminate/6
ancestors: [‘single_bucket_sup-presence’,<0.26794.1>]
messages: [refresh_stats]
links: [<0.26795.1>,<0.294.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 6765
stack_size: 24
reductions: 112789821687
neighbours:

And [ns_server:info,2014-03-06T15:45:06.037,ns_1@192.168.100.106:ns_config_events<0.276.0>:ns_ports_setup:memcached_force_killer_fn:159]Sent force death command to own memcached: {send_to_port,<>} [ns_server:info,2014-03-06T15:45:06.607,ns_1@192.168.100.106:<0.1512.0>:mc_connection:run_loop:202]mccouch connection was normally closed [ns_server:info,2014-03-06T15:45:06.609,ns_1@192.168.100.106:<0.16942.0>:mc_connection:run_loop:202]mccouch connection was normally closed [ns_server:info,2014-03-06T15:45:06.671,ns_1@192.168.100.106:<0.26822.1>:mc_connection:run_loop:202]mccouch connection was normally closed [error_logger:error,2014-03-06T15:45:06.926,ns_1@192.168.100.106:error_logger<0.6.0>:ale_error_logger_handler:log_report:72] =========================CRASH REPORT========================= crasher: initial call: erlang:apply/2 pid: <0.4224.0> registered_name: [] exception error: no match of right hand side value {error,closed} in function mc_client_binary:cmd_binary_vocal_recv/5 in call from mc_client_binary:wait_for_checkpoint_persistence/3 in call from ns_memcached:perform_wait_for_checkpoint_persistence/5 ancestors: ['ns_memcached-usercenter','single_bucket_sup-usercenter', <0.1487.0>] messages: [] links: [<0.1501.0>,#Port<0.8699>] dictionary: [] trap_exit: false status: running heap_size: 121393 stack_size: 24 reductions: 11072 neighbours:

[rebalance:warn,2014-03-06T15:45:06.928,ns_1@192.168.100.106:<0.20193.3>:ebucketmigrator_srv:do_confirm_sent_messages:744]Got error while trying to read close ack:{error,einval}

[ns_server:debug,2014-03-06T15:45:06.046,babysitter_of_ns_1@127.0.0.1:<0.81.0>:ns_port_server:handle_info:80]Sending the following to port: <<“die!\n”>>
[ns_server:info,2014-03-06T15:45:06.340,babysitter_of_ns_1@127.0.0.1:<0.81.0>:ns_port_server:log:168]memcached<0.81.0>: ‘die!’ on stdin.</strong. Exiting super-quickly

[error_logger:error,2014-03-06T15:45:06.433,babysitter_of_ns_1@127.0.0.1:error_logger<0.6.0>:ale_error_logger_handler:log_msg:76]** Generic server <0.81.0> terminating
** Last message in was {#Port<0.2964>,{exit_status,0}}
** When Server state == {state,#Port<0.2964>,memcached,
{["‘die!’ on stdin. Exiting super-quickly",
“Thu Mar 6 15:45:05.449182 CST 3: (presence) TAP (Producer) eq_tapq:replication_ns_1@192.168.100.107 - disconnected, keep alive for 300 seconds”,
“Thu Mar 6 15:45:05.442200 CST 3: (presence) TAP (Producer) eq_tapq:replication_ns_1@192.168.100.104 - disconnected, keep alive for 300 seconds”,
“Thu Mar 6 15:45:04.780402 CST 3: (usercenter) TAP (Producer) eq_tapq:replication_ns_1@192.168.100.107 - disconnected, keep alive for 300 seconds”],
“Thu Sep 26 12:02:50.878582 CST 3: (presence) TAP (Producer) eq_tapq:rebalance_184 - Sending TAP_VBUCKET_SET with vbucket 184 and state “pending””,
"Thu Sep 26 12:02:50.878775 CST 3: (presence) Clean up “eq_tapq:rebalance_183"”,
“Thu Sep 26 12:02:51.316374 CST 3: (presence) TAP (Producer) eq_tapq:replication_building_181_‘ns_1@192.168.100.107’ - Connection is closed by force”,
“Thu Sep 26 12:02:51.372818 CST 3: (presence) TAP (Producer) eq_tapq:replication_building_181_‘ns_1@192.168.100.107’ - Clear the tap queues by force”,
“Thu Sep 26 12:02:51.375258 CST 3: (presence) TAP takeover is completed. Disconnecting tap stream <eq_tapq:rebalance_181>”,
“Thu Sep 26 12:02:51.375319 CST 3: (presence) TAP (Producer) eq_tapq:rebalance_181 - disconnected”,
“Thu Sep 26 12:02:51.380750 CST 3: (presence) TAP (Producer) eq_tapq:replication_building_182_‘ns_1@192.168.100.107’ - Connection is closed by force”,
“Thu Sep 26 12:02:51.437432 CST 3: (presence) TAP (Producer) eq_tapq:rebalance_182 - Sending TAP_VBUCKET_SET with vbucket 182 and state “pending””,
“Thu Sep 26 12:02:51.439579 CST 3: (presence) TAP (Producer) eq_tapq:rebalance_182 - VBucket <182> is going dead to complete vbucket takeover”,
“Thu Sep 26 12:02:51.440348 CST 3: (presence) TAP (Producer) eq_tapq:rebalance_182 - Sending TAP_VBUCKET_SET with vbucket 182 and state “active””,
“Thu Sep 26 12:02:52.274523 CST 3: (presence) TAP (Producer) eq_tapq:replication_building_180_‘ns_1@192.168.100.107’ - Connection is closed by force”,
“Thu Sep 26 12:02:52.323935 CST 3: (presence) TAP (Producer) eq_tapq:rebalance_180 - Sending TAP_VBUCKET_SET with vbucket 180 and state “pending””,
"Thu Sep 26 12:02:52.324120 CST 3: (presence) Clean up “eq_tapq:anon_11304"”,
“Thu Sep 26 12:02:52.326633 CST 3: (presence) TAP (Producer) eq_tapq:rebalance_180 - disconnected”,
“Thu Sep 26 12:02:52.672789 CST 3: (presence) TAP (Producer) eq_tapq:replication_building_179_‘ns_1@192.168.100.107’ - Schedule the backfill for vbucket 179”,
"Thu Sep 26 12:02:52.673061 CST 3: (presence) Clean up “eq_tapq:rebalance_180"”,
“Thu Sep 26 12:02:52.673085 CST 3: (presence) TAP (Producer) eq_tapq:rebalance_180 - Clear the tap queues by force”,
“Thu Sep 26 12:02:52.881707 CST 3: (presence) TAP (Producer) eq_tapq:replication_building_179_‘ns_1@192.168.100.107’ - Connection is closed by force”,
“Thu Sep 26 12:02:52.932036 CST 3: (presence) TAP (Producer) eq_tapq:rebalance_179 - Sending TAP_VBUCKET_SET with vbucket 179 and state “pending””,
"Thu Sep 26 12:02:52.932256 CST 3: (presence) Clean up “eq_tapq:anon_11305"”,
“Thu Sep 26 12:02:52.932295 CST 3: (presence) TAP (Producer) eq_tapq:replication_building_179_‘ns_1@192.168.100.107’ - Clear the tap queues by force”,
“Thu Sep 26 12:02:52.933681 CST 3: (presence) TAP (Producer) eq_tapq:rebalance_179 - VBucket <179> is going dead to complete vbucket takeover”,
“Thu Sep 26 12:02:52.940179 CST 3: (presence) TAP (Producer) eq_tapq:rebalance_179 - Sending TAP_VBUCKET_SET with vbucket 179 and state “active””,
“Thu Sep 26 12:02:52.940871 CST 3: (presence) TAP takeover is completed. Disconnecting tap stream <eq_tapq:rebalance_179>”,
“Thu Sep 26 12:02:52.940952 CST 3: (presence) TAP (Producer) eq_tapq:rebalance_179 - disconnected”,
"Thu Sep 26 12:02:53.941409 CST 3: (presence) Clean up “eq_tapq:rebalance_179"”,
“Thu Sep 26 12:02:53.941490 CST 3: (presence) TAP (Producer) eq_tapq:rebalance_179 - Clear the tap queues by force”,
“Sun Jan 12 01:01:18.857736 CST 3: (presence) TAP (Producer) eq_tapq:replication_ns_1@192.168.100.104 - disconnected, keep alive for 300 seconds”,
“Sun Jan 12 01:01:20.716510 CST 3: (contacts) TAP (Producer) eq_tapq:replication_ns_1@192.168.100.104 - disconnected, keep alive for 300 seconds”,
“Sun Jan 12 01:01:21.966367 CST 3: (usercenter) TAP (Producer) eq_tapq:replication_ns_1@192.168.100.104 - disconnected, keep alive for 300 seconds”,
“Sun Jan 12 01:01:25.295263 CST 3: (usercenter) TAP (Producer) eq_tapq:replication_ns_1@192.168.100.104 - Connection is closed by force”,
“Sun Jan 12 01:01:25.314835 CST 3: (usercenter) TAP (Producer) eq_tapq:replication_ns_1@192.168.100.104 - Sending TAP_OPAQUE with command “enable_checkpoint_sync” and vbucket 0”,
"Sun Jan 12 01:01:25.323603 CST 3: (usercenter) Clean up “eq_tapq:anon_11306"”,
“Sun Jan 12 01:01:25.323791 CST 3: (usercenter) TAP (Producer) eq_tapq:replication_ns_1@192.168.100.104 - Clear the tap queues by force”,
“Sun Jan 12 01:01:25.394223 CST 3: (contacts) TAP (Producer) eq_tapq:replication_ns_1@192.168.100.104 - Connection is closed by force”,
“Sun Jan 12 01:01:25.415021 CST 3: (contacts) TAP (Producer) eq_tapq:replication_ns_1@192.168.100.104 - Sending TAP_OPAQUE with command “opaque_enable_auto_nack” and vbucket 0”,
“Sun Jan 12 01:01:25.420986 CST 3: (contacts) TAP (Producer) eq_tapq:replication_ns_1@192.168.100.104 - Sending TAP_OPAQUE with command “enable_checkpoint_sync” and vbucket 0”,
“Sun Jan 12 01:01:25.426694 CST 3: (contacts) Clean up “eq_tapq:anon_11307"”,
“Sun Jan 12 01:01:25.426744 CST 3: (contacts) TAP (Producer) eq_tapq:replication_ns_1@192.168.100.104 - Clear the tap queues by force”,
“Sun Jan 12 01:01:25.481429 CST 3: (presence) TAP (Producer) eq_tapq:replication_ns_1@192.168.100.104 - Connection is closed by force”,
“Sun Jan 12 01:01:25.487644 CST 3: (presence) TAP (Producer) eq_tapq:replication_ns_1@192.168.100.104 - Sending TAP_OPAQUE with command “opaque_enable_auto_nack” and vbucket 0”,
“Sun Jan 12 01:01:25.487665 CST 3: (presence) TAP (Producer) eq_tapq:replication_ns_1@192.168.100.104 - Sending TAP_OPAQUE with command “enable_checkpoint_sync” and vbucket 0”,
“Sun Jan 12 01:01:25.488482 CST 3: (presence) Clean up “eq_tapq:anon_11308"”,
“Sun Jan 12 01:01:25.488695 CST 3: (presence) TAP (Producer) eq_tapq:replication_ns_1@192.168.100.104 - Clear the tap queues by force”,
“Thu Mar 6 15:44:26.735574 CST 3: (contacts) TAP (Producer) eq_tapq:replication_ns_1@192.168.100.104 - disconnected, keep alive for 300 seconds”,
“Thu Mar 6 15:45:04.765134 CST 3: (usercenter) TAP (Producer) eq_tapq:replication_ns_1@192.168.100.104 - disconnected, keep alive for 300 seconds”,
“Thu Mar 6 15:45:04.779465 CST 3: (contacts) TAP (Producer) eq_tapq:replication_ns_1@192.168.100.107 - disconnected, keep alive for 300 seconds”]},
{ok,{1394091906330,#Ref<0.0.18.34453>}},
[”‘die!’ on stdin. Exiting super-quickly”],
0,true}
** Reason for termination ==
** {abnormal,0}

Any suggestion for resolving this problem? Any kind of help is much appreciated.


#2

Will this problem happen if I change the date by using ‘ntddate’ command?