Couchbase 2.5.0 "CRASH REPORT"s with subsequent connection timeouts

Couchbase version: 2.5.0
RAM Overview: Total Allocated (1 GB) Total in Cluster (16 GB) In Use (33.8 MB) Unused (990 MB) Unallocated (15 GB)
Disk Overview: Usable Free Space (40.6 GB)Total Cluster Storage (44.1 GB) In Use (82.5 MB) Other Data (3.45 GB) Free (40.6 GB)
Active Servers: 1 Servers Failed Over: 0 Servers Down: 0 Servers Pending Rebalance: 0
Number of buckets: 1 Item Count: 5369 Number of views: 4

Our Couchbase server is logging unexpected "CRASH REPORT"s (below) and our app server is reporting connection time outs when trying to connect to it for the next 8 to 12 hours.

During the apparent crash, these logs are generated exactly every 5 seconds, for about 6 minutes. No suspicious logs precede the crash. No errors appear in the log during this down time, although other messages like “Compacting” and “data_size” continue to be logged.

What is happening and how do we prevent it? Are “timeout,” “gen_server,” and “dir_size” any hint about what is happening?

    [ns_server:info,2015-04-27T8:37:36.714,ns_1@127.0.0.1:<0.17870.785>:compaction_daemon:try_to_cleanup_indexes:650]Cleaning up indexes for bucket `mystuff`
[ns_server:info,2015-04-27T8:37:36.723,ns_1@127.0.0.1:<0.17870.785>:compaction_daemon:spawn_bucket_compactor:609]Compacting bucket mystuff with config:
[{database_fragmentation_threshold,{30,undefined}},
 {view_fragmentation_threshold,{30,undefined}}]
[error_logger:error,2015-04-27T8:37:44.971,ns_1@127.0.0.1:error_logger<0.6.0>:ale_error_logger_handler:log_msg:76]** Generic server 'couch_stats_reader-mystuff' terminating
** Last message in was refresh_stats
** When Server state == {state,"mystuff",1430123854959,
                               [{"17bfc8036cc49a20cbdc7d7ec83c667b",662156,
                                 517894,1336},
                                {"4d3c7fa0e7897939968321f195ca9dd7",735419,
                                 558102,49},
                                {"999107dcf8e2455fb1423ec2e4aefb44",555319,
                                 450652,833},
                                {"e1e0f0950314f2f69c52e5a5166712ce",588293,
                                 472803,2079}]}
** Reason for termination ==
** {timeout,{gen_server,call,
                        [dir_size,
                         {dir_size,"/opt/couchbase/var/lib/couchbase/data/mystuff"}]}}

[error_logger:error,2015-04-27T8:37:44.971,ns_1@127.0.0.1:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
  crasher:
    initial call: couch_stats_reader:init/1
    pid: <0.7621.689>
    registered_name: 'couch_stats_reader-mystuff'
    exception exit: {timeout,
                        {gen_server,call,
                            [dir_size,
                             {dir_size,
                                 "/opt/couchbase/var/lib/couchbase/data/mystuff"}]}}
      in function  gen_server:terminate/6
    ancestors: ['single_bucket_sup-mystuff',<0.6601.0>]
    messages: [refresh_stats]
    links: [<0.6602.0>,<0.6447.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 6765
    stack_size: 24
    reductions: 3301239623
  neighbours:

[error_logger:error,2015-04-27T8:37:44.973,ns_1@127.0.0.1:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
     Supervisor: {local,'single_bucket_sup-mystuff'}
     Context:    child_terminated
     Reason:     {timeout,
                     {gen_server,call,
                         [dir_size,
                          {dir_size,
                              "/opt/couchbase/var/lib/couchbase/data/mystuff"}]}}
     Offender:   [{pid,<0.7621.689>},
                  {name,{couch_stats_reader,"mystuff"}},
                  {mfargs,{couch_stats_reader,start_link,["mystuff"]}},
                  {restart_type,permanent},
                  {shutdown,1000},
                  {child_type,worker}]


[error_logger:info,2015-04-27T8:37:44.973,ns_1@127.0.0.1:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
=========================PROGRESS REPORT=========================
          supervisor: {local,'single_bucket_sup-mystuff'}
             started: [{pid,<0.17936.785>},
                       {name,{couch_stats_reader,"mystuff"}},
                       {mfargs,{couch_stats_reader,start_link,["mystuff"]}},
                       {restart_type,permanent},
                       {shutdown,1000},
                       {child_type,worker}]

Hello,

Can you tell me a bit more about the operations that are timing out? What are they trying to do?

Also, what does your disk set-up look like? Are you running from a SAN or perhaps in a cloud provider without provisioned iops?

Cheers,

Matthew.

I checked our /var/log/syslog and discovered that our NFS server reported being down at the exact same 6 minute timespan. There were no external queries at that moment. I don’t know where we have our Couchbase instance pointed at NFS mounts, but, other than the opaque error, this is certainly not a Couchbase problem.

Thank you for having a look at this!