Memcached, couch_view_grou and couch_view_index_updater crashing


#1

We are running 4 node (m1.large) couchbase serever (3.0) with single bucket on AWS.
Bucket has 6 design docs and 24 views. 4 buckets has 1 view each and other 2 design
docs have 10 views each (all views in single design doc need to be updated same time).
21 views emit around 10000 KV pairs for each input document.

We have 6+ miillion documents in bucket with 250 KB.
We have seen memcached and couch_view_grou crashing (with core dumps) on server nodes.

[73053747.099756] CPU: 0 PID: 5421 Comm: couch_view_grou Not tainted 3.14.27-25.47.amzn1.x86_64 #1
[73053747.298901] [ 5421] 220 5421 1329977 821468 2310 339656 0 couch_view_grou
[73056339.110385] [ 5421] 220 5421 1674865 852643 2983 653367 0 couch_view_grou
[73056339.110406] Out of memory: Kill process 5421 (couch_view_grou) score 507 or sacrifice child
[73056339.110413] Killed process 5421 (couch_view_grou) total-vm:6699460kB, anon-rss:3410572kB, file-rss:0kB
[73063080.407038] [18935] 220 18935 1675294 853237 2986 653209 0 couch_view_grou
[73063080.407057] Out of memory: Kill process 18935 (couch_view_grou) score 507 or sacrifice child
[73063080.407069] Killed process 18935 (couch_view_grou) total-vm:6701176kB, anon-rss:3412948kB, file-rss:0kB
[73142477.159780] [23551] 220 23551 1245719 785890 2146 291063 0 couch_view_grou
[73143682.811862] [23551] 220 23551 1709495 534046 3052 1006620 0 couch_view_grou
[73143682.811894] Out of memory: Kill process 23551 (couch_view_grou) score 519 or sacrifice child
[73143682.811901] Killed process 23551 (couch_view_grou) total-vm:6837980kB, anon-rss:2136184kB, file-rss:0kB
[73147150.274977] [27851] 220 27851 1662953 919418 2961 574693 0 couch_view_grou
[73147150.274996] Out of memory: Kill process 27851 (couch_view_grou) score 503 or sacrifice child
[73147150.275010] Killed process 27851 (couch_view_grou) total-vm:6651812kB, anon-rss:3677672kB, file-rss:0kB

[72967894.981971] [ 1447] 220 1447 1748924 973626 3358 701916 0 memcached
[72967894.982139] Out of memory: Kill process 1447 (memcached) score 564 or sacrifice child
[72967894.982146] Killed process 1447 (memcached) total-vm:6995696kB, anon-rss:3894504kB, file-rss:0kB
[73053747.298879] [ 2994] 220 2994 1719039 967007 3295 677833 0 memcached
[73053747.298912] Out of memory: Kill process 2994 (memcached) score 554 or sacrifice child
[73053747.298925] Killed process 2994 (memcached) total-vm:6876156kB, anon-rss:3868028kB, file-rss:0kB
[73056339.110393] [32237] 220 32237 1342404 908600 2563 363891 0 memcached
[73063080.407029] [32237] 220 32237 1344940 910671 2567 363357 0 memcached
[73142477.158852] memcached invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[73142477.158872] memcached cpuset=/ mems_allowed=0
[73142477.158879] CPU: 0 PID: 32237 Comm: memcached Not tainted 3.14.27-25.47.amzn1.x86_64 #1
[73142477.159766] [32237] 220 32237 1766972 982807 3393 712839 0 memcached
[73142477.159790] Out of memory: Kill process 32237 (memcached) score 571 or sacrifice child
[73142477.159798] Killed process 32237 (memcached) total-vm:7067888kB, anon-rss:3931228kB, file-rss:0kB
[73143682.811869] [26827] 220 26827 1323342 1252183 2525 0 0 memcached
[73147150.274948] [26827] 220 26827 1357686 859512 2593 427674 0 memcached

GDB o/p for core dumps:

$ gdb core.19238
GNU gdb (GDB) Amazon Linux (7.6.1-51.24.amzn1)
Copyright © 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show copying"
and “show warranty” for details.
This GDB was configured as “x86_64-amazon-linux-gnu”.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/
[New LWP 19255]
[New LWP 19247]
[New LWP 19250]
[New LWP 19251]
[New LWP 19238]
Missing separate debuginfo for the main executable file
Try: yum --enablerepo=‘debug’ install /usr/lib/debug/.build-id/00/2d878a64b2e42cdba05b4dde0654f4f7664e6d
Core was generated by `/opt/couchbase/bin/couch_view_index_updater’.
Program terminated with signal 6, Aborted.
#0 0x00007f2667e1abe9 in ?? ()
”/opt/couchbase/var/lib/couchbase/core.19238" is a core file.
Please specify an executable to debug.
(gdb) bt
#0 0x00007f2667e1abe9 in ?? ()
#1 0x00007f2667e1bfe8 in ?? ()
#2 0x0000000000000020 in ?? ()
#3 0x0000000000000000 in ?? ()

We are using following AMI:

couchbase_server_community_x86_64_3.0.1-afc4c29d-a672-4442-82ce-81aed7ea4d18-ami-70aed018.2 (ami-c398c6f3)

pre-installed Couchbase Server 3.0.1, Community Edition, 64bit


#2

@ingenthr , can you please help with this ? what is going wrong here ?

Is memcached causing couch_view_index_update crash ?
i can see followin message in couchbase info.log:

Control connection to memcached on ‘ns_1@172.31.23.153’ disconnected: {{badmatch,
{error,
timeout}},
[{mc_binary,
quick_stats_recv,
3,
[{file,
“src/mc_binary.erl”},
{line,
67}]},
{mc_binary,
quick_stats_loop,
5,
[{file,
“src/mc_binary.erl”},
{line,
156}]},
{mc_binary,
quick_stats,
5,
[{file,
“src/mc_binary.erl”},
{line,
141}]},
{ns_memcached,
ensure_bucket_config,
4,
[{file,
“src/ns_memcached.erl”},
{line,
1307}]},
{ns_memcached,
handle_info,
2,
[{file,
“src/ns_memcached.erl”},
{line,
744}]},
{gen_server,
handle_msg,
5,
[{file,
“gen_server.erl”},
{line,
604}]},
{ns_memcached,
init,
1,
[{file,
“src/ns_memcached.erl”},
{line,
171}]},
{gen_server,
init_it,
6,
[{file,
“gen_server.erl”},
{line,
304}]}]}

Here is core dump from crashed process:

gdb -c core.2745

GNU gdb (GDB) Amazon Linux (7.6.1-51.24.amzn1)
Copyright © 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and “show warranty” for details.
This GDB was configured as “x86_64-amazon-linux-gnu”.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
[New LWP 2745]
[New LWP 2758]
[New LWP 2799]
[New LWP 2749]
[New LWP 2798]
[New LWP 2800]
Missing separate debuginfo for the main executable file
Try: yum --enablerepo=‘debug’ install /usr/lib/debug/.build-id/00/2d878a64b2e42cdba05b4dde0654f4f7664e6d
Core was generated by `/opt/couchbase/bin/couch_view_index_updater’.
Program terminated with signal 6, Aborted.
#0 0x00007f5b2dd68be9 in ?? ()
(gdb) bt
#0 0x00007f5b2dd68be9 in ?? ()
#1 0x00007f5b2dd69fe8 in ?? ()
#2 0x0000000000000020 in ?? ()
#3 0x0000000000000000 in ?? ()
(gdb)