Rebalance failing - possibly due to primary index running out of space?


#1

We created a primary index on a bucket with 450M documents and it ran out of space on the indexing volume. This caused the node indexing thread to continually restart. A graceful removal of that node from the cluster did not work so we did a hard failover. Since then, rebalances will not work. I cleaned the data off of the node and added it back to the cluster but that did not help any.

Lot of errors in the logs:
memcached.log
2016-07-20T21:59:20.487794-05:00 WARNING (stats) Notified the timeout on checkpoint persistence for vbucket 877, id 0, cookie 0x7fe182ab9a80 2016-07-20T21:59:20.487841-05:00 WARNING 121: Slow SEQNO_PERSISTENCE operation on connection (127.0.0.1:58703 => 127.0.0.1:11209): 31000 ms 2016-07-20T21:59:20.498013-05:00 WARNING (stats) Notified the timeout on checkpoint persistence for vbucket 876, id 0, cookie 0x7fe182af4780 2016-07-20T21:59:20.498067-05:00 WARNING 122: Slow SEQNO_PERSISTENCE operation on connection (127.0.0.1:54497 => 127.0.0.1:11209): 31000 ms

And many others in the various logs. Here is the collected info from the node that ran out of space while creating the Primary index on the stats bucket.

The cluster is still working and it does not look like we have lost any data yet.

Any help would be appreciated!
Thanks!
Mark