Couchbase Indexing and Rebalancing for 2 weeks

Hello,

I have a 2 node cluster on XLarge instances in Amazon.
There’s a one big bucket of 18M documets - 2K each doc and 60 bytes for the key, 1 replica configured (+replica for indexes).
There are about 3 disk creates per sec, 3 CAS operations per sec, 10 gets per sec.
Each node has 9GB RAM allocated for this bucket, and 1TB disk (regular EBS, not PIOPS)
There are 2 design documents, each one has 4 views, each view has 40 emits (there are checks on each property to verify that it exists before emitting).

We have 4 problems:

  1. When publishing a new design document, it takes 3-4 days until indexing is finished.
  2. We added a third node 2 weeks ago, and the rebalancing hasn’t finished yet (now it’s stuck on 70%).
  3. We can’t use the views, probably because of the reindexing (which I don’t understand why it suddenly started without publishing a new view)
  4. The @indexes dir on the second node reached 920GB, which is not normal. On the first node it takes only 120GB.

Can anyone help?

Thanks,
Edi Buslovich,
VP R&D,
Toonimo.

Hi Edi,

Could you provide some more information:

  • Version of Couchbase server
  • Contents of the map / reduce functions?
  • The output of vmstat 10 3 to see if the machine is I/O CPU bound etc?

See “answer” below
Thanks.

  1. Couchbase Version: 2.2.0 enterprise edition (build-821)

  2. Attached the map function of one of the 4 views (they are almost the same): https://dl.dropboxusercontent.com/u/18184796/FetchByParamAndTime.js
    the reduce function is _count

vmstat 10 3 output for the first node:
procs -----------memory---------- —swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 1 0 137260 172192 4416848 0 0 1072 1154 0 0 15 5 69 11 0
0 1 0 143452 172204 4416932 0 0 0 82 697 672 4 2 71 24 0
0 1 0 143080 172208 4417036 0 0 1 84 740 641 4 2 71 23 0

vmstat 10 3 output for the bad node (the one with 920GB @indexes dir):
procs -----------memory---------- —swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 1 0 79836 9928 4535908 0 0 1427 825 1 0 16 5 72 8 0
0 3 0 83648 9368 4526860 0 0 15971 4312 6151 5401 30 4 46 20 0
1 1 0 82892 9556 4534784 0 0 8548 22266 4121 4145 19 4 38 40 0

vmstat 10 3 output for the new third node which is rebalanced:
procs -----------memory---------- —swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 187580 245828 5084616 0 0 1254 571 2 7 10 3 79 7 2
0 0 0 181636 245840 5084816 0 0 0 191 494 508 4 1 94 0 0
0 0 0 182496 245864 5084908 0 0 0 507 446 420 10 3 84 1 3

Thanks,
Edi.

Hi Edi,

Your view is very complex, and outputs a lot of data (not just the number of emits, but the total size of data output). Views are stored on disk, and given the combination of the large, complex views and competitively slow EBS performance this likely explains your performance issues.

Take a look at our View optimisation guide - http://docs.couchbase.com/couchbase-manual-2.2/#view-writing-best-practice

Specifically, you should try to use the view as an index, (i.e. try to output just a document ID), and use the normal get() key-value accessors to actually read the bulk to the data.

Thanks for the answer.
I looked at the guide and also following your advice, there isn’t much we can do except moving to a PIOPS disk.
We output an array of fields by purpose, because we use start_key and end_key arrays to match lots of fields when querying the view. This can’t be done if we output only the document id.