Server performance declining constantly


#1

I have a small program that initiates our CouchBase server data. It basically gets records from a different DB, and for each record creates (or updates in case the key is already there) documents in CouchBase.
As the number of documents grew, within the first couple of days the performance diminished. I re-wrote this program as a multithreaded one, but that had little effect.

See image at: IMAGE (this forum won’t allow me to upload images?!)

(the “breakage” is the one discussed in topic Help! Server going berserk - that has been fixed and as already visible from the graph - it continues along the same slope after)

What can be the reason? I suppose searching for the documents is slowing me down as I have more and more of them, but I didn’t expect it to drop that hard.
Is there anything I can do to improve get/set speed?

The cluster is 4 nodes, each with: 48GB ram, 32 logical cores, raided disks)

thanks


#2

Questions such as these are pretty hard to debug with just one screenshot of a single graph. Let me give you some pointers to check:

  • Sounds like your operations on the client are tie to both Couchbase Server and the other DB. How do you know the read requests on the other DB has not slow down?
  • You can use cbstats to see how long operations are taking server side.
  • What is your active resident ratio? A screen shot of the whole graph page would be useful.
  • Point 2 on this blog are the key stats to monitor with regards to performance.

#3
  1. Good question - but the answer is - I do, cause I measured. I am pulling a batch (a 1000 in this case) of records from the other DB, and then doing the “insert/update” in couchbase per record, and repeat. Pulling the 1000 from the other DB into memory takes <1s. Putting the thousand into couch takes up to few minutes.
  2. I’m still not sure how to read those results. Will look into it. Any specific pointers? or entries to look for?
  3. Attached
  4. “cache-miss ratio” grew at the beginning and stabled around 20%
    “Active resident %” started at 100% (expected) but dropped sharply and stabled around 5%
    “disk reads per sec” has a similar pattern then the total ops decrease (but more jagged). seems to stable around 2.5 only
    "disk write queue" ALSO dropped with the same shape, very sharp, and is around 2.8. If that was higher it will explain - couch is waiting for the disk. but it’s not…

IMG_month_full_screen