PageFaults, Connections, and very high level of Network Timeouts!


#1

I have a couchbase installation in development (OSX, PHP libcouchbase, and Couchbase 2.01) which works without any issue but my production instance (Ubuntu, PHP libcouchbase, and Couchbase 2.1.1) is having a LOT of problems and I’m hoping I can get a little help on the following:

  1. Connections: even after restarting the machine I notice that the number of connections reported as being consistently around 80 connections. Keeping in mind that this site is not yet live and therefore has almost no usage, this seems abnormal or is it?
  2. CPU Spikes: the CPU seems to bounce between 40-60% on the Couchbase Console although running top from the command line indicates a somewhat lower utilisation. What is also pretty odd is that just watching the CPU meter chug along (again in the Couchbase Console) it seems to be popping up to 100% utilisation here and there and this is with zero load. Can anyone explain this?
  3. Page Faults:I am getting 0 "Major Faults" but my "Minor Faults" are constantly moving up and down between 1.5k to 5.5k. Again this is under zero load. Is this normal? Is there anywhere I can go to better understand what these variables actually mean?

#2

As a quick update, the number of minor faults was underreported above. I’m getting between between 1k and 15k at any given moment.


#3

Hi,

Could you give some details on your environment - for example Server Quota, bucket size, op/s, residency ratio?

If you haven’t already, I’d look at the Best Practices in the admin guide, particularly in terms of sizing your cluster: http://docs.couchbase.com/couchbase-manual-2.2/#best-practices

One thing to note in addition - as you’re using the PHP client (which typically cannot persist connections like the other SDKs) I’d look at using the config cache - see: http://docs.couchbase.com/couchbase-sdk-php-1.2/#configuration-cache This should reduce the load PHP clients give to the system.


#4

The configuration cache sounds very important. I’ll get that incorporated soon. WRT to the best practices document … I have looked at that and I’m definitely below-spec but was hoping to run a very small test instance on AWS without needing to get fancy with auto-start and stop automation but I think that ultimately will not be possible.

I have no created a auto-shutdown on idle procedure for my AWS instance an upgraded from a m1.small to a m1.medium and at least on first inspection this seems good enough for some basic testing.


#5

Further to the major / minor page faults - these are standard OS metric and so any Linux documentation should help. You are likely memory contrained - check the amount of free memory, swap usage and the quotas and sizes of your buckets.