Hello,
I am running couchbase 2.0.1 community edition (build-170) on two Ubuntu Server 12.04.2 machines and during some of my couchbase stress-testing i’ve come accross a problem i can’t explain as of yet.
The scenario is as follows:
-
2 server nodes, 4GB memory in clusterm 1.36 TB storage space, replicas enabled, 1 replica copy, single bucket with a RAM quota of 200MB (100 MB per node, intentional for testing), persistence enabled
-
Set requests are sent to the cluster in groups of <=400000 (each group’s elements sent sequentially in a loop) - the set requests are sent using PHP, via both the SDK and the Memcached library (doesn’t seem to matter), each set request has a random INT key and a random INT value (mt_rand)
-
For each group, I am calculating two things:
a) How many set requests fail (getResultCode() is non-zero) - this is checked for each request (failrue rate)
b) After the group is sent and the disk write queue is empty - how many sets cannot be ‘verified’, as in for how many elements a get( key ) request does not return the proper value (or returns no value at all)
In all cases, values calculated in a) and b) are identical (things confirmed set are allways verified)
-
Initial group has 13% failrue rate, once the bucket runs out of RAM all following set requests fail with code 10 (SERVER ERROR) - the couchbase log dump does not contain any out of memory errors!
-
Second group has 50% fail rate, third 99% and finally 100%, each group is exectuted after a delay (I intentionally wait for the disk writes to finish), RAM usage stays a few megs above 200 (200 is the limit, however)
-
Every set attempt after that (even from the web interface) fails with code 10, the delay before the set does not matter
-
If the cluster is restarted, I can squeeze one more group, with 70% failrue rate
In theory, at least if I understand the documentation correctly, I can get out of memory errors at high load or when it’s running out of RAM, but once everything calms down and all data is flushed to disk I should be able to perform set operations again, plus there are no out of memory messages in the error log.
I might be doing something wrong, of course. Does anyone have any leads on what might be wrong or what I might be doing wrong? I can, of course, provide log or code fragments if needed.
Thanks in advance.