We have a 12 nodes (6 per datacenter) couchBase cluster with the version 3.0.3-1716 enterprise edition. Per data center, the bucket has 585GB RAM quote. Also the cache metadata is value eviction and we enable the auto compaction.
We are seeing the inconsistent throughput. The throughput is up and down time to time. The highest ops/sec can reach 100k but worst case, it could drop to less 10k which is not acceptable for our user case.
Here is the detail information:
The data in couchBase bucket is about 1.65 billion records and total size is 780GBGB. This data is being upinsert-ed by a online process around 1.5k per second in one data center. Reach of record has a TTL based on a field in the record.
At the read side, the app need at lest 50k “get” per second to meet the SLA. Right now the read hit another data center which the data is being replicated through XDCR. If the system is in good sharp, the read throughput is very good. But if the throughput downgrade to 10K, the app will miss the SLA. I checked the cache miss ratio is around 1%.
I’m suspecting the auto compaction impact the throughput.
Anyone has any suggestion here?
Thanks for your response in advance.