Can couchbase work if not all documents in DB can be stored in memory?


#1

I’ve been evaluating couchbase trying to figure out if our company can continue to use it as our database solution and hit a serious performance snag. I’ve been using couchbase 3.1.0 enterprise server install since it seems more stable than couchbase 4.X. Not all of our documents will be able to fit in memory, though most of our active dataset can, and sometimes we will need to retrieve documents from disk. I’ve been testing with a single node setup with couchbase data stored on 4 disks of 100GB each setup in a RAID 0 array. I get about 4ms latency writing/reading to the RAID drive and about 1500 iops. I have a database bucket with about 3.6 million docs in it of varying sizes (from about 195 bytes to about 113KB). If I run tests with documents of about the same size it works fine and I can retrieve 5000 docs at once with a nodejs couchbase library getMulti call. With varying sizes as I described above though it generally fails at about 10 docs at once. By fails I mean it times out because it takes longer than 2500ms to retrieve them. I find that sometimes it takes over 600ms just to retrieve a single document from disk, even when nothing else is using the disk or the vm at all.

So can couchbase support a use case where we will need to retrieve a couple hundred docs from disk sometimes? Is this the expected performance for couchbase when retrieving docs from disk?


#2

@alexegli,

Try a RAID 10 setup for disk. Are you using SSD ?

What is ./cbstats -timings saying your cmd_get times are when you test from disk? They should be microseconds.

Do you have the bucket set to Full Ejection or Value Ejection?

Are your reader/writer threads set to LOW or HIGH for the bucket?


#3

We’re using standard disks, not SSD due to the cost but I will test out SSD. For us the price difference between the two is about $100/mth so we were hoping we would be ok with standard disks.

Here are some of our cbstats timings results:
get_cmd (48904 total)
2us - 4us : ( 1.81%) 883
4us - 8us : ( 29.75%) 13665 ############
8us - 16us : ( 44.88%) 7398 ######
16us - 32us : ( 59.46%) 7133 ######
32us - 64us : ( 96.54%) 18135 ###############
64us - 128us : ( 99.49%) 1440 #
128us - 256us : ( 99.85%) 179
256us - 512us : ( 99.97%) 55
512us - 1ms : ( 99.99%) 11
1ms - 2ms : (100.00%) 4
2ms - 4ms : (100.00%) 1
Avg : ( 19us)

bg_load (23458 total)
64us - 128us : ( 31.23%) 7325 #############
128us - 256us : ( 63.01%) 7455 #############
256us - 512us : ( 65.13%) 498
512us - 1ms : ( 65.36%) 54
1ms - 2ms : ( 65.40%) 9
2ms - 4ms : ( 65.40%) 1
4ms - 8ms : ( 65.66%) 61
8ms - 16ms : ( 67.35%) 396
16ms - 32ms : ( 80.68%) 3128 #####
32ms - 65ms : ( 93.41%) 2986 #####
65ms - 131ms : ( 99.02%) 1314 ##
131ms - 262ms : ( 99.88%) 204
262ms - 524ms : ( 99.98%) 22
524ms - 1s : (100.00%) 5
Avg : ( 11ms)

bg_wait (23458 total)
16us - 32us : ( 0.08%) 19
32us - 64us : ( 50.35%) 11792 #####################
64us - 128us : ( 94.71%) 10406 ###################
128us - 256us : ( 98.84%) 970 #
256us - 512us : ( 99.36%) 122
512us - 1ms : ( 99.67%) 72
1ms - 2ms : ( 99.83%) 37
2ms - 4ms : ( 99.90%) 17
4ms - 8ms : (100.00%) 22
8ms - 16ms : (100.00%) 1
Avg : ( 59us)

Most of our retrieval times do say they’re in microseconds, but about 20% of the bg_load calls take milliseconds. When I ran the cbhealthchecker it shows a warning about poor op-engine key performance: Average item loaded time ‘16.328 ms’ is slower than ‘500 us’

We have the bucket set to all the default values: Value Eviction and I tried with Disk I/O Priority set to Low and then with it set to High and it made no difference.


#4

Is there any particular reason you recommend RAID 10 over RAID 0 for performance? I am using Azure LRS storage so I’m not worried about a hardware failure leading to data loss. My only concern right now is performance in reading from disk.


#5

If the above is true. CB server is delivery on time, but:

  • is it getting to your APP machine fast enough?
    and/or
  • when you get it is the SDK and/or Node.js processing it fast enough?

On RAID 10 you get 2X Write speed(which raid 0 gives) & 4X Read speed (which you need)
over
a singe disk.

Source: http://searchstorage.techtarget.com/definition/RAID-10-redundant-array-of-independent-disks


#6

Thanks, I’ll try that as well then.


#7

I tried 4 128GB SSD drives with RAID 10, but RAID 0 actually had better performance for me. I found that it was the size of the documents that was causing the issue. Couchbase handles many small documents easily, but a few large ones is much harder for it to read. It can handle 5000 <1KB docs off disk in less than 200ms, but if I up the size to 4KB then it starts failing. Even if the average size is only 8KB it will fail retrieving 10 docs. As the average document size increases it gets worse and worse. Retrieving from the cache is fine, it’s just reading off disk that’s the issue. I’ll keep this in mind when designing documents in the future, to try and keep them as small as possible.


#8

Sorry to hear about the larger object trouble. What do you mean by ‘failing’ in this context? That’s something we should look into.


#9

It times out during retrieval using the default nodejs couchbase bucket connection timeout of 2500ms. If I increase the timeout it will retrieve more documents.

Here is the data from a few tests I ran trying to retrieve 10 docs at once using the nodejs Bucket.getMulti call:

----------------------------------------------------------------------------------------|
Avg Cache        |    Num Docs     |    Num Docs    |    Avg Read      |    Total Time  |
Miss Ratio (%)   |    Retrieved    |    Failed      |  Doc Size (Bytes)|    (ms)        |
----------------------------------------------------------------------------------------|
    100          |        7        |        3        |        857         |    2524     |
    100          |        7        |        3        |        1087        |    2545     |
    100          |        9        |        1        |        1366        |    2532     |
----------------------------------------------------------------------------------------|

I’m using Azure cloud services, Premium SSD LRS storage for my disks (four disks of 128 GB each, SSD in RAID 0).