Server becoming unavailable during bulk upload

dorfsmay · November 9, 2013, 5:24pm

We’re doing some test with:

the Couchbase community edition (2.1.1 build 764)
the python sdk
a small subset of 8 M records
running on a linode with 1 GiB mem and 8 core

The server becomes completely unavailable evrey few minutes (pretty much as soon as we get to ~ 800 ops/sec. Unavailable both for writes and reads, and even the web console has issues to connect.

We did implement retry logic with delays (3 minute increments), and we’re fine with slow writes, but are very concerned about the fact that the server goes offline and is no longer available for reads either.

I did reduce the aut-compacting trigger down to 80% fragmentation, and it halfed the time outs, but now I am afraid that I won’t be able to upload the whole sample because of memory usage…

Is this to be expected?
Are there ways to make the sdk slow dow the writes such that read operations don’t get affected?

househippo · November 11, 2013, 6:16am

Number of cores is not going to be your issue because “bulk uploads” = disk bottle neck with i/o. Couchbase is append only so there is really no CPU issues as the head of the HD never has to do seeks or find items to update.

Check your Write que growth if it keeps climbing you will need to back off.
Also check your write fill rate(ie how many items is the QUE) vs. your write drain(ie. how many items are really being written to disk.) They should both see-saw with each other.
Also lots of replica( 2-3 ) + only having less the five servers means more work your HD’s have to do.