Advice for read-only use-cases

We have an application for which we are considering Couchbase. It’s somewhat different to “traditional”(!) uses of Couchbase.

We’re effectively taking a large dataset, originally in a MySQL database, filtering it heavily, and loading it in to a Couchbase instance, which we subsequently query. At the moment we’re using Couchbase as a key/value store, we’re not yet exploiting views and indexes although we plan to do so eventually.

The thing is that the loading process only happens once every few months. At all other times, the data is completely static.

Are there best practices for this sort of use-case? In a relational world I’d turn off indexing during the initial, write-heavy load phase, then turn it on when loading was complete. Is it generally recommended to do the same with Couchbase?

I’ll be interested to hear any thoughts you may have.

Regards

Glenn.

I would suggest you create a view that you are happy with and then push to production , then start adding records. So that the Index will created slowly. By default the View will be rebuild for every 5 seconds(if there any new/changed documents) AND 5000 (new/changed documents) which every comes first. So see what your rate of loading is { SETS()/seconds}, and adjust the re-indexing the view rates so that you are not reindex 5 times a second.

http://www.couchbase.com/docs/couchbase-manual-2.1.0/couchbase-views-operation-autoupdate.html

Also remember that you need to have plenty of disk space for both views and documents. I usually start off with 20%-30% hard drive usage and plan for more servers in the futures as it approaches 40%-50%, You always need room for compaction.

If you plan on quering the view alot I would suggest creating a 200MB bucket called “cached”. Don’t make it a Couchbase bucket instead make it a Memcached bucket. There you can cache your query so that disk ops is low and speed is high.

Thanks, that’s a really useful answer, and accords with what I was thinking.