Multi buckets Versus Single bucket, best practice/recommendation?


#1

Still at evaluation phase with Couchbase Server.

We have multiple buckets configured and our application will have to create multiple connections at startup (using couchbase client java sdk) in order to access required data.

Should we consider consolidating into a single bucket and single connection?

Regards,
Kaz


#2

Appreciate the quick reply. The following link that you posted did not return anything as the link might have been cut out short:
http://www.couchbase.com/docs/couchbase-devguide-2.0/working-with-out-sc

is this the one you intended to place?
http://www.couchbase.com/docs/couchbase-devguide-2.0/working-with-out-schemas.html

We will definitely consider points mentioned on your post and move forward.

Regards,
Kaz


#3

Hello,

The short answer is:

  • less bucket is the best.

And this mostly for resource management point of view. On the server side, all the replication, I/O, views … activities are “per bucket”. As you said you will also have to manage multiple connections into your application.

But having multiple buckets is something that is quite useful for different use cases:

  • multi-tenancy : you want to be sure all data are separated (with our without SASL authentication)
  • different types of data: you can for example store all documents (JSON) in one bucket, and use another one to store “binary” content and like that have a bucket with views the other one without any.

So usually people are using a single bucket and use “type” in your documents. (see http://www.couchbase.com/docs/couchbase-devguide-2.0/working-with-out-schemas.html )

Do not hesitate to detail what is your use case and we can discuss about the best approach.

Regards


#4

Less bucket is better : no point with that, but on the other hand it can be interesting to split your data across different buckets if you plan to use views.

The way we understand it is that all views from a same designDoc will need to fetch from disk all the items of the Bucket they belong to.

Stupid demonstration : If you have a small set of items frequently inserted or updated, and a service critical view frequently accessed (stale=false) that map/reduce those items, you should better store them in a dedicated bucket rather than mix them up with the (maybe) million other items that the view doesn’t need.

More over, you will not fill up your FS cache with the unnecessary items, and will make access to disk data more effective.

By the way (TGrall if you hear us …) : How do we know that we are having to much buckets and would get some benefit from merging data in fewer buckets ?

eg : How can we trace the different beam.smp actions (stats, compaction, indexing, etc …) and calculate a per bucket overhead ?

Xavier


#5

Let’s say I have 5 different applications, all using Couchbase but with documents clearly very different. Which escenario is better: Have only one bucket and have a field who diferences documents between apps, or simply have one bucket for each app?

it`s important to say than in any escenario, I don’t gonna have to do any queries using documents of two different apps.