Multi buckets Versus Single bucket, best practice/recommendation?

Still at evaluation phase with Couchbase Server.

We have multiple buckets configured and our application will have to create multiple connections at startup (using couchbase client java sdk) in order to access required data.

Should we consider consolidating into a single bucket and single connection?

Regards,
Kaz

Appreciate the quick reply. The following link that you posted did not return anything as the link might have been cut out short:
http://www.couchbase.com/docs/couchbase-devguide-2.0/working-with-out-sc

is this the one you intended to place?
http://www.couchbase.com/docs/couchbase-devguide-2.0/working-with-out-schemas.html

We will definitely consider points mentioned on your post and move forward.

Regards,
Kaz

Hello,

The short answer is:

  • less bucket is the best.

And this mostly for resource management point of view. On the server side, all the replication, I/O, views … activities are “per bucket”. As you said you will also have to manage multiple connections into your application.

But having multiple buckets is something that is quite useful for different use cases:

  • multi-tenancy : you want to be sure all data are separated (with our without SASL authentication)
  • different types of data: you can for example store all documents (JSON) in one bucket, and use another one to store “binary” content and like that have a bucket with views the other one without any.

So usually people are using a single bucket and use “type” in your documents. (see http://www.couchbase.com/docs/couchbase-devguide-2.0/working-with-out-schemas.html )

Do not hesitate to detail what is your use case and we can discuss about the best approach.

Regards

Less bucket is better : no point with that, but on the other hand it can be interesting to split your data across different buckets if you plan to use views.

The way we understand it is that all views from a same designDoc will need to fetch from disk all the items of the Bucket they belong to.

Stupid demonstration : If you have a small set of items frequently inserted or updated, and a service critical view frequently accessed (stale=false) that map/reduce those items, you should better store them in a dedicated bucket rather than mix them up with the (maybe) million other items that the view doesn’t need.

More over, you will not fill up your FS cache with the unnecessary items, and will make access to disk data more effective.

By the way (TGrall if you hear us …) : How do we know that we are having to much buckets and would get some benefit from merging data in fewer buckets ?

eg : How can we trace the different beam.smp actions (stats, compaction, indexing, etc …) and calculate a per bucket overhead ?

Xavier

Let’s say I have 5 different applications, all using Couchbase but with documents clearly very different. Which escenario is better: Have only one bucket and have a field who diferences documents between apps, or simply have one bucket for each app?

it`s important to say than in any escenario, I don’t gonna have to do any queries using documents of two different apps.

Hope the explanation below answers the question “why one should not worry about distributing (horizontally Shrading) JSON-documents explicitly across multiple-buckets? for performance consideration.”

In couchbase, JSON-documents, stored in a Bucket (which is nothing but a “Logical / Conceptual” organization of JSON-data and a logical Unit for Couchbase Operations) are implicitly “Horizontally Sharded” (Distributed) into 1024 number of “Virtual Buckets” ( ;the actual physical Buckets in RAM distributed equally across multiple nodes). Additionally …

… as the (high speed)" Indexes" can also be defined on these distributed (implicitly “Sharded”) JSON-Documents, therefore we can have all different “types” (“Tables”, in terms of Relatoinal Database) of JSON-Documents (Records) in a single “Couchbase (logical) Bucket”, without having to worry about performance-degradation. In couchbase terms, a “Set of similar Records” is not identified through a “Table / Entity” instead a document (record) can be associated with a “Set of Records” using an “Indexed” key in the Document, like “type” (type : Orders).

Use of Memory-optimized Indexes (in RAM) further ensures high throughput (Performance). These indexes are, in turn, implicitly distributed across multiple nodes in a cluster for high performance.

Moreover, one can take advantage of Multi-Dimensional Scaling (MDS) through which you can dedicate nodes in a cluster for “Indexing Services”, so that indexes can be optimally distributed across RAMs of nodes, for higher performance / throughput.

In short, a Bucket (logical bucket comprised of multiple physical Virtual Buckets) is designed to serve Big Data with ultra high performance.

In fact, a Couchbase Bucket (Logical / conceptual bucket) is not comparable to a “Table” in Relational database, in any way. However, it can be seen as a kind of a “conceptual / logical” (schema-less ) Document s Store.

Note: Being "Schema-less " is a flexibility and a power feature of NoSQL databases.