When to create a new bucket?


#1

Hello,

Is there a guideline to determine when to create a new bucket?

Let’s say I have a custom CMS, which supports a content type.

Is it better to create put all the contents into one bucket when the documents can have different values? Or should I create a new bucket for each content types since documents have a different set of values?


#2

Bucket is typically for controlling resources allocated to each set of data. So you don’t have to put data in separate buckets unless you want distinct control over memory, IO path, high availability, compaction, security etc.
if you’d like to describe more of these properties of your data, I am happy to help out in the decision.
thanks
-cihan


#3

Thank you @cihangirb

Only concern I have at the moment is indexing.

Let’s assume that I can have X number of document types.

When I create an index for a specific document type, how does it affect other document types without the field I specified in the index? How much performance penalty do I get? I assume it’s really minor, but I wanted to make sure.


#4

I’ll simplify the answer but we can doubleclick if you have stats on size of data, mutations rate and more.
There are a indexers in the system (map/reduce, global index and spatial). The general way in which these work is each indexer there is a filter clause. With N1QL for example, you’d say
CREATE INDEX ind1 ON bucket1(attrib1) WHERE type=“type1”;
this will ensure the index only apply to the specific type and it will ensure that the projector (see the arch guide here) sends the relevant information to the index based on this index definition. So the impact of mixing types in a bucket is fairly small.
That said, there is still some overhead to checking the type in each case. We only recommend that you start splitting the data into multiple buckets in cases where a type is very sensitive to these latencies in indexing and mutation rate between types are drastically different.