Best Approach for Large Bucket(s) Design

aajordan · March 9, 2021, 8:24pm

Hi All,
Over a number of years we’ve accumulated a large volume of documents (hundreds of millions) in a single bucket with about 3 dozen “types” of documents, each of which have very little if any commonality, varying access frequency, and size. At what point does it make sense to create a second bucket if at all, and along what lines does it make sense to do so in terms of how to split the data? I’ve read other posts, and we really don’t have any need for distinct resource control over specific types like security, memory, etc…I’m more interested to know if there’s a certain threshold or any specific use cases besides resource control where a second bucket makes sense and what that might look like.

Please share your thoughts - thank you!
Ari

matthew.groves · March 12, 2021, 6:40pm

I would say that these are the primary reasons you’d want to split into multiple buckets. If you don’t need or want to have that level of control, and you aren’t running into any problems with the way the data is modeled and stored now, then I’d say you’re fine.

I will say that Couchbase 7 is adding scopes and collections, which you may want to take advantage of for at least organizational purposes instead of using a “type” field. Would love to get your feedback on 7.0, currently in beta. There’s a forum just for feedback here: Beta Support - Couchbase Forums

A second or third bucket may become useful in certain eventing, XDCR, modeling or remodeling scenarios.

But generally speaking, if it’s not broke, don’t fix it.

aajordan · March 15, 2021, 2:26pm

Great feedback, thank you - I’d love to pilot Couchbase 7, but we’re still on 6.6 (and just moved to it), so I’d say we’ve got a bit of time unfortunately. Once it’s ready for prime time, v7 it’s definitely something we’d like to move forward with as I’ve been following it for several months with great interest!