CouchBase Internals

#1

Hello,

I would like to understand how Couchbase split document keys in a cluster environment. I believe Couchbase using Hash function to distribute document keys between the cluster nodes. what should be for performance tuning perspective the best practices in choosing document keys values so there is no overhead in network traffic between the cluster nodes and take advantage of the parallelism in read / write operations.

Thanks
Wissa

#2

Hi,

The reason Couchbase uses hash partitioning is precisely to optimize the sharding and allow great performances. Whatever keys you should won’t really matter on that side because it will be hashed by Couchbase.

#3

Thanks @ldoguin ; is there a way to choose the keys so there is a sequential reads ? Anyone knows the hash function used?

#4

Here’s a previous answer about hash function that you might find useful: https://groups.google.com/forum/#!topic/couchbase/_RNYi2_kyNA

Basically we don’t recommend changing it, unless you really really really know what you are doing.

That being said, if you want to do something like a sequential read, you could probably buffer small doc on the client side and then write them as one doc every minute for instance. This can work well for time-series. Can you describe your use case a bit more?

#5

thanks @ldoguin ; my case is I am building a forum application ; when user can submit post in JSON documents and inserted in couchDB bucket; I am building now the system architecture, data modeling parts and I would like to consider performance also for reads and writes; That said I am exploring all the good possibilities and best practices to reach very good performance and the best of couchbase. So I am thinking maybe sequetial reads will be better (or not?) then random reads for sequential documents. Especially I thinking to use N1QL and views to extract post during the day and do some analysis on them …
Thanks

#6

If you afraid of the scatter/gather happening with views, you should definitely look into GSI and the new MDS architecture. You’ll be able to have an index on just one node and this way avoid the scatter/gather. I invite you to look at the latest presentation on the topic:

GSI: https://www.youtube.com/watch?v=WvjYKO27Vdk
MDS: https://www.youtube.com/watch?v=b09peBHtITA

1 Like