Append seems to replicate the whole doc across nodes

zfedor · January 2, 2016, 2:45pm

Thanks for confirming, that is what I thought.

Yes, I understand that there could be some tradeoffs, although in an append-only document type that might not necessarily be the case. For example you can take a look at how Apache Kafka runs an append-only data structure to keep appending to a “topic” in a distributed cluster without rewriting the whole topic at each append on each cluster causing high I/O and high inter-node network traffic.

In either case, I am not saying that Couchbase necessarily needs to implement what Kafka is doing, as for that we can just use Kafka. I was just looking for a confirmation for the behavior I was seeing, so I can design around it (by making smaller, “caching” appending docs which would then roll up to bigger “storage” append docs, which would get append to in bulk).

Again, thanks for confirming it! Now I know how to work around this.

Ps.: Maybe it would be beneficial for others to mention this in the “raw append” part of the documentation (http://developer.couchbase.com/documentation/server/4.0/developer-guide/raw-append-prepend.html), which is even mentioning the raw append to store logs in Couchbase documents and considers this “efficient”. Maybe a note here saying that this is only “efficient” for the client (nor read of the whole doc and then write), not necessarily for the nodes (where there is a whole doc read and write). There is a note about increasing document size here, but making the nodes’ I/O and network traffic penalty clear might help others to understand it better too.