Statement_plus in distributed systems

johannesjasper · June 6, 2019, 9:22am

Hey there,

I just read up on the difference between statement_plus (or at_plus) and request_plus. As far as I understand it, you have to specify the mutations your query should wait for, i.e.

JsonDocument written = bucket.upsert(JsonDocument.create("mydoc", JsonObject.empty()));
bucket.query(
	N1qlQuery.simple("select count(*) as cnt from `travel-sample`",
	N1qlParams.build().consistentWith(written))
);

Does this work in a distributed microservice architecture? If I cannot be sure if another instance of a service is doing queries on the same kind of documents, how can I specify what to wait for?

On an unrelated note: can somebody tell me how spring-data-couchbase handles READ_YOUR_OWN_WRITES? It is their default consistency level but I don’t see a way to specify the MutationToken. The statement consistentWith does not even occur in their code base.

graham.pople · June 6, 2019, 4:06pm

Hey @johannesjasper

Indeed, statement_plus is used to allow you to be consistent with your own recent updates, e.g. it gives read-your-own-writes semantics.

johannesjasper · June 6, 2019, 9:32pm

Cool, so I got that right Tank you @graham.pople!

Thus in a distributed system the choice would be between NOT_BOUNDED and REQUEST_PLUS, right? Unless, of course, I can assure there is only one instance of my service running.

Can anybody comment on my second question on spring-data-souchbase? How do they assure that I read my own writes?

graham.pople · June 7, 2019, 9:00am

Well, it still has some utility in distributed systems, it depends on the use-case. E.g. you may want to remove an item from a list and return the new list of items including that removal, in the same request. This would prevent the issue where the end user removes an item, the page automatically refreshes, and their item is back again.

Your example seems to need something much more complex, e.g. I think you’re looking for the count at the exact point after you do the upsert, and not including any mutations between the upsert and the select? If that’s right, then you’re looking for transactional snapshot isolation semantics, which is something that’s possible but very expensive (in terms of performance and scalability) to achieve with distributed databases, as it usually requires timestamped operations and storing multiple versions of docs e.g. MVCC.

If instead you only need the count to definitely include the upsert, then at_plus is fine. If you need the count to include at least the upsert and any other mutations at the point of the upsert, but you don’t care if a few additional mutations post-upsert and pre-count are also included, then request_plus is good.