Duplicate Document INSERT exception

@pasin @djpongh @hideki
One of the topics in issue #1591 (https://github.com/couchbase/couchbase-lite-android/issues/1591) is the proper error handling for duplicate (key) inserts.
As @hideki points outs currently this is handled as a typical conflict.

I’d like you to please consider this as a far more serious problem, to alert the dev in a way done with most if not all databases, that the transaction has failed and the new data has not been saved. This is unrelated to a conflict, which usually means multiple simultaneous edits to an EXISTING document.

Please see your documentation at: https://developer.couchbase.com/documentation/mobile/2.0/couchbase-lite/java.html#document . Do you see try-catch surrounding the database.save()? Is it not correct to assume that the exception would be thrown for a duplicate? If a developer simply follows that documentation, how in the world would one know that documents are NOT being saved? Why should a dev assume that there needs to be attached conflict resolvers? (By the way, @hideki, Document.generation and Document.getRevID are not public.)

But at the core of the issue, the program/app/dev/USER needs to be alerted specifically that a NEW document has NOT been saved due to a DUPLICATE ID.
I can go on and on (as you know!) with many scenarios on the importance of this. But I will let COUCHBASE SERVER do the arguing on my behalf. Obviously, the server (Java) api DOES throw a duplicate exception! Here: http://docs.couchbase.com/sdk-api/couchbase-java-client-2.5.5/com/couchbase/client/java/error/DocumentAlreadyExistsException.html
(Especially in Mobile apps where many developers use the local db only. No chance of conflict with another node. Why worry about conflicts?)

Having said all this, if you disagree and keep it the way it is, ok - but at least update the documentation so that devs know how to code to properly handle duplicate id’s.
Thank you much.
-nat

We are evaluating our conflict resolution algorithms and in the light of that, the document save API is being evaluated so with that, the uncertainty around whether or not a document save went through or not should be alleviated.

I am not sure on the basis of what data you make this assumption that many mobile apps use local db only. While local only db is a use case we support, we have to also be cognizant of a large number of our users who use it multi-writer distributed systems where conflicts are likely.
Anyway, as I mentioned above, with some of impending changes, it should simplify case for local-db only environments such as yours where conflicts are unlikely.

And why do you need that info ?

On contrair; What I was saying was that even though this is Couchbase Lite and of course therefore usually part of a distributed system, do not forget that a great percentage of developers Android & iPhone would use cbl just for their app - and nothing else -. I wouldn’t be surprised if more than 50% of apps using cbl are local only.

Regarding Document.generattion() and getRevID; I was responding to @hideki that posted some code on issue 1591 on github, last post of that issue. (And I didn’t want to re-open the issue, hence this post here.)

Back to the original issue, if I may. Kicking out a duplicate record (with some error code!) is normal behavior of any db system, sql or otherwise. Oracle, IBM DB2, MySql, and indeed Couchbase server!

A duplicate ID really is the same thing as an update conflict, it’s just that it’s the first update. If you don’t want this to be possible, you should use UUIDs as the docIDs.

The server SDKs aren’t a good comparison because they have no concept of revisions, and they’re not designed for highly distributed systems that spend much of their time offline. Also, the SDKs do support a ‘retry’ mode where you pass a block/lambda to update the document body, that gets re-evaluated when there’s a conflict; this is pretty similar to our conflict handlers.

I wouldn’t be surprised if more than 50% of apps using cbl are local only.

I agree it’s a valid use case, but I haven’t heard of anyone shipping an app using client-only CBL. And zero percent of Couchbase’s customers do this.

@pasin @priya.rajagopal @jens
[Please read at least twice before responding. Thank you.]

False. But true that this is a conceptual argument. In all my database experience, since dbase II on cp/m systems in 1985, and DBase III on the 1st IBM PC, and tens of different databases since, including JPA on JEE, including various methodologies of multi-user and multi-tier systems, I have never encountered a system that allows this to happen without warning! (I did mention that at least the documentation should explain how to code for this scenario)

You gotta be kidding me. If YOUR db can’t handle it, restrict it and force the use of UUIDs!
Many systems rely on this ‘error’ by design. For example there are systems that are designed with lastName-FirstName as the PK (primary key), or ID. Of course there might be duplicates with two totally unrelated people with that name, and all the fields are of course different. The system is designed with this in mind and when the user/operator enters a new dupe, it simply prompts the operator that so & so - with this and that data - already exists! Is this the same user? Well, why didn’t you know that? Is this truly a new record, append xyz (depending on company policy) to the id. Point is, it’s by design.
Of course there is the argument that this is not good design, BUT many systems eliminate the need for an expensive index using this methodology.
Another scenario: A custom log of events that uses a Long.toString() of system time for the id. (Again to save an index overhead with millions++ of records.) In our design we have a unique identifier for each mobile device as part of the id. So it is IMPOSSIBLE to have duplicates due to other nodes. So the id would be yyMMddhhmmssSSS-DEVICE_ID. (SSS is the millisecond.) Although usually we do not have multiple records in the same millisecond, it does happen (logging many mobile generated events), and when it does, we need to know!

I hope you haven’t started writing a response yet because you might want to think again after the following point: With CBLite there are NO UNIQUE INDICES! So even if I were to use UUID for the id, I cannot create a unique index on the data I want unique! Simply doesn’t exist in CBL 2.

The problem is that you are speaking techie speak, while I am speaking plain English. I too speak computer lingo (including assembler) but as a consultant-developer I try to take off my computer hat and put on a ‘simple-user’ hat. So… I need not care what your technical implementation explanation is, just how logical is the end-result for any given argument.

Hopefully your marketing department doesn’t see this statement! (Ok, you are a computer whiz, but not marketing, which is ok.) 1. Although (say 50% of) the developers are for client only, the USERS of these apps likely need/have a server database. If they are happy with Couchbase lite, which db are they likely to use for the server? (Developed by a ‘server’ guy.)
2. The app developers themselves choose a local database with an eye to expansion (or a pro version of the app) to the server db! 2b. Developers usually have decided on Couchbase Server IF they are happy with CBLite.
Mobile apps are much more in demand than web-apps today. Mobile is the door to the real thing.

I hope you want new customers!

Please understand that this is constructive criticism, and I spend so much time on this because I want it to succeed. For myself I have workaround utility code for the problems I encounter. But my experience and conscious (after all, I am using your free software!) demand that I raise these issues.

Thank you much.
All the best.
-nat
ps. I will refrain to comment for at least a day, because I want to see if this will get multiple responses and/or opinions.

This seems like one of those violent-agreement situations. I agree that two threads shouldn’t be able to create the same document with no indication anything happened. Same thing when two threads try to update an existing document. In all cases this is treated as a conflict, and a handler is invoked. The handler can easily abort the save by returning null.

Based on the types of data customers tend to store, on how often conflicts actually occur, and on how sophisticated the average developer is about conflict handling, we decided to go for the simple path and have a default conflict handler in place. In the cases you describe as examples, the developer would of course register a short conflict handler to fail the save.

We’ve actually (by coincidence) been talking about this design over the past week and are making a few last-minute API tweaks to enable the “optimistic concurrency” behavior you’re talking about and that the SDKs expose, where the Save operation returns an error if a conflict occurs.

With CBLite there are NO UNIQUE INDICES!

Consider the complexity of enforcing uniqueness of a property value in a distributed system. Two nodes in the network can simultaneously create a property with the same value and start propagating it. At some point, perhaps hours later if one device was offline or there’s a network partition, a database instance somewhere will receive both of those updates and the unique index will blow up. Now what? There’s no transaction to roll back (transactions are nearly impossible in a distributed system) and neither document update is illegal. There’s not even a well defined notion of which one came “first”.

The problem is that you are speaking techie speak, while I am speaking plain English.

Not really. You’re using database jargon like primary key, unique index, etc. It’s just more widely-known techie speak.

The language doesn’t matter; this is an engineering forum and not understanding the words isn’t a viable defense. It’s a simple fact that distributed systems are unlike non-distributed ones, and widely/loosely distributed ones are even more different. Your intuition from a monolithic system, whether dBASE or MySQL, will not serve you well.

I hope you want new customers!

Sure. I was just cutting short your argument about what our existing customers do. And we are not going to make engineering decisions that make life easier for customers not using replication, if it makes life harder for those who do, because the latter are our primary market.

CBLite already does this! Why didn’t you tell me!!
@pasin @hideki @priya.rajagopal @jens

All I need to do is to add a ConflictResolver that does nothing (return null), and it will cause exactly the results I am looking for! The new important point here is that I can have a ConflictResolver without getting into resolving anything, not Conflict.Base/mine/theirs, nothing, just return null! (When/if that local db gets connected to a server/sync gw, then the exact algorithm for your specific db requirements can be coded.)

Maybe the default DataBaseConfiguration should set a ‘nothing’ resolver by default. This is another minor question. The main issue about this, for me, is solved.

Hope folks gain from this thread someday.
-nat

@pasin @priya.rajagopal @jens
Quote from : https://developer.couchbase.com/documentation/mobile/2.0/whatsnew.html

No Conflicts Mode
Sync Gateway 2.0 introduces a ‘no conflicts mode’. When enabled, Sync Gateway will reject any revision that would create a conflict. This mode is specifically designed for scenarios where you do not wish to use the multi version concurrency control aspect of Couchbase Mobile.

As mentioned in my earlier response and what @jens alluded to, we are evaluating our conflict resolution algorithms. So “custom conflict resolver”, which is what you are mentioning above will very likely be going away in the next release or so. So I would recommend that you don’t adopt this solution.

Again, @jens and I have already mentioned that your particular issue should be taken of with impending changes - "(by coincidence) are making a few last-minute API tweaks to enable the “optimistic concurrency” behavior you’re talking about and that the SDKs expose, where the Save operation returns an error if a conflict occurs.:

. Couple of points here -

No-conflicts mode means that conflicting branches (of document revision tree) don’t exist in the database. And the way that is accomplished is that the conflict resolver kicks during document save so we never allow conflicting documents to be saved, We have a default conflict resolver that handles this automatically for you (“Automatic conflict resolution”) . Jens described this and the rationale for this in detail earlier. Once again, we are tweaking the API so you get more insights into document save.

The statement above talks about Sync Gateway 2.0. SGW rejects when an attempt is made to push up a conflicting document. So the clients running couchbase lite are responsible for resolving the conflicts when documents are pulled down and thats through the automatic conflict resolution process we have talked about.

(These will be documented in a blog when the GA is out)

You missed my point of the paragraph about no-conflict mode. It was a logical - non-technical point, which is " specifically designed for scenarios where you do not wish to use the multi version concurrency control aspect of Couchbase Mobile. ". In other words, for developers who are simply writing apps using a database and do not want to factor in the distributed database at that point in time. (This, can be true not only for normal apps using a simple solid db, but also for seasoned Couchbase devs that code a mobile app -in stages-. Stage 1: “let’s have a functional solid app standalone”. Stage 2: A pro version of the app that connects to server via sgw.)
What I was saying was that from that Couchbase documentation page, it seems like [some of] the original designers of cbl2 actually planned it this way. Good for them for having this simple common sense in the design of cbl2.

Yeah, we had some debates about this early in the API design process. At the time we decided in favor of ease of use, i.e. making the default behavior one that doesn’t require the developer to handle an error in this case. We are currently doing some tweaking of this, however.

This mode is specifically designed for scenarios where you do not wish to use the multi version concurrency control aspect of Couchbase Mobile

That’s a bit of a misstatement in the docs. The no-conflicts mode is still MVCC. It’s just not a form of MVCC that allows deferred resolution of conflicts (by preserving the revision tree.)

@jens
Actually it was your API docs that revealed this ‘trick’. The doc on the resolve(Conflict) method in http://docs.couchbase.com/mobile/2.0/couchbase-lite-java/db022/com/couchbase/lite/ConflictResolver.html states:
Returning a nil document means giving up the conflict resolution and will result to a conflicting error returned when saving the document

  1. 'giving up the conflict resolution ’ - Yes!
  2. and will result to a conflicting error returned’ - Yes! & Yes!
    To whoever wrote such a clear, concise, 2-liner method doc; Thank you!
    -nat

Well - you clearly missed my point.

I recommended to you that you should not be adopting this approach because this will likely go away in next release. Including it again for reference.

Once again, please wait for changes coming to the conflict resolution API that should handle some of your concerns.