Couchbase high durability


#1

I want to design a banking system using couchbase , and I need high durability (like RDBMS) , How can I achieve 100% durability and not bit of data loss?

I know I can use persist_to and replicate_to for durability

bucket.insert(id, data, { "persist_to": 3}, (error, result) => {
    if(error) {
        throw error;
    }
    console.log(result);
});

If I have 6 servers and I choose persist_to=3 , How can I ensure that the data is durable , for instance If the active node is server1 and my data is persisted to server1 , server2 and server3 , and we have a fail on server1 , What if server4 -that has no data in memory or disk- become active? Do I lost some data? as master (server4) dont accept data from slave (server2 and server3)


#2

Hello, how many replicas do you have?


#3

It just a question , Please assume 2 cases

1- I have 3 replicas (So persist_to is less than total data copies)
2- I have 2 replicas (So persist_to is equal to total data copies)


#4

In case #2, when persist_to is equal to total data copies, in the event of primary node failure you are guaranteed your acknowledged writes are never lost.

In case #1, when persist_to is less than total data copies, if the primary node fails, one of the nodes that the write was not persisted to could get promoted to primary. Couchbase biases towards making the vbucket(s) available as soon as possible. It is not ideal for some applications and is a known issue that we are working to address in the next release.


#5

So for a banking system , If I have 6 servers and I choose N replicas Where 1<=N<=3 , If I want to be guaranteed that writes are never lost , I must set persist_to=N+1 , i.e replicas count + primary node , So I have both high availability and durability , Right?

Another question : Just some writes for me are critical for high durability , I know that if I choose persist_to higher level , I have higher latency , But if I accept cost of latency , In high throughput writes , If I choose persist_to=4 is it affect performance (How much)? Or the cost is only latency?


#6

Correct, your summary in first para is right.

As for your question on latency and throughput impact of persist_to, you have a better alternative to do replicate_to = replicas count. In a distributed system like Couchbase, replicate_to is a more performant way to achieve durability at a high performance.


#7

@shivani_g after months, I read 70% of couch base documents and have some experience

Now I have some questions , please reply me in details

according to couch base documents

In my experience when I have 4 nodes and I want 2 replica, when I set replicate_to=2
When a node get down , As replicate_to=2 , I have error in SDK that tell me It cannot replicate to 2 other nodes (when vbucket existed in down node)
So , how can I handle error?? If I ignore error and when the node is up and it promoted to primary (in case of primary failure) , what happened? I think I lost some data

My big concern is data, As I want to create a banking system , I never ever want lost data, specially when I move money
I am worry about transactions, when user A send money to user B and , I reply with a success message to user, suddenly we have a node failure or primary promotion it is likely to lost transaction, I want have a guarantee that it is not possible


#8

Currently you need to make sure you have enough available replicas for chosen replicate_to. So if you have 2 copies (1 primary and 1 replica), and are using replicate_to =2, when you lose the primary copy you have only one copy left. And hence replicate_to =2 will fail.

You can either recover the primary or run rebalance on the cluster to restore the lost replica.

Due to these issues the current semantics will not provide the ideal platform for a banking system.


#9

@shivani_g

At first , I thing you make a mistake , according to https://docs.couchbase.com/server/4.1/developer-guide/durability.html

So I think you must not count primary in ReplicateTo

We cannot set replicate_to higher than 1 as we have 1 replica , Let me know it is not true

But , Consider the following scenario

I have 1 primary and 2 replica , and set replicate_to =1 (so it guarantee that I have data in primary + 1 replica), So when I loose a node , we have no issue for replicate_to =1

But I have 2 concern
1- When I loose 2 replica , and replicate_to =1 , In my experience , I know that primary successfully done operation , but we have exception from SDK , so how can I handle the error? What must I do exactly? it is very unclear for me

2-When I loose 1 replica , let’s call replicas as X and Y , when I loose Y and I have a write operation , I know when replicate_to =1the SDK return success , But Is it possible that Y promoted to primary while it looses the write operation?


#10

You are right, thanks for catching that. Yes, replciateTo=2 will write to primary + 2 replicas.

As for your concerns:
1- When you loose 2 replicas, you are not guaranteed durability if you lose the primary. Otherwise once the replicas comes back it will resync from the primary. You can accordingly deal with the error.

2- Yes, that is possible and this is what I mentioned in my reply to you on Aug 23.


#11

@shivani_g

In document we have

Upon receiving such a failure response, the application may retry the operation or mark it as a failure

the application may retry the operation
How it can help? when a node is lost , it may be take an hour to come back , So Can you tell me use case of “retry the operation”

mark it as a failure
How we can implement it? If I use couchbase itself and have a fail field in json document , when I have error from SDK , I can use mutateIn to set fail as true but how it can help me?? When there is no gaurantee that this mark it as a failure is durable ?
If I use another databaese for instance Redis or PostgreSQL to mark it as a failure , it is very difficult to impement it and have guarantee that data is in sync


#12

@shivani_g Reply please
@brett19 , @MikeGoldsmith , @venkat , @vsr1 Contribute please


#13

retry will only help for short lived intermittent blips, it won’t help for longer outages.

I am not sure how the mark as failure approach will work, maybe one of the SDK folks you have pinged will respond.


#14

@shivani_g Do you know what happens if :

  • We have 2 node (1 primary and 1 replica for desired vBucket) , lets call node X (Primary) and Y (Replica)
  • Assume we have a document , currently both X and Y node persisted the document with value {"status":"A"}
  • We dont use durability options for writes
  • We loose node Y (Replica)
  • We replace document with {"status":"B"}
    • So document in node X must be {"status":"B"} and in loosed node (node Y) must be {"status":"A"}
  • We loose node X too and node Y returns , So Y promoted to primary
  • Node X returns as a replica node

What happens for document in node X??

  1. document in node X remains as {"status":"B"}
  2. document in node X replicate data from node Y and set document as {"status":"A"}
  3. Random behavior

#15

@graham.pople Do you have any solution for the problem ?
I want to have a guarantee that when I transfer money from user A to user B and send a success response to end users , It is durable and impossible loosing transaction


#16

(In your question above by the way, scenario 2 happens.)

Is your request idempotent? If it’s an operation that you can simply retry on failure, then I would do that. And if ultra-high durability was a goal, I’d be aiming to make as many operations idempotent as I possibly could.


#17

Is your request idempotent?

What request?

I am creating a RESTful api , that transfer money
I have 2 methods, init that initialize transaction (insert data like from, to, amount, fee ,…)
And execute that trigger processing

I have two main document type account (has balance field) and transaction
In successful transaction, I insert a transaction and decrement balance of an account document and increment balance of another account

So I must change three document in a transaction

I implement two phase commit, and my transaction document has many states like , initial, processing, debited, deposited, completing, completed , failing, failed and many other states
My concern is , when I increment balance of payee account , it is possible that I lost this increment in rare cases, like promoting secondary nodes to primary
I have a scheduler task that handle broken transactions
I want for instance when I mark transaction as completed there is a guarantee that the transaction

  1. stay completed
    Or
  2. if for some reasons it looses the state , we have a way to detect it , and set it again as completed (like secondary promotion -it is possible that secondary has initial state, or in processing, if the transaction stay in processing for 15 minutes I mark it as failed, And it is possible I complete transaction in primary but processing in replica, So due to power outage , after 30 minutes when secondary promoted , transaction marked as failure, or after a minute if in initial state the transaction stay initial )

I hope, I could explain problem


#18

Hey @socketman2016

(I should caveat this by saying that I’m a relatively new Couchbase employee. These are my personal opinions, and not necessary the official Couchbase line.)

Quite honestly, I feel that achieving absolute 100% certainty on a write with any database is a very tough ask that you’re going to need to do a lot of work at the application layer to achieve. Forget Couchbase and transactions for a sec and pretend you’re writing a single SQL UPDATE to a classical single-node RMDBS over a local network, something of an ideal case for durability. Even here, you can hit problems. Say you send the UPDATE, but then your application immediately crashes (or the network goes down). What do you do now when the app comes back up (or the network comes back)? That update may or may not have reached the database, and you may or may not have lost it - no fault of the database, simply a reality of networks and code being fallible.

If you absolutely can’t lose an update under any perfect storm of edge cases, no matter what, then it’s going to be a lot of work. I’m not even certain if it’s possible to get there, but I’d probably start by maintaining some sort of persistent log, where I store mutations before putting them in the database. E.g before you write the mutation, you write it to the log. So if your application crashes and resumes, it can read this log, find any mutations that the database didn’t acknowledge, then read the database and check if that mutation was successfully written or not. (You may wonder why the database doesn’t maintain that log for you, but you have similar problems - what if you ask the database to write, then your app immediately crashes? You have no idea if that mutation made it to the database’s log or not.)

Of course, there’s still problems: what if your app crashes before you can write the mutation to the log? But say you can solve that, perhaps by storing all end-user interactions as soon as possible into a local persistent event source queue. Though of course, you could have the hard-drive fail just after you attempt the write, and then your app crashes…

You may be seeing where I’m coming from. This is just trying to write a single SQL UPDATE into a single-node RMDBS, and achieving the 100% durability you want across the whole system is already near impossible. Add a distributed database and multiple-document transactions into the mix, and it’s getting much harder. And to stress the point, you’re not trying to solve a Couchbase problem here, you’re trying to solve the fundamentally very hard problem of durability in the face of unreliable networks, hardware, and applications.

Ultimately, I feel a human factor needs to be considered with durability. The end-user, waiting for her account transfer on the banking website to complete, will get frustrated at the lack of confirmation, refresh the webpage, see it’s not gone through, and try it again (or call support). Your automated systems will detect two identical transactions in quick succession and flag it for human review. That kind of thing.

Maybe that’s all a bit wishy-washy and theorectical. Taking your specific question, e.g. what to do if a durable Couchbase mutation fails:

If it’s an idempotent mutation, try it again. And aim to be idempotent as much as possible. E.g. if you’ve got an amount to debit, perhaps create a key based on the event’s time plus the amount, and write that as a subdoc upsert into a map. Like this:

user: {
    	ledger: {
        	time_2018_11_07_11_33_22_234_amount_100_02: {
	        	// details
    		}
        }
    }

Now, if that fails to durably write, you can just retry that subdoc upsert.

If you can’t make it idempotent, it’s tricky. Say your write failed, your app crashed, you’ve restarted and looked at your log and found that a mutation wasn’t acknowledged. I think you’d need to do getFromReplica on all nodes to see if it was written to all. If not, you’d work out what the doc should look like (no subdoc mutations), and write the full doc to make sure all nodes are correctly set.

Anyway, that was a very long answer, apologies. But this is a very complex topic.


#19

@graham.pople, thank you very very much
I MUST read your response multi time again and again, And think about each statements

But the question that I have currently is :

You told about RDBMS , can we have such durability in couchbase too? I KNOW that if we have no replica and set persist_to=1 we can achieve it easily
But the problem is replica and asynchronous replication
So, is there any tricks can help we achieve better durability,Something like RDBMS , for now forgot other failure like network , app crashes and …


#20

So, is there any tricks can help we achieve better durability,Something like RDBMS , for now forgot other failure like network , app crashes and …

Well that seems to be ignoring most problems that can cause durability issues :slight_smile:

But, I think I get what you’re asking. Assuming your application never crashes, and you never lose your network connection to the Couchbase cluster, what specifically can you do to improve durability with Couchbase. Is that right?

So, say you’ve tried to make a durable subdoc mutation, and it’s timed out. You don’t know which nodes, if any, it was written to. How do you fix things?

I did answer this above, though I don’t blame you for missing it as my answer was rather long.

The first thing to decide is, do you want to try to complete the mutation, or roll it back, or perhaps have some mechanism where this can be logged for manual reconciliation.

The second thing to decide is, what happens if you repeatedly fail to e.g. complete or rollback the mutation.

The third thing is the actual mechanics of reapplying a mutation or rolling it back. Some operations can be retried trivially, e.g. I gave a trick above on how to convert many mutations into subdoc map upserts. For some, you’ll have to craft a fresh document containing that subdoc mutation (or without it, if you want to rollback), and write it.

The key point is that, regardless of the database in use, there is no way to get a 100% durable write. There are going to be times where hardware dies, where network switches crash… And these can happen with an RMDBS just as with a distributed database like Couchbase. Though admittedly, failures are more common-place with a distributed database, simply because you’re dealing with more nodes, more network connections, more points of failure.

So durable writes will fail, sometimes repeatedly, and you’ll have to pick the semantics that make sense for your application, which only you can decide.

Personally I’d log mission-critical data as much as possible as persistently as possible, retry failing mutations for a while, then flag it for human reconciliation. You may need to build complex tools around that - as above, achieving ultra-high durability is very hard, regardless of the database in use.

Though these continue to be my personal thoughts, rather than the official Couchbase line :slight_smile: