N1QL transactions with Go SDK and lost updates

Muneeb_Abbas · October 18, 2021, 7:43am

The docs state that N1QL transactions can be used at any place N1QL can be used. I am using the Go SDK which doesn’t have explicit support for transactions but can use the REST API to communicate with the query service.

The docs also state that CAS value will be used to make sure updates are not lost in case of concurrent transactions that touch the same document.

	var wg sync.WaitGroup
	for i := 0; i < 100; i++ {
		wg.Add(1)
		go func() {
			results, err := cluster.Query("START TRANSACTION;", nil)
			if err != nil {
				panic(err)
			}
			var tx txid
			err = results.One(&tx)
			if err != nil {
				panic(err)
			}

			opts := &gocb.QueryOptions{Raw: map[string]interface{}{"txid": tx.TxID}}

			results, err = cluster.Query("UPDATE customer SET balance = balance - 1 WHERE cid=1;", opts)
			if err != nil {
				panic(err)
			}
			results, err = cluster.Query("UPDATE customer SET balance = balance + 1 WHERE cid=2;", opts)
			if err != nil {
				panic(err)
			}
			_, err = cluster.Query("COMMIT;", opts)

			if err != nil {
				panic(err)
			}
			wg.Done()
		}()
	}

This code tries to run the following set of queries

START TRANSACTION;
UPDATE customer SET balance = balance - 1 WHERE cid=1;
UPDATE customer SET balance = balance + 1 WHERE cid=2;
COMMIT;

Effectively trying to make a transfer of amount 1. This is done 100 times concurrently and a bunch of updates are lost. Do transactions not provide safety against lost updates? If not, is there no way to achieve safe transfer of balance from one customer to another using the Go SDK?

graham.pople · October 18, 2021, 9:18am

Hi @Muneeb_Abbas
First thing to note is that this code will only work against a single-node cluster, as all queries in the same transaction need to go to the same query node. For this you will need to use the REST interface directly, until a Go implemention of transactions is available.

When talking about Lost Updates, as in what the documentation is referring to, there is the specific database phenomenon (described well here What Is a Lost Update in Database Systems? - DZone Database), which requires a read followed by an update, which is not going on in your transaction.

Your transaction involves straightforward single UPDATEs that should certainly be detected as write write conflicts, with one transaction being rejected.

You mention that updates are being lost - what leads you to this conclusion? I would expect from this transaction to always have a constraint that the balance of cid=1 is the negative balance of cid=2 (e.g. -50 vs 50) - assuming that all statements succeed. Is this constraint being broken? Do any statements fail? (Note that N1QL has Statement Level Atomicity, e.g. if an individual statement fails then it will be rolled back, and an error reported back to the user, but the overall transaction is permitted to continue.

Muneeb_Abbas · October 18, 2021, 12:16pm

It is a single-node cluster but I think that shouldn’t matter as the SDK should be sending the txid extra parameter which is all that should be required to make sure the query is processed by the correct node (Note that in my testing, I have a single node cluster anyways).

from this blog The N1QL Query Language Now Supports Distributed ACID Transactions

The N1QL transactions feature is supported anywhere you use a N1QL query.

The main requirement is that a txid value – which is returned from the START TRANSACTION command – is used with all subsequent N1QL DMLs if they’re a part of the transaction. The same goes for the final COMMIT or ROLLBACK.

I can confirm that the Go SDK passes along the extra parameters such as txid and tximplicit which means that it doesn’t matter if I use the REST interface directly as the SDK is just calling the REST interface I think.

How do I know updates are being lost?
Instead of 100 being transferred, it’s either 1 or 2 (And changes between runs meaning it’s a race condition). What I am missing here is the write-write conflict that is mentioned in the docs.

graham.pople · October 18, 2021, 12:39pm

Hi @Muneeb_Abbas
The Go SDK does use the REST interface under the hood, yes. But txid alone isn’t enough - it’s also necessary to send each query to the same query node. Otherwise you’ll get an error that the transaction id is not recognised.

I think what’s happening is that most of the transactions are actually failing due to write-write conflicts, leading to just one or two succeeding. A transactions library will automatically retry a failed transaction for you (where it is sensible to retry, such as with a write write conflict), but when using query statements directly in this way, it’s necessary to do that yourself.

vsr1 · October 18, 2021, 1:10pm

@Muneeb_Abbas ,

Also checkout each transaction/each statement is returned error or success. You will have only number of successful transaction begin to end are transferred (not 100).

Also note N1QL transaction uses optimistic concurrency , i.e. All mutations are performed with in the transaction and during commit time it will write with cas value of first mutation of the corresponding document. There is no retry with in the N1QL. Due to optimistic concurrency, If concurrent transactions mutating same documents there will be higher rollbacks.

If you using REST API directly, Application is responsible for sending all statements with in the transaction to same query node. After successful stat transaction, on any error you must rollback instead of abending transaction. Once SDK support transaction all these will be handled by transaction frame work.

If you still have issues, Will you able to provide CB version, GOCB version, Number of CPUs on query node, complete repro.

Muneeb_Abbas · October 18, 2021, 5:33pm

@graham.pople
I understand that txid alone isn’t enough and we have to manually send each query to the same node. However, I am only using a single node right now for testing so we can get to handling multiple nodes later.

I also understand that the retry will have to be manual. I was not under the impression that it’s something the SDK will handle. What I was trying to look for when writing this example was the error message I will get with the write write conflict so that I can handle it manually. It seems that the SDK does not report it back, at least from what I can see.

			results, err = cluster.Query("COMMIT;", opts)
			if err != nil {
				panic(err)
			}
			err = results.Close()
			if err != nil {
				panic(err)
			}

I have tried calling Close() on the QueryResult as well but to no avail. What I would like to know is, if there should be any signal or error when using N1QL with Go SDK that will let me handle the retry behavior from the application logic. I think using the REST interface might be the only option from Go though as the SDK might need some more work before it can be used with N1QL transactions. The documentation should also be clear on this in my opinion.

Muneeb_Abbas · October 18, 2021, 5:38pm

@vsr1

Couchbase enterprise 7.0.1 and community 7.0.1 (Tried with both)
Go 1.17
gocb v2.3.2

To reproduce, you can just create two customer documents and a customer bucket. The document will have the following structure.

customer-1

{
  "balance": 100,
  "cid": 1
}

customer-2

{
  "balance": 0,
  "cid": 2
}

After running the piece of code, customer-2 should have balance as 100 and customer-1 should have balance as 0.

I really thought the SDK will propagate the write-write conflict somehow so that application can handle it itself. Thanks for the help though!

vsr1 · October 18, 2021, 5:41pm

@Muneeb_Abbas, Please provide the complete repro of your code.

Muneeb_Abbas · October 18, 2021, 8:25pm

 package main

import (
	"sync"

	"github.com/couchbase/gocb/v2"
)

func main() {
	cluster, err := gocb.Connect(
		"localhost",
		gocb.ClusterOptions{
			Username: "muneebabbas",
			Password: "muneeb",
		})
	if err != nil {
		panic(err)
	}
	// Start a transaction and get the transaction ID
	type txid struct {
		TxID string `json:"txid"`
	}

	var wg sync.WaitGroup
	for i := 0; i < 100; i++ {
		wg.Add(1)
		go func() {
			results, err := cluster.Query("START TRANSACTION;", nil)
			if err != nil {
				panic(err)
			}
			var tx txid
			err = results.One(&tx)
			if err != nil {
				panic(err)
			}

			opts := &gocb.QueryOptions{Raw: map[string]interface{}{"txid": tx.TxID}}

			_, err = cluster.Query("UPDATE customer SET balance = balance - 1 WHERE cid=1;", opts)
			if err != nil {
				panic(err)
			}
			_, err = cluster.Query("UPDATE customer SET balance = balance + 1 WHERE cid=2;", opts)
			if err != nil {
				panic(err)
			}

			results, err = cluster.Query("COMMIT;", opts)
			if err != nil {
				panic(err)
			}
			err = results.Close()
			if err != nil {
				panic(err)
			}
			wg.Done()
		}()
	}
	wg.Wait()
}

This is the entire test file. I have explained the behavior and necessary pre conditions you would need to test this. I am not sure what more you’d need.

vsr1 · October 18, 2021, 8:52pm

I am not expertise in gocb., Do you need to iterate through results process execution errors.
I don’t think this is transaction related.

Second insert is primary key violation, N1QL returns error. Why gocb is not set error. Same thing happening in commit.

package main
import (
    "fmt"
    "github.com/couchbase/gocb/v2"
)

func main() {
    cluster, err := gocb.Connect(
        "localhost",
        gocb.ClusterOptions{
            Username: "Administrator",
            Password: "password",
        })
    if err != nil {
        panic(err)
    }
    // may succeed
    _, err = cluster.Query("INSERT INTO customer VALUES('k1',{'a':1});", nil)
    fmt.Print(err)
    if err != nil {
        panic(err)
    }
    // should fail due to duplicate document key. Why no error is raised
    _, err = cluster.Query("INSERT INTO customer VALUES('k1',{'a':1});", nil)
    fmt.Print(err)
    if err != nil {
         panic(err)
    }
}

graham.pople · October 18, 2021, 9:53pm

This is the crux of the issue I believe as the error certainly should be getting propagated back. I am not an expert on how gocb will surface this (perhaps @brett19 can help?), but at the JSON level, you will get an errors array, and if this contains a JSON blob with a cause field, then this information can be used to drive your program logic. cause will contain a rollback boolean field indicating whether you should issue a ROLLBACK statement, and a retry boolean field indicating whether you should then retry the full transaction. There is also a nested cause field that contains additional diagnostics information including a human readable description of the error.

vsr1 · October 19, 2021, 1:31am

I think after every cluster.Query() statement you need to iterate results and check error again (There might be better way)
This gives execution errors.

           results, err = cluster.Query("COMMIT;", opts)
            if err == nil {
                for results.Next() {
                }
                err = results.Err()
            }
            if err != nil {
               panic(err)
            }

Muneeb_Abbas · October 19, 2021, 8:29am

@graham.pople

I think @vsr1 cracked it. The problem is that we have to iterate over the results and then check results.Err(). Looks a bit silly with the empty for loop though but doing that does return the correct underlying error.

			results, err = cluster.Query("COMMIT;", opts)
			if err != nil {
				panic(err)
			}
			for results.Next() {
			}
			err = results.Err()
			if err != nil {
				panic(err)
			}

When I do this, I get the following error when calling results.Err()

query error | {"errors":[{"Code":17007,"Message":"Commit Transaction statement error"}]}

This should let me figure out what the error is.

Just FYI, I think the error codes are not updated in the documentation. For example, there is no such error code documented on this page. This might need updating

https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/n1ql-error-codes.html

Thanks everyone for the help.

Muneeb_Abbas · October 19, 2021, 8:37am

I have one follow up question however. Is there a roadmap for the Go SDK? For example, having built in transactions support like the Java SDK would really help and is that something on the roadmap?

graham.pople · October 19, 2021, 10:26am

Glad you got to the bottom of how to process the errors. Looking through the Go SDK query documentation, I see that general pattern of reading the results and then checking the Err field is documented there.

Just FYI, I think the error codes are not updated in the documentation.

Ah yes, good point. I’ve raised a docs ticket for this. Until this has been updated, please use this ticket as a reference for the new transactions error codes.

I have one follow up question however. Is there a roadmap for the Go SDK? For example, having built in transactions support like the Java SDK would really help and is that something on the roadmap?

I’ll discuss this with the team today and get back to you.

graham.pople · October 20, 2021, 11:49am

Hi @Muneeb_Abbas

I discussed the Go transactions library roadmap plans with the team, and we are targeting a release by the end of the year.

If you have a Couchbase representative then please feel free to discuss prioritisation with them on this or any other roadmap item.

AV25242 · February 24, 2022, 5:29am

Hello @Muneeb_Abbas Go SDK 2.4 that was released recently comes with ACID Transaction support.
Please see documentation as a reference - Distributed Transactions from the Go SDK | Couchbase Docs