N1QL DELETE DML Statement

perrin.bignoli · April 29, 2016, 11:41pm

Hi,

I’m trying to run a very simple N1QL Query to delete all documents that have keys with a certain prefix.

My CB version is: Version: 4.1.0-5005 Enterprise Edition (build-5005)

My query looks like:

DELETE FROM api WHERE POSITION(META().id, “cache::”) = 0

I have a primary index built on the api bucket and a secondary index defined as:

CREATE INDEX id_cache ON api(position((meta().id), “cache::”)) USING GSI

In a test I am running, the api bucket contains about 100,000 documents and about 70,000 of them have the prefix.

When I run the query from cbq, all of the documents are consistently deleted. However, if I run this as an asynchronous query call from inside a Java program, a variable number of documents are removed each time I run the query.

The rxjava code I am using to perform the query is:

public static Observable executeQueryAsString(AsyncBucket bucket, String queryAsString) {
return bucket.query(N1qlQuery.simple(queryAsString))
.flatMap(result -> result.rows()
.map(row -> new JsonObject(row.value().toString()))
.collect(JsonArray::new, JsonArray::add));
}

It seems as though the query is interrupted before it finishes. I’ve also tried using things like “last()”, “toBlocking()”, and “Observable.zip” with the “finalSuccess” method on the query result observable, but nothing seems to ensure that the query gets finished.

Is there anything obvious that I’m doing wrong? I know that the DELETE statement (and other DML statements) are currently in Beta.

Also, the N1QL takes between 7 and 11 seconds to run. I am running a CB cluster on the same computer that the Java program is running on. That seems slow to me, is there anything that I am missing in terms of using secondary indices properly?

Thanks!

Best,
Perrin

prasad · April 30, 2016, 12:51am

Hi Perrin,
As you are interested in only documents with specific prefix, creating partial index might help with some perf.

CREATE INDEX id_cache ON api(position((meta().id), “cache::”)) WHERE POSITION(META().id, “cache::”) = 0 USING GSI

Will get back on the SDK issue.

-Prasad

cihangirb · April 30, 2016, 2:33pm

@perrin.bignoli
I cannot reproduce the interrupted query behavior on my end. I see all data deleted immediately but I don’t have any other concurrent load on my end that is updating the data. is this the only statement running in the system or do you have other concurrent load?
This could also be related the rxjava behavior and how it traps timeouts or errors. Cross posting to Java SDK.

perrin.bignoli · May 2, 2016, 5:42pm

Hi Prasad,

Thanks for the quick reply!

I ran an informal performance test with my original indexing and with indexing with the WHERE clause you suggested above. The WHERE clause in the indexed seemed to have some improvement, but not significantly. However, that makes sense given that most of the documents in my bucket have an ID that starts with the prefix. I think it would probably make more difference for a more realistic set up where a smaller portion of the documents have the targeted prefix. In any case, it is good to know about that technique for making indexes.

I am still seeing the early termination behavior using the reactive Java API. Thanks for looking into that. The DELETE statement runs very well from cbq.

Best,
Perrin

perrin.bignoli · May 2, 2016, 7:03pm

@cihangirb
Thanks for your quick reply!

Yes, the DELETE statement is the only one running on the cluster. I’ve run the same query many times from the CBQ prompt and it has always finished when run that way.

I just tried modifying my code to execute the DELETE statement synchronously through the API. That seems to work fine–all the expected deletions are made consistently.

Best,
Perrin

perrin.bignoli · May 2, 2016, 7:12pm

@cihangirb

Hmm, actually, I take back the second part of my message above. Even with the synchronous call, I’m seeing weird results. It did appear to run correctly at first though… I’m going to break writing a completely separate test that uses no rxjava at all and I’ll report back what happens.

Best,
Perrin

perrin.bignoli · May 2, 2016, 10:09pm

@cihangirb
@prasad

I figured out what was happening. The testing apparatus was inserting the 70k entries directly before the query statement was executed, so I believe that the re-indexing operation had not yet completed when the query was executed. When I added ScanConsistency.REQUEST_PLUS to the N1QLParams, the DELETEs began working as I expected. Thanks!

Best,
Perrin

prasad · May 3, 2016, 5:13am

Glad to hear that Perrin. Thanks for the update.