Performance around MapGet vs N1QL

I was benchmarking mapget against n1ql(adhoc=false) and was surprised that n1ql outperform mapget. (by 25%).

The query involved

  1. select field1 from bucket where meta().id = xxxx
  2. mapget(xxx, field1, JsonObject.class)

** note that field1 is indexed.

I wonder if this is the expected behaviour or am I missing something.

Maybe you could show the code you’re using for each?

Sure,

Bucket b = couchbase.getBucket(sessionBucket);
String subField1 = “outline”;

for(int i = 0; i < 100; ++i){
JsonObject params = JsonObject.empty();
params.put("$v0", “1502077769850_5e394619-0219-4e5c-914a-4b699cec4575”);
Statement statement = select(subField1).from(sessionBucket).where(x(“meta().id”).eq("$v0"));
N1qlQuery n1qlQuery = N1qlQuery.parameterized(statement, params, consistency);
N1qlQueryResult result = b.query(n1qlQuery);
N1qlQueryRow row = result.allRows().get(0);
jo1 = row.value().getObject(subField1);
}

for(int i = 0; i < 100; ++i){
jo2 = b.mapGet(“1502077769850_5e394619-0219-4e5c-914a-4b699cec4575”, subField1, JsonObject.class);
}

Strangely now I am getting 1000ms (query) against 220ms (mapget).

I wonder what is the differences (in implementation) between mapget and query.

I would always expect mapGet() to be quicker than a N1QL query, given it is a simpler (and hence less flexible, but faster) API.

The numbers you are quoting seem quite high - what exactly are you timing? Specifically, are you including the time to perform the initial bucket connect (couchbase.getBucket()) in your measurements?

Given that’s a one-off task (you should connect once at the start of the application and re-use the same Bucket object) it doesn’t make sense to include in your timings.

getBucket is done outside of the measurement indeed. The timing is taken using time elapse between 2 System.nanotime. The test is done by wrapping 2 System.nanoTime before forloop and the bracket.
E.g.
long start = System.nanotime();
for (xxx){
}
long end = System.nanotime();
long elapsed = end - start;

I ran the test on the very box where one of the couchbase reside to minimize latency cost.

I will probably spend more time in next week on walking through the different method calls and execution plan of the query, just to be sure that it is not one of those things where the timing is off due to incorrect setup.

Ah, so the times you were reporting (1000ms and 220ms) were not for one operation, but for N operations?

Yes, I did that to create more meaningful numbers for discussion as it smoothen the occasional spike cause by IO or gc (in the case of java).

I really need to spend some time on this, but too involved with paperwork @ work :frowning:

I will get back after I inspect my code and configuration. Thank you drigby for following this.

Here is my assumptions, please correct me if I am wrong

  1. the cost of compiling n1ql is a finite cost depending on the complexity of the query. In this case, 7ms per query
  2. the cost of request plus depends on how often the index is updated (in our case, we use forestdb).

Here are my questions

  1. Does mapget depend on index? If no, am I right that the performance of mapget depends if the doc is in memory?
  2. Does mapget ensure getting of the latest? If not, can it be configured as so?

My general use-case around this is that, I fetch the subset doing select outline from bucket where xx = 1 AND yy = 2 AND zz =3 and am deciding if I should separate them into 2 calls by select meta().id from bucket where xx = 1 AND yy = 2 AND zz =3 follow by fetching a subset of the document. I understand that this way of fetching (2 calls) release the query engine from parsing the json for returning result.

Here are the results from my test, running the java application on the box that hosted couchbase, so IO should be at its minimal.

query outline with no index (adhoc = false, REQUEST_PLUS): succeed (1010 ms)
query outline with no index (adhoc = true, REQUEST_PLUS): succeed (1708 ms)
query outline with no index (adhoc = false, NOT_BOUNDED): succeed (593 ms)
query outline with no index (adhoc = true, NOT_BOUNDED): succeed (1378 ms)
mapget outline: succeed (134 ms)
query metadata with index (adhoc = false, REQUEST_PLUS): succeed (564 ms)
query metadata with index (adhoc = true, REQUEST_PLUS): succeed (1554 ms)
query metadata with index (adhoc = false, NOT_BOUNDED): succeed (399 ms)
query metadata with index (adhoc = true, NOT_BOUNDED): succeed (1309 ms)
mapget metadata: succeed (100 ms)

Index is created (create index sessionBuckettest on sessionBucket (meta().id, metadata);
The code around query (where y = outline or metadata, x is the N1qlParams as defined in the bracket)

for(int i = 0; i < 100; ++i){
JsonObject params = JsonObject.empty();
params.put("$v0", “1502077769850_5e394619-0219-4e5c-914a-4b699cec4575”);
Statement statement = select(y).from(sessionBucket).where(x(“meta().id”).eq("$v0"));
N1qlQuery n1qlQuery = N1qlQuery.parameterized(statement, params, x);
N1qlQueryResult result = b.query(n1qlQuery);
N1qlQueryRow row = result.allRows().get(0);
row.value().getObject(y);
}

Code around mapget
for(int i = 0; i < 100; ++i) {
b.mapGet(“1502077769850_5e394619-0219-4e5c-914a-4b699cec4575”, y, JsonObject.class);
}

can someone help me with my questions

Here are my questions
Does mapget depend on index? If no, am I right that the performance of mapget depends if the doc is in memory?
Does mapget ensure getting of the latest? If not, can it be configured as so?

Your assumptions are right!

  1. mapGet() would use a streaming parser over the item internal to the server where the item is in the active vbucket. It does not depend on an index. It does indeed need to be in memory, and it’ll be fetched if needed.
  2. Since the request is going to the active vbucket, and that system is responsible for the item, it’ll always operate against the latest.
1 Like

I note that there isn’t any API around mapGet from replication. In the typical use of get (if you replicate at least once),
try{
doc = b.get(id);
}
catch(Exception e1){
doc = b.getFromReplica(id, ReplicaMode.FIRST).get(0);
}

is that necessary in mapget? if yes (assuming that mapget will always fetch from the ‘primary’ node and never from replica), does it mean that I should at least try
try{
doc = b.mapget(id, subfield, JsonObject.class);
}
catch(Exception e1){
doc = b.mapget(id, subfield, JsonObject.class); //<- this will fail over to the replica if auto failover takes place
}