Unexpected query execution time

JesusTheHun · September 17, 2021, 6:25pm

I have a simply query that takes 17secs to complete against 250k documents.

SELECT *
FROM mybucket
WHERE `_class` = "acme.domain.Block"
ORDER BY height DESC
LIMIT 1

I have this index :

CREATE INDEX `block_height` ON `mybucket`(`height` DESC) WHERE (`_class` = "acme.domain.Block")

I can’t see why it takes so long neither do the index optimizer.
This is what the execution plan looks like :

Most of the time is spent on fetching, the index is in memori right ? even though, the disk is an NVME so I don’t get it.
Any clue ?

vsr1 · September 17, 2021, 8:08pm

It is not using right index. Please check why?
It is doing explicit sort (fetching 215352 items, last one might be highest value) vs use index order.

If need provide USE INDEX.

JesusTheHun · September 17, 2021, 9:01pm

How do I check why ?

I tried the following, it did not change the execution time :

SELECT *
FROM mybucket USE INDEX (block_height USING GSI)
WHERE `_class` = "acme.domain.Block"
ORDER BY height DESC
LIMIT 1

vsr1 · September 17, 2021, 9:15pm

Leading key of index must be part of predicate for index selection

SELECT *
FROM mybucket USE INDEX (block_height USING GSI)
WHERE `_class` = "acme.domain.Block" AND height IS NOT MISSING
ORDER BY height DESC
LIMIT 1

OR

SELECT *
FROM mybucket
WHERE `_class` = "acme.domain.Block" AND height IS NOT MISSING
ORDER BY height DESC
LIMIT 1

JesusTheHun · September 17, 2021, 9:22pm

Including the height in the where clause makes the deal. Thank you
It’s weird that even with the USE INDEX it doesn’t work.

vsr1 · September 17, 2021, 9:27pm

Couchbase indexer will not index the document if the leading index key evaluates MISSING.
Assume you have document doc1 with height MISSING field, block_height will not have the doc1.
But query semantics tells give all documents with _class = “acme.domain.Block”. It suppose to give doc1 too.
As Index doesn’t have it will not qualify for this query even you give explicit hint via USE INDEX.
Hint is ignored if not qualified.

JesusTheHun · September 17, 2021, 9:40pm

This is weird, I would have bet that the index had the documents even if the field was missing. Initially I even started my response with that assumption. How can I have an index that includes all documents of type Block that have no height ? (in my case there are none but just for the example). I would have to write the index backward like this ?

CREATE INDEX missingheight ON mybucket (reward)
WHERE _class = "Block"
    AND height IS MISSING

vsr1 · September 17, 2021, 9:46pm

NOTE: Couchbase bucket can have any type of documents (even unrelated). If start indexing first leading key MISSING also, every index will have all the entries in the bucket. This makes index huge. Indexer will index MISSING for other keys once first key value is present.

CREATE INDEX missingheight ON mybucket (_class, height)
WHERE _class = "Block";

Leading index key must evaluate not MISSING.
7.0 has bucket/scope/collection. you can separate different documents as separate collection. MISSING restriction still there. MB-30249

There is other techniques if you want index height MISSING

CREATE INDEX ` block_height `ON` mybucket `( _class, `height` DESC) 
WHERE (`_class ` = "acme.domain.Block");

SELECT *
FROM mybucket 
WHERE `_class` = "acme.domain.Block" 
ORDER BY height DESC
LIMIT 1;

OR

-- height is number if MISSING indexed as "" as functional key

CREATE INDEX ` block_height `ON` mybucket `( IFMISING(`height`, "") DESC) 
WHERE (`_class` = "acme.domain.Block");

SELECT *
FROM mybucket
WHERE `_class` = "acme.domain.Block" AND IFMISING(`height`, "") IS NOT MISSING
ORDER BY IFMISING(`height`, "") DESC
LIMIT 1

JesusTheHun · September 17, 2021, 10:09pm

Thank you for all the information ! I find it hard to get documentation about scopes and collections, I don’t even know how to declare one, what benefit they bring, how are they different than another index, and with what drawback they come. Maybe it will come later ?

Kevin.Cherkauer · September 20, 2021, 10:55pm

@JesusTheHun

I find it hard to get documentation about scopes and collections

The Couchbase Server 7.0 documentation covers scopes and collections, as they first became available in this release. A couple of high-level pages:

JesusTheHun · September 21, 2021, 7:38am

@Kevin.Cherkauer

I understand the feature is very recent and therefore the doc will follow in the near future.
After reading the pages you still don’t know how to use it with the SDK nor how they behave performance wise or if you still need an index to fetch those data . You simply cannot start using the features after reading the current doc.

Kevin.Cherkauer · September 21, 2021, 4:35pm

@JesusTheHun Here are N1QL doc pages for creating and dropping scopes and collections:

To use these in SELECT queries you use the “Keyspace Path” instead of bucket name:

Essentially this is just

bucketName.scopeName.collectionName

E.g.

SELECT … FROM myBucket.myScope.myCollection

You can use the same syntax to create indexes on a collection:

CREATE INDEX indexName ON myBucket.myScope.myCollection(field1, field2)

Bucket, scope, and collection are analagous to RDBMS database, schema, and table.

JesusTheHun · September 21, 2021, 5:48pm

So let’s say today I have documents with a field “type”, and an index for this field exists so I can query documents. If I create a collection “A” and put all my documents of type=A in it. Can I query them without the need for an index on type A ?

SELECT * FROM bucket.typeA

vsr1 · September 21, 2021, 6:08pm

At current time , you need at least primary index. If you have secondary index (leading index key must be present in the query predicate to choose the index). MB-30249

Kevin.Cherkauer · September 21, 2021, 6:41pm

@JesusTheHun Here is the correct syntax and an example showing you do need an index. Secondary indexes are always on collections (so indexes created without specifying a collection are for the bucket._default._default collection, which is also the collection holding documents whose scopes and collections were not specified). Thus with collections, secondary indexes can perform better because building and updating an index on a collection ignores all documents that are not in that specific collection.

create scope default.myScope
create collection default.myScope.myCollection
insert into default.myScope.myCollection values ("key1", {"type":"A", "val1":"Value 1"})

select * from default.myScope.myCollection where type="A"
[
  {
    "code": 4000,
    "msg": "No index available on keyspace `default`:`default`.`myScope`.`myCollection` that matches your query. Use CREATE PRIMARY INDEX ON `default`:`default`.`myScope`.`myCollection` to create a primary index, or check that your expected index is online.",
    "query": "select * from default.myScope.myCollection where type=\"A\""
  }
]

create index myIndex on default.myScope.myCollection(type)
select * from default.myScope.myCollection where type="A"
[
  {
    "myCollection": {
      "type": "A",
      "val1": "Value 1"
    }
  }
]

JesusTheHun · September 24, 2021, 3:13pm

So in the end - organisation aside - collections are just semantic groups, that can be used to optimize indexing processes, XDCR replications also I guess, but doesn’t change the requirements around data querying. Is there any performance benefit using collections ?

Kevin.Cherkauer · September 27, 2021, 4:27pm

@JesusTheHun Collections bring performance benefits (reduction in CPU usage) for both index initial builds and ongoing index maintenance. This write-up explains how this works: