Priority in N1QL indexes


#1

I’m hoping someone can shed some light on why it does not appear that my indexes are being used by couchbase propertly. I have created the indices using the N1QL command line, I’m running Couchbase 3.0.1 on Mac OSX, and I’m running N1QL DP4.

I have documents in the “default” bucket that look something like this:

{
    "_type":"board",
    "tId":"team::1"
}

And another document that might look like this

{
    "_type":"card",
    "bId":"board::1"
}

And another like:

{
     "_type":"cxn",
     "cId: "card::1",
     "tId": "team::1",
     "cxns": ["board::1", "board::2"]
}

I have created 3 indexes to account for my 3 different types of documents.

Index 1: Index the _type fields only

cbq > CREATE INDEX typeIndex ON default (_type);

Index 2: Index the “_type” and “tId” fields

cbq > CREATE INDEX typeTeamId ON default (_type, tId);

Index 3: Index the “_type”, “tId” and “cxns” fields

cbq > CREATE INDEX typeTeamIdCxns ON default (_type, tId, cxns);

Now, I would expect that issuing a query that is only referencing the _type field, it would use the “typeIndex” just created, but instead I see the typeTeamIdCxns being referenced, with the only span checking on the _type parameter.

cbq> explain select * from default where _type = 'board';
{
"requestID": "e1a7bc84-86da-4618-b694-5acd7e296f22",
"signature": "json",
"results": [
    {
        "#operator": "Sequence",
        "~children": [
            {
                "#operator": "IndexScan",
                "index": "typeTeamIdCxns",
                "keyspace": "default",
                "limit": 9.223372036854776e+18,
                "namespace": "default",
                "spans": [
                    {
                        "Range": {
                            "High": [
                                "board"
                            ],
                            "Inclusion": 3,
                            "Low": [
                                "board"
                            ]
                        },
                        "Seek": null
                    }
                ]
            },
            {
                "#operator": "Parallel",
                "~child": {
                    "#operator": "Sequence",
                    "~children": [
                        {
                            "#operator": "Fetch",
                            "keyspace": "default",
                            "namespace": "default"
                        },
                        {
                            "#operator": "Filter",
                            "condition": "((`default`.`_type`) = \"board\")"
                        },
                        {
                            "#operator": "InitialProject",
                            "result_terms": [
                                {
                                    "star": true
                                }
                            ]
                        },
                        {
                            "#operator": "FinalProject"
                        }
                    ]
                }
            }
        ]
    }
],
"status": "success",
"metrics": {
    "elapsedTime": "1.217282ms",
    "executionTime": "1.109197ms",
    "resultCount": 1,
    "resultSize": 1990
}
}

Making a slightly more complex call, including both the _type and the tId, only the “typeIndex” is invoked, which doesn’t account at all for the “tId” fields used in the query

cbq> explain select * from default where _type = 'board' AND tId = 'team::1';
{
"requestID": "101e1cc8-7c45-4395-8b85-7322e5d6a246",
"signature": "json",
"results": [
    {
        "#operator": "Sequence",
        "~children": [
            {
                "#operator": "IndexScan",
                "index": "typeIndex",
                "keyspace": "default",
                "limit": 9.223372036854776e+18,
                "namespace": "default",
                "spans": [
                    {
                        "Range": {
                            "High": [
                                "board"
                            ],
                            "Inclusion": 3,
                            "Low": [
                                "board"
                            ]
                        },
                        "Seek": null
                    }
                ]
            },
            {
                "#operator": "Parallel",
                "~child": {
                    "#operator": "Sequence",
                    "~children": [
                        {
                            "#operator": "Fetch",
                            "keyspace": "default",
                            "namespace": "default"
                        },
                        {
                            "#operator": "Filter",
                            "condition": "(((`default`.`_type`) = \"board\") and ((`default`.`tId`) = \"team::1\"))"
                        },
                        {
                            "#operator": "InitialProject",
                            "result_terms": [
                                {
                                    "star": true
                                }
                            ]
                        },
                        {
                            "#operator": "FinalProject"
                        }
                    ]
                }
            }
        ]
    }
],
"status": "success",
"metrics": {
    "elapsedTime": "1.111ms",
    "executionTime": "1.048ms",
    "resultCount": 1,
    "resultSize": 2025
}
}

Finally, in a 3rd query where I am joining 2 documents together, the 2nd document getting joined based on the existence of a parameter within an array of values. When I run an Explain on it, I see the very simplist index being used again, just the “typeIndex”. This one should be using the typeTeamIdCxns index.

cbq> explain SELECT card.* FROM default cn JOIN default card ON KEYS cn.cId WHERE cn._type = 'cnet' and cn.tID = 'team::1'  AND 'board::bkHdkVa5M' IN cn.cxns;

{
"requestID": "355705f0-aebf-4e02-8925-1195d5a3b5c6",
"signature": "json",
"results": [
    {
        "#operator": "Sequence",
        "~children": [
            {
                "#operator": "IndexScan",
                "index": "typeIndex",
                "keyspace": "default",
                "limit": 9.223372036854776e+18,
                "namespace": "default",
                "spans": [
                    {
                        "Range": {
                            "High": [
                                "cnet"
                            ],
                            "Inclusion": 3,
                            "Low": [
                                "cnet"
                            ]
                        },
                        "Seek": null
                    }
                ]
            },
            {
                "#operator": "Parallel",
                "~child": {
                    "#operator": "Sequence",
                    "~children": [
                        {
                            "#operator": "Fetch",
                            "as": "cn",
                            "keyspace": "default",
                            "namespace": "default"
                        },
                        {
                            "#operator": "Join",
                            "as": "card",
                            "keyspace": "default",
                            "namespace": "default",
                            "on_keys": "(`cn`.`cId`)"
                        },
                        {
                            "#operator": "Filter",
                            "condition": "((((`cn`.`_type`) = \"cnet\") and ((`cn`.`tID`) = \"team::1\")) and (\"board::bkHdkVa5M\" in (`cn`.`cxns`)))"
                        },
                        {
                            "#operator": "InitialProject",
                            "result_terms": [
                                {
                                    "expr": "`card`",
                                    "star": true
                                }
                            ]
                        },
                        {
                            "#operator": "FinalProject"
                        }
                    ]
                }
            }
        ]
    }
],
"status": "success",
"metrics": {
    "elapsedTime": "2.089779ms",
    "executionTime": "1.963745ms",
    "resultCount": 1,
    "resultSize": 2489
}

}

Before I go too far down this path, I’m hoping someone could shed some light as to what I’m missing.

Thanks!
Chris


#2

Hi Chris,

You are seeing the behavior as currently implemented. For now, N1QL only offers this guarantee: if there are one or more indexes that can satisfy the query, one of those indexes will be used. For now, N1QL is not guaranteed to use the longest or most selective of those indexes.

In your example, you can stick with only one index, the 3rd one, without any loss in generality or performance.

Finally, JOINs in N1QL only scan the first (leftmost) keyspace, so only indexes on that keyspace are considered. In your example, the “cards” are fetched directly using their keys, so there is no need for a second index scan.

Please lmk if this answers your question.

Thanks,
Gerald


#3

Hi Gerald,

Thanks, that makes sense.

For the 3rd query, I’m concerned about the “cxns” not being indexed. This where I feel there will be the most opportunity for a bottleneck. Imagine if there were tens or hundreds of thousands of records that matched in the “_type” and on the “tId”, then it’s up to querying the “cxns” array using the EXISTS operator.

It’s tough to determine if this is properly indexed. Even when I remove the first two indexes, if I then run Explain on the 3rd query, it only informs we that its matching still on the _type parameter without mentioning “tId” or “cxns”.

Finally, does the order of properties to index, when initially creating the index, matter? It seems I get different results if I run:

Create index typeTeamId on default (_type, tId)

Versus

Create index typeTeamId on default (tId, _type)

Thank you for the prompt help!

Chris


#4

The cxns are indexed as the entire array, not individual elements of the array. Therefore, it is not very useful for your example, unless you were matching against whole arrays. We have received other requests to enable indexing of individual array elements, so it is something we will consider in the future.

To your second question: yes, the order of index expressions is very important. An index can only be used if the query matches some prefix of the index expressions. So an index on (_type, tId) can be used to match _type or _typeId AND tId, but not tId by itself.


#5

Hey Gerald,

Thanks for bearing with me here. I’ve ran a few additional tests to try to work through the index expression orders. The results are below. Starting with no indexes, and then adding / removing indexes.

As you can see, with no indexes, all the data is pulled fine. There should be a total of 37 items with _type = board and tId = ‘team::2’; In addition, there should be a total of 93 items where tId = 93;

I then go on to create a few indexes, starting with only one expression, and then two expressions, in different orders. In both cases, the multiple expression indexes are not as I’d expect, and based on your response earlier, not as you’d expect either.

Let me know what you think.

Thanks,
Chris

————————————————————————————————————————
NO INDEX
————————————————————————————————————————
SELECT * FROM default WHEre _type = ‘board’;
(RESULT - 37 count)

SELECT * FROM default WHERE _type = ‘board’ AND tId = ‘team::2’
(RESULT - 37 count)

SELECT * FROM default WHERE tId = ‘team::2’
(Result - 93 count)

————————————————————————————————————————
CREATE INDEX test ON default (_type);
————————————————————————————————————————
SELECT * FROM default WHERE _type = ‘board’;
(Result - 37 count)

SELECT * FROM default WHERE _type = ‘board’ AND tId = ‘team::2’
(Result - 37 count)

DROP INDEX default.test;

————————————————————————————————————————
CREATE INDEX test ON default (_type, tId);
————————————————————————————————————————
SELECT * FROM default WHERE _type = ‘board’;
(Result - 0 count)

SELECT * FROM default WHERE _type = ‘board’ AND tId = ‘team::2’
(Result - 0 count)

SELECT * from default where tId = ‘team::2’;
(Result - 93 count)

DROP INDEX default.test;

————————————————————————————————————————
CREATE INDEX test ON default (tId, _type);
————————————————————————————————————————
SELECT * FROM default WHERE _type = ‘board’;
(Result - 37 count)

SELECT * FROM default WHERE _type = ‘board’ AND tId = ‘team::2’
(Result - 0 count)

SELECT * from default where tId = ‘team::2’;
(Result - 0 count)

DROP INDEX default.test;


#6

Thanks Chris. Looks like a bug, so I filed a bug: https://issues.couchbase.com/browse/MB-13483


#7

Thanks Gerald,

You’re a great help, and very prompt. It’s very appreciated.

-Chris


#8

Thanks Chris.
-Gerald