N1ql How to use a GSI when doing a join

n1ql
query

#1

Hello Couchbase Experts.

I have created a GSI Index but I have no idea how to write a query where I can specify how to use this newly created index in a join using the keyword “USE INDEX [index_name] [index_type]”

http://developer.couchbase.com/documentation/server/current/n1ql/n1ql-language-reference/hints.html

The documentation only tells me how to use the "USE INDEX’ key word in a simple select statement. I’d like to know if this can be done using a JOIN NEXT CONTAINS etc…

Also if you could give me some example on how to do this.

Thanks in advance.


#2

Hi @fernando_l,

Please post your CREATE INDEX statement, and then post your SELECT statement without the USE INDEX. We first need to understand what your are trying to do.


#3

Hi Fernando,

You can try USE INDEX with JOIN like this (using the beer-sample data set shipped with couchbase server).

CREATE INDEX beer_brewery_id ON `beer-sample`(brewery_id) WHERE (type = “beer”);

CREATE PRIMARY INDEX beer_primary ON `beer-sample`;

  1. explain select * from `beer-sample` beer USE INDEX (beer_primary) JOIN `beer-sample` brewery on keys beer.brewery_id LIMIT 4;
    [
    {
    “plan”: {
    "#operator": “Sequence”,
    "~children": [
    {
    "#operator": “PrimaryScan”,
    “index”: “beer_primary”,
    “keyspace”: “beer-sample”,
    “namespace”: “default”,
    “using”: “gsi”

  2. explain select * from `beer-sample` beer USE INDEX (beer_brewery_id) JOIN `beer-sample` brewery on keys beer.brewery_id
    where beer.type=“beer” and brewery.type = “brewery” and beer.brewery_id LIKE “%cafe%”;
    [
    {
    “plan”: {
    "#operator": “Sequence”,
    "~children": [
    {
    "#operator": “IndexScan”,
    “index”: “beer_brewery_id”,
    “index_id”: “b5217a9b9f96f44a”,
    “keyspace”: “beer-sample”,
    “namespace”: “default”,

Note that the SELECT must follow these two important rules (loosely described) to use the index:
a) must have a predicate/condition in the where-clause that specifies the leading index keys used in the create index. In this example (2) above, it uses beer.brewery_id LIKE “%cafe%”, where brewery_id is the index-key in the index-definition of the index beer-brewery-id.

b) if the index is a partial index, then the SELECT/where-clause must also have matching predicates. Example (2) above has beer.type=“beer”.

hth,
-Prasad
PS: provide your N1QL statements, as Gerald mentioned, in case you hit issues…


#4

@geraldss @prasad Here I give you my select query.

SELECT * FROM Session_Bucket Session
NEST Events_Bucket aryEvents ON KEYS Session.aryEvents
LEFT NEST PageRule_Bucket aryPageRules ON KEYS aryEvents[*].causeId
WHERE Session.companyId = " " AND
NOT ANY Event IN aryEvents SATISFIES
ANY rule IN aryPageRules SATISFIES Event.result='form-completed’
AND Event.causeId=rule.ruleId
AND rule.pageGroup='homedefenders.com Quote Form’
END
END AND
ANY rule IN aryPageRules SATISFIES rule.pageGroup='unknowns’
END AND Session.aryBreStatus.flagQ1 IS MISSING AND Session.lastEventTime < (NOW_MILLIS() - 300000)

and how the index was created.

CREATE INDEX idxCompanyId ON Session_Bucket(companyId) USING GSI;

will n1ql default this index if I do not specify it in the query?


#5

You can create the following index:

CREATE INDEX idx_company_time ON Session_Bucket( companyId, lastEventTime );

Then in your query, change

Session.lastEventTime < (NOW_MILLIS() - 300000)

to

Session.lastEventTime < $time_param

and pass $time_param as a named parameter.

Finally, post the output of EXPLAIN here using your new index and query.


#6

This is my explain output

[
{
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexScan”,
“index”: “fernando_idx_company_time”,
“keyspace”: “Session_Bucket”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“4b602472-44e6-4c22-a674-047484d588b8"”
],
“Inclusion”: 3,
“Low”: [
"“4b602472-44e6-4c22-a674-047484d588b8"”
]
}
}
],
“using”: “gsi”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Fetch”,
“as”: “Session”,
“keyspace”: “Session_Bucket”,
“namespace”: “default”
},
{
"#operator": “Nest”,
“as”: “aryEvents”,
“keyspace”: “Events_Bucket”,
“namespace”: “default”,
“on_keys”: “(Session.aryEvents)”
},
{
"#operator": “Nest”,
“as”: “aryPageRules”,
“keyspace”: “PageRule_Bucket”,
“namespace”: “default”,
“on_keys”: “(array_star(aryEvents).causeId)”,
“outer”: true
},
{
"#operator": “Filter”,
“condition”: “((((((Session.companyId) = “4b602472-44e6-4c22-a674-047484d588b8”) and (not any Event in aryEvents satisfies any rule in aryPageRules satisfies ((((Event.result) = “form-completed”) and ((Event.causeId) = (rule.ruleId))) and ((rule.pageGroup) = “homedefenders.com Quote Form”)) end end)) and any rule in aryPageRules satisfies ((rule.pageGroup) = “homedefenders.com Quote Form”) end) and (((Session.aryBreStatus).homedefendersQuoteBreadId) is missing)) and ((Session.lastEventTime) < (now_millis() - 300000)))”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “self”,
“star”: true
}
]
},
{
"#operator": “FinalProject”
}
]
}
}
]
}
]


#7

Thanks. But you did not use a named param. Please make that change, post the new EXPLAIN output, and tell us how long the new query takes.


#8

I am not using the sdk to run this query I am using the workbench. Is there a increase in performance when using a named parameter?


#9

Yes, significantly in this case. Use the cbq shell, and you can use a named parameter. I will ask my colleague @isha to help you with this.


#10

Hi @fernando_l,

When using the cbq shell, you can create a named parameter (the parameter should have a -$ prefix) using the \SET command as follows :

\SET -$time_param 30;

Then you can execute the query using the named param.


#11

We are currently using couchbase server 4.1.1 is this feature available in this version?


#12

In 4.1, you can use curl to pass parameters.


#13

okay I was able to do it using curl

{
“requestID”: “981a18ff-dbdf-4dbd-b034-67fc3e330aab”,
“signature”: “json”,
“results”: [indent preformatted text by 4 spaces
{
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexScan”,
“index”: “fernando_idx_company_time”,
“keyspace”: “Session_Bucket”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“4b602472-44e6-4c22-a674-047484d588b8"”,
"$time"
],
“Inclusion”: 1,
“Low”: [
"“4b602472-44e6-4c22-a674-047484d588b8"”
]
}
}
],
“using”: “gsi”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Fetch”,
“as”: “Session”,
“keyspace”: “Session_Bucket”,
“namespace”: “default”
},
{
"#operator": “Nest”,
“as”: “aryEvents”,
“keyspace”: “Events_Bucket”,
“namespace”: “default”,
“on_keys”: “(Session.aryEvents)”
},
{
"#operator": “Nest”,
“as”: “aryPageRules”,
“keyspace”: “PageRule_Bucket”,
“namespace”: “default”,
“on_keys”: “(array_star(aryEvents).causeId)”,
“outer”: true
},
{
"#operator": “Filter”,
“condition”: “((((((Session.companyId) = “4b602472-44e6-4c22-a674-047484d588b8”) and (not any Event in aryEvents satisfies any rule in aryPageRules satisfies ((((Event.result) = “form-completed”) and ((Event.causeId) = (rule.ruleId))) and ((rule.pageGroup) = “homedefenders.com Quote Form”)) end end)) and any rule in aryPageRules satisfies ((rule.pageGroup) = “homedefenders.com Quote Form”) end) and (((Session.aryBreStatus).homedefendersQuoteBreadId) is missing)) and ((Session.lastEventTime) \u003c $time))”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “self”,
“star”: true
}
]
},
{
"#operator": “FinalProject”
}
]
}
}
]
}
],
“status”: “success”,
“metrics”: {
“elapsedTime”: “11.120239ms”,
“executionTime”: “11.068164ms”,
“resultCount”: 1,
“resultSize”: 3460
}
}


#14

Ok, great. Now if you compute NOW_MILLIS() - 30000 on the client side, and pass that as the value of $time, you should see better performance. Is there also a minimum time you should add to your query? Right now, it gets all events from the beginning of time, up to five minutes ago.