Perfomance issue with N1QL self join

singh_siddhu · February 10, 2016, 7:37am

Same result.

Query execution time : 11.83 s
Explain Query :

[{
"#operator": “Sequence”,
"~children": [{
"#operator": “IntersectScan”,
“scans”: [{
"#operator": “IndexScan”,
“index”: “idx_gle_type_balance”,
“keyspace”: “NAV”,
“namespace”: “default”,
“spans”: [{
“Range”: {
“Inclusion”: 1,
“Low”: [
“null”,
“null”
]
}
}],
“using”: “gsi”
}, {
"#operator": “IndexScan”,
“index”: “idx_GLE_Type”,
“keyspace”: “NAV”,
“namespace”: “default”,
“spans”: [{
“Range”: {
“High”: [
"“GLEntry”"
],
“Inclusion”: 3,
“Low”: [
"“GLEntry”"
]
}
}],
“using”: “gsi”
}]
}, {
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [{
"#operator": “Fetch”,
“as”: “X”,
“keyspace”: “NAV”,
“namespace”: “default”
}, {
"#operator": “Filter”,
“condition”: “((((X.Type) = “GLEntry”) and ((X.Balance) is not missing)) and ((X.No_) is not missing))”
}, {
"#operator": “InitialGroup”,
“aggregates”: [
“sum((X.Balance))”
],
“group_keys”: [
"(X.No_)"
]
}]
}
}, {
"#operator": “IntermediateGroup”,
“aggregates”: [
“sum((X.Balance))”
],
“group_keys”: [
"(X.No_)"
]
}, {
"#operator": “FinalGroup”,
“aggregates”: [
“sum((X.Balance))”
],
“group_keys”: [
"(X.No_)"
]
}, {
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [{
"#operator": “InitialProject”,
“result_terms”: [{
“as”: “No_”,
“expr”: “(X.No_)”
}, {
“as”: “Balance”,
“expr”: “ifnull(sum((X.Balance)), 0)”
}]
}, {
"#operator": “FinalProject”
}]
}
}]
}]

singh_siddhu · February 10, 2016, 3:18pm

Any thoughts here guys ? @keshav_m and @geraldss ? Same query with SQL server with ~ 3 million records is under 1 second .

keshav_m · February 11, 2016, 7:05am

Please execute the following query and you should see better performance.

select
X.No AS No,
IFNULL(Sum(X.Balance),0) as Balance
from NAV X USE INDEX (idx_gle_type_balance)
Where X.Type = "GLEntry"
AND X.Balance IS NOT MISSING
AND X.No_ IS NOT MISSING
Group by X.No_;

keshav_m · February 11, 2016, 5:30pm

Please create the following index and try the query below it.

CREATE INDEX idx_gle_type_balance2 ON NAV(No_, Balance, Type) WHERE (Type = ‘GLEntry’) USING GSI;

select
X.No_ AS No_,
IFNULL(Sum(X.Balance),0) as Balance
from NAV X USE INDEX (idx_gle_type_balance2)
Where X.Type = "GLEntry"
and X.Balance IS NOT MISSING
AND X.No_ IS NOT MISSING
Group by X.No_

----- Update from Siddu is, with the new index in place, query runs in about 1.7 seconds.

geraldss · February 12, 2016, 1:25am

Hi @keshav_m, can you please post the same answer on Stack Overflow so others can find it? Here is the question on Stack Overflow:

Thanks,
Gerald

singh_siddhu · February 15, 2016, 7:56am

Yes it performed really well with proper index creation. I think Explain query should also be explained well in the documentation with a use-case as an example. Thanks for the help.

singh_siddhu · February 17, 2016, 6:44am

For views as well do we get benefit by having dedicated Index and query node ?
Or all the nodes should be having all index + query + data ?

vmx · February 17, 2016, 7:06am

Views are currently always co-located with the KV nodes.

singh_siddhu · February 17, 2016, 8:32am

Even if I dedicate a server explicitly to keep only data.

vmx · February 17, 2016, 8:47am

Yes if it keeps only data it will also keep the views if you have defined any.

singh_siddhu · February 17, 2016, 10:15am

Thanks @vmx. Also suppose I want to use Views but pre-calculated data and N1QL for some adhoc queries. Then should I create my cluster with query and index separately and data node separately. Will it suffice both the need efficiently ?

vmx · February 17, 2016, 10:26am

I’ve too little experience with sizing Couchbase, I sadly can’t help here.

singh_siddhu · February 17, 2016, 10:46am

@geraldss, @keshav_m, any input on my question please.

geraldss · February 17, 2016, 1:03pm

Hi @singh_siddhu,

Are you referring to your question about views? Or a different question?

We can get you on a Skype call with @keshav_m. Let us know.

Thanks,
Gerald

keshav_m · February 17, 2016, 11:16pm

In some use cases, @singh_siddhu, wants to join, group & aggregate over > 200K rows and within sub seconds. For that, view based pre-calculation may help.
Views & KV always co-exist.

For production use, we recommend you having index and query services on separate nodes.

singh_siddhu · February 18, 2016, 6:00am

Got the answer from @keshav_m on skype. Posted it here for public view. Basically we need query / index servers separate from data nodes. But this would help in N1QL only. For views, doesn’t matter looks like because it stays close to data. Then again the question arises : N1QL will also need data from all the nodes. So does it mean that it all happens from index node and therefore, no map-reduce is required ? I think a comparison of N1QL versus View as document is required. What to use when ? From evolution of Couchbase it gives impression that we should get rid of views and start using N1QL.