Any solution guys ? Will really leave N1QL as solution if this is not going to give me sub second response.
Hi Siddhartha,
I’m not good with N1QL, so I might be wrong, but it looks like you could also use a mapreduce view to solve your problem. I also don’t know what at Date
does, so I don’t know if mapreduce views can do it. But for the rest, it would look like that:
The map function:
function(doc) {
// 1 == Debit Amount
// 2 == Credit Amount
// 3 == Net Change
// 4 == Balance
// 5 == Additional-Currency Balance
// 6 == Additional-Currency Net Change
if (doc['Debit Amount'] !== undefined) {
emit([doc.No_, 1], doc['Debit Amount']);
}
if (doc['Credit Amount'] !== undefined) {
emit([doc.No_, 2], doc['Credit Amount']);
}
...
}
Your reduce function would be _sum
.
Your query parameters would then be:
?startkey=["122537",null]&endkey=["122537",{}]&group=true
Cheers,
Volker
Hi @singh_siddhu,
Can you tell us which document has No_ as a primary key, and which one has No_ as a foreign key? I masking about N and X. That is, do the N documents have No_ as primary key, or do the X documents have No_ as a primary key?
Gerald
In N it is primary key, and X it is foreign key that is why aggregate functions are applied on X’s column because there would be multiple rows for each Key of N in X . i.e., 1 : N ratio between N and X documents.
Ok, thank you. We are taking a look.
Hi Siddhu,
You’re joining NAV X JOIN NAV N.
You have a filter on “NAV N”: WHERE N.No_ = ‘122537’
You have a primary index created on the bucket.
That means, the engine (query plan) will scan the full bucket NAV X and for each document, it looks for the field X.No_ and tries to match with “NAV N”.
The filter N.No_ = ‘122537’ is applies ONLY AFTER THE JOIN.
To improve the performance, you’ve to filter the documents early (during the scan of NAV X).
If X.No_ is the foreign key for N.No_ and the document ID has the same value,
1. You can simply apply the predicate to X.No.
2. Create a secondary index on NAV(No_)
CREATE INDEX NAVNo_ on NAV(No_);
If you have type field defined within each doc to differentiate between parent & child, you can can use that in the query and add WHERE clause to the index as well.
CREATE INDEX NAVNo_ on NAV(No_) WHERE type = ‘X’;
SELECT blah…
FROM NAV X JOIN NAV N ON KEYS [X.No_]
WHERE X.type = ‘X’ and X.No_ = 122537’
GROUP BY X.No_
I will try this as well. But didn’t understand this “and the document ID has the same value”, how can two different documents will have document ID same ? I already described earlier that keys are unique for each document.
Also how to make sure that out of 3 nodes in the cluster only 2 are data nodes and 1 is for query ?
Above mentioned help didn’t work out. It is taking 34 seconds for just 50K documents.
When you add a node to the cluster, you can choose which services you want to enable on it. There you can specify Data and/or Index and/or Query. For the rest of the performance related work I leave you in the capable hands of @keshav_m and @geraldss.
So now I have 1 data node , 1 query node and 1 index node. Still I am getting result set which is too slow.
Total 50K documents, total time it took : 12.267031445s
There is no join now. Querying using only 1 type of document
Query :
select
X.No_
AS No_
,
IFNULL(Sum(X.Balance
),0) as Balance
from NAV X
Where X.Type
= "GLEntry"
Group by X.No_
Indexes are :
a) CREATE PRIMARY INDEX NAV_No_
ON NAV
USING GSI WITH {“nodes”:“172.16.47.62:8091”};
b) CREATE INDEX idx_GLE_Balance
ON NAV
(Balance
) USING GSI WITH {“nodes”:“172.16.47.62:8091”};
c) CREATE INDEX idx_GLE_Type
ON NAV
(Type
) WHERE (Type
= “GLEntry”) USING GSI WITH {“nodes”:“172.16.47.62:8091”};
Query I am running using workbench :
./launch-cbq-gui.sh
Running: ./cbq-gui -datastore=http://172.16.47.60:8091 -localPort=:8094 -user= -pass=
Launching query web service.
Using CB Server at: http://172.16.47.60:8091
Using N1QL query service on: 172.16.47.60:8093
Using memcached service on: 172.16.47.60:11210
Using web content at: ./static
Launching UI server, to use, point browser at http://localhost:8094
Hit enter to stop server:
=====
Just wanted to know if this is the best I can get from N1QL ? Please let me know your input @geraldss @keshav_m.
Hi @singh_siddhu,
I sent you another message yesterday with the info yesterday.
my skypeid is: rkeshavmurthy
Here’s the advice for you to get the best performance.
- CREATE INDEX idx_gle_type_balance ON NAV(No_, Balance) WHERE (Type = ‘GLEntry’) USING GSI;
Then you issue:
select
X.No_ AS No_,
IFNULL(Sum(X.Balance),0) as Balance
from NAV X
Where X.Type = "GLEntry"
AND X.Balance IS NOT MISSING
AND X.No_ IS NOT MISSING
Group by X.No_;
Remember, in this case, you’re doing the full index scan followed by grouping & aggregation.
The cost of executing this query is proportional to number of entries in the index.
Please post the EXPLAIN when you post the results next.
Same result.
Query execution time : 11.83 s
Explain Query :
[{
"#operator": “Sequence”,
"~children": [{
"#operator": “IntersectScan”,
“scans”: [{
"#operator": “IndexScan”,
“index”: “idx_gle_type_balance”,
“keyspace”: “NAV”,
“namespace”: “default”,
“spans”: [{
“Range”: {
“Inclusion”: 1,
“Low”: [
“null”,
“null”
]
}
}],
“using”: “gsi”
}, {
"#operator": “IndexScan”,
“index”: “idx_GLE_Type”,
“keyspace”: “NAV”,
“namespace”: “default”,
“spans”: [{
“Range”: {
“High”: [
"“GLEntry”"
],
“Inclusion”: 3,
“Low”: [
"“GLEntry”"
]
}
}],
“using”: “gsi”
}]
}, {
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [{
"#operator": “Fetch”,
“as”: “X”,
“keyspace”: “NAV”,
“namespace”: “default”
}, {
"#operator": “Filter”,
“condition”: “((((X
.Type
) = “GLEntry”) and ((X
.Balance
) is not missing)) and ((X
.No_
) is not missing))”
}, {
"#operator": “InitialGroup”,
“aggregates”: [
“sum((X
.Balance
))”
],
“group_keys”: [
"(X
.No_
)"
]
}]
}
}, {
"#operator": “IntermediateGroup”,
“aggregates”: [
“sum((X
.Balance
))”
],
“group_keys”: [
"(X
.No_
)"
]
}, {
"#operator": “FinalGroup”,
“aggregates”: [
“sum((X
.Balance
))”
],
“group_keys”: [
"(X
.No_
)"
]
}, {
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [{
"#operator": “InitialProject”,
“result_terms”: [{
“as”: “No_”,
“expr”: “(X
.No_
)”
}, {
“as”: “Balance”,
“expr”: “ifnull(sum((X
.Balance
)), 0)”
}]
}, {
"#operator": “FinalProject”
}]
}
}]
}]
Any thoughts here guys ? @keshav_m and @geraldss ? Same query with SQL server with ~ 3 million records is under 1 second .
Please execute the following query and you should see better performance.
select
X.No AS No,
IFNULL(Sum(X.Balance),0) as Balance
from NAV X USE INDEX (idx_gle_type_balance)
Where X.Type = "GLEntry"
AND X.Balance IS NOT MISSING
AND X.No_ IS NOT MISSING
Group by X.No_;
Please create the following index and try the query below it.
CREATE INDEX idx_gle_type_balance2 ON NAV(No_, Balance, Type) WHERE (Type = ‘GLEntry’) USING GSI;
select
X.No_
AS No_
,
IFNULL(Sum(X.Balance
),0) as Balance
from NAV X USE INDEX (idx_gle_type_balance2)
Where X.Type
= "GLEntry"
and X.Balance IS NOT MISSING
AND X.No_ IS NOT MISSING
Group by X.No_
----- Update from Siddu is, with the new index in place, query runs in about 1.7 seconds.