Query on Index fields are running slow

index

#21

@vgupta, you need to have at least 16 cores for max-parallelism set to 16.


#22

Try this.

CREATE INDEX ind11_cn_part ON QUEST_DEV_USER_PROFILE (( compacted . 63 )) PARTITION BY hash(MTEA().id) WITH { “num_partition”:16 }


#23

Hi vsr1,

So i created new index with hash(meta().id) with 16 partition and ran the below query to use this particular index . The result is coming in about 3 sec.

Query :
select count(*) from QUEST_DEV_USER_PROFILE USE INDEX (ind11_cn_part_16 using gsi) where compacted.63 = ‘US’

Plan :

{
#operator”: “Sequence”,
#stats”: {
#phaseSwitches”: 1
},
“~children”: [
{
#operator”: “Authorize”,
#stats”: {
#phaseSwitches”: 3,
“servTime”: “3.0013ms”
},
“privileges”: {
“List”: [
{
“Target”: “default:QUEST_DEV_USER_PROFILE”,
“Priv”: 7
}
]
},
“~child”: {
#operator”: “Sequence”,
#stats”: {
#phaseSwitches”: 1
},
“~children”: [
{
#operator”: “IndexScan3”,
#stats”: {
#itemsOut”: 4,
#phaseSwitches”: 19,
“servTime”: “3.0627703s”
},
“covers”: [
“cover (((QUEST_DEV_USER_PROFILE.compacted).63))”,
“cover ((meta(QUEST_DEV_USER_PROFILE).id))”,
“cover (count())"
],
“index”: “ind11_cn_part_16”,
“index_group_aggs”: {
“aggregates”: [
{
“aggregate”: “COUNT”,
“expr”: “1”,
“id”: 2,
“keypos”: -1
}
],
“partial”: true
},
“index_id”: “670270ebc765ac20”,
“index_projection”: {
“entry_keys”: [
2
]
},
“keyspace”: “QUEST_DEV_USER_PROFILE”,
“namespace”: “default”,
“spans”: [
{
“exact”: true,
“range”: [
{
“high”: ““US””,
“inclusion”: 3,
“low”: ““US””
}
]
}
],
“using”: “gsi”,
#time_normal”: “00:03.0627”,
#time_absolute”: 3.0627703
},
{
#operator”: “Parallel”,
#stats”: {
#phaseSwitches”: 1
},
“copies”: 4,
“~child”: {
#operator”: “InitialGroup”,
#stats”: {
#itemsIn”: 4,
#itemsOut”: 4,
#phaseSwitches”: 28,
“kernTime”: “12.2510812s”
},
“aggregates”: [
"sum(cover (count(
)))”
],
“group_keys”: []
}
},
{
#operator”: “IntermediateGroup”,
#stats”: {
#itemsIn”: 4,
#itemsOut”: 1,
#phaseSwitches”: 13,
“kernTime”: “3.0627703s”
},
“aggregates”: [
“sum(cover (count()))"
],
“group_keys”: []
},
{
#operator”: “FinalGroup”,
#stats”: {
#itemsIn”: 1,
#itemsOut”: 1,
#phaseSwitches”: 7,
“kernTime”: “3.0627703s”
},
“aggregates”: [
"sum(cover (count(
)))”
],
“group_keys”: []
},
{
#operator”: “Parallel”,
#stats”: {
#phaseSwitches”: 1
},
“copies”: 4,
“~child”: {
#operator”: “Sequence”,
#stats”: {
#phaseSwitches”: 4
},
“~children”: [
{
#operator”: “InitialProject”,
#stats”: {
#itemsIn”: 1,
#itemsOut”: 1,
#phaseSwitches”: 23,
“kernTime”: “12.2510812s”
},
“result_terms”: [
{
“expr”: “sum(cover (count(*)))”
}
]
},
{
#operator”: “FinalProject”,
#stats”: {
#itemsIn”: 1,
#itemsOut”: 1,
#phaseSwitches”: 7
}
}
]
}
}
]
},
#time_normal”: “00:00.0030”,
#time_absolute”: 0.0030013
},
{
#operator”: “Stream”,
#stats”: {
#itemsIn”: 1,
#itemsOut”: 1,
#phaseSwitches”: 7,
“kernTime”: “3.0657716s”
}
}
],
“~versions”: [
“2.0.0-N1QL”,
“5.5.1-3511-enterprise”
]
}

Is this the max optimization I can achieve?


#24

How many cores on the machine? You’ll get the best performance when you have 16 cores.


#25

Right now I have only 4 cores on the machine. I will be getting more cores today. Will test again and update you guys.

One more questions, with just 40M of data, how much cores I will be needing to have such 30 different indexes and still getting results in ms?

Thanks and appreciate your help.


#26

Hi There,
I am running Couchbase 6.0 with linux.

My index scan is still running slow even after adding more cores clustering.

Cluster config:
4 nodes
Each node with 4 GB of RAM and 4 cores.
Disk Size : 100 GB.

I have created GSI with partition based on hash of id.
CREATE INDEX index_cn_us ON quest_agg_store(63) PARTITION BY hash((meta().Id)) WHERE (63 = “US”) WITH { “nodes”:[ “10.1.75.82:8091”,“10.1.75.83:8091”,“10.1.75.84:8091”,“10.1.75.85:8091” ], “num_partition”:8 }

And running the below query and its taking 19.21s
select count(*) from quest_agg_store where 63 = ‘US’

The one thing I observed that the swap % is 63 for all the nodes. So after understanding the below artilce, I changes the swap to 0 for all the nodes.
But still i am seeing same swap percentage for all the nodes and query is still taking seconds to return just the counts.
CPU is still under 15%.

Can you please help here ? Let me know if need more information.

Thanks.


#27

@vgupta, you can try to increase the query parallelism by doing the following setting change on the query service node.

curl http://<node_ip>:8093/admin/settings -d '{"max_parallelism":8}' -u user:password

What is the final count this query returns? (just to get an idea of how many documents need to be processed).


#28

I have already set the query parallelism to 8 from the UI tool.
The final counts is 10,357,766 .

Also, how is IndexCountScan is different from IndexScan and how we can get IndexCountScan ?

Here is the plan too.
{
“plan”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “IndexScan3”,
“covers”: [
“cover ((quest_agg_store.63))”,
“cover ((meta(quest_agg_store).id))”,
“cover (count())"
],
“filter_covers”: {
“cover ((quest_agg_store.63))”: “US”
},
“index”: “index_cn_us”,
“index_group_aggs”: {
“aggregates”: [
{
“aggregate”: “COUNT”,
“expr”: “1”,
“id”: 2,
“keypos”: -1
}
],
“partial”: true
},
“index_id”: “9c3e5aa9a66512fe”,
“index_projection”: {
“entry_keys”: [
2
]
},
“keyspace”: “quest_agg_store”,
“namespace”: “default”,
“spans”: [
{
“exact”: true,
“range”: [
{
“high”: ““US””,
“inclusion”: 3,
“low”: ““US””
}
]
}
],
“using”: “gsi”
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “InitialGroup”,
“aggregates”: [
"sum(cover (count(
)))”
],
“group_keys”:
}
]
}
},
{
#operator”: “IntermediateGroup”,
“aggregates”: [
“sum(cover (count()))"
],
“group_keys”: []
},
{
#operator”: “FinalGroup”,
“aggregates”: [
"sum(cover (count(
)))”
],
“group_keys”:
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “InitialProject”,
“result_terms”: [
{
“as”: “count”,
“expr”: “sum(cover (count()))"
}
]
},
{
#operator”: “FinalProject”
}
]
}
}
]
},
“text”: "select COUNT(
) AS count from quest_agg_store where 63 = ‘US’”
}

Thanks.


#29

This is best plan. Checkout https://blog.couchbase.com/understanding-index-grouping-aggregation-couchbase-n1ql-query/


#30

Thanks for sharing the link. I went through
If what I am getting is the best plan , so does it means that the lateness of “19.21s” to return the count is acceptable or more can be done to make it fast?


#31

For an index with 10M entries and 100% resident(8 partitions), with our in-house tests I see the below numbers for a simple count query:

  1. Standard GSI - 8-10 seconds
  2. Memory optimized - 3-4 seconds.

You can check if your index is 100% resident.


#32

Will increasing more memory to the nodes improve the performance and how much?


#33

If the index is not 100% resident, then it should help to give more memory. You can check the resident percent in UI graphs and accordingly increase the memory quota assigned to index service in Settings.


#34

Right now only 10% of data is in resident. I cannot ask for more memory from the IT team but rather can remove 1/4 of records and test again. I will share the result once done.