Indexing not working as expected

index
n1ql

#1

Hi,

Following is the document which I get after running explain on one of the query.

[
{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “DistinctScan”,
“scan”: {
"#operator": “IndexScan”,
“index”: “activity_pins”,
“index_id”: “2c66f03e9193a364”,
“keyspace”: “activity”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“atlas”"
],
“Inclusion”: 3,
“Low”: [
"“atlas”"
]
}
},
{
“Range”: {
“High”: [
"“review”"
],
“Inclusion”: 3,
“Low”: [
"“review”"
]
}
}
],
“using”: “gsi”
}
},
{
"#operator": “Fetch”,
“keyspace”: “activity”,
“namespace”: “default”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Unnest”,
“as”: “pin”,
“expr”: “(activity.pins)”
},
{
"#operator": “Filter”,
“condition”: “((((pin.pin_type) in [“review”, “atlas”]) and ((pin.place_id) = “MMI000”)) and ((pin.status) in [0, “0”]))”
},
{
"#operator": “InitialGroup”,
“aggregates”: [
“count((pin.pin_id))”,
“sum((pin.pin_rate))”
],
“group_keys”: [
"(pin.pin_type)"
]
}
]
}
},
{
"#operator": “IntermediateGroup”,
“aggregates”: [
“count((pin.pin_id))”,
“sum((pin.pin_rate))”
],
“group_keys”: [
"(pin.pin_type)"
]
},
{
"#operator": “FinalGroup”,
“aggregates”: [
“count((pin.pin_id))”,
“sum((pin.pin_rate))”
],
“group_keys”: [
"(pin.pin_type)"
]
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “InitialProject”,
“result_terms”: [
{
“as”: “total_pin”,
“expr”: “count((pin.pin_id))”
},
{
“as”: “total_rating”,
“expr”: “sum((pin.pin_rate))”
},
{
“expr”: “(pin.pin_type)”
}
]
},
{
"#operator": “FinalProject”
}
]
}
}
]
},
“text”: “SELECT COUNT(pin.pin_id) AS total_pin,SUM(pin.pin_rate) AS total_rating, pin.pin_type FROM activity UNNEST activity.pins AS pin WHERE pin.pin_type IN[‘review’,‘atlas’] AND \npin.place_id=‘MMI000’ AND pin.status IN [0,‘0’] GROUP BY pin.pin_type;”
}
]
This query is taking some where aroung 200ms what should I do to optimize it, I have already done indexing on one of the field. what more should be done to optimize it.


#2

The Plan is right based on the index you have. Try the following Index.

CREATE INDEX ix1 ON activity (DISTINCT ARRAY pin.place_id FOR pin IN pins);


#3

@vsr1 I have already applied indexing on pin.pin_type as you can see, If I apply indexing on place_id as suggested by you will both indexing will be considered for fetching data.


#4

It already fetching data ("#operator": “Fetch”).

When using ARRAY indexing predicates on multiple fields you can only push one of them others will be applied post scan/fetch.

If pin.pin_type has many items indexscan produce many items and require fetch all of them.
If pin.place_id is less items it will do better because indexscan do lot of elimination

You need to try which one works for your data model and create that index and drop other one or USE INDEX


#5

@vsr1 thanks a lot that has cleared a lot things.


#6

@vsr1 as suggested by you I applied indexing on pin_type but following query is taking 500ms

SELECT pin FROM activity UNNEST activity.pins AS pin WHERE pin.pin_type IN [“atlas”,“review”,“report”] AND pin.statusIN[1,“1”]
AND META(activity).id NOT IN [“mustanish123”] ORDER BY pin.pin_create_date DESC LIMIT 10 OFFSET 0

Explain for same is below

{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Sequence”,
"~children": [
{
"#operator": “IntersectScan”,
“scans”: [
{
"#operator": “DistinctScan”,
“scan”: {
"#operator": “IndexScan2”,
“index”: “#activity”,
“index_id”: “95fb99b21f9ad6f4”,
“keyspace”: “activity”,
“namespace”: “default”,
“spans”: [
{
“exact”: true,
“range”: [
{
“high”: ““mustanish123"”,
“inclusion”: 0,
“low”: “null”
}
]
},
{
“exact”: true,
“range”: [
{
“inclusion”: 0,
“low”: ““mustanish123"”
}
]
}
],
“using”: “gsi”
}
},
{
#operator”: “DistinctScan”,
“scan”: {
"#operator": “IndexScan2”,
“index”: “activity_pins”,
“index_id”: “2c66f03e9193a364”,
“index_projection”: {
“primary_key”: true
},
“keyspace”: “activity”,
“namespace”: “default”,
“spans”: [
{
“exact”: true,
“range”: [
{
“high”: ““atlas””,
“inclusion”: 3,
“low”: ““atlas””
}
]
},
{
“exact”: true,
“range”: [
{
“high”: ““report””,
“inclusion”: 3,
“low”: ““report””
}
]
},
{
“exact”: true,
“range”: [
{
“high”: ““review””,
“inclusion”: 3,
“low”: ““review””
}
]
}
],
“using”: “gsi”
}
}
]
},
{
"#operator": “Fetch”,
“keyspace”: “activity”,
“namespace”: “default”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Unnest”,
“as”: “pin”,
“expr”: “(activity.pins)”
}
]
}
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “((((pin.pin_type) in [“atlas”, “review”, “report”]) and ((pin.status) in [1, “1”])) and (not ((meta(activity).id) in [“mustanish123”])))”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “pin
}
]
}
]
}
}
]
},
{
"#operator": “Order”,
“limit”: “10”,
“sort_terms”: [
{
“desc”: true,
“expr”: “(pin.pin_create_date)”
}
]
},
{
"#operator": “Limit”,
“expr”: “10”
},
{
"#operator": “FinalProject”
}
]
},
“text”: “SELECT pin FROM activity UNNEST activity.pins AS pin WHERE pin.pin_type IN [“atlas”,“review”,“report”] AND pin.statusIN[1,“1”] \nAND META(activity).id NOT IN [“mustanish123”] ORDER BY pin.pin_create_date DESC LIMIT 10 OFFSET 0”
}


#7

It using IntersectScan. You should avoid it by dropping one of the index or USE INDEX.

Please go through https://blog.couchbase.com/optimize-n1ql-performance-using-request-profiling/ and checkout which one taking time and optimize accordingly.


#8

Hi,

Please see below json it does not have IntersectScan in it.

{
“plan”: {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Sequence”,
"~children": [
{
"#operator": “DistinctScan”,
“scan”: {
"#operator": “IndexScan2”,
“index”: “activity_pins”,
“index_id”: “73d99617ce9c4d01”,
“index_projection”: {
“primary_key”: true
},
“keyspace”: “activity”,
“namespace”: “default”,
“spans”: [
{
“exact”: true,
“range”: [
{
“high”: ““atlas””,
“inclusion”: 3,
“low”: ““atlas””
}
]
},
{
“exact”: true,
“range”: [
{
“high”: ““report””,
“inclusion”: 3,
“low”: ““report””
}
]
},
{
“exact”: true,
“range”: [
{
“high”: ““review””,
“inclusion”: 3,
“low”: ““review””
}
]
}
],
“using”: “gsi”
}
},
{
"#operator": “Fetch”,
“keyspace”: “activity”,
“namespace”: “default”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Unnest”,
“as”: “pin”,
“expr”: “(activity.pins)”
}
]
}
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Filter”,
“condition”: “(((not ((pin.user_name) = “mustanish123”)) and ((pin.pin_type) in [“atlas”, “review”, “report”])) and ((pin.status) in [1, “1”]))”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“expr”: “pin
},
{
“as”: “distance”,
“expr”: “(6371 * acos((((cos(radians(“28.549056336607507”)) * cos(radians((pin.place_lat)))) * cos((radians((pin.place_long)) - radians(“77.2689280000161”)))) + (sin(radians(“28.549056336607507”)) * sin(radians((pin.place_lat)))))))”
}
]
}
]
}
}
]
},
{
"#operator": “Order”,
“limit”: “10”,
“sort_terms”: [
{
“desc”: true,
“expr”: “(pin.pin_create_date)”
}
]
},
{
"#operator": “Limit”,
“expr”: “10”
},
{
"#operator": “FinalProject”
}
]
},
“text”: “SELECT pin,(6371*acos(cos(radians(‘28.549056336607507’))*cos(radians(pin.place_lat))cos(radians(pin.place_long)-radians(‘77.2689280000161’))+sin(radians(‘28.549056336607507’))\nsin(radians(pin.place_lat)))) AS distance FROM activity UNNEST activity.pins AS pin WHERE pin.user_name!=‘mustanish123’ AND pin.pin_type IN [“atlas”,“review”,“report”] \nAND pin.statusIN[1,“1”] ORDER BY pin.pin_create_date DESC LIMIT 10 OFFSET 0”
}

I have more 158,125 documents in this bucket and this query is taking more than 500 ms


#9

When Query has order by it needs to produce all qualifying documents and do sort it takes time. You can try with out order by it will be very fast. Also follow above link if you need further optimizations


#10

@vsr1 Thanks a lot that helped me solved my problem…