After upgrading to 5.1.1 community edition,

node
n1ql
index

#1

CONFIGURATION:
We are running a 3 node cluster of 5.1.1 community. We have copies of all indexes (w/different names) on each node.

PROBLEM:
After a change to any index, some queries properly do an index DistinctScan, they find the correct index, and then run in milliseconds (as they should).

After a few hours, they begin to do IntersectScans and use another index, along with the proper index. This causes the queries to take minutes to run.

*This was never a problem when we were running a single node cluster with 4.6 community.

Any ideas as to what is going on? Any helps appreciated!

Kevin


#2

Please post the both plans and index definitions.


#3

GOOD RESULTS (works for a few hours):

SELECT DISTINCT META(per).id AS id, per.nameFirst, per.nameLast, per.dateOfBirth, SUBSTR(per.ssn, -4) AS ssnl4, per.client_ids, per.address[0].address1 || ’ ’ || per.address[0].address2 || ’ ’ || CASE WHEN per.address[0].state_id = ‘’ THEN per.address[0].zip ELSE st.display || ’ ’ || per.address[0].zip END AS primaryAddress, personClientDisplay.name as per_client FROM dec_db_dec per LEFT JOIN dec_db_common st ON KEYS first x.state_id for x in per.address end LEFT JOIN dec_db_common personClientDisplay ON KEYS first x for x in per.client_ids end
WHERE per.type = ‘person’ AND (per.nameLast) is not missing AND ANY array_element IN SUFFIXES(LOWER(per.nameLast)) SATISFIES array_element LIKE ‘fassett%’ END
ORDER BY per_client desc

CORRECT INDEX:
CREATE INDEX autocomplete_nameLastD_03 ON dec_db_dec
(type,(distinct (array array_element for array_element in suffixes(lower(nameLast)) end))) WHERE ((type = “person”) and (nameLast is not missing))

{
“plan”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Sequence”,
“~children”: [
{
#operator”: “DistinctScan”,
“scan”: {
#operator”: “IndexScan2”,
“index”: “autocomplete_nameLastD_03”,
“index_id”: “e3290b5e92558914”,
“index_projection”: {
“primary_key”: true
},
“keyspace”: “dec_db_dec”,
“namespace”: “default”,
“spans”: [
{
“exact”: true,
“range”: [
{
“high”: ““person””,
“inclusion”: 3,
“low”: ““person””
},
{
“high”: ““fassetu””,
“inclusion”: 1,
“low”: ““fassett””
}
]
}
],
“using”: “gsi”
}
},
{
#operator”: “Fetch”,
“as”: “per”,
“keyspace”: “dec_db_dec”,
“namespace”: “default”
},
{
#operator”: “Join”,
“as”: “st”,
“keyspace”: “dec_db_common”,
“namespace”: “default”,
“on_keys”: “first (x.state_id) for x in (per.address) end”,
“outer”: true
},
{
#operator”: “Join”,
“as”: “personClientDisplay”,
“keyspace”: “dec_db_common”,
“namespace”: “default”,
“on_keys”: “first x for x in (per.client_ids) end”,
“outer”: true
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Filter”,
“condition”: “((((per.type) = “person”) and ((per.nameLast) is not missing)) and any array_element in suffixes(lower((per.nameLast))) satisfies (array_element like “fassett%”) end)”
},
{
#operator”: “InitialProject”,
“distinct”: true,
“result_terms”: [
{
“as”: “id”,
“expr”: “(meta(per).id)”
},
{
“expr”: “(per.nameFirst)”
},
{
“expr”: “(per.nameLast)”
},
{
“expr”: “(per.dateOfBirth)”
},
{
“as”: “ssnl4”,
“expr”: “substr0((per.ssn), (-4))”
},
{
“expr”: “(per.client_ids)”
},
{
“as”: “primaryAddress”,
“expr”: "(((((((per.address)[0]).address1) || " ") || (((per.address)[0]).address2)) || " ") || case when ((((per.address)[0]).state_id) = “”) then (((per.address)[0]).zip) else (((st.display) || " “) || (((per.address)[0]).zip)) end)”
},
{
“as”: “per_client”,
“expr”: “(personClientDisplay.name)”
}
]
},
{
#operator”: “Distinct”
}
]
}
},
{
#operator”: “Distinct”
}
]
},
{
#operator”: “Order”,
“sort_terms”: [
{
“desc”: true,
“expr”: “per_client
}
]
},
{
#operator”: “FinalProject”
}
]
},
“text”: “SELECT DISTINCT META(per).id AS id, per.nameFirst, per.nameLast, per.dateOfBirth, SUBSTR(per.ssn, -4) AS ssnl4, per.client_ids, per.address[0].address1 || ’ ’ || per.address[0].address2 || ’ ’ || CASE WHEN per.address[0].state_id = ‘’ THEN per.address[0].zip ELSE st.display || ’ ’ || per.address[0].zip END AS primaryAddress, personClientDisplay.name as per_client FROM dec_db_dec per LEFT JOIN dec_db_common st ON KEYS first x.state_id for x in per.address end LEFT JOIN dec_db_common personClientDisplay ON KEYS first x for x in per.client_ids end \nWHERE per.type = ‘person’ AND (per.nameLast) is not missing AND ANY array_element IN SUFFIXES(LOWER(per.nameLast)) SATISFIES array_element LIKE ‘fassett%’ END \nORDER BY per_client desc”
}


#4

BAD RESULTS (same query after a few hours):

SELECT DISTINCT META(per).id AS id, per.nameFirst, per.nameLast, per.dateOfBirth, SUBSTR(per.ssn, -4) AS ssnl4, per.client_ids, per.address[0].address1 || ’ ’ || per.address[0].address2 || ’ ’ || CASE WHEN per.address[0].state_id = ‘’ THEN per.address[0].zip ELSE st.display || ’ ’ || per.address[0].zip END AS primaryAddress, personClientDisplay.name as per_client FROM dec_db_dec per LEFT JOIN dec_db_common st ON KEYS first x.state_id for x in per.address end LEFT JOIN dec_db_common personClientDisplay ON KEYS first x for x in per.client_ids end
WHERE per.type = ‘person’ AND (per.nameLast) is not missing AND ANY array_element IN SUFFIXES(LOWER(per.nameLast)) SATISFIES array_element LIKE ‘fassett%’ END
ORDER BY per_client desc

{
“plan”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Sequence”,
“~children”: [
{
#operator”: “IntersectScan”,
“scans”: [
{
#operator”: “DistinctScan”,
“scan”: {
#operator”: “IndexScan2”,
“index”: “autocomplete_nameLastD_03”,
“index_id”: “e3290b5e92558914”,
“index_projection”: {
“primary_key”: true
},
“keyspace”: “deco_db_deco”,
“namespace”: “default”,
“spans”: [
{
“exact”: true,
“range”: [
{
“high”: ““person””,
“inclusion”: 3,
“low”: ““person””
}
]
}
],
“using”: “gsi”
}
},
{
#operator”: “IndexScan2”,
“index”: “deco_type_index_03”,
“index_id”: “1495e4d298ed069c”,
“index_projection”: {
“primary_key”: true
},
“keyspace”: “deco_db_deco”,
“namespace”: “default”,
“spans”: [
{
“exact”: true,
“range”: [
{
“high”: ““person””,
“inclusion”: 3,
“low”: ““person””
}
]
}
],
“using”: “gsi”
}
]
},
{
#operator”: “Fetch”,
“as”: “per”,
“keyspace”: “deco_db_deco”,
“namespace”: “default”
},
{
#operator”: “Join”,
“as”: “st”,
“keyspace”: “deco_db_common”,
“namespace”: “default”,
“on_keys”: “first (x.state_id) for x in (per.address) end”,
“outer”: true
},
{
#operator”: “Join”,
“as”: “personClientDisplay”,
“keyspace”: “deco_db_common”,
“namespace”: “default”,
“on_keys”: “first x for x in (per.client_ids) end”,
“outer”: true
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Filter”,
“condition”: “((((per.type) = “person”) and ((per.nameLast) is not missing)) and any array_element in suffixes(lower((per.nameLast))) satisfies (array_element like “fassett%”) end)”
},
{
#operator”: “InitialProject”,
“distinct”: true,
“result_terms”: [
{
“as”: “id”,
“expr”: “(meta(per).id)”
},
{
“expr”: “(per.nameFirst)”
},
{
“expr”: “(per.nameLast)”
},
{
“expr”: “(per.dateOfBirth)”
},
{
“as”: “ssnl4”,
“expr”: “substr0((per.ssn), (-4))”
},
{
“expr”: “(per.client_ids)”
},
{
“as”: “primaryAddress”,
“expr”: "(((((((per.address)[0]).address1) || " ") || (((per.address)[0]).address2)) || " ") || case when ((((per.address)[0]).state_id) = “”) then (((per.address)[0]).zip) else (((st.display) || " “) || (((per.address)[0]).zip)) end)”
},
{
“as”: “per_client”,
“expr”: “(personClientDisplay.name)”
}
]
},
{
#operator”: “Distinct”
}
]
}
},
{
#operator”: “Distinct”
}
]
},
{
#operator”: “Order”,
“sort_terms”: [
{
“desc”: true,
“expr”: “per_client
}
]
},
{
#operator”: “FinalProject”
}
]
},
“text”: “SELECT DISTINCT META(per).id AS id, per.nameFirst, per.nameLast, per.dateOfBirth, SUBSTR(per.ssn, -4) AS ssnl4, per.client_ids, per.address[0].address1 || ’ ’ || per.address[0].address2 || ’ ’ || CASE WHEN per.address[0].state_id = ‘’ THEN per.address[0].zip ELSE st.display || ’ ’ || per.address[0].zip END AS primaryAddress, personClientDisplay.name as per_client \nFROM deco_db_deco per\nLEFT JOIN deco_db_common st ON KEYS first x.state_id for x in per.address end LEFT JOIN deco_db_common personClientDisplay ON KEYS first x for x in per.client_ids end \nWHERE per.type = ‘person’ AND (per.nameLast) is not missing AND ANY array_element IN SUFFIXES(LOWER(per.nameLast)) SATISFIES array_element LIKE ‘fassett%’ END \nORDER BY per_client desc”
}


#5

When the index scan happens, here is the definition of the other index that it should not be scanning:

CREATE INDEX dec_type_index_02 ON dec_db_dec(type)


#6

It should have detected dec_type is subset of other index and not do IntersectScan.
You can drop the dec_type_index_02.
Also you can try with following index or Latest 5.50

CREATE INDEX autocomplete_nameLastD_03 ON dec_db_dec (distinct (array array_element for array_element in suffixes(lower(nameLast)) end))
WHERE type = "person" AND nameLast IS NOT MISSING;

#7

Hi, I’m working with kivey on this issue. We can’t drop the dec_type_index_02 index without other problems. We also tried making the index modification that you recommended (removing type) and it did not do the trick. We also cannot seem to replicate the issue in our development environment. We are removing one node at a time to see if it is a specific node causing the issue. Any other ideas are welcome. Thanks.


#8

We can now reproduce the problem reliably using pillowfight to simulate a high volume of user activity.


#9

The index plan always starts out correctly using a distinct scan and only includes the array index that we want.

Then we hit the database with update and select n1ql queries to simulate load.

Then the index plan (for the same query) changes to an intersect scan and it brings in another index that takes minutes to execute.

We have simulated this issue on:
5.0.1 Community
5.1.1 Community
5.5.2 Enterprise

This does not happen on 4.6

I assume there has been a change in the way array indexes work, any advice on how to approach the problem is welcome.

--------------------DETAILS BELOW------------

QUERY:
SELECT DISTINCT META(per).id AS id, per.nameFirst, per.nameLast, per.client_ids, per.deceased
FROM deco_db_deco per
WHERE per.type = ‘person’ AND (per.nameLast) is not missing AND ANY array_element IN SUFFIXES(LOWER(per.nameLast)) SATISFIES array_element LIKE ‘smith%’ END

GOOD INDEX (this is the only index used immediately after stopping/starting the database):
CREATE INDEX autocomplete_nameLastD_03 ON deco_db_deco
(type,(distinct (array array_element for array_element in suffixes(lower(nameLast)) end)))
WHERE ((type = “person”) and (nameLast is not missing))

BAD INDEX (this is also included after some n1ql update/select load):
CREATE INDEX deco_type_index_03 ON deco_db_deco
(type)

INDEX PLANS BELOW:

INITIAL (GOOD) PLAN:
{
“plan”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “DistinctScan”,
“scan”: {
#operator”: “IndexScan3”,
“as”: “per”,
“index”: “autocomplete_nameLastD_03”,
“index_id”: “8517cc45dea8af82”,
“index_projection”: {
“primary_key”: true
},
“keyspace”: “deco_db_deco”,
“namespace”: “default”,
“spans”: [
{
“exact”: true,
“range”: [
{
“high”: ““person””,
“inclusion”: 3,
“low”: ““person””
},
{
“high”: ““smiti””,
“inclusion”: 1,
“low”: ““smith””
}
]
}
],
“using”: “gsi”
}
},
{
#operator”: “Fetch”,
“as”: “per”,
“keyspace”: “deco_db_deco”,
“namespace”: “default”
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Filter”,
“condition”: “((((per.type) = “person”) and ((per.nameLast) is not missing)) and any array_element in suffixes(lower((per.nameLast))) satisfies (array_element like “smith%”) end)”
},
{
#operator”: “InitialProject”,
“distinct”: true,
“result_terms”: [
{
“as”: “id”,
“expr”: “(meta(per).id)”
},
{
“expr”: “(per.nameFirst)”
},
{
“expr”: “(per.nameLast)”
},
{
“expr”: “(per.client_ids)”
},
{
“expr”: “(per.deceased)”
}
]
},
{
#operator”: “Distinct”
},
{
#operator”: “FinalProject”
}
]
}
},
{
#operator”: “Distinct”
}
]
},
“text”: “SELECT DISTINCT META(per).id AS id, per.nameFirst, per.nameLast, per.client_ids, per.deceased\nFROM deco_db_deco per \nWHERE per.type = ‘person’ AND (per.nameLast) is not missing AND ANY array_element IN SUFFIXES(LOWER(per.nameLast)) SATISFIES array_element LIKE ‘smith%’ END”
}

BAD PLAN (AFTER UPDATE/SELECT LOAD):
{
“plan”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “IntersectScan”,
“scans”: [
{
#operator”: “IndexScan3”,
“as”: “per”,
“index”: “deco_type_index_03”,
“index_id”: “3b9f6a819dbd212d”,
“index_projection”: {
“primary_key”: true
},
“keyspace”: “deco_db_deco”,
“namespace”: “default”,
“spans”: [
{
“exact”: true,
“range”: [
{
“high”: ““person””,
“inclusion”: 3,
“low”: ““person””
}
]
}
],
“using”: “gsi”
},
{
#operator”: “DistinctScan”,
“scan”: {
#operator”: “IndexScan3”,
“as”: “per”,
“index”: “autocomplete_nameLastY_03”,
“index_id”: “748628961184c17f”,
“index_projection”: {
“primary_key”: true
},
“keyspace”: “deco_db_deco”,
“namespace”: “default”,
“spans”: [
{
“exact”: true,
“range”: [
{
“high”: ““person””,
“inclusion”: 3,
“low”: ““person””
}
]
}
],
“using”: “gsi”
}
}
]
},
{
#operator”: “Fetch”,
“as”: “per”,
“keyspace”: “deco_db_deco”,
“namespace”: “default”
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Filter”,
“condition”: “((((per.type) = “person”) and ((per.nameLast) is not missing)) and any array_element in suffixes(lower((per.nameLast))) satisfies (array_element like “smith%”) end)”
},
{
#operator”: “InitialProject”,
“distinct”: true,
“result_terms”: [
{
“as”: “id”,
“expr”: “(meta(per).id)”
},
{
“expr”: “(per.nameFirst)”
},
{
“expr”: “(per.nameLast)”
},
{
“expr”: “(per.client_ids)”
},
{
“expr”: “(per.deceased)”
}
]
},
{
#operator”: “Distinct”
},
{
#operator”: “FinalProject”
}
]
}
},
{
#operator”: “Distinct”
}
]
},
“text”: “SELECT DISTINCT META(per).id AS id, per.nameFirst, per.nameLast, per.client_ids, per.deceased\nFROM deco_db_deco per \nWHERE per.type = ‘person’ AND (per.nameLast) is not missing AND ANY array_element IN SUFFIXES(LOWER(per.nameLast)) SATISFIES array_element LIKE ‘smith%’ END”
}


#10

This should not happen. If you can provide index definitions/query and exact steps repro will take look.


#11

Hi @vsr1,

I am working with @tom1 and @kivey. @tom1’s earlier post has been updated to include the index definitions. We are able to reproduce the issue on our development servers using cbc-n1qlback to stress test using a queryfile.


#12

Can you post cluster node configurations. no of Index nodes, query nodes. Is replica enabled or do you have duplicate indexes.
Also you mentioned "Then we hit the database with update ". Can you more specific what type of updates.

BAD PLAN using “index”: “autocomplete_nameLastY_03” what is definition of this index.

In mean time you can specify explicit USE INDEX

SELECT DISTINCT META(per).id AS id, per.nameFirst, per.nameLast, per.client_ids, per.deceased
FROM default per USE INDEX(autocomplete_nameLastD_03)
WHERE per.type = "person" AND (per.nameLast) is not missing
AND ANY array_element IN SUFFIXES(LOWER(per.nameLast)) SATISFIES array_element LIKE "smith%" END;