I have a query like this, where metadata.another_field is the key for a global secondary index I have defined:
SELECT metadata.some_field FROM MY_BUCKET WHERE metadata.another_field IN ["KEY1", "KEY2"]
I confirmed from the explain
statement that it is using intended index, and the query works fine. When I explain
this query I note this pattern:
"spans": [
{
"Range": {
"High": [
"\"KEY1\""
],
"Inclusion": 3,
"Low": [
"\"KEY1\""
]
},
"Seek": null
},
{
"Range": {
"High": [
"\"KEY2\""
],
"Inclusion": 3,
"Low": [
"\"KEY2\""
]
},
"Seek": null
}
],
"using": "gsi"
}
]
However, if I supply more than about 17 keys, the spans
component changes to:
"spans": [
{
"Range": {
"High": null,
"Inclusion": 1,
"Low": [
"null"
]
},
"Seek": null
}
],
And then a filter occurs on what appears to be the entire dataset.
As a result, fetching 10 items takes about 30ms, whereas fetching 100 items takes 30seconds (1000 times as long for 10 times the items).
Is there a better way to fetch items by specific keys from a global secondary index? Or is there a reason the query strategy changes (and become slow) if two many items are listed in the “IN” array?