I have a datastructure like this:
{
"ispublic": true,
"key": "187",
"name": "Moserne i Nikkelborg (v. Nørrekær)",
"points": [
{
"lat": 56.5084931472,
"lon": 8.9840306006
}
],
"revisioninfo": {
"created": "2019-03-28T16:31:19+0100",
"createdby": "John Dalsgaard/BA171123846CEBF1C1257CB2002DA330/Fangst",
"modifiedcount": 0
},
"type": "Lake"
}
I would like to count the geo points. I have 5 data types that contain these points arrays that could have one entry (like here) or more.
I use this query:
SELECT sum(ARRAY_COUNT(points)) AS size FROM data WHERE type IN ["Lake","Stream","CoastLocalArea","SeaLocalArea","PutTakeLake"]
I have created two index like this:
CREATE INDEX `def_count_points` ON `data`(`type`,array_count(`points`)) WHERE (`points` is valued)
I would expect the query to use that index… But it prefers this:
CREATE INDEX `def_type` ON `data`(`type`)
The Explain
looks like this:
{
"plan": {
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan3",
"index": "def_type",
"index_id": "941b5fdba8f0e7f8",
"index_projection": {
"primary_key": true
},
"keyspace": "data",
"namespace": "default",
"spans": [
{
"exact": true,
"range": [
{
"high": "\"CoastLocalArea\"",
"inclusion": 3,
"low": "\"CoastLocalArea\""
}
]
},
{
"exact": true,
"range": [
{
"high": "\"Lake\"",
"inclusion": 3,
"low": "\"Lake\""
}
]
},
{
"exact": true,
"range": [
{
"high": "\"PutTakeLake\"",
"inclusion": 3,
"low": "\"PutTakeLake\""
}
]
},
{
"exact": true,
"range": [
{
"high": "\"SeaLocalArea\"",
"inclusion": 3,
"low": "\"SeaLocalArea\""
}
]
},
{
"exact": true,
"range": [
{
"high": "\"Stream\"",
"inclusion": 3,
"low": "\"Stream\""
}
]
}
],
"using": "gsi"
},
{
"#operator": "Fetch",
"keyspace": "data",
"namespace": "default"
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Filter",
"condition": "((`data`.`type`) in [\"Lake\", \"Stream\", \"CoastLocalArea\", \"SeaLocalArea\", \"PutTakeLake\"])"
},
{
"#operator": "InitialGroup",
"aggregates": [
"sum(array_count((`data`.`points`)))"
],
"group_keys": []
}
]
}
},
{
"#operator": "IntermediateGroup",
"aggregates": [
"sum(array_count((`data`.`points`)))"
],
"group_keys": []
},
{
"#operator": "FinalGroup",
"aggregates": [
"sum(array_count((`data`.`points`)))"
],
"group_keys": []
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "InitialProject",
"result_terms": [
{
"as": "size",
"expr": "sum(array_count((`data`.`points`)))"
}
]
},
{
"#operator": "FinalProject"
}
]
}
}
]
},
"text": "SELECT sum(ARRAY_COUNT(points)) AS size FROM data WHERE type IN [\"Lake\",\"Stream\",\"CoastLocalArea\",\"SeaLocalArea\",\"PutTakeLake\"]"
}
Now, the reason I found it is that I have optimized the def_type
index to not include some activity logs that I would never query from inside the system - so I thought it was better to take them out of the index (at present there are 230K of these log documents). Therefore, I changed the def_type
index to this:
CREATE INDEX `def_type` ON `data`(`type`) WHERE (not (`type` = "ActivityLog"))
However, with the above count statement I now get:
[
{
"code": 4000,
"msg": "No index available on keyspace data that matches your query. Use CREATE INDEX or CREATE PRIMARY INDEX to create an index, or check that your expected index is online.",
"query_from_user": "SELECT sum(ARRAY_COUNT(points)) AS size FROM data WHERE points is valued"
}
]
I tried to play with a couple of variations:
SELECT sum(ARRAY_COUNT(points)) AS size FROM data WHERE type='Lake'
This one works - though of course it only gives me the points for lakes.
SELECT sum(ARRAY_COUNT(points)) AS size FROM data WHERE type IN ['Lake']
This one fails with the above “no index…” message.
SELECT sum(ARRAY_COUNT(points)) AS size FROM data WHERE points is valued
This one also fails with same “no index…” message.
I have also tried to create the count index like this:
CREATE INDEX `def_count_points` ON `data`(`type`,`points`) WHERE (`points` is valued)
But that isn’t used either - so the above results are the exact same
Can any one spread a little light on where my misinterpretation of the index usage lies?
I’m on a community server: Community Edition 6.0.0 build 1693