Range search inside arrays

Hello,

A subset of my document is as below, and this is used to store some file sizes that have been uploaded from the client.

{
  "id": 13400,
  "packet": {
    "previous": [{
        "size": 1000,
        "platform": "IOS",
        "time": "2016-12-02T12:03:05Z"
      }, {
        "size": 1500,
        "platform": "IOS",
        "time": "2016-12-03T18:23:15Z"
      }
    ]
  }
}

For the example I’ve just given some rounded size values, but these range in size from 0 to around 500000.

I created a covering index on size, and it is really fast to return the exact match values from millions of documents (SELECT WHERE ANY IN) However I don’t know what to do to increase performance when looking for entries where the size is being range checked (size >= or size > or size <= etc or combinations of it) This can take a minute or so to complete.

The same applies for the date searches. Looking at the couch videos there are some suggestions on storing the date as epoch or an array, but it ends up with the same poor performance on range queries as the size field.

Can post an EXPLAIN and other info as needed.

You need to use an array index.

http://developer.couchbase.com/documentation/server/current/n1ql/n1ql-language-reference/indexing-arrays.html

http://blog.couchbase.com/2016/march/making-the-most-of-your-arrays…-with-array-indexing

This is the index I created that I didn’t put in my original post

 CREATE INDEX idx_size ON `iaps`
 ( DISTINCT ARRAY (DISTINCT ARRAY s FOR s IN p.previous END) FOR p IN packet END, id)

And here is my query:

SELECT id FROM iaps
WHERE ANY p IN packet SATISFIES (ANY s IN p.previous SATISFIES s.size = 100 END) END

The query changes with s.size and using >=, > etc. As mentioned earlier the = query is quick, but the range can take upto a minute to complete. Even if just doing a COUNT(1)

Am I going about this the right way? Or is there a more efficient way to do the query when it comes to range searching?

[
{
“plan”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “DistinctScan”,
“scan”: {
#operator”: “IndexScan”,
“index”: “idx_size”,
“index_id”: “9e968dd24f56958d”,
“keyspace”: “iaps”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“Inclusion”: 0,
“Low”: [
“null”
]
}
}
],
“using”: “gsi”
}
},
{
#operator”: “Fetch”,
“keyspace”: “iaps”,
“namespace”: “default”
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Filter”,
“condition”: “any p in (iaps.packet) satisfies any s in (p.previous) satisfies ((s.size) = 100) end end”
},
{
#operator”: “InitialProject”,
“result_terms”: [
{
“expr”: “(iaps.id)”
}
]
},
{
#operator”: “FinalProject”
}
]
}
}
]
},
“text”: “SELECT id FROM iaps\r\nWHERE ANY p IN packet SATISFIES (ANY s IN p.previous SATISFIES s.size = 100 END) END”
}
]

Two things.

(1) Your EXPLAIN output does not match the query you posted.

(2) We have added a fix to make query query/index combination work as a covering index. The fix is currently on nightly builds. Will be released next year.

ok, let me double check what I pasted there and update it.

In your experience, when looking through specific entries in a nested array is the double statement ANY query a better solution

WHERE ANY p IN packet SATISFIES (ANY s IN p.previous SATISFIES s.size = 100 END) END

or it is better to use another query (with an appropriate index to try to cover it).

WHERE ANY p IN packet.previous SATISFIES p.size = 100 END

Well, is packet an array or an object? Only one of those two predicates will return the correct results.