How to index this location query?

n1ql
query

#1

Hello!

I am using Couchbase 4.1 CE. I have finished converting a location query that would return me a feed of information based on some distances, supposedly minimizing the number of items it needs to run on by filtering by min/max lat and lng, ordered by a sum of weight (a pre-calculated one and another one handled in real-time).

Much probably the following query may perform much better, but since I don’t have such knowledge in Couchbase, it is being such a great challenge to work with such queries. I am all open to suggestions.

One of the issues I’ve noticed is probably the need for recalculation the same two passed values at least two times. Of course one another important one is the lack of indexes, that I’ve preferred to only create at the end.

Document feed (user_account::1::feed::26):

{
  "related_id": "item::5",
  "_type": "feed",
  "account_id": "user_account::1",
  "total_weight": 0,
  "created_at": 1474553980,
  "action_type": "EXAMPLE_ACTION",
  "_id": 26
}

Related document (item::5):

{
  "title": "Testing",+
  "type": "ITEM_TYPE",
  "location": [
    {
      "lat": 38.7628539,
      "lng": -9.1170388
    }
  ],
  "created_by": "user_account::1",
  "_id": 5,
  "_type": "item",
  "created_at": 1474543977
}

My current query:

SELECT itemEntry, relatedItemDetails, distance, weight_distance, itemEntry.total_weight  FROM (
    SELECT itemEntry, relatedItemDetails, distance, weight_distance
    FROM default itemEntry
    NEST default relatedItemDetails ON KEYS (itemEntry.related_id)
        
    /* get weight based on distance, and sum the ALIAS of it to total_weight when ordering */
    LET distance = CASE
           WHEN relatedItemDetails[0].location[0] IS NOT MISSING AND
                relatedItemDetails[0].location[0].lat BETWEEN -0.89932160591873 AND 0.89932160591873 AND
                relatedItemDetails[0].location[0].lng BETWEEN -0.89932160591873 AND 0.89932160591873
           THEN (acos(sin( 0 )*sin(radians(relatedItemDetails[0].location[0].lat)) + cos( 0 )*cos(radians(relatedItemDetails[0].location[0].lat))*cos(radians(relatedItemDetails[0].location[0].lng)-( 0 ))) * 6371)
           ELSE null
           END,
        weight_distance = CASE
           WHEN relatedItemDetails[0].location[0] IS NOT MISSING AND
                relatedItemDetails[0].location[0].lat BETWEEN -0.89932160591873 AND 0.89932160591873 AND
                relatedItemDetails[0].location[0].lng BETWEEN -0.89932160591873 AND 0.89932160591873
           THEN ((acos(sin( 0 )*sin(radians(relatedItemDetails[0].location[0].lat)) + cos( 0 )*cos(radians(relatedItemDetails[0].location[0].lat))*cos(radians(relatedItemDetails[0].location[0].lng)-( 0 ))) * 6371) / 100)
                -
                (acos(sin( 0 )*sin(radians(relatedItemDetails[0].location[0].lat)) + cos( 0 )*cos(radians(relatedItemDetails[0].location[0].lat))*cos(radians(relatedItemDetails[0].location[0].lng)-( 0 ))) * 6371)
           ELSE 0
           END

    WHERE itemEntry._type = "feed" AND
    itemEntry.account_id = user_account::5
) feedEntry 

WHERE (feedEntry.distance <= 100 OR feedEntry.distance IS NULL)

EXPLAIN of the current SELECT:

[0] => stdClass Object
(
    [#operator] => Sequence
    [~children] => Array
        (
            [0] => stdClass Object
                (
                    [#operator] => Sequence
                    [~children] => Array
                        (
                            [0] => stdClass Object
                                (
                                    [#operator] => Sequence
                                    [~children] => Array
                                        (
                                            [0] => stdClass Object
                                                (
                                                    [#operator] => IndexScan
                                                    [index] => idx_type_accountid
                                                    [keyspace] => default
                                                    [namespace] => default
                                                    [spans] => Array
                                                        (
                                                            [0] => stdClass Object
                                                                (
                                                                    [Range] => stdClass Object
                                                                        (
                                                                            [High] => Array
                                                                                (
                                                                                    [0] => "feed"
                                                                                    [1] => $dkAccountId
                                                                                )

                                                                            [Inclusion] => 3
                                                                            [Low] => Array
                                                                                (
                                                                                    [0] => "feed"
                                                                                    [1] => $dkAccountId
                                                                                )

                                                                        )

                                                                )

                                                        )

                                                    [using] => gsi
                                                )

                                            [1] => stdClass Object
                                                (
                                                    [#operator] => Parallel
                                                    [~child] => stdClass Object
                                                        (
                                                            [#operator] => Sequence
                                                            [~children] => Array
                                                                (
                                                                    [0] => stdClass Object
                                                                        (
                                                                            [#operator] => Fetch
                                                                            [as] => itemEntry
                                                                            [keyspace] => default
                                                                            [namespace] => default
                                                                        )

                                                                    [1] => stdClass Object
                                                                        (
                                                                            [#operator] => Nest
                                                                            [as] => relatedItemDetails
                                                                            [keyspace] => default
                                                                            [namespace] => default
                                                                            [on_keys] => (`itemEntry`.`related_id`)
                                                                        )

                                                                    [2] => stdClass Object
                                                                        (
                                                                            [#operator] => Let
                                                                            [bindings] => Array
                                                                                (
                                                                                    [0] => stdClass Object
                                                                                        (
                                                                                            [expr] => case when ((((((`relatedItemDetails`[0]).`location`)[0]) is not missing) and (((((`relatedItemDetails`[0]).`location`)[0]).`lat`) between $minLat and $maxLat)) and (((((`relatedItemDetails`[0]).`location`)[0]).`lng`) between $minLng and $maxLng)) then (acos(((sin($lat) * sin(radians(((((`relatedItemDetails`[0]).`location`)[0]).`lat`)))) + ((cos($lat) * cos(radians(((((`relatedItemDetails`[0]).`location`)[0]).`lat`)))) * cos((radians(((((`relatedItemDetails`[0]).`location`)[0]).`lng`)) - $lng))))) * $earthRadius) else null end
                                                                                            [variable] => distance
                                                                                        )

                                                                                    [1] => stdClass Object
                                                                                        (
                                                                                            [expr] => case when ((((((`relatedItemDetails`[0]).`location`)[0]) is not missing) and (((((`relatedItemDetails`[0]).`location`)[0]).`lat`) between $minLat and $maxLat)) and (((((`relatedItemDetails`[0]).`location`)[0]).`lng`) between $minLng and $maxLng)) then (((acos(((sin($lat) * sin(radians(((((`relatedItemDetails`[0]).`location`)[0]).`lat`)))) + ((cos($lat) * cos(radians(((((`relatedItemDetails`[0]).`location`)[0]).`lat`)))) * cos((radians(((((`relatedItemDetails`[0]).`location`)[0]).`lng`)) - $lng))))) * $earthRadius) / 100) - (acos(((sin($lat) * sin(radians(((((`relatedItemDetails`[0]).`location`)[0]).`lat`)))) + ((cos($lat) * cos(radians(((((`relatedItemDetails`[0]).`location`)[0]).`lat`)))) * cos((radians(((((`relatedItemDetails`[0]).`location`)[0]).`lng`)) - $lng))))) * $earthRadius)) else 0 end
                                                                                            [variable] => weight_distance
                                                                                        )

                                                                                )

                                                                        )

                                                                    [3] => stdClass Object
                                                                        (
                                                                            [#operator] => Filter
                                                                            [condition] => (((`itemEntry`.`_type`) = "feed") and ((`itemEntry`.`account_id`) = $dkAccountId))
                                                                        )

                                                                    [4] => stdClass Object
                                                                        (
                                                                            [#operator] => InitialProject
                                                                            [result_terms] => Array
                                                                                (
                                                                                    [0] => stdClass Object
                                                                                        (
                                                                                            [expr] => `itemEntry`
                                                                                        )

                                                                                    [1] => stdClass Object
                                                                                        (
                                                                                            [expr] => `relatedItemDetails`
                                                                                        )

                                                                                    [2] => stdClass Object
                                                                                        (
                                                                                            [expr] => `distance`
                                                                                        )

                                                                                    [3] => stdClass Object
                                                                                        (
                                                                                            [expr] => `weight_distance`
                                                                                        )

                                                                                )

                                                                        )

                                                                )

                                                        )

                                                )

                                        )

                                )

                            [1] => stdClass Object
                                (
                                    [#operator] => Alias
                                    [as] => feedEntry
                                )

                            [2] => stdClass Object
                                (
                                    [#operator] => Parallel
                                    [~child] => stdClass Object
                                        (
                                            [#operator] => Sequence
                                            [~children] => Array
                                                (
                                                    [0] => stdClass Object
                                                        (
                                                            [#operator] => Filter
                                                            [condition] => (((`feedEntry`.`distance`) <= $maxRadius) or ((`feedEntry`.`distance`) is null))
                                                        )

                                                    [1] => stdClass Object
                                                        (
                                                            [#operator] => InitialProject
                                                            [result_terms] => Array
                                                                (
                                                                    [0] => stdClass Object
                                                                        (
                                                                            [expr] => (`feedEntry`.`itemEntry`)
                                                                        )

                                                                    [1] => stdClass Object
                                                                        (
                                                                            [expr] => (`feedEntry`.`relatedItemDetails`)
                                                                        )

                                                                    [2] => stdClass Object
                                                                        (
                                                                            [expr] => (`feedEntry`.`distance`)
                                                                        )

                                                                    [3] => stdClass Object
                                                                        (
                                                                            [expr] => (`feedEntry`.`weight_distance`)
                                                                        )

                                                                    [4] => stdClass Object
                                                                        (
                                                                            [expr] => ((`feedEntry`.`itemEntry`).`total_weight`)
                                                                        )

                                                                )

                                                        )

                                                )

                                        )

                                )

                        )

                )

            [1] => stdClass Object
                (
                    [#operator] => Order
                    [sort_terms] => Array
                        (
                            [0] => stdClass Object
                                (
                                    [expr] => ((`feedEntry`.`total_weight`) + (`feedEntry`.`weight_distance`))
                                )

                            [1] => stdClass Object
                                (
                                    [desc] => 1
                                    [expr] => ((`feedEntry`.`itemEntry`).`created_at`)
                                )

                        )

                )

            [2] => stdClass Object
                (
                    [#operator] => FinalProject
                )

        )

)

I feel that the topic is a bit extent, but as this may be a core feature of our system, we really need to get it done well.

I am looking for your feedback!

Thanks


#2

Hi @ivoecpereira,

It looks like you are getting complete LEFT side document. Best index will be push the predicates to indexer.

create index ix1 on default( account_id) WHERE _type = “feed”;


#3

Hi @ivoecpereira,

If you follow my previous post on first calculating the bounding box and then issuing the query,
the index creation becomes simple. The right predicates gets pushed down to index scans making it very efficient.

CREATE INDEX idxbrlocation on `beer-sample`(RADIANS(geo.lat), RADIANS(geo.lon));
explain SELECT * FROM `beer-sample` 
WHERE (RADIANS(geo.lat) >= 0.648579006484 and RADIANS(geo.lat) <= 0.664257656866) 
and (RADIANS(geo.lon) >= -2.14582984831 and RADIANS(geo.lon) <= -2.12603802887) ;
 
[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexScan",
          "index": "idxbrlocation",
          "index_id": "9f17ffe2b11db2e1",
          "keyspace": "beer-sample",
          "namespace": "default",
          "spans": [
            {
              "Range": {
                "High": [
                  "0.664257656866",
                  "-2.12603802887"
                ],
                "Inclusion": 3,
                "Low": [
                  "0.648579006484",
                  "-2.14582984831"
                ]
              }
            }
          ],
          "using": "gsi"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Fetch",
                "keyspace": "beer-sample",
                "namespace": "default"
              },
              {
                "#operator": "Filter",
                "condition": "(((0.648579006484 <= radians(((`beer-sample`.`geo`).`lat`))) and (radians(((`beer-sample`.`geo`).`lat`)) <= 0.664257656866)) and (((-2.14582984831) <= radians(((`beer-sample`.`geo`).`lon`))) and (radians(((`beer-sample`.`geo`).`lon`)) <= (-2.12603802887))))"
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "self",
                    "star": true
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "SELECT * FROM `beer-sample` \nWHERE (RADIANS(geo.lat) >= 0.648579006484 and RADIANS(geo.lat) <= 0.664257656866) \nand (RADIANS(geo.lon) >= -2.14582984831 and RADIANS(geo.lon) <= -2.12603802887) ;"
  }
]

#4

Hello,

After adapting my query based on your suggestion and inserting the index based on my schema, I was not really able to understand why my index was not used.

Let me start with the INDEX I’ve created:

CREATE INDEX idx_location ON `default`(RADIANS((location[0]).lat), RADIANS((location[0]).lng));

As it is an array of objects, I’ve indexed the first object lat and lng. Even tried without the array index, but without effects either.

The query I’ve ended up with:

SELECT itemEntry, relatedItemDetails, distance, weight_distance, itemEntry.total_weight  FROM (
    SELECT itemEntry, relatedItemDetails, distance, weight_distance
    FROM default itemEntry
    NEST default relatedItemDetails ON KEYS (itemEntry.related_id)

    LET distance = CASE
           WHEN (RADIANS(relatedItemDetails[0].location[0].lat) >= -0.0078393251908876
            and RADIANS(relatedItemDetails[0].location[0].lat) <= 0.0078393251908876)
            and (RADIANS(relatedItemDetails[0].location[0].lng) >= -0.0078393251908876
            and RADIANS(relatedItemDetails[0].location[0].lng) <= 0.0078393251908876)
             OR  acos(sin(0) * sin (relatedItemDetails[0].location[0].lat) + cos(0) * cos(relatedItemDetails[0].location[0].lat) * cos (relatedItemDetails[0].location[0].lng - 0)) <= 0.0078480615288024
           THEN (acos(sin( 0 )*sin(radians(relatedItemDetails[0].location[0].lat)) + cos( 0 )*cos(radians(relatedItemDetails[0].location[0].lat))*cos(radians(relatedItemDetails[0].location[0].lng)-( 0 ))) * 6371)
           ELSE null
           END,
        weight_distance = CASE
           WHEN (RADIANS(relatedItemDetails[0].location[0].lat) >= -0.0078393251908876
            and RADIANS(relatedItemDetails[0].location[0].lat) <= 0.0078393251908876)
            and (RADIANS(relatedItemDetails[0].location[0].lng) >= -0.0078393251908876
            and RADIANS(relatedItemDetails[0].location[0].lng) <= 0.0078393251908876)
             OR  acos(sin(0) * sin (relatedItemDetails[0].location[0].lat) + cos(0) * cos(relatedItemDetails[0].location[0].lat) * cos (relatedItemDetails[0].location[0].lng - 0)) <= 0.0078480615288024
           THEN ((acos(sin( 0 )*sin(radians(relatedItemDetails[0].location[0].lat)) + cos( 0 )*cos(radians(relatedItemDetails[0].location[0].lat))*cos(radians(relatedItemDetails[0].location[0].lng)-( 0 ))) * 6371) / 100) - (acos(sin( \$lat )*sin(radians(relatedItemDetails[0].location[0].lat)) + cos( 0 )*cos(radians(relatedItemDetails[0].location[0].lat))*cos(radians(relatedItemDetails[0].location[0].lng)-( 0 ))) * 6371)
           ELSE 0
           END

    WHERE itemEntry._type = "feed" AND
    itemEntry.account_id = "user_account::5"
) feedEntry 

WHERE (feedEntry.distance <= 100 OR feedEntry.distance IS NULL)

ORDER BY (feedEntry.total_weight + feedEntry.weight_distance), itemEntry.created_at DESC

The EXPLAIN of the query above:

{
    "requestID": "be8b374d-9c18-4d94-8854-80e09445a2d2",
    "signature": "json",
    "results": [
        {
            "#operator": "Sequence",
            "~children": [
                {
                    "#operator": "Sequence",
                    "~children": [
                        {
                            "#operator": "Sequence",
                            "~children": [
                                {
                                    "#operator": "IndexScan",
                                    "index": "idx_type_accountid",
                                    "keyspace": "default",
                                    "namespace": "default",
                                    "spans": [
                                        {
                                            "Range": {
                                                "High": [
                                                    "\"feed\"",
                                                    "\"user_account::5\""
                                                ],
                                                "Inclusion": 3,
                                                "Low": [
                                                    "\"feed\"",
                                                    "\"user_account::5\""
                                                ]
                                            }
                                        }
                                    ],
                                    "using": "gsi"
                                },
                                {
                                    "#operator": "Parallel",
                                    "~child": {
                                        "#operator": "Sequence",
                                        "~children": [
                                            {
                                                "#operator": "Fetch",
                                                "as": "itemEntry",
                                                "keyspace": "default",
                                                "namespace": "default"
                                            },
                                            {
                                                "#operator": "Nest",
                                                "as": "relatedItemDetails",
                                                "keyspace": "default",
                                                "namespace": "default",
                                                "on_keys": "(`itemEntry`.`related_id`)"
                                            },
                                            {
                                                "#operator": "Let",
                                                "bindings": [
                                                    {
                                                        "expr": "case when (((((-0.0078393251908876) \u003c= radians(((((`relatedItemDetails`[0]).`location`)[0]).`lat`))) and (radians(((((`relatedItemDetails`[0]).`location`)[0]).`lat`)) \u003c= 0.0078393251908876)) and (((-0.0078393251908876) \u003c= radians(((((`relatedItemDetails`[0]).`location`)[0]).`lng`))) and (radians(((((`relatedItemDetails`[0]).`location`)[0]).`lng`)) \u003c= 0.0078393251908876))) or (acos(((sin(0) * sin(((((`relatedItemDetails`[0]).`location`)[0]).`lat`))) + ((cos(0) * cos(((((`relatedItemDetails`[0]).`location`)[0]).`lat`))) * cos((((((`relatedItemDetails`[0]).`location`)[0]).`lng`) - 0))))) \u003c= 0.0078480615288024)) then (acos(((sin(0) * sin(radians(((((`relatedItemDetails`[0]).`location`)[0]).`lat`)))) + ((cos(0) * cos(radians(((((`relatedItemDetails`[0]).`location`)[0]).`lat`)))) * cos((radians(((((`relatedItemDetails`[0]).`location`)[0]).`lng`)) - 0))))) * 6371) else null end",
                                                        "variable": "distance"
                                                    },
                                                    {
                                                        "expr": "case when (((((-0.0078393251908876) \u003c= radians(((((`relatedItemDetails`[0]).`location`)[0]).`lat`))) and (radians(((((`relatedItemDetails`[0]).`location`)[0]).`lat`)) \u003c= 0.0078393251908876)) and (((-0.0078393251908876) \u003c= radians(((((`relatedItemDetails`[0]).`location`)[0]).`lng`))) and (radians(((((`relatedItemDetails`[0]).`location`)[0]).`lng`)) \u003c= 0.0078393251908876))) or (acos(((sin(0) * sin(((((`relatedItemDetails`[0]).`location`)[0]).`lat`))) + ((cos(0) * cos(((((`relatedItemDetails`[0]).`location`)[0]).`lat`))) * cos((((((`relatedItemDetails`[0]).`location`)[0]).`lng`) - 0))))) \u003c= 0.0078480615288024)) then (((acos(((sin(0) * sin(radians(((((`relatedItemDetails`[0]).`location`)[0]).`lat`)))) + ((cos(0) * cos(radians(((((`relatedItemDetails`[0]).`location`)[0]).`lat`)))) * cos((radians(((((`relatedItemDetails`[0]).`location`)[0]).`lng`)) - 0))))) * 6371) / 100) - (acos(((sin($lat) * sin(radians(((((`relatedItemDetails`[0]).`location`)[0]).`lat`)))) + ((cos(0) * cos(radians(((((`relatedItemDetails`[0]).`location`)[0]).`lat`)))) * cos((radians(((((`relatedItemDetails`[0]).`location`)[0]).`lng`)) - 0))))) * 6371)) else 0 end",
                                                        "variable": "weight_distance"
                                                    }
                                                ]
                                            },
                                            {
                                                "#operator": "Filter",
                                                "condition": "(((`itemEntry`.`_type`) = \"feed\") and ((`itemEntry`.`account_id`) = \"user_account::5\"))"
                                            },
                                            {
                                                "#operator": "InitialProject",
                                                "result_terms": [
                                                    {
                                                        "expr": "`itemEntry`"
                                                    },
                                                    {
                                                        "expr": "`relatedItemDetails`"
                                                    },
                                                    {
                                                        "expr": "`distance`"
                                                    },
                                                    {
                                                        "expr": "`weight_distance`"
                                                    }
                                                ]
                                            }
                                        ]
                                    }
                                }
                            ]
                        },
                        {
                            "#operator": "Alias",
                            "as": "feedEntry"
                        },
                        {
                            "#operator": "Parallel",
                            "~child": {
                                "#operator": "Sequence",
                                "~children": [
                                    {
                                        "#operator": "Filter",
                                        "condition": "(((`feedEntry`.`distance`) \u003c= 100) or ((`feedEntry`.`distance`) is null))"
                                    },
                                    {
                                        "#operator": "InitialProject",
                                        "result_terms": [
                                            {
                                                "expr": "(`feedEntry`.`itemEntry`)"
                                            },
                                            {
                                                "expr": "(`feedEntry`.`relatedItemDetails`)"
                                            },
                                            {
                                                "expr": "(`feedEntry`.`distance`)"
                                            },
                                            {
                                                "expr": "(`feedEntry`.`weight_distance`)"
                                            },
                                            {
                                                "expr": "((`feedEntry`.`itemEntry`).`total_weight`)"
                                            }
                                        ]
                                    }
                                ]
                            }
                        }
                    ]
                },
                {
                    "#operator": "Order",
                    "sort_terms": [
                        {
                            "expr": "((`feedEntry`.`total_weight`) + (`feedEntry`.`weight_distance`))"
                        },
                        {
                            "desc": true,
                            "expr": "((`feedEntry`.`itemEntry`).`created_at`)"
                        }
                    ]
                },
                {
                    "#operator": "FinalProject"
                }
            ]
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "51.850782ms",
        "executionTime": "51.782029ms",
        "resultCount": 1,
        "resultSize": 9503
    }
}

I don’t really understand why the INDEX is not being applied. What am I missing here?

Thanks!


#5

Hi @ivoecpereira, there is a difference between your posting and the previous by @keshav_m.

N1QL index usage is determined by the WHERE clause of your query. If you look at the WHERE clause in the posting by @keshav_m, the WHERE clause directly uses the index expressions: RADIANS(geo.lat) and RADIANS(geo.lon).

You cannot use an index if your WHERE clause does not match the index.


#6

Hi @ivoecpereira,

For this query (without taking the 180th meridian into account), indexes below will work.

SELECT geo FROM `beer-sample` 
WHERE (RADIANS(geo.lat) >= 0.648579006484 and RADIANS(geo.lat) <= 0.664257656866) 
and (RADIANS(geo.lon) >= -2.14582984831 and RADIANS(geo.lon) <= -2.12603802887) ;
CREATE INDEX idxbrlocation on `beer-sample`(RADIANS(geo.lat), RADIANS(geo.lon));

When you take the 180th meridian into account,
change the OR predicate into a UNION to exploit the index scan. the AND predicate can remain as is.


#7

Hey @geraldss,

Thanks for the feedback and sorry for the late reply. I’ve been working on other stuff meanwhile.

That explains why the INDEX is not being applied. As I need the result of the variable “distance”, having it defined or not (so it could be weighted), what would be the best approach? Another subselect using RADIANS in WHERE clause?

I feel like I am going a lot around of the main problem.

Thanks @keshav_m! I guess that once I have the query being able to read the INDEXes by being used in WHERE clause, I would be able to apply those INDEXes.

Thanks again!


#8

You can create an index on the distance. You would need to put the whole formula in the index definition.

Just remember that the WHERE clause in your query must use the expressions in your index.