Index with number functions help

Hello,

Objects in the bucket example

{
“coordinates”: {
“latitude”: “34.006780999999997”,
“longitude”: “-118.491449”
},
type: “HotelbedsHotel”
…}

My index

CREATE INDEX hotelbeds_hotels_coordinates
ON hotelbeds_hotels(coordinates, TO_NUMBER(coordinates.latitude), TO_NUMBER(coordinates.longitude))
WHERE (type = “HotelbedsHotel”)

My query

select meta().id
from hotelbeds_hotels
where
TO_NUMBER(coordinates.longitude) > -118.6681759 and TO_NUMBER(coordinates.longitude) < -118.1552891
and TO_NUMBER(coordinates.latitude) > 33.7036519 and TO_NUMBER(coordinates.latitude) < 34.3373061
and type=‘HotelbedsHotel’

Query explain

explain

{
“plan”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “IndexScan3”,
“covers”: [
“cover ((hotelbeds_hotels.coordinates))”,
“cover (to_number((hotelbeds_hotels.coordinates.latitude)))”,
“cover (to_number((hotelbeds_hotels.coordinates.longitude)))”,
“cover ((meta(hotelbeds_hotels).id))”
],
“filter_covers”: {
“cover ((hotelbeds_hotels.type))”: “HotelbedsHotel”
},
“index”: “hotelbeds_hotels_coordinates”,
“index_id”: “578587a235e31a70”,
“index_projection”: {
“entry_keys”: [
0
],
“primary_key”: true
},
“keyspace”: “hotelbeds_hotels”,
“namespace”: “default”,
“spans”: [
{
“range”: [
{
“inclusion”: 0,
“low”: “null”
}
]
}
],
“using”: “gsi”
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Filter”,
“condition”: “((((((-118.6681759) < to_number((cover ((hotelbeds_hotels.coordinates)).longitude))) and (to_number((cover ((hotelbeds_hotels.coordinates)).longitude)) < (-118.1552891))) and (33.7036519 < to_number((cover ((hotelbeds_hotels.coordinates)).latitude)))) and (to_number((cover ((hotelbeds_hotels.coordinates)).latitude)) < 34.3373061)) and (cover ((hotelbeds_hotels.type)) = "HotelbedsHotel"))”
},
{
#operator”: “InitialProject”,
“result_terms”: [
{
“expr”: “cover ((meta(hotelbeds_hotels).id))”
}
]
},
{
#operator”: “FinalProject”
}
]
}
}
]
},
“text”: “select meta().id\nfrom hotelbeds_hotels \nwhere\nTO_NUMBER(coordinates.longitude) > -118.6681759 and TO_NUMBER(coordinates.longitude) < -118.1552891\nand TO_NUMBER(coordinates.latitude) > 33.7036519 and TO_NUMBER(coordinates.latitude) < 34.3373061\nand type=‘HotelbedsHotel’”
}

Now as you see the index works fine but I thought by creating the index with TO_NUMBER, i’ll skip the overhead needed when I do the actual query which involves TO_NUMBER… but it seems that making the index with or without TO_NUMBER yields the same performance.

I tried the same scenario on a data where the numbers are stored numbers (not literals) and the performance was much quicker, we’re talking ~ 8ms to 1.5secs

Any tips how can I enhance my query performance in my scenario?

Thanks a lot

Check this out https://dzone.com/articles/speed-up-spatial-search-in-couchbase-n1ql
cc @keshav_m

Will check it out further soon but if someone can answer my question in specific I would appreciate that because I’m in a situation where I can’t change the data at the moment and the coordinates are saved as strings.

There may be an issue here. If you see the explain, the predicates are not getting correctly pushed down to the index scan and the fact that it’s choosing the index without ANY predicate on the leading key is strange. It’s possible there is some inference because all three keys use the same coordinates object.

Change the index definition to this.
CREATE INDEX hotelbeds_hotels_coordinates
ON hotelbeds_hotels(TO_NUMBER(coordinates.latitude), TO_NUMBER(coordinates.longitude))
WHERE (type = “HotelbedsHotel”);

You should see much better performance. If you do need the coordinates, put it at the end of the index definition.

CREATE INDEX hotelbeds_hotels_coordinates
ON hotelbeds_hotels(TO_NUMBER(coordinates.latitude), TO_NUMBER(coordinates.longitude), coordinates)
WHERE (type = “HotelbedsHotel”);

Your plan should look something like this.

      "spans": [
      {
        "exact": true,
        "range": [
          {
            "high": "34.3373061",
            "inclusion": 0,
            "low": "33.7036519"
          },
          {
            "high": "(-118.1552891)",
            "inclusion": 0,
            "low": "(-118.6681759)"
          }
        ]
      }
    ],
    "using": "gsi"
  },
  {
1 Like

@keshav_m

Bear with me and keep in mind that I’ve been using couchbase on daily basis since a year so this is not my first index.

I tried your suggestion to have the index only include coordinates.latitude and coordinates.longitude but the query I posted above keeps using the primary index and never uses the explicit index even if I try to use index but explain still says primary as the index is not eligible.

I tried putting coordinates as the last item in the index as you suggested (even though I don’t need it) and it still didn’t make any difference.

The only index that semi-worked is the index that I posted above where coordinates is the first item in the index (coordinates, TO_NUMBER(coordinates.latitude), TO_NUMBER(coordinates.longitude)) and without having coordinates as the first element the index will never work.

I’m pretty sure my problem is with TO_NUMBER because I have another bucket with the same data structure but the coordinates are stored as numbers (not literals) and I created an index and it works fine and the explain have a range like you suggested.

Either I’m using TO_NUMBER incorrectly or it’s a bug.

edit: I tried all of this on two couchbase versions just as a sanity check (5.0.0 build 3519 (stable) and 5.5.0 build 2473 (beta))

edit2: Here is a paste of a whole document just to make sure

json
{
  "country": {
    "description": {
      "content": "Spain"
    },
    "code": "ES",
    "isoCode": "ES"
  },
  "code": 10,
  "address": {
    "content": "Los Pechos,5  "
  },
  "coordinates": {
    "latitude": "27.7861323",
    "longitude": "-15.703138"
  },
  "type": "HotelbedsHotel",
  "name": {
    "content": "Natural Park"
  }
}

What do you think?

The following Index and query combinations works in 5.0.0-3519 (it should work on all releases).
Make sure your index built successfully and if needed provide USE INDEX hint (drop old index) (multiple indexes qualifies it can use any index). If still having issue provide screen host of the index definition and exact query.

CREATE INDEX hotelbeds_hotels_coordinates ON hotelbeds_hotels(TO_NUMBER(coordinates.latitude), TO_NUMBER(coordinates.longitude))
WHERE (type = "HotelbedsHotel");

SELECT META().id
FROM hotelbeds_hotels
WHERE TO_NUMBER(coordinates.longitude) > -118.6681759 AND TO_NUMBER(coordinates.longitude) < -118.1552891
      AND TO_NUMBER(coordinates.latitude) > 33.7036519 AND TO_NUMBER(coordinates.latitude) < 34.3373061
      AND type="HotelbedsHotel";

      "~children": [
            {
                "#operator": "IndexScan2",
                "covers": [
                    "cover (to_number(((`hotelbeds_hotels`.`coordinates`).`latitude`)))",
                    "cover (to_number(((`hotelbeds_hotels`.`coordinates`).`longitude`)))",
                    "cover ((meta(`hotelbeds_hotels`).`id`))"
                ],
                "filter_covers": {
                    "cover ((`hotelbeds_hotels`.`type`))": "HotelbedsHotel"
                },
                "index": "hotelbeds_hotels_coordinates",
                "index_id": "fad6be9953c8ddd3",
                "keyspace": "hotelbeds_hotels",
                "namespace": "default",
                "spans": [
                    {
                        "exact": true,
                        "range": [
                            {
                                "high": "34.3373061",
                                "inclusion": 0,
                                "low": "33.7036519"
                            },
                            {
                                "high": "(-118.1552891)",
                                "inclusion": 0,
                                "low": "(-118.6681759)"
                            }
                        ]
                    }
                ],
                "using": "gsi"
            },

Consider this as solved… my problem is that I was escaping the fields while creating the index

That’s the wrong thing that @keshav_m pointed out …

Thanks a lot for your help