What would be the best FTS query type for my scenario?

java

#1

Hello,

I’m learning FTS and I have to say it’s an excellent feature. In my bucket I have documents representing destinations/cities like so

{
  "caption": "San Francisco, California, USA",
  "cityId": "1136",
  "cityName": "San Francisco",
  "countryCode": "US",
  "countryName": "USA",
  "destinationId": 404,
  "destinationName": "California",
  "type": "com.proj.model.dest"
}

Also let’s assume I also have a destination with cityName: Amman captionName: Amman, Jordan

I have an fts index created

Now I’m trying to create a search api, something like the searchbox in hotels.com and expedia.com where you type the destination you want to go to and you get results, but I’m having a difficult time in getting proper results according to the user input.

Tried to do a match query like so

SearchQuery searchQuery = new SearchQuery(“idx_cities_custom_store”, SearchQuery.match(keyword));
SearchQueryResult citiesResults = bucket.query(searchQuery.fields(cityNameField, captionField).highlight(captionField).limit(citiesLimit));

But this way I’m missing a lot hits, assume the keyword is “san fran” then I don’t get San Francisco in the json above but I get other destinations like ‘San Marino’ and ‘San Benito’. If I input “amm” as keyword, I don’t get city Amman.

So I’m here seeking help, can someone enlighten me how to achieve what I’m trying to do? What kind of query would suit this case best? Should I use a conjunction query maybe?

Thanks,


#2

Hi lenix,
Wondering if you can post your index definition up for us to see? Especially, to check out the analyzer you’re using.

One of the things about search systems is depending on the text analyzer, something like “San Francisco” or “Amman” will get tokenized into something like [“san”, “francisco”] and into [“amman”]

But, when you search for “san fran” and “amm”, it would be looking for [“san”, “fran”] and “amm”, respectively, so the “fran” won’t match, and the “amm” won’t match.

cheers,
steve


#3

My index definition is as follows:

{
  "type": "fulltext-index",
  "name": "idx_cities_custom_store",
  "uuid": "588a5bd7bd1cb2c6",
  "sourceType": "couchbase",
  "sourceName": "cities_custom",
  "sourceUUID": "671bffc3e73f9dc3f041ed404f7a087b",
  "planParams": {
    "maxPartitionsPerPIndex": 32,
    "numReplicas": 0,
    "hierarchyRules": null,
    "nodePlanParams": null,
    "pindexWeights": null,
    "planFrozen": false
  },
  "params": {
    "doc_config": {
      "mode": "type_field",
      "type_field": "type"
    },
    "mapping": {
      "default_analyzer": "standard",
      "default_datetime_parser": "dateTimeOptional",
      "default_field": "_all",
      "default_mapping": {
        "display_order": "0",
        "dynamic": false,
        "enabled": true,
        "properties": {
          "caption": {
            "dynamic": false,
            "enabled": true,
            "fields": [
              {
                "analyzer": "",
                "display_order": "1",
                "include_in_all": true,
                "include_term_vectors": true,
                "index": true,
                "name": "caption",
                "store": true,
                "type": "text"
              }
            ]
          },
          "cityName": {
            "dynamic": false,
            "enabled": true,
            "fields": [
              {
                "analyzer": "",
                "display_order": "0",
                "include_in_all": true,
                "include_term_vectors": true,
                "index": true,
                "name": "cityName",
                "store": true,
                "type": "text"
              }
            ]
          }
        }
      },
      "default_type": "_default",
      "index_dynamic": true,
      "store_dynamic": false,
      "type_field": "type"
    },
    "store": {
      "kvStoreName": "mossStore"
    }
  },
  "sourceParams": {
    "clusterManagerBackoffFactor": 0,
    "clusterManagerSleepInitMS": 0,
    "clusterManagerSleepMaxMS": 2000,
    "dataManagerBackoffFactor": 0,
    "dataManagerSleepInitMS": 0,
    "dataManagerSleepMaxMS": 2000,
    "feedBufferAckThreshold": 0,
    "feedBufferSizeBytes": 0
  }
}

Do I have to modify my index or decide on the query type that I should use? or maybe both?

Thanks a lot for the explanation Steve :slight_smile:


#4

Hi, sounds like you probably want to do a Prefix query instead of Match. It is mentioned here, with a small example that might help: https://developer.couchbase.com/documentation/server/current/sdk/full-text-search-overview.html#story-h2-4

I’m not sure if you need to use a different analyzer or not, I’ll try to give that a test today too. But I did use QueryString with a * wildcard and received good results using same analyzer as you.


#5

But that page does not describe using a wildcard * with the QueryString, only + and -… Also it says in a note

Certain queries that are supported by FTS are not yet supported by the query string syntax. This includes wildcards, regexp, and date range queries.

Also I’m still unable to get the results I want by any of those query types, as I described above, if I have ‘san fran’ as a keyword, I don’t get ‘San Francisco’ back. I tried creating my own analyzer with a whitespace tokenizer but it also didn’t help.

Please if someone can give any sort of help it would be much appreciated.


#6

Sorry we have some updates docs coming that should help show these examples better.

There are two approaches to consider. One is using a Wildcard Query in the query string.

“The wildcard query uses a wildcard expression to search within individual terms for matches. Wildcard expressions can be any single character (?) or zero to many characters (*). Wildcard expressions can appear in the middle or end of a term but not at the beginning of the search term.”

You should be able to search: “san fran*” or “amm*”. I think you’ve tried it, did it work for you?

Or you can do a Prefix query, but you will have to generate your JSON definition for the query and pass it to the service. (You cannot using simple query string method using the search box in the web console for this special type of query). For example, I indexed the “country” field in the travel-sample database, then I query it for a specific prefix:

curl -u admin:admin123 -X POST -H "Content-Type: application/json" \
-d '{"size":10,"query": {"prefix": "united"}}' \
 http://localhost:8094/api/index/airlinecountry/query  

So you could put also put them into a conjunct query that searches for multiple prefixes, etc…

curl -u admin:admin123 -X POST -H "Content-Type: application/json" \
-d '{"size":10,"query": {"conjuncts": [{"prefix": "uni"}, {"prefix": "canad"}]}}' \
 http://localhost:8094/api/index/airlinecountry/query

If you need a little more help, please holler.


#7

Regex and Wildcard queries don’t work for me whatsoever (I get zero hits/results). Keep in mind I’m using the latest SDK and I’ve tried on both the latest Stable CB build and the 5.0 Beta. Not sure what I’m doing wrong.

See following screenshots: