Searching on value containing a - (hyphen)

Hi,

I’m having trouble with what seems like a simple FTS

Given the data:

{ “id”: “sim_current-anthropology_1989-02_30_1” }
{ “id”: “sim_current-anthropology_1989-03_30_1” }
{ “id”: “sim_current-anthropology_1989-04_30_1” }
etc…

This FTS works (many results):

{“size”: 10, “explain”: false, “fields”: [“id”], “query”:{“wildcard”: “sim_current*”}}

This does not (0 results):

{“size”: 10, “explain”: false, “fields”: [“id”], “query”:{“wildcard”: “sim_current-*”}}

The difference is a “-” at the end of “current”.

I tried with “prefix” instead of “wildcard”.

I tried escaping the hyphen with a backslash but it errors with “invalid character ‘-’ in string escape code”.

Tried regexp:

{“size”: 10, “explain”: false, “fields”: [“id”], “query”:{“regexp”: “sim_current[-].+”}}

Tried a non-FTS with "LIKE = “sim_current-anthropology%” but it is slow even when indexed (16m records of which 1.5M begin with “sim_”)

Is there something I am missing? Thanks.

If you are looking for N1QL. Underscore wild card you need to escape that like below
https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/comparisonops.html

create index ix1 ON default(id);
SELECT d.*
FROM default AS d
WHERE d.id LIKE "sim\\_current-%"

Check EXPLAIN and see spans has enough information passed to indexer produce less items.

               {
                    "exact": true,
                    "range": [
                        {
                            "high": "\"sim_current.\"",
                            "inclusion": 1,
                            "low": "\"sim_current-\""
                        }
                    ]
                }

Thank you. I’m using the API not N1QL. Tried escaping the underscore and/or dash with double \ with no change.

Here is the full command:

curl -u priv:priv -X POST -H "Content-Type: application/json" http://localhost:8094/api/index/index_2/query -d '{"size": 10, "explain": true, "fields": ["id"], "query":{"wildcard": "sim_current-*"}}'

The output:

{"status":{"total":6,"failed":0,"successful":6},"request":{"query":{"wildcard":"sim_american-*"},"size":10,"from":0,"highlight":null,"fields":["id"],"facets":null,"explain":true,"sort":["-_score"],"includeLocations":false,"search_after":null,"search_before":null},"hits":[],"total_hits":0,"max_score":0,"took":710955,"facets":null}

As noted earlier, changing to this:

{"wildcard": "sim_current*"}

…it returns the correct results. But this does not work:

{"wildcard": "sim_current-*"}

I need to to include the hyphen to narrow the results as it is too many without.

@stb3 ,

The problem here mostly stems from the analyzer in use for the field used.
The default standard analyzer omits the contents after the hyphen - and hence not searchable.
You explore that here- Bleve Text Analysis Wizard

Also, in the above query since you haven’t specified any target fields the search is applied on the default _all field.
So you could either fix your default analyzer in the index definition Or
change the analyzer for the field of interest and use that field in the query as the target field.

Keyword analyzer would be one choice, but the final analyzer depends on what exactly your search requirements.

https://docs.couchbase.com/server/current/fts/fts-using-analyzers.html

Cheers

Hello Sreeks - yes that worked! It required to change the default analyzer since it had already been built in the index with the ‘standard’ analyzer. Thank you! I will need to learn more about analyers.