What is the best analyser config for an uuid "00000055-ed1a-46a9-9ab0-e766cea7e7e3"

flaviu · March 20, 2023, 1:11am

@abhinav I need your suggestion if you have a few minutes.

I have an FTS index and I need to search for something of this format (UUID): 00000055-ed1a-46a9-9ab0-e766cea7e7e3

The search should be exact match, not parts of it. What is the best analyzer to set for the index?

Now It is set as inherit, and the default analyzer is set as “en”. My feeling is that this is incorect and I should use the “keyword”, but in the same time, I feel that this one is not a the correct option

What is your suggestion?

thejas · March 20, 2023, 7:13am

If the field you’re indexing in the json doc looks like:

{
...
"UUID": "00000055-ed1a-46a9-9ab0-e766cea7e7e3"
...
}

then using keyword analyzer for the field “UUID” should be good enough. However if the UUID was part of a sentence,

{
...
"description": "the unique identifier for the product is 00000055-ed1a-46a9-9ab0-e766cea7e7e3 "
...
}

then perhaps a custom analyzer with a whitespace tokenizer (along with the set of token filters that you’d want to use) should do the trick. Also, I’d suggest experimenting with this tool https://bleveanalysis.couchbase.com/analysis over some text to get better idea about what analyzer to use as per your use-case (which looks like searching for the entire UUID value as single string).

flaviu · March 20, 2023, 10:46am

Thanks for the answer. The UUID is not part of a sentence, so, I will use keyword as analyzer

system · June 18, 2023, 10:47am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.