FTS search on a field that may contain diacritic symbols

Hello,
I am using fts search on a field that represents person name. The name can contain diacritic symbols (é, è, ë, ñ, ø, ç, …). What analyser should I use for that field so if my CB contain below data:
“Hervé Villechaizé”
“Hérve Villechaize”
“Herve Villechaize”
“Hërve Villechaizè”

and user searches with simple English characters “Herve Villechaize”, system will return all 4 documents?

Does such analyser exist? Or what customizations should I do for existent analysers?
Thanks in advance,
Natalia

Hey @bertynat ,

Please find similar responses here,

Let me know if this doesn’t answer your query.

Cheers,

Thanks for pointing me out. I tried to implement and it works when I place search with English letters, the result will contain diacritics as well, but when I search with diacritic character “hervé” search is not returning anything.
I saw a reply around _all, but I did not get how to implement it. Can you please advice?

Here is my index definition:

And here is my query:
image

Hey @bertynat ,

You are trying a “term” query there for the diacritic term.
Term queries are non-analytic. ref - Non-Analytic Queries | Couchbase Docs
So, it won’t apply query time text analysis with the custom analyser.

If you make this a match query it would work.

{“field”: “displayNameMatch”, “match” : “hervé”}

1 Like

Indeed diacritic now it returns. But now I have another issue with spaces. If I would search for “Herve Villechaize” and in database I have other persons “Neil Shervell” and “Emma Haize”, they will be returned as well by this query
image

How correctly to construct query in order to get all variations of “Herve Villechaize” (meaning with English alphabet and diactitic symbols and no other names)?
Thanks

@bertynat Here’s what you do in your situation …

{
	"query": {
		"conjuncts": [{
				"field": "type",
				"match": "person"
			},
			{
				"field": "displayNameMatch",
				"match": "herve Villechaize",
				"operator": "and"
			}
		]
	}
}       

The default operator for a match query is “or” - meaning all the tokens generated by the match analytic query are OR-ed. Setting the operator to “and” will force the query to look for all the tokens in the particular field.