Full Text search with ASCII Folding Filter


#1

Hi there,

We are implementing a simple full text search in our application driven by couchbase database.

As there are quite a lot of french people among our users, we need ASCII Folding to make index accent insensitive.

I found this issue pending on the Bleve repository:

Which states that:

[…] though some of the language specific analyzers include a filter which folds specific accent characters likely to appear in a particular language.

Any insight on how to get such a filter working within couchbase ? We tried quite all of the french specific filters without any success so far …

Thanks,


#2

Hi – one quick thought that pops to mind (have not tried it myself, though!) is perhaps using one or more regexp character filters might be able to replace accented characters with their simpler ASCII versions. cheers, steve


#3

Hi Steve,

That’s what we did to work arround, but it kind of bring some noise in the mapping.

If I read well, there’s quite a chance that an ASCII folding filter will be merged into Bleve in the future.

Does that mean that this will be available as well in Couchbase FT or is there a gap between Bleve’s latest and FTS features ?

Thanks


#4

Hi Anton – yes, there’s definitely a lag. Features, improvements, and fixes all go into the bleve open-source library earlier. Then later on, an latest & greatest released version of Couchbase server comes out incorporating the latest bleve.