Handling special/latin characters like ö in searches


#1

Example documents
1 - caption: Mövenpick Airport, Nuremburg
2 - caption: Movenpick Ms Hamees, Qena

My index looks like this

screenshot

json

{
“name”: “giata_properties_custom”,
“type”: “fulltext-index”,
“params”: {
“doc_config”: {
“mode”: “type_field”,
“type_field”: “type”
},
“mapping”: {
“default_analyzer”: “standard”,
“default_datetime_parser”: “dateTimeOptional”,
“default_field”: “_all”,
“default_mapping”: {
“dynamic”: true,
“enabled”: false
},
“default_type”: “_default”,
“index_dynamic”: true,
“store_dynamic”: false,
“types”: {
“properties.CustomPropertyDetailsType”: {
“dynamic”: false,
“enabled”: true,
“properties”: {
“caption”: {
“enabled”: true,
“dynamic”: false,
“fields”: [
{
“analyzer”: “”,
“include_in_all”: true,
“include_term_vectors”: true,
“index”: true,
“name”: “caption”,
“store”: true,
“type”: “text”
}
]
},
“giataId”: {
“enabled”: true,
“dynamic”: false,
“fields”: [
{
“analyzer”: “”,
“include_in_all”: false,
“include_term_vectors”: false,
“index”: false,
“name”: “giataId”,
“store”: true,
“type”: “text”
}
]
},
“name”: {
“enabled”: true,
“dynamic”: false,
“fields”: [
{
“analyzer”: “”,
“include_in_all”: false,
“include_term_vectors”: false,
“index”: false,
“name”: “name”,
“store”: true,
“type”: “text”
}
]
}
}
}
}
},
“store”: {
“kvStoreName”: “mossStore”
}
},
“sourceType”: “couchbase”,
“sourceName”: “giata”,
“sourceUUID”: “4c0347224f55790cd9500ca2a6165102”,
“sourceParams”: {},
“planParams”: {
“maxPartitionsPerPIndex”: 171,
“numReplicas”: 0
},
“uuid”: “53392badf825285c”
}

Now my objective is return both results if someone either inputs Mövenpick or Movenpick… I tried my luck with analyzers but I couldn’t make it. Would appreciate any input as I’m not an expert on FTS and bleve.

Thanks,


#2

In this case if you choose the “de” analyzer, both “Mövenpick” and “Movenpick” should be indexed as the same term “movenpick”, and thus searches for either term will match both.

Also, be sure to set this as the default analyzer for the whole index if you want these searches to match when a user doesn’t explicitly specify the field.


#3

@mschoch
I can’t find the de analyzer, see here

I’m using Couchbase 5.0.0 build 3519

Any ideas?


#4

You’re right, I mistakenly though that the ‘de’ analyzer was included in the last release. It will be available in the next version.

For now, there is no good filter to do what you want. You could create a series of regular expression based character filters which match a character and specify the replacement. But, it will be cumbersome and probably also quite slow.