Behaviour changed from 6 to 7 with wildcard / regexp

Hi there.

We have a situation with an query. It was supported in 6.5.0, but with the upgrade to 7.1.3 not anymore.

I chose FTS 'couse there is some requirements to got it:

  • Differents combinations of fields in query

  • “like” query

  • language-aware

  • etc…

It’s related to “like” wildcard / regexp query type in both sides.

6.5.0

Full FTS index:

{
 "name": "xxxx-schools-fts",
 "type": "fulltext-index",
 "params": {
  "doc_config": {
   "docid_prefix_delim": "",
   "docid_regexp": "",
   "mode": "type_field",
   "type_field": "_class"
  },
  "mapping": {
   "analysis": {
    "analyzers": {
     "xxxx-es": {
      "token_filters": [
       "to_lower",
       "stop_es",
       "stop_ca",
       "stop_eu",
       "elision_ca"
      ],
      "tokenizer": "unicode",
      "type": "custom"
     }
    }
   },
   "default_analyzer": "standard",
   "default_datetime_parser": "dateTimeOptional",
   "default_field": "_all",
   "default_mapping": {
    "dynamic": true,
    "enabled": false
   },
   "default_type": "_default",
   "docvalues_dynamic": true,
   "index_dynamic": true,
   "store_dynamic": false,
   "type_field": "_type",
   "types": {
    "xx.xxxx.xxxx.schools.model.SchoolRefDocument": {
     "default_analyzer": "standard",
     "dynamic": false,
     "enabled": true,
     "properties": {
      "school_collective": {
       "enabled": true,
       "dynamic": false,
       "fields": [
        {
         "analyzer": "standard",
         "index": true,
         "name": "school_collective",
         "type": "text"
        }
       ]
      },
      "school_name": {
       "enabled": true,
       "dynamic": false,
       "fields": [
        {
         "analyzer": "es",
         "index": true,
         "name": "school_name",
         "type": "text"
        }
       ]
      }
      }
     }
    }
   }
  },
  "store": {
   "indexType": "scorch"
  }
 },
 "sourceType": "couchbase",
 "sourceName": "search",
 "sourceUUID": "1c07e0053b581874d2efcd574867b2d1",
 "sourceParams": {},
 "planParams": {
  "maxPartitionsPerPIndex": 171,
  "indexPartitions": 6,
  "numReplicas": 0
 },
 "uuid": "1e21fec76afb8411"
}

Wildcard fields:

(...)
      "school_collective": {
       "enabled": true,
       "dynamic": false,
       "fields": [
        {
         "analyzer": "standard",
         "index": true,
         "name": "school_collective",
         "type": "text"
        }
       ]
      },
      "school_name": {
       "enabled": true,
       "dynamic": false,
       "fields": [
        {
         "analyzer": "es",
         "index": true,
         "name": "school_name",
         "type": "text"
        }
       ]
      }
(...)

Next query works as expected:

select meta(s).id as _ID, s.school_name, s.school_collective
from `search` s
WHERE  s.`_class` = "xxx.xxx.xxxx.schools.model.SchoolRefDocument"
AND SEARCH(s,
   {"query": {
       "must": {
                   "conjuncts": [
                        {"field":"ean", "match": "5createQA"},
                        {"field":"school_name", "wildcard": "*rinida*"},
                        {"field":"school_collective", "wildcard": "*sian*"}
                    ]
       }
    },
    "fields": ["*"]
    }
)
;

7.1.3

Full FTS index:

{
 "name": "fts_schools_name_with_class",
 "type": "fulltext-index",
 "params": {
  "doc_config": {
   "docid_prefix_delim": "",
   "docid_regexp": "",
   "mode": "scope.collection.type_field",
   "type_field": "_class"
  },
  "mapping": {
   "default_analyzer": "standard",
   "default_datetime_parser": "dateTimeOptional",
   "default_field": "_all",
   "default_mapping": {
    "dynamic": true,
    "enabled": false
   },
   "default_type": "_default",
   "docvalues_dynamic": false,
   "index_dynamic": true,
   "store_dynamic": false,
   "type_field": "_type",
   "types": {
    "_default._default": {
     "dynamic": false,
     "enabled": true,
     "properties": {
      "_class": {
       "enabled": true,
       "dynamic": false,
       "fields": [
        {
         "analyzer": "keyword",
         "docvalues": true,
         "include_in_all": true,
         "include_term_vectors": true,
         "index": true,
         "name": "_class",
         "store": true,
         "type": "text"
        }
       ]
      },
      "name": {
       "enabled": true,
       "dynamic": false,
       "fields": [
        {
         "analyzer": "standard",
         "docvalues": true,
         "include_in_all": true,
         "include_term_vectors": true,
         "index": true,
         "name": "name",
         "store": true,
         "type": "text"
        }
       ]
      }
     }
    }
   }
  },
  "store": {
   "indexType": "scorch",
   "segmentVersion": 15
  }
 },
 "sourceType": "gocbcore",
 "sourceName": "xxxx-local",
 "sourceUUID": "599da0e6c99ff70e9e698c276d8aac78",
 "sourceParams": {},
 "planParams": {
  "maxPartitionsPerPIndex": 1024,
  "indexPartitions": 1,
  "numReplicas": 0
 },
 "uuid": "298c2439fbc2b857"
}

Look closely to name attribute:

(...)
      "name": {
       "enabled": true,
       "dynamic": false,
       "fields": [
        {
         "analyzer": "standard",
         "docvalues": true,
         "include_in_all": true,
         "include_term_vectors": true,
         "index": true,
         "name": "name",
         "store": true,
         "type": "text"
        }
(...)

Next query does not work properly (name in some docs has “name”: “SAN PEDRO - TRIANA/SEVILLA”, for example):

select s.*
from `xxx-local` s
WHERE  SEARCH(s,
   {"query": {
       "must": {
                   "conjuncts": [
                        {"field":"_class", "match": "xxx.xxx.xxx.schools.model.SchoolDocument"},
                        {"field":"name", "wildcard": "*VILL*"}
                    ]
       }
    },
    "fields": ["*"],
    "score": "none"
    }
)
;

Searching from the begginig of the field does not work neither:

select s.*
from `xxx-local` s
WHERE  SEARCH(s,
   {"query": {
       "must": {
                   "conjuncts": [
                        {"field":"_class", "match": "xxx.xxx.xxx.schools.model.SchoolDocument"},
                        {"field":"name", "wildcard": "SAN*"}
                    ]
       }
    },
    "fields": ["*"],
    "score": "none"
    }
)
;

But searching by match works (but it’s not the requirement)

select s.*
from `xxx-local` s
WHERE  SEARCH(s,
   {"query": {
       "must": {
                   "conjuncts": [
                        {"field":"_class", "match": "xxx.xxx.xxx.schools.model.SchoolDocument"},
                        {"field":"name", "match": "sevilla"}
                    ]
       }
    },
    "fields": ["*"],
    "score": "none"
    }
)
;

It’s something I took from different blog (one of them is this: N1QL & SEARCH: Leverage Full-Text Search (FTS) Index in N1QL) but with new version it does not work like previously

Is there some workaround to support the same requirement without changing (almost-complete) the way to execute / create the query?

Thanks in advance!

Hi @mbracero. Yes this is an expected change in behavior from 6.5.1 to builds that came after. Here’s the ticket that captures some details on this - MB-39592. The workaround for you is to use the keyword analyzer for fields in your search index definition whose access is via a wildcard request.

With 6.6.0 and beyond, for non analytic queries such as the wildcard query you’re using from N1QL we’ve introduced a mandate to only work with the keyword analyzer while accessing FTS indexes, because of some undesirable behavior that pops up otherwise within certain kind of queries (highlighted in the ticket).

The reason the match query works as expected is because it’s an analytic query which will apply the analyzer pulled from the index definition over the search criteria as a precursor to search.

Also, worth noting that we’ve loosened this mandate in an upcoming release as noted here MB-53231 with improved behavior. This release will likely come out in fall this year, so until then your workaround as I earlier pointed out should be to use the keyword analyzer over fields that will be queried with non-analytic queries:

  • TermQuery
  • PhraseQuery
  • MultiPhraseQuery
  • FuzzyQuery
  • PrefixQuery
  • RegexpQuery
  • WildcardQuery

Thanks for your answer.

I changes to keyword analyzer and the query works as expected:

select s.*
from `xxx-local` s
WHERE  SEARCH(s,
   {"query": {
       "must": {
                   "conjuncts": [
                        {"field":"_class", "match": "xxx.xxx.xxx.schools.model.SchoolRefDocument"},
                        {"field":"name", "wildcard": "*TRINIDAD*"}
                    ]
       }
    },
    "fields": ["*"],
    "score": "none",
    "sort": ["-_id"],
    "offset": 0, "limit": 2,
    "explain": false,
    "highlight": {}
    }
)
;

According to MB-53231, and more specifically this sentence:

Additionally, it is perhaps good to drop the constraint that non-analytic queries require the keyword analyzer usage in definition and leave this up to the user to provide context correctly for the validation phase in case of non-covering indexes for queries. So that’s essentially removing the restriction put in place with MB-39592.

When you release it we could use the index / query without changes as previouslly with working properly?? That is, index with field keyword and query with wildcard plus *.

Thanks so much!

Yes, that will continue to work as expected.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.