FTS performance with async query

Hi,

I’m working on a small project that need to do FTS search asynchronously since I need it to run with some parallelism.
Then, I run my app with maximum 100 parallelism (means max 100 parallel async FTS query running), I saw the bucket statistic, it shown:

  1. 1.6GB Fts RAM used
  2. 36% Max CPU Utilization
  3. 33.6GB free RAM of 58.5GB
  4. Average 8 fts queries/sec
  5. 169GB fts disk size

Questions:

  1. I was hoping could get 2.000 fts queries/sec. Is it possible? And how to achieve that?
  2. In the process, I’m quite seeing several times of RequestCancelledException and TimeoutException. Most of it when the search query has been running for 75 seconds. What are the possibilities caused it? And what can be done so it won’t get that exceptions?

Thank you.

Below is the FTS index definition:

{
  "type": "fulltext-index",
  "name": "customer",
  "uuid": "2a4d9a471e091cb3",
  "sourceType": "couchbase",
  "sourceName": "CDG",
  "sourceUUID": "39e9be42e56d114c2fcb30801325f17f",
  "planParams": {
    "maxPartitionsPerPIndex": 171
  },
  "params": {
    "doc_config": {
      "docid_prefix_delim": "",
      "docid_regexp": "",
      "mode": "type_field",
      "type_field": "type_"
    },
    "mapping": {
      "analysis": {},
      "default_analyzer": "standard",
      "default_datetime_parser": "dateTimeOptional",
      "default_field": "_all",
      "default_mapping": {
        "dynamic": true,
        "enabled": false
      },
      "default_type": "_default",
      "docvalues_dynamic": true,
      "index_dynamic": true,
      "store_dynamic": false,
      "type_field": "_type",
      "types": {
        "CUSTOMER": {
          "dynamic": false,
          "enabled": true,
          "properties": {
            "contact": {
              "dynamic": false,
              "enabled": true,
              "properties": {
                "value": {
                  "dynamic": false,
                  "enabled": true,
                  "fields": [
                    {
                      "include_in_all": true,
                      "include_term_vectors": true,
                      "index": true,
                      "name": "value",
                      "store": true,
                      "type": "text"
                    }
                  ]
                }
              }
            },
            "dob": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "include_in_all": true,
                  "include_term_vectors": true,
                  "index": true,
                  "name": "dob",
                  "store": true,
                  "type": "text"
                }
              ]
            },
            "idnumber": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "include_in_all": true,
                  "include_term_vectors": true,
                  "index": true,
                  "name": "idnumber",
                  "store": true,
                  "type": "text"
                }
              ]
            },
            "surname": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "include_in_all": true,
                  "include_term_vectors": true,
                  "index": true,
                  "name": "surname",
                  "store": true,
                  "type": "text"
                }
              ]
            }
          }
        }
      }
    },
    "store": {
      "indexType": "scorch",
      "kvStoreName": ""
    }
  },
  "sourceParams": {}
}`

Data count:

  1. Doc count in bucket: 103,374,998
  2. Count of indexed doc (type_ = CUSTOMER): 20.566.526

Source code for searching:

		MatchQuery nameFuzzy = SearchQuery.match(searchKeywords).fuzziness(1);
		MatchQuery nameSimple = SearchQuery.match(searchKeywords);
		DisjunctionQuery ftsQueryName = SearchQuery.disjuncts(nameFuzzy, nameSimple);

		AsyncBucket async = getAsyncDataBucket(message.getBucket());
		Observable<SearchQueryRow> res = async.query(new SearchQuery("customer", ftsQueryName)
				.serverSideTimeout(3, TimeUnit.MINUTES)
				.fields("dob", "idnumber", "surname", "contact.value").limit(100))
				.flatMap(AsyncSearchQueryResult::hits);

The search keywords consist of minimum 4 words.
Example of searchKeywords: "WONG MIU HIE 3202025505550101 +628123456789 19550515 "

Hi @hadi,

Two potential wrinkles strikes the eye at first are,

  1. Incorrect sizing.
  2. Inefficient querying.

Did you override the FTS RAM quota?
Cluster sizing seems like playing a role here.

Querying

Match queries for keywords with a fuzziness of 1 is already capable of handling exact matches as well as any of the incorrect/misspelled items in the query. So your additional conjunct clause for nameSimple looks irrelevant. (Or it is not yet clear why do you need it)

Since all your fields have _all option enabled, you could just query against the default field by omitting the field name in the query too.
More info on that part is here - Full-Text Search - 5 Tips To Improve Your Query Performance

If you are a licensed user, highly recommend to create a support ticket for further helps with the query rewrites and the sizing of the cluster.

Cheers!

Hi @sreeks,
Thank you for the reply.

Did you override the FTS RAM quota?

I’m not sure what you mean with FTS RAM quota. Is it the memory quota set for search service below ?
image


But you may need to configure or play around with more cpu cores/FTS RAM quota and index partitions to scale the performance of your cluster.
It may involve MDS enablement for FTS service too.

Thanks, I’ll look into it. Maybe you have suggestion guideline links about it ?


So your additional conjunct clause for nameSimple looks irrelevant. (Or it is not yet clear why do you need it)

The idea of using disjunct clause is taken from this blog.
I was hoping it could minimize false positive during fuzzy lookup.
But, thanks for your insight, will look into it if I can just use the fuzzy lookup without disjunct.


Since all your fields have _all option enabled, you could just query against the default field by omitting the field name in the query too.

Actually, the defined fields there, so it could return those fields in the search result.


If you are a licensed user, highly recommend to create a support ticket for further helps with the query rewrites and the sizing of the cluster.

Unfortunately, I’m not a licensed user :grinning:

Oh, some other informations:

  1. I have dedicated 2 query nodes, 2 data nodes, 2 index nodes, and 1 search node
  2. The search node have 30 cores of CPU and 32 GB of RAM

One more question, which nodes beside search node and what statistics that I need to look closely when playing around with FTS query ?

Thank you.

Hey @hadi,

=>Yes, It looks like you have 31.7GB FTS quota set. Try to bump the RAM available in the node, and based on the actual size of the index and try to bump the RAM quota.

=> Fuzzy queries themselves are pretty heavy in terms of memory and cpu requirements. So reducing the number of fuzzy subclauses helps. Also analyze whether for your application, the relevance ranking is that important as specified in that blog. Else reducing the number of clauses helps the performance.

=> If text relevancy isn’t affecting your results, then you could use the “score”:“none” option to skip the default scoring to speed up queries. Details were there in the earlier shared blog links.

=> FTS service entirely runs on its own nodes including its own indexes. So usually for debugging query performance, one needs to focus mainly on the FTS cluster.