Performance Improvement for FTS

dishawagle · March 5, 2021, 3:04pm

I am using the Python SDK 3 to look up a text.

prefix_query = PrefixQuery(search_string)
result = cluster.search_query(index_name, string_query, limit=limit, fields=[search_field])

The FTS index defines the type mappings, with stored=true.
The result returns no fields.

Is there an alternate way to optimize the FTS index definition or avoid data retrieval based on the document IDs returned by the search query.

abhinav · March 5, 2021, 7:41pm

I’m not sure I follow your question here.

Do you need fields to be returned along side the document IDs or no?
Have you verified the query does return document hits?

If you want fields’ content for every document ID to be included within your search response, "store"ing the child fields in your index definition and asking for them while making the search request is the way to go.

dishawagle · March 8, 2021, 10:56am

@abhinav the document IDs are being returned but the fields being asked for (drug_name in this case) are not being returned. Here is a glimpse of the index definition.

  "default_type": "_default",
  "docvalues_dynamic": true,
  "index_dynamic": true,
  "store_dynamic": false,
  "type_field": "_type",
  "types": {
    "drug": {
      "dynamic": false,
      "enabled": true,
      "properties": {
        "brand_generic": {
          "dynamic": false,
          "enabled": true,
          "fields": [
            {
              "include_in_all": true,
              "include_term_vectors": true,
              "index": true,
              "name": "brand_generic",
              "store": true,
              "type": "text"
            }
          ]
        },
        "drug_class": {
          "dynamic": false,
          "enabled": true,
          "fields": [
            {
              "include_in_all": true,
              "include_term_vectors": true,
              "index": true,
              "name": "drug_class",
              "store": true,
              "type": "text"
            }
          ]
        },
        "drug_name": {
          "dynamic": false,
          "enabled": true,
          "fields": [
            {
              "analyzer": "keyword",
              "include_in_all": true,
              "include_term_vectors": true,
              "index": true,
              "name": "drug_name",
              "store": true,
              "type": "text"
            }
          ]
        },

abhinav · March 9, 2021, 10:53pm

The definition looks good. Let’s check if the issue is with how you’re using the SDK.
Would you try this query (as a curl/HTTP request) and check how the search response looks …

curl -XPOST -H "Content-type:application/json" -u <username>:<password>
http://<ip>:8094/api/index/<index_name>/query -d
'{"query": {"query": "<string_query>"}, "fields": ["drug_name"]}'

dishawagle · March 10, 2021, 1:33pm

I am getting the fields using a curl request as well as Python SDK 2.
The python SDK 3 is not returning the fields.
This is the Python SDK 3 code.

cluster = Cluster(<url>, ClusterOptions(PasswordAuthenticator(<username>, <password>)))
cb = cluster.bucket(<bucket>)
prefix_query = PrefixQuery('TYL')
result = cluster.search_query(<index_name>, prefix_query, limit = 10, fields = ['drug_name'])

abhinav · March 10, 2021, 6:37pm

Hey @daschl would you be able to assist @dishawagle here.

Things seem to not work as expected with python SDK 3.
I’m not very familiar with the syntax to retrieve fields along side document IDs in a search response.

jcasey · May 19, 2021, 3:11am

Hi @dishawagle – what version of the Python SDK are you using? Also, how are you trying to access the fields? You need to first kick off the iterator on the result by result.rows() before you will start to retrieve values. From there, you should have a fields dict on each row.