Search with a Disjunt Query is slow


#1

Hi all,
i have a bucket with a 205,665 documents.
The document structure is as follows:

  • name --> string
  • uuid --> string
  • readers --> array of string

if i search with a simple query:

“query”: {
“query”: “text”
}
the took is (on average) 0.005 ( is perfect).

but i want to filter the result with the readers, and i use this query:

{
“explain”: true,
“fields”: [
“*”
],
“highlight”: {},
“query”: {
“conjuncts” :[
{
“query”: “text”
},
{
“disjuncts”:[
{“field”:“readers”, “match”: “reader_1”},
{“field”:“readers”, “match”: “reader_2”},
{“field”:“readers”, “match”: “reader_3”},
{“field”:“readers”, “match”: “reader_4”},
{“field”:“readers”, “match”: “reader_5”},
{“field”:“readers”, “match”: “reader_6”},
{“field”:“readers”, “match”: “reader_7”},
{“field”:“readers”, “match”: “reader_8”},
{“field”:“readers”, “match”: “reader_9”},
{“field”:“readers”, “match”: “reader_10”},
{“field”:“readers”, “match”: “reader_11”},
{“field”:“readers”, “match”: “reader_12”},
{“field”:“readers”, “match”: “reader_13”},
{“field”:“readers”, “match”: “reader_14”},
{“field”:“readers”, “match”: “reader_15”},
{“field”:“readers”, “match”: “reader_16”},
{“field”:“readers”, “match”: “reader_17”},
{“field”:“readers”, “match”: “reader_18”},
{“field”:“readers”, “match”: “reader_19”},
{“field”:“readers”, “match”: “reader_20”},
{“field”:“readers”, “match”: “reader_21”},
{“field”:“readers”, “match”: “reader_22”},
{“field”:“readers”, “match”: “reader_23”},
{“field”:“readers”, “match”: “reader_24”},
]
}
]
}
}

and the took is (on average) 0.89.

I think that the different is very high, also because the number of documents, in production, will become greater than 10,000,000.

I’m missing something?
it is possible to create a query with a low took?

Thanks
J


#2

@jempis02 Thank you for raising this concern. A few questions for you …

  • Did you create a specific index, or is it just a default index?
  • What analyzer are you using to build the index?
  • If you share your index definition here, I’ll have the answers to my previous 2 questions.

In not using a wild card query for your use case, I suppose you have perhaps already reduced the number of term searchers for your query request.

If we can fine tune your index (if you haven’t done that already i.e) - we could look at some additional savings. What I mean here is: when you search for term “text” - do you need to look across all fields, or only a specific field. If it is only a specific field, you should include that mapping in your index definition.


#3

I @abhinav

thank for the response.
I use a specific index and not an analyzer.
I use the wildcard too, and the scope of the disjuncts query is for filter the response, depending on the user executing the query.
I’ll explain:
My bucket is the list of files (‘name’) in a folder with the relative permissions (‘readers’).
When a user search a file, the disjuncts checks if at least one user permissions are present within the reader array.

I hope I explained myself.

the index is shown below:

{
 "name": "test_index",
 "type": "fulltext-index",
 "params": {
  "doc_config": {
   "docid_prefix_delim": "",
   "docid_regexp": "",
   "mode": "type_field",
   "type_field": "filename"
  },
  "mapping": {
   "default_analyzer": "standard",
   "default_datetime_parser": "dateTimeOptional",
   "default_field": "_all",
   "default_mapping": {
    "default_analyzer": "",
    "dynamic": true,
    "enabled": true,
    "properties": {
     "ext": {
      "enabled": true,
      "dynamic": false,
      "fields": [
       {
        "include_in_all": true,
        "include_term_vectors": true,
        "index": true,
        "name": "ext",
        "type": "text"
       }
      ]
     },
     "name": {
      "enabled": true,
      "dynamic": false,
      "fields": [
       {
        "include_in_all": true,
        "include_term_vectors": true,
        "index": true,
        "name": "name",
        "type": "text"
       }
      ]
     },
     "path": {
      "enabled": true,
      "dynamic": false,
      "fields": [
       {
        "include_in_all": true,
        "include_term_vectors": true,
        "index": true,
        "name": "path",
        "type": "text"
       }
      ]
     },
     "readers": {
      "enabled": true,
      "dynamic": false,
      "fields": [
       {
        "include_term_vectors": true,
        "index": true,
        "name": "readers",
        "type": "text"
       }
      ]
     }
    }
   },
   "default_type": "_default",
   "docvalues_dynamic": true,
   "index_dynamic": true,
   "store_dynamic": false,
   "type_field": "_type"
  },
  "store": {
   "indexType": "upside_down",
   "kvStoreName": "mossStore"
  }
 },
 "sourceType": "couchbase",
 "sourceName": "files-ele",
 "sourceUUID": "161267103d57a3fd63e2ca7a4d11e4a7",
 "sourceParams": {},
 "planParams": {
  "maxPartitionsPerPIndex": 171,
  "numReplicas": 0
 },
 "uuid": "54d52e766588941c"
}

Thanks
jempis


#4

@jempis02 Cool. So I guess one bit that I was asking about was - within the query when you search for the file/text could you make do with searching over a single field. For example if the “name” field carries all the information you’ll need, then you’re query could look like:

{
"query": {
"conjuncts" :[
{
"term": "text",
"field": "name"
},
{
"disjuncts":[
{"field":"readers", "match": "reader_1"},
{"field":"readers", "match": "reader_2"},
...
{"field":"readers", "match": "reader_24"},
]
}
]
}
}

Also, what release of couchbase are you using? I see that you’re using upside_down/moss. We have a new version available with 6.0 which we’ve named scorch. We’ve noted pretty good performance improvements with the new index type especially when it comes to latency and throughput for compound queries such as the ones you’re using.


#5

@abhinav perfect.
with your new query the request is faster than the querystring.
But I decided to use the querystring query because i want a dynamically search, google like.
I want, for example, add new fields, like the extension or filepath, and i want to use, for example, Field Scoping (like “ext:.pdf”).
Finally : I want to use a search with all possibility query (math query, Field Scoping, Required, optional, numeric, etc…), then i suppose that your specific query is not suitable for my project.

That’s why I want to understand how I can improve and speed up my query, is it possible?

I use the release 5.5.2, but now i download the new version (6.0)

Thanks for now
jempis