Search by using the prefix of the ID

jessyang · July 10, 2019, 6:25pm

In our application, the value of the document key has format like the following:

<development environment ID>::<document type>::<UUID>

We have tried to use query like the following to get documents for a particular environment and document type like the following:

/////////////////////
select * from travel WHERE META().id LIKE “dev1::airline::%”
////////////////////

We want to find all airline documents for a given development environment dev1. We noticed that the above query will use the primary index and perform very slow.

Another option we have is we can create additional fields on the documents which can store the value of development environment, and create index on the document type and development environment, then create query based on these two fields. However, we prefer to create as few index as possible. Since the environment and document type are already part of the document key, i am just wondering whether there are other ways to query the information and avoid the primary index scan. Thanks

vsr1 · July 10, 2019, 10:03pm

If you are doing prefix search on document key with whole document or document key only then primary index is best. Even if use secondary index on different field it will be same.

Could you elaborate perform very slow. How many documents/size you are taking . If you need whole document you can project META().id only and use direct KV get from your SDKs to avoid two hop data.

select META().id from travel WHERE META().id LIKE “dev1::airline::%”

jessyang · July 11, 2019, 6:02pm

The performance are similar for smaller tenant. However, we have large tenant with over 2 million documents. The query similar to the following takes longer then 4 min to finish, then timout:

/////////////////////
select * from travel WHERE META().id LIKE “dev1::airline::%”
////////////////////

If we use a secondary index on the new fields, the query is able to finish.

vsr1 · July 11, 2019, 6:08pm

In above use case Primary index used as secondary index. Please Post EXPLAIN and indexes definitions of both. Also you can use query profiling and see where it is taking time

jessyang · July 11, 2019, 6:13pm

correction, the tenant has 4 million documents.

With this amount of documents, do you still expect the query have similar performance in both scenarios?

vsr1 · July 11, 2019, 6:19pm

Performance defends On How many documents your query qualifies. A based on query/index total number of documents on the bucket should not matter.

jessyang · July 11, 2019, 6:21pm

Not sure exactly what you mean. Do you mean the performance depend on how many documents will be returned from the query?

vsr1 · July 11, 2019, 6:24pm

Yes. Example: Your bucket has 10million documents.
if there is only 1 document that starts with “dev1::airline::” it will be ms
If there is 1M then it may take secs/minutes based on size

jessyang · July 11, 2019, 6:42pm

regarding the following:

////////
if there is only 1 document that starts with “dev1::airline::” it will be ms
If there is 1M then it may take secs/minutes based on size
///////////

does the above apply to the query which uses secondary index as well? or when there are lots of documents to return (e.g. 0.5 M or 1M), the query uses secondary index will perform better?

vsr1 · July 11, 2019, 7:24pm

Dpeneds on what the index is. Checkout Topics on Indexing and Query in https://blog.couchbase.com/n1ql-practical-guide-second-edition/

smithclarkson01 · August 1, 2019, 11:54pm

had the same issue , I’m looking to view it on the web, but thank you.
9Apps VidMate app

barryallen1337 · September 4, 2019, 6:53pm

Each document has an _id that uniquely identifies it, which is indexed so that documents can be looked up either with the GET API or the ids query.

The value of the _id field is accessible in certain queries (term, terms, match, query_string, simple_query_string).

Example documents

curl -X PUT “localhost:9200/my_index/_doc/1?pretty” -H ‘Content-Type: application/json’ -d’
{
“text”: “Document with ID 1”
}
’
curl -X PUT “localhost:9200/my_index/_doc/2&refresh=true?pretty” -H ‘Content-Type: application/json’ -d’
{
“text”: “Document with ID 2”
}
’
curl -X GET “localhost:9200/my_index/_search?pretty” -H ‘Content-Type: application/json’ -d’
{
“query”: {
“terms”: {
“_id”: [ “1”, “2” ]
}
}
}
’