Search by using the prefix of the ID

In our application, the value of the document key has format like the following:

<development environment ID>::<document type>::<UUID>

We have tried to use query like the following to get documents for a particular environment and document type like the following:

/////////////////////
select * from travel WHERE META().id LIKE “dev1::airline::%”
////////////////////

We want to find all airline documents for a given development environment dev1. We noticed that the above query will use the primary index and perform very slow.

Another option we have is we can create additional fields on the documents which can store the value of development environment, and create index on the document type and development environment, then create query based on these two fields. However, we prefer to create as few index as possible. Since the environment and document type are already part of the document key, i am just wondering whether there are other ways to query the information and avoid the primary index scan. Thanks

If you are doing prefix search on document key with whole document or document key only then primary index is best. Even if use secondary index on different field it will be same.

Could you elaborate perform very slow. How many documents/size you are taking . If you need whole document you can project META().id only and use direct KV get from your SDKs to avoid two hop data.

select META().id from travel WHERE META().id LIKE “dev1::airline::%”

The performance are similar for smaller tenant. However, we have large tenant with over 2 million documents. The query similar to the following takes longer then 4 min to finish, then timout:

/////////////////////
select * from travel WHERE META().id LIKE “dev1::airline::%”
////////////////////

If we use a secondary index on the new fields, the query is able to finish.

In above use case Primary index used as secondary index. Please Post EXPLAIN and indexes definitions of both. Also you can use query profiling and see where it is taking time

correction, the tenant has 4 million documents.

With this amount of documents, do you still expect the query have similar performance in both scenarios?

Performance defends On How many documents your query qualifies. A based on query/index total number of documents on the bucket should not matter.

Not sure exactly what you mean. Do you mean the performance depend on how many documents will be returned from the query?

Yes. Example: Your bucket has 10million documents.
if there is only 1 document that starts with “dev1::airline::” it will be ms
If there is 1M then it may take secs/minutes based on size

regarding the following:

////////
if there is only 1 document that starts with “dev1::airline::” it will be ms
If there is 1M then it may take secs/minutes based on size
///////////

does the above apply to the query which uses secondary index as well? or when there are lots of documents to return (e.g. 0.5 M or 1M), the query uses secondary index will perform better?

Dpeneds on what the index is. Checkout Topics on Indexing and Query in https://blog.couchbase.com/n1ql-practical-guide-second-edition/

had the same issue , I’m looking to view it on the web, but thank you.
9Apps VidMate app

Each document has an _id that uniquely identifies it, which is indexed so that documents can be looked up either with the GET API or the ids query.

The value of the _id field is accessible in certain queries (term, terms, match, query_string, simple_query_string).

Example documents

curl -X PUT “localhost:9200/my_index/_doc/1?pretty” -H ‘Content-Type: application/json’ -d’
{
“text”: “Document with ID 1”
}

curl -X PUT “localhost:9200/my_index/_doc/2&refresh=true?pretty” -H ‘Content-Type: application/json’ -d’
{
“text”: “Document with ID 2”
}

curl -X GET “localhost:9200/my_index/_search?pretty” -H ‘Content-Type: application/json’ -d’
{
“query”: {
“terms”: {
“_id”: [ “1”, “2” ]
}
}
}