Query takes 75 seconds when using rest api/sdk, but 20ms in cbq

Could you please post your index definition. Also run the following command in cbq and see if it uses index. Change bucket name and index name.

     explain select * from default where form=$1;
     explain select * from default use index (ix2) where form=$1;

I have conditional indexes for form value.

create index idx_example1_form on bucket(form) where form='example1' using GSI;
create index idx_example2_form on bucket(form) where form='example2' using GSI;
create index idx_example3_form on bucket(form) where form='example3' using GSI;

I’m now suspecting that my index definition might not be correct, but it still does not explain why I see IndexScan in cbq.

Both of these take PrimaryScan

 explain select * from default where form=$1;
 explain select * from default use index (ix2) where form=$1;

Update: created a new index without where clause and everything works as expected now. Is this considered a bug or am I understanding how index works incorrectly?

Can you create the following index:

create index idx_form on bucket(form);

And then run EXPLAIN for the parametrized query.

If you have conditional index to choose that index, the exact index condition needs be present in query where clause. This is expected.

Could you please provide more details on “but it still does not explain why I see IndexScan in cbq”

@geraldss I’ve updated my thread. It works if I create an index without where clause.

@vsr1

When I run the exact query in cbq without positional parameters, it takes index without specifying index condition. It looks like to me that it is a bug where rest api endpoint does not parse the positional parameters before sending the query to the query service.

Hi @moon0326, so everything works correctly as intended. If you use a parametrized query, we cannot choose a conditional / partial index, because the query plan must work with any value of the parameter.

So you have two options:

(1) Use a conditional index and fully specify the query, without parameters for the index keys.

(2) Use a full index, and you can use a parametrized query for the index keys.

For these indexes, if you ha query like:

Select * from bucket where form=‘example1’; – this chooses the index idx_example1_form because we know the value of the predicate.

Select * from bucket where form= $1; – this cannot choose any of the index during plan creation since your value could be ANYTHING. The argument is only used to bind the predicate at runtime and not choose the index.

Separately, your actual definitions is likely different. In this example, you have both index key and the index condition as ‘form’, which is not highly useful. If you do actually have distinct keys, you can use the condition in the statement you’re preparing.

Example:
create index idx_example1_form on bucket(quantity) where form=‘example1’ using GSI;

Select * from bucket where form=‘example1’ and quantity=$1; – this will use the right index.

I don’t think I can comment on it further as I don’t fully understand how query planner works. I think this use case should be mentioned in the documentation though. It makes sense from the technical point of view, but it really does not make sense as a enduser. Thank you for the help. I appreciate it.

couldn’t you bind the parameters first then run the query plan? That makes more sense to me with my limited knowledge on the query planner.

The only reason I added where clause to my index definition was to prevent delay caused by indexer.

This is my use case.

  1. Create a document with form=‘example1’
  2. Query select * from bucket where form=‘example2’ with REQUEST_PLUS.

I thought that the query #2 has to wait for #1 as both of them share the same index. Am I understanding it correctly?

@geraldss sorry for bothering you again. Am I understanding how this index works correctly with my use case?

When you issue a query with strong consistency of REQUEST_PLUS, the indexer will ensure it catches up to the changes made to the bucket up to the request point before the results are returned.

The create indices with WHERE condition will only help you with managing indexes (like index range partitioning in Oracle) and not with speeding up REQUEST_PLUS latency.

But if I create two indexes for ‘form’ separately, aren’t they technically two indexes?

How does that affect my query on #2 when index of #1 is being used? I thought REQUEST_PLUS only waits for the index that must be synced to ensure consistency.

Hi @moon0326, all the feedback you are getting here is spot on.

One more thing in response to the question you asked me: for your case, you should only use parametrized queries together with prepared statements. If you are sending the full query statement each, there is no advantage to using parameters, and there are the disadvantages you encountered.

And so, please avoid parametrized queries, and then you can use any indexing strategy you choose.

Thank you @geraldss

Probably I don’t understand how parameterized query works in Couchbase. However, parameterized query is considered as best practice in almost all cases in a system where a query language is available. Even Couchbase’s documentation mentions about it at http://developer.couchbase.com/documentation/server/4.1/developer-guide/querying.html

“The approach above is both unwieldy and insecure (subject to SQL injection attacks).”

Isn’t this one of the points of using parameterized query in the first place?

Good points :slight_smile:

You should not take user input and directly inject it into query strings. When you use parameters, the planner does not use the parameter values in generating the query plan. We will consider that for the future.

Also, N1QL generally uses a single index per query (with some exceptions). With that in mind, one good option for you would be to not use conditional indexes, or not use parameters, and instead provide your users with a select list of inputs, so that the users are not injecting arbitrary input into your queries.

Both good options. Let us know if you run into other issues. As for this particular issue, there are no additional workarounds for now.

1 Like

Thank you for the suggestions. It’s pretty tricky in that case then…

I’m not quite sure about this yet.

  1. Create a document with form =‘example1’
  2. Query select * from bucket where form=‘example2’ with REQUEST_PLUS

Does the #2 query has to wait for the #1?
What if I have two separate queries with where clause for each form? Does #2 query still wait for #1?

Given that you know the possible values of “form” upfront (in order to create the conditional indexes), you should put those values in a selection list and then avoid parametrized queries. This solves all the issues.

That will work for us for this particular use case for now. Thank you!

So…can I assume that #2 query has to wait #1 query if I don’t use conditional indexes (sorry no one said yes to it yet…)?

Thank you! have a good one!

It’s a bit more complicated than that. In 4.1, the answer is yes. In 4.5, there is also a read-your-own-writes setting, which means query 2 would only have to wait for query 1 if they were issued from the same client session. Anyway, separate subject.

We will be on 4.1 for while until 4.5 gets out.

Thank you for all your help today.

1 Like