Performance difference with subquery versus actual data

I am writing this subquery where I would like to obtain data from one bucket and using that data get a count on other data from a different bucket. Note the query below:

SELECT count(1) from BE
WHERE sN IN (select raw (t.atten.ar)
from BR t
where t.atten.ir = “AB” );

The inner query is giving me ~15000 different values, which in turn is giving me a count of about 45000. The problem is this query is not performing well at all, it is taking way too long in running. I haven’t seen it produce an output because of the time it is taking.

SO I tried running the queries individually. I took the output from the inner query which looks something like this:

[“123”, “112”, “101”, … ] (imagine 15k elements in this list) and

I took the list above and put it in the query above in place of the subquery like this below:

SELECT count(1) from BE
WHERE sN IN [“123”, “112”, “101”, … ];

And in turn I am getting a result back in 1 second. Where as in the above I am not able to even get a result it is taking so long. Why is this the case? When I limit the data above I see that it uses the same index as well, so I dont think it is a concern of indexes? Any help?

Thanks in advance!

Query Plan is generated during prepare time. At that time no results from subquery available. The plan should work all possible values of the subquery. As they no subquery values nothing is pushed to index scan (Full Indexscan is done) and IN clause applied in the query that why it taking time.

You should run as two separate queries

  • Run first query
  • once results are there from second query and run.

OR Use JOIN query

Can you provide an example for the join?

I am trying :

SELECT count(1)
FROM BE AS cO
JOIN BR as p
ON cO.sN = p.atten.ar
WHERE p.atten.ir= “AB”;

This is still very slow, It has not finished running since before I started typing this response out.

Thanks!

Switch the JOIN. JOINs are LEFT to RIGHT, More predicates elimination on left side will perform better.

SELECT count(1)
FROM  `BE`  AS p
JOIN  `BR`  as cO
ON cO.sN = p.atten.ar
WHERE p.atten.ir= “AB”;

So this join takes about a minute, the result count should be 46128 when I run the two queries separately and cut and paste the list I get this and it takes about 2 seconds.

The join is returning 92073 as the value. Any reason why it would not match?

based on values and duplicates.

Okay thanks, So is couchbase working on a better subquery or join implementation? To mirror, or near mirror, the performance of traditional sql?