N1QL EXCEPT and NOT EXISTS Query

naveen.ecpesit · September 2, 2016, 6:02am

Hi Team,

I am having two types of document in a bucket, namely representing Vehicle and Person. I am trying to write a N1QL query, to fetch a list of Vehicles which are NOT owned by any person. Following is the document structure:

Vehicle Document:

{
type: ‘vehicle’,
id: <<>>–>vehicle number,
…
}

Person Document:

{
type: ‘person’,
id: <<>> -->person id,
vehNum:<<>> --> Vehicle Number from above document,
…

}

I was able to achieve the same using EXCEPT query as:

SELECT id FROM bucket AS vehicle WHERE type='vehicle’
EXCEPT ALL
SELECT vehNum as id FROM bucket AS person WHERE type=‘person’

Is this an efficient query? or is there an option to achieve by using NOT IN clause?

NOTE: I tried using the NOT IN clause with a subquery, but seems like i am not getting the syntax right with USE KEYS. Any help here would be good.

manusyone · September 2, 2016, 7:46am

Hi @naveen.ecpesit. This could be a possible query using NOT IN.

SELECT v.id 
FROM bucket v 
WHERE type="Vehicle" 
AND v.id NOT IN ARRAY p.vehNum FOR r IN (
  SELECT p.vehNum
  FROM bucket p 
  WHERE p.docType="Person"
) END;

Regarding performance, the Couchbase team would be better suited to talk about it, but they have kind of absent in the last days…

naveen.ecpesit · September 2, 2016, 4:54pm

thanks @manusyone for the above response.

I tried running the suggested NOT IN query. For some reason, it seems to be taking too long to respond for my data set (Around 150K documents in bucket). The EXCEPT query gave results in 22 sec, while the NOT IN query did not return any results for 100 sec.

geraldss · September 11, 2016, 6:55pm

EXCEPT ALL seems good to me. You need two indexes, one on id and one on vehNum.

Hi @manusyone, You can omit the ARRAY:

SELECT v.id 
FROM bucket v 
WHERE type="Vehicle" 
AND v.id NOT IN (
  SELECT RAW p.vehNum
  FROM bucket p 
  WHERE p.docType="Person"
);

naveen.ecpesit · September 22, 2016, 9:28pm

Thanks @geraldss for the reply.

alperkanat · February 2, 2017, 7:15pm

Hi @geraldss,

Is it possible to create an index for this query? I’ve such a query where I query few keys then do xxId NOT IN

I tried creating 2 indexes. One for the subquery and another for the outer one. So would be something like below:

create index subq_idx on bucket(docType, vehNum);
create index outer_idx on bucket(type, id);

Would this work?

geraldss · February 2, 2017, 7:23pm

Hi @alperkanat, can you post your exact query here.

alperkanat · February 2, 2017, 7:39pm

Hi,

This is an example to my query:

SELECT DISTINCT e.deviceId
FROM mapp AS e
WHERE e._class = "com.foo.bar.MobileEntity"
    AND appId = "myapp"
    AND e.deviceId NOT IN (
        SELECT RAW sub_924889182.deviceId
        FROM mapp AS sub_924889182
        WHERE sub_924889182._class = "com.foo.bar.MobileEntity"
            AND sub_924889182.appId = "myapp"
            AND sub_924889182.eventName = "opened_app"
    )
LIMIT 2500
OFFSET 0

I’ve the following index for the inner query:

CREATE INDEX `eventName_class_appId_idx` ON `mapp`(`eventName`,`_class`,`appId`)

So inner query is executed fast. However I couldn’t think of a proper way of creating an index for the outer one.

geraldss · February 2, 2017, 8:49pm

For the outer query, you can index on _class, appId, and deviceId.

alperkanat · February 3, 2017, 6:29am

The best I could get with that index is roughly around 1 sec… And I only have around 40k docs in the bucket. Is it normal?

geraldss · February 3, 2017, 2:05pm

How many device ids does the inner subquery produce?

alperkanat · February 3, 2017, 2:46pm

Actually it’s just 2

geraldss · February 3, 2017, 3:39pm

The EXCEPT might be faster for you. Try both the EXCEPT and the NOT IN. In both cases, make sure you have only covering indexes. After you try that out, let us know the results here.