Shorter intersect query possible?

Hi,

Let’s say that I’ve the following query:

SELECT DISTINCT userId
FROM user_logs
WHERE action IN ["login", "purchase"];

How can I change this query to ensure I only get userId’s who have done both actions? I think I could this with below query:

SELECT userId
FROM user_logs
WHERE action = "login"
INTERSECT
SELECT userId
FROM user_logs
WHERE action = "purchase";

However this is not ideal for my case for various reasons. Is there another way to achieve this?

How about this N1QL

SELECT a.userId
  FROM user_logs a
 WHERE a.action = "login"
   AND META(a).id IN (
          SELECT RAW META(b).id
            FROM user_logs b
           WHERE b.action = "purchase"
    )

SELECT userId FROM user_logs WHERE action = "purchase" AND userId IN (SELECT RAW userId FROM user_logs WHERE action = "login");

What if the number of actions increase? More subqueries?

If you are using CB 4.6+,may be you can use ARRAY_SYMDIFF function ,try this

SELECT a.userId
  FROM (
       SELECT userId, ARRAY_AGG(action) AS action_list  FROM user_logs GROUP BY userId  
       ) a
WHERE ARRAY_LENGTH(["login","purchase","logout"]) <= ARRAY_LENGTH(a.action_list) 
  AND "login" NOT IN ARRAY_SYMDIFF(["login","purchase","logout"],a.action_list)
  AND "purchase" NOT IN ARRAY_SYMDIFF(["login","purchase","logout"],a.action_list)
  AND "logout" NOT IN ARRAY_SYMDIFF(["login","purchase","logout"],a.action_list)
 

Unfortunetely I don’t.

maybe @vsr1 can give some suggestion.

In that situation INTERSECT will be better. If that is not ideal more subqueries will work.
Also you can try this and see which option works and performs better.

SELECT userId FROM user_logs 
LET login = (SELECT RAW userId FROM user_logs WHERE action = "login")
LET logout= (SELECT RAW userId FROM user_logs WHERE action = "logout")
WHERE action = "purchase" AND ARRAY_LENGTH(ARRAY_INTERSECT([userId],login,logout)) > 1;

SELECT userId FROM user_logs 
WHERE action = "purchase" AND ARRAY_LENGTH(ARRAY_INTERSECT([userId], (SELECT RAW userId FROM user_logs WHERE action = "login"), (SELECT RAW userId FROM user_logs WHERE action = "logout"))) > 1;
1 Like

For the sake of simplicity (since I’m generating dynamic queries using the DSL) I decided to use INTERSECT. One last question though. Please look at the following query:

SELECT userId
FROM user_logs
WHERE action = "login"
INTERSECT
SELECT userId
FROM user_logs
WHERE action = "purchase"
ORDER BY userId
LIMIT 50
OFFSET 0;

In this case to which query does ORDER BY, LIMIT and OFFSET belong to? Actually I want it to be part of the whole query so should I use parantheses?

ORDER BY, LIMIT, OFFSET apply to the final outer query, unless you use parentheses to restrict them to one subquery.

1 Like