SELECT a.userId
FROM (
SELECT userId, ARRAY_AGG(action) AS action_list FROM user_logs GROUP BY userId
) a
WHERE ARRAY_LENGTH(["login","purchase","logout"]) <= ARRAY_LENGTH(a.action_list)
AND "login" NOT IN ARRAY_SYMDIFF(["login","purchase","logout"],a.action_list)
AND "purchase" NOT IN ARRAY_SYMDIFF(["login","purchase","logout"],a.action_list)
AND "logout" NOT IN ARRAY_SYMDIFF(["login","purchase","logout"],a.action_list)
In that situation INTERSECT will be better. If that is not ideal more subqueries will work.
Also you can try this and see which option works and performs better.
SELECT userId FROM user_logs
LET login = (SELECT RAW userId FROM user_logs WHERE action = "login")
LET logout= (SELECT RAW userId FROM user_logs WHERE action = "logout")
WHERE action = "purchase" AND ARRAY_LENGTH(ARRAY_INTERSECT([userId],login,logout)) > 1;
SELECT userId FROM user_logs
WHERE action = "purchase" AND ARRAY_LENGTH(ARRAY_INTERSECT([userId], (SELECT RAW userId FROM user_logs WHERE action = "login"), (SELECT RAW userId FROM user_logs WHERE action = "logout"))) > 1;
For the sake of simplicity (since I’m generating dynamic queries using the DSL) I decided to use INTERSECT. One last question though. Please look at the following query:
SELECT userId
FROM user_logs
WHERE action = "login"
INTERSECT
SELECT userId
FROM user_logs
WHERE action = "purchase"
ORDER BY userId
LIMIT 50
OFFSET 0;
In this case to which query does ORDER BY, LIMIT and OFFSET belong to? Actually I want it to be part of the whole query so should I use parantheses?