Counting how many elements in an array in a document match the elements in a returned query

Hello everyone,
we are currently working on a project for university and are stuck at one query because we just can’t wrap our heads around how to do it as only a query without the help of python.

What we have is one collection of users that has one document for every user and looks like this:

{
  "user_id": "100004373",
  "followers_id": [
    "88314865",
    "139872488",
  ],
  "following_id": [
    "56932547",
    "246319857",
  ],
}

To get the 100 Accounts with the most followers we query like that:

select user_id from Tweets._default.new_accounts order by ARRAY_LENGTH(followers_id) desc limit 100

This gives us what we want, the next step now is to get the users that follow the most accounts found by the query before. This means we have to check the array of the following_id of every user in our collection and count how many of them match the ones returned by the query mentioned above.

Is this possible with only N1QL? Or do we have to use python to achieve this? We know how to do it with python but that would defeat the purpose of the database since we would have to save all the following_ids of all users in a list…

It would be great if someone could give us some hints or tell us how to get it done!

Thanks in advance
Flatric

Try:

CREATE INDEX ix_following ON Tweets._default.new_accounts(DISTINCT following_id);

WITH popular AS (
  SELECT RAW user_id FROM Tweets._default.new_accounts
  ORDER BY ARRAY_LENGTH(followers_id) desc limit 100)
SELECT user_id, ARRAY_LENGTH(pop_following) AS num_following
FROM Tweets._default.new_accounts
LET pop_following = ARRAY v FOR v IN following_id WHEN v IN popular END
WHERE ANY v IN following_id SATISFIES v IN popular END;

The WITH clause is your original query except for the RAW projection, since we want it to be an array of just user_ids (not an array of object). The WHERE clause selects the users that follows any of the most popular ids. The LET clause should give you the following_ids that falls in the top 100 group from each such user, and you can do various manipulations on that as you wish. The query above only does an array_length in the projection. If all you need is the count, you can also put the ARRAY_LENGTH part in the LET clause itself:

LET num_following = ARRAY_LENGTH(ARRAY v FOR v IN following_id WHEN v IN popular END)

and project num_following.

In addition to @bingjie.miao suggestion,

Also create the following index
CREATE INDEX ix1 ON Tweets._default.new_accounts(ARRAY_LENGTH(followers_id) DESC, user_id);
Add WHERE ARRAY_LENGTH(followers_id) IS NOT NULL to WITH clause subquery

Another option is to use ARRAY_INTERSECT in the LET clause:

WITH popular AS (
  SELECT RAW user_id FROM Tweets._default.new_accounts
  ORDER BY ARRAY_LENGTH(followers_id) desc limit 100)
SELECT user_id, ARRAY_LENGTH(pop_following) AS num_following
FROM Tweets._default.new_accounts
LET pop_following = ARRAY_INTERSECT(following_id, popular)
WHERE ANY v IN following_id SATISFIES v IN popular END;

Thanks a lot for your answers! The queries are way more complex than I imagined them to be and it took me a while to understand them but it showed me how much is possible with N1QL