N1QL query time out for large data array from two buckets

I already posted question here but I have some issues. I have two buckets “temp” and “temp1” in “temp1” buckets I have thousands of documents and in each documents I have structure as like below

{
   "data":["ID","ID1","ID2",...."ID10000"]
},
{
   "data":["ID","ID1","ID2",...."ID10000"]
},
{
   "data":["ID","ID1","ID2",...."ID10000"]
}

This “data” array contains ten thousand of ids stored and that ids in string format. And same way I have data in “temp” buckets as below

{
  "user_ID":"A1",
   "data":["ID","ID1","ID2",...."ID10000"]
},
{
  "user_ID":"A2",
   "data":["ID","ID1","ID2",...."ID10000"]
},
{
  "user_ID":"A3",
   "data":["ID","ID1","ID2",...."ID10000"]
}

and I want to check if any of “ID” from “temp1.data” present in “temp.data” then return matched documents “user_ID” from “temp” buckets so my expected output as below

["A1","A2",...] etc.

This two buckets don’t have any relation ship between them. If I run query mentioned in previous question like this

SELECT ARRAY_INTERSECT(t.data, t1.data) FROM temp t UNNEST (SELECT RAW temp1 FROM temp1) AS t1;

but it takes too much time to execute query and after some minutes it show query time out exception.

Hi @yogesh_0586, Try in cbq shell

SELECT t.user_ID FROM temp1 t UNNEST (SELECT RAW temp FROM temp) AS t1
WHERE ANY v IN t1.data SATISFIES v IN t.data END;

I used given query on cbq shell and still it takes too much time to execute ( more than 20 minutes ) and no output it still it in execution mode I mannually stop couch base server and my cbq shell look like this

cbq> SELECT t.user_ID FROM temp1 t UNNEST (SELECT RAW temp FROM temp) AS t1 WHERE ANY v IN t1.data SATISFIES v IN t.data END;

here on both buckets temp and temp1 having only primary index and both buckets RAM Quota is 300MB

This query has lot of loops and takes time. You mentioned you have 10,000 on each bucket and 10,000 each array. This result in 10,000 *10,000 Cartesian Join and it for matching each document it needs to process 10,000 *10,000 Array iterations.

May be you should split query into multiple and try in the loop.
SELECT META().id FROM temp1;
Repeat the following query each META().id

SELECT t.user_ID FROM temp1 t USE KEYS UNNEST (SELECT RAW temp FROM temp) AS t1
WHERE ANY v IN t1.data SATISFIES v IN t.data END;

If you run one meta()id you know approximate time then you will have idea how long it will take.

Instead of pushing data in “temp1.data” if document inserted like this

{
"data": "id1"
},
{
"data": "id2"
},
{
"data": "id3"
}  

and create index on data in temp1.data and use following query

SELECT t.user_ID FROM temp t WHERE ANY v IN (select raw temp1.data from temp1) SATISFIES v IN t.data END;

is this query performance better than as compare to previous buckets structure, I am open to modified “temp1” bucket structure.

In 5.5.0 (currently beta) we have ANSI join https://blog.couchbase.com/ansi-join-support-n1ql/. It should perform better. You can try out