I want to translate query 22 of the TPC-H benchmark to N1QL. The query involves two buckets named “customer” and “orders”; the former containing 150,000 documents and the latter 1,500,000. My first approach to write it is shown below:
select cntrycode, count(*) as numcust, sum(c_acctbal) as totacctbal from ( select substr(c1.c_phone, 0, 2) as cntrycode, c1.c_acctbal from customer c1 where substr(c1.c_phone, 0, 2) in ['30', '17', '25', '10', '22', '15', '21'] and c1.c_acctbal > ( select raw avg(c2.c_acctbal) from customer c2 where c2.c_acctbal > 0.00 and substr(c2.c_phone, 0, 2) in ['30', '17', '25', '10', '22', '15', '21'] ) and not exists ( select * from orders o use keys meta(c1).id where o.o_custkey = c1.id ) ) as custsale group by cntrycode order by cntrycode;
The problem is that the correlated sub-query in “exists” always returns an empty set. I have tried to isolate the problem by:
- Trying to count the number of customers without orders with the query below, but just like in the main query it results in an empty set.
select count(*) from customer c1 where not exists ( select o.o_custkey from orders o use keys meta(c1).id where c1.id = o.o_custkey )
- Writing the query above in a different way , but the length is always 0, which (again) causes the query to count all customers
select count(*) from customer c1 let corders = ( select o.o_custkey from orders o use keys meta(c1).id where o.o_custkey = c1.id ) where ARRAY_LENGTH(corders) = 0
- I checked to see if something about the data was off by finding the customers with at least 1 order (see the query below), but the result is 99996; which is correct as I know beforehand there are 50004 customers without orders: 150,000 - 99,996 = 50,004
select count(distinct c.id) from customer c join orders o on c.id = o.o_custkey where c.id = o.o_custkey
I have seen the “id” field in couchbase to be usually defined as a string, but mine is an integer (in TPC-H, the primary key of “customer” and “orders” are the fields “c_custkey” and “o_orderkey” respectively, but I simply reference them as “id” in their respective couchbase bucket ). May this be affecting USE KEYS somehow?
Could you help me understand what the problem is?
(NOTE: I am using Couchbase Server 6.0.0 Community)