Incorect count number

flaviu · January 24, 2022, 5:24pm

Hi,

I am running two queries one looking for a key in a json document and the other one looking for the document with same key missing

So, basically I am covering both options having this key or not having it.

Now, If I do another query without looking the presence of this key in the json the result should be the sum of the previous two queries, but unfortunatelly I have a different number.

These are the queries:

select count(*) from `bucket` where sub_type = "app-extra" and i_genre_id is  missing or null

number of results: 172520

select count(*) from `bucket` where sub_type = "app-extra" and i_genre_id is not missing or null

number of results: 210806

select count(*) from `bucket` where sub_type = "app-extra"

number of results: 383326

The sum of the first two is: 374326 and the result of the latest query is 383326

What I don’t understand is where is the difference. In what state could be the i_genre_id other than “not missing” or “missing”

dh · January 24, 2022, 5:34pm

I believe your where clause is incorrect for your intention - the “or null” is not checked against i_genere_id; suspect you should be running:

select count(*) from `bucket` where sub_type = "app-extra" and (i_genre_id is missing or i_genre_id is null)

select count(*) from `bucket` where sub_type = "app-extra" and i_genre_id is not missing and i_genre_id is not null

HTH.

vsr1 · January 24, 2022, 5:44pm

Alternative is

select count(1) 
from `bucket`
 where sub_type = "app-extra" 
and i_genre_id IS NOT VALUED;

flaviu · January 24, 2022, 5:44pm

thanks for the fast answer.

Yes you are right, the pasted query was incorect. But I realized that what was happening was something else. There was a script which was populating data to the db, and the difference in time between the execution of the queries was the reason the sum of two was incorect

flaviu · January 24, 2022, 5:45pm

thanks @vsr1 didn’t know about NOT VALUED

vsr1 · January 24, 2022, 5:53pm

That might be happening because different indexes used. Index is eventual consistency.

VALUED and KNOWN are synonyms (exclude NULL, MISSING)

select count(1)  AS total,
       count(i_genre_id) AS total_valued,
       SUM(CASE WHEN i_genre_id IS VALUED THEN 0 ELSE 1 END) AS total_nonvalued
from `bucket`
 where sub_type = "app-extra" ;

select count(1)  AS total, genre_valued
from `bucket`
LET genre_valued = (i_genre_id IS VALUED)
 where sub_type = "app-extra" 
GROUP BY genre_valued;