COUNT condition

Good evening,

I have these expressions:

1: SELECT did, date, COUNT(*) AS zero_entries
FROM sync_gateway
LET date = SUBSTR(t, 0, 10)
WHERE type='bcn_scan' AND date BETWEEN '2016-11-01' AND '2016-12-15' AND ARRAY_LENGTH(bcn)==0
GROUP BY did, date ORDER BY did ASC, date ASC;

and

2: SELECT did, date, COUNT(ARRAY_LENGTH(bcn)==0) AS zero_entries
FROM sync_gateway
LET date = SUBSTR(t, 0, 10)
WHERE type='bcn_scan' AND date BETWEEN '2016-11-01' AND '2016-12-15'
GROUP BY did, date ORDER BY did ASC, date ASC;

To my eyes, I’d expect them to return the same result; however

My plan, obviously is to include both == and !! in the same query, probably creating an array by did instead of having split lists

I also checked out the execution plan; I still don’t see what the problem could possibly be

Can you add

AND IS_ARRAY(bcn)

to both queries.

I suppose at the WHERE clause?

both statements return that

Fixed my previous comment, sorry.

They seem … reverse, for some reason

N.B.: foull is actually full; I just wanted aligned fields

cbq> SELECT did, {`string` : [{
   >		"empty" : ARRAY_SUM(ARRAY_AGG(TONUMBER(ARRAY_LENGTH(bcn)==0))),
   >		"foull" : ARRAY_SUM(ARRAY_AGG(TONUMBER(ARRAY_LENGTH(bcn)<>0)))
   > }]} AS data
   > FROM playground
   > WHERE type="easy-agg" AND IS_ARRAY(bcn)
   > GROUP BY did, `string`
   > ORDER BY did, `string`;
{
	"requestID": "5e366b85-9b7b-49a4-a4b0-e169626e2be0",
	"signature": {
		"data": "object",
		"did": "json"
	},
	"results": [
		{
			"data": {
				"NEW: Vaasa 1": [
					{
						"empty": 1,
						"foull": 1
					}
				]
			},
			"did": "d46cb7bd3c6001e00ba3fd43eca39428"
		},
		{
			"data": {
				"Vaasa 1": [
					{
						"empty": 0,
						"foull": 2
					}
				]
			},
			"did": "d46cb7bd3c6001e00ba3fd43eca39428"
		},
		{
			"data": {
				"NEW: Vaasa 2": [
					{
						"empty": 0,
						"foull": 1
					}
				]
			},
			"did": "ee703bc099ffa963901e7e36551a8e9f"
		},
		{
			"data": {
				"Vaasa 2": [
					{
						"empty": 1,
						"foull": 0
					}
				]
			},
			"did": "ee703bc099ffa963901e7e36551a8e9f"
		}
	],
	"status": "success",
	"metrics": {
		"elapsedTime": "23.5736ms",
		"executionTime": "23.533412ms",
		"resultCount": 4,
		"resultSize": 1066,
		"sortCount": 4
	}
}
cbq>

(Data: testTable.zip (627 Bytes) - the mockup playground bucket)

They look quite alike to what I’d want to achive in the end, except that:
I’d like to “merge” did value (d46cb7bd3c6001e00ba3fd43eca39428) with the attribute name data, ie

"d46cb7bd3c6001e00ba3fd43eca39428": {
	"NEW: Vaasa 1": [
		{
			"empty": 1,
			"foull": 1
		}
	]
}

and then the same did’s to merge with themselves (with their children in ASC order), ie

"d46cb7bd3c6001e00ba3fd43eca39428": {
	"NEW: Vaasa 1": [
		{
			"empty": 1,
			"foull": 1
		}
	],
	"Vaasa 1": [
		{
			"empty": 0,
			"foull": 2
		}
	]
}

Ok, found one cause. COUNT(FALSE) is 1, while COUNT(NULL) is 0. so you want:

COUNT( NULLIF( ARRAY_LENGTH(bcn), 0 ) )

Tried 3 approaches.

  1. Since NULLIF has no counterpart expression, I tried to negate it:

    cbq> SELECT did, date, COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)) AS zero_entries, COUNT(NOT NULLIF(ARRAY_LENGTH(bcn), 0)) AS entries
    FROM sync_gateway
    > > LET date = SUBSTR(t, 0, 10)
    > WHERE type=‘bcn_scan’ AND date BETWEEN ‘2016-11-01’ AND ‘2016-12-15’
    GROUP BY did, date ORDER BY did ASC, date ASC; >
    {
    “requestID”: “38b60186-d5c8-440a-b148-f06876724ae5”,
    “signature”: {
    “date”: “json”,
    “did”: “json”,
    “entries”: “number”,
    “zero_entries”: “number”
    },
    “results”: [
    {
    “date”: “2016-11-15”,
    “did”: “1a939b4b17ea46af16592a5be2bb7ab6”,
    “entries”: 1523,
    “zero_entries”: 1523
    },
    {
    “date”: “2016-11-16”,
    “did”: “1a939b4b17ea46af16592a5be2bb7ab6”,
    “entries”: 367,
    “zero_entries”: 367
    },
    […]

Doesn’t look right

  1. Then I tried the not zero value

    cbq> SELECT did, date, COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)) AS zero_entries, COUNT(NULLIF(ARRAY_LENGTH(bcn), 1)) AS entries
    FROM sync_gateway
    > > LET date = SUBSTR(t, 0, 10)
    > WHERE type=‘bcn_scan’ AND date BETWEEN ‘2016-11-01’ AND ‘2016-12-15’
    > GROUP BY did, date ORDER BY did ASC, date ASC;
    {
    “requestID”: “a0df6abe-ff68-4f27-bb67-ba6ff25139d8”,
    “signature”: {
    “date”: “json”,
    “did”: “json”,
    “entries”: “number”,
    “zero_entries”: “number”
    },
    “results”: [
    {
    “date”: “2016-11-15”,
    “did”: “1a939b4b17ea46af16592a5be2bb7ab6”,
    “entries”: 1583,
    “zero_entries”: 1523
    },
    {
    “date”: “2016-11-16”,
    “did”: “1a939b4b17ea46af16592a5be2bb7ab6”,
    “entries”: 1798,
    “zero_entries”: 367
    },
    […]

Could be, but I cannot parse 1,5M entries to verify

  1. Then tried to mix the expressions:

    cbq> SELECT did, date, COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)) AS zero_entries, COUNT(ARRAY_LENGTH(bcn)<>0) AS entries
    FROM sync_gateway
    LET date = SUBSTR(t, 0, 10)
    > > > WHERE type=‘bcn_scan’ AND date BETWEEN ‘2016-11-01’ AND ‘2016-12-15’
    > GROUP BY did, date ORDER BY did ASC, date ASC;
    {
    “requestID”: “00f7b619-f1e0-455c-8e61-04ff264faf42”,
    “signature”: {
    “date”: “json”,
    “did”: “json”,
    “entries”: “number”,
    “zero_entries”: “number”
    },
    “results”: [
    {
    “date”: “2016-11-15”,
    “did”: “1a939b4b17ea46af16592a5be2bb7ab6”,
    “entries”: 1583,
    “zero_entries”: 1523
    },
    {
    “date”: “2016-11-16”,
    “did”: “1a939b4b17ea46af16592a5be2bb7ab6”,
    “entries”: 1803,
    “zero_entries”: 367
    },
    […]

They look alike (2)


I also tried to look at one day (Still the file is 5 MB, so search regex to the rescue!)

SELECT t, bcn, IS_ARRAY(bcn) AS bcn_test,
	COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)) AS `NULLIF0`,
	COUNT(NULLIF(ARRAY_LENGTH(bcn), 1)) AS `NULLIF1`,
	COUNT(ARRAY_LENGTH(bcn)<>0) AS diffThanZero,
	COUNT(ARRAY_LENGTH(bcn)==0) AS eqZero
FROM sync_gateway LET date = SUBSTR(t, 0, 10) WHERE type="bcn_scan" AND date="2016-11-15" AND did="1a939b4b17ea46af16592a5be2bb7ab6" GROUP BY t, bcn ORDER BY t, bcn;


 Connected to : http://localhost:8091/. Type Ctrl-D or \QUIT to exit.

 Path to history file for the shell : /root/.cbq_history 
{
	"requestID": "e15c844b-e720-4aa1-9cdc-b82c7a13f11f",
	"signature": {
		"NULLIF0": "number",
		"NULLIF1": "number",
		"bcn": "json",
		"bcn_test": "boolean",
		"diffThanZero": "number",
		"eqZero": "number",
		"t": "json"
	},
	"results": [
		{
			"NULLIF0": 1,
			"NULLIF1": 1,
			"bcn": [
				{
					"d": 7.590711939039835,
					"id": [
						"0x5475726b75204f626f416b6164656d69",
						"0x0117",
						"0x0001"
					],
					"rssi": -87,
					"tx": -58
				},
				[...]
			],
			"bcn_test": true,
			"diffThanZero": 1,
			"eqZero": 1,
			"t": "2016-11-15T07:27:50.241Z"
		},
		[...],
		{
			"NULLIF0": 0,
			"NULLIF1": 1,
			"bcn": [],
			"bcn_test": true,
			"diffThanZero": 1,
			"eqZero": 1,
			"t": "2016-11-15T11:57:49.997Z"
		}
		[...],

(Perl) Regex Scan returns 60 results for ("(NULLIF0|NULLIF1|diffThanZero|eqZero)": 0|"bcn_test": false) and only for the "NULLIF0": 0 term.

I also ran:

Connected to : http://localhost:8091/. Type Ctrl-D or \QUIT to exit.

 Path to history file for the shell : /root/.cbq_history 
SELECT did, date,
	COUNT(*) AS star,
	COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)) AS `NULLIF0`,
	COUNT(NOT NULLIF(ARRAY_LENGTH(bcn), 0)) AS `NOTNULLIF0`,
	COUNT(NULLIF(ARRAY_LENGTH(bcn), 1)) AS `NULLIF1`,
	COUNT(ARRAY_LENGTH(bcn)<>0) AS difZero,
	COUNT(ARRAY_LENGTH(bcn)==0) AS eqZero
FROM sync_gateway
LET date = SUBSTR(t, 0, 10)
WHERE type="bcn_scan" AND date BETWEEN "2016-11-01" AND "2016-12-15"
GROUP BY did, date
ORDER BY did ASC;


 Connected to : http://localhost:8091/. Type Ctrl-D or \QUIT to exit.

 Path to history file for the shell : /root/.cbq_history 
{
	"requestID": "609cd125-0470-42c5-8620-bec95a996448",
	"signature": {
		"NOTNULLIF0": "number",
		"NULLIF0": "number",
		"NULLIF1": "number",
		"date": "json",
		"did": "json",
		"difZero": "number",
		"eqZero": "number",
		"star": "number"
	},
	"results": [
		[...],
		{
		    "NOTNULLIF0": 1523,
		    "NULLIF0": 1523,
		    "NULLIF1": 1583,
		    "date": "2016-11-15",
		    "did": "1a939b4b17ea46af16592a5be2bb7ab6",
		    "difZero": 1583,
		    "eqZero": 1583,
		    "star": 1583
		},
		[...]

I assume that 1583 - 1523 = 60 indeed equals 60 (what the scan showed us above), so I guess I have to subtract COUNT (*) - COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)) to get my number.

I also “derived” a different way on computing (again on post #6), which also produces same results.

I don’t know if this helps your facilitate your answer functionally/language-related.


Also, I’d warmly welcome your input on the #6 post (how to ‘merge’ the documents)


Sidenote: Moving some (index) speedup improvements in this topic

I can only process one thought at a time, so let’s start with this. Did NULLIF give you the same result as COUNT(*). That is, did NULLIF address the issue with COUNT(ARRAY_LENGTH…)).

So COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)) AS NULLIF0 removes the bcn: [] entries that are counted in the COUNT(*) AS star.

I tried to find the opposite of the expression.
In my testing journeys however, I saw that COUNT(*) - COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)) AS NULLIF0 is also possible (as I haven’t yet decided what I want the data to look like).

What would be the binary different expression of NULLIF(ARRAY_LENGTH(bcn), 0)? (that’s where the #8 post starts from)

You are correct, my logic was wrong. Try:

CASE WHEN ARRAY_LENGTH( bcn ) = 0 THEN 1
ELSE NULL
END

Yes - indeed

	"nulld_entries" : COUNT(*) - COUNT(NULLIF(ARRAY_LENGTH(bcn), 0))

is the same; but slower than

	"nulld_entries" : COUNT(CASE WHEN ARRAY_LENGTH(bcn)==0 THEN 1 ELSE NULL END)

What about

Try the OBJECT construct.

You can also try:

{ some_expr : some_value }

SELECT OBJECT_ADD(self, SUBSTR(t, 0, 10), {
	did : {
		"foull_entries" : COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)),
		"total_entries" : COUNT(*),
		"nulld_entries" : COUNT(CASE WHEN ARRAY_LENGTH(bcn)==0 THEN 1 ELSE NULL END)
	}
})
FROM sync_gateway
WHERE type="bcn_scan" AND SUBSTR(t, 0, 10) BETWEEN "2016-11-01" AND "2016-12-15"
GROUP BY SUBSTR(t, 0, 10), did
ORDER BY SUBSTR(t, 0, 10) ASC, did ASC;

(I guess the ‘parent’ object is self?)
OBJECT_ADD(…) must be a group key or aggregate.

SELECT SUBSTR(t, 0, 10) : {
	did : {
		"foull_entries" : COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)),
		"total_entries" : COUNT(*),
		"nulld_entries" : COUNT(CASE WHEN ARRAY_LENGTH(bcn)==0 THEN 1 ELSE NULL END)
	}
}
FROM sync_gateway
WHERE type="bcn_scan" AND SUBSTR(t, 0, 10) BETWEEN "2016-11-01" AND "2016-12-15"
GROUP BY SUBSTR(t, 0, 10), did
ORDER BY SUBSTR(t, 0, 10) ASC, did ASC;

syntax error - at :

If I try

cbq> SELECT {SUBSTR(t, 0, 10) : {
   >         did : {
   >                 "foull_entries" : COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)),
   >                 "total_entries" : COUNT(*),
   >                 "nulld_entries" : COUNT(CASE WHEN ARRAY_LENGTH(bcn)==0 THEN 1 ELSE NULL END)
   >         }
   > }}
   > FROM sync_gateway
   > WHERE type="bcn_scan" AND SUBSTR(t, 0, 10) BETWEEN "2016-11-01" AND "2016-12-15"
   > GROUP BY SUBSTR(t, 0, 10), did
   > ORDER BY SUBSTR(t, 0, 10) ASC, did ASC;

then, it does that; but it breaks grouping, I have yet another nesting level and that annoying ‘$1’:

    {...},
    {
        "$1": {
            "2016-12-15": {
                "8f2d11c831dc6eaa1144c9db5409437e": {
                    "foull_entries": 0,
                    "nulld_entries": 1485,
                    "total_entries": 1485
                }
            }
        }
    },
    {
        "$1": {
            "2016-12-15": {
                "abc7ff93c3f3c4bcde0baa28f2fa688c": {
                    "foull_entries": 924,
                    "nulld_entries": 0,
                    "total_entries": 924
                }
            }
        }
    },
    {...},

(Of course I can rename it with AS "string" but group-breaking is what’s really undesired here)

Your last approach might work with SELECT RAW …

SELECT RAW does get rid of the ‘$1’, but does not merge the various dids under the same day. So I thought it was ARRAY_AGG time. However:

SELECT RAW {SUBSTR(t, 0, 10) : ARRAY_AGG({
	did : {
		"foull_entries" : COUNT(NULLIF(ARRAY_LENGTH(bcn), 0)),
		"total_entries" : COUNT(*),
		"nulld_entries" : COUNT(CASE WHEN ARRAY_LENGTH(bcn)==0 THEN 1 ELSE NULL END)
	}
})}
FROM sync_gateway
WHERE type="bcn_scan" AND SUBSTR(t, 0, 10) BETWEEN "2016-11-01" AND "2016-12-15"
GROUP BY SUBSTR(t, 0, 10)
ORDER BY SUBSTR(t, 0, 10) ASC;

5020: "Error updating initial GROUP value. - cause: Aggregate count(nullif(array_length((`sync_gateway`.`bcn`)), 0)) not found."

Ok. Please start a new topic just on this. Just post three things, no more than 5 lines each:

(1) The query you are using

(2) The current output

(3) The desired output

If the question is long, or if there are extraneous discussions, I will not be able to understand it.

1 Like

You are right, I got sidetracked: