Left outer join + count include empty cases


#1

Hi, I have this problem: I need to show all stores with the amount of items they have, including stores with 0 items. I’m using a left join but i only getting stores with at least 1 item. Please help.

This is my query:

SELECT store.id, count(tem.id) as count, store.main
FROM CONTENT store
LEFT JOIN CONTENT item ON KEY item.main.owner FOR store
WHERE (store.id = item.main.owner OR item IS MISSING)
AND store.main.type = ‘electronic-store’
AND store.main.status = “OPEN”
AND item.main.type = ‘toys’
AND item.main.status = “AVALAIBLE”
GROUP BY store.id, store.main
ORDER BY count DESC


#2

Predicate on right side rejects the LEFT JOIN MISSING projected items.

You can use ANSI JOINS in 5.5

OR try this

SELECT store.id, count(item.id) as count, store.main
FROM CONTENT store
LEFT JOIN CONTENT item ON KEY item.main.owner FOR store
WHERE store.main.type = ‘electronic-store’
AND store.main.status = “OPEN”
AND (item IS MISSING OR  (store.id = item.main.owner AND item.main.type = ‘toys’ AND item.main.status = “AVALAIBLE”))
GROUP BY store.id, store.main
ORDER BY count DESC

Depends on what you want you may want to take care of item.main or its sub fields are MISSING


#3

I´m using version 5.1.
What index do you propouse for this query??


#4

CREATE INDEX ix1 ON CONTENT(main.status, id, main) WHERE main.type = “electronic-store”;
CREATE INDEX ix2 ON CONTENT(main.owner,main.type,main.status,id);


#5

I create this two indexes but I´m getting this error: “No index available for join term item”


#6

The following should work

CREATE INDEX ix51 ON CONTENT(main.status, id, main) WHERE main.type = "electronic-store";
CREATE INDEX ix52 ON CONTENT(main.owner,main.type,main.status,id);
SELECT store.id, count(item.id) as count, store.main
FROM CONTENT store
LEFT JOIN CONTENT item ON KEY item.main.owner FOR store
WHERE store.main.type = "electronic-store"
AND store.main.status = "OPEN"
AND (item IS MISSING OR  (store.id = item.main.owner AND item.main.type = "toys" AND item.main.status = "AVALAIBLE"))
GROUP BY store.id, store.main
ORDER BY count DESC

#7

Ok, it works but it takes more than 50 seg with only 80K documents. Any suggestion? because we gonna be working with at least 10 millions documents

Here que explain:
{
“plan”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “Sequence”,
“~children”: [
{
#operator”: “UnionScan”,
“scans”: [
{
#operator”: “IndexScan2”,
“index”: “ix51”,
“index_id”: “c03a19f316e3fd4b”,
“index_projection”: {
“primary_key”: true
},
“keyspace”: “CONTENT”,
“namespace”: “default”,
“spans”: [
{
“exact”: true,
“range”: [
{
“high”: ““OPEN””,
“inclusion”: 3,
“low”: ““OPEN””
}
]
}
],
“using”: “gsi”
},
{
#operator”: “IndexScan2”,
“index”: “ix51”,
“index_id”: “c03a19f316e3fd4b”,
“index_projection”: {
“primary_key”: true
},
“keyspace”: “CONTENT”,
“namespace”: “default”,
“spans”: [
{
“range”: [
{
“high”: ““OPEN””,
“inclusion”: 3,
“low”: ““OPEN””
},
{
“inclusion”: 0,
“low”: “null”
},
{
“inclusion”: 0,
“low”: “null”
}
]
}
],
“using”: “gsi”
}
]
},
{
#operator”: “Fetch”,
“as”: “store”,
“keyspace”: “CONTENT”,
“namespace”: “default”
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “IndexJoin”,
“as”: “item”,
“for”: “store”,
“keyspace”: “CONTENT”,
“namespace”: “default”,
“on_key”: “((item.main).owner)”,
“outer”: true,
“scan”: {
“index”: “ix52”,
“index_id”: “6f8d2d01eec17dac”,
“using”: “gsi”
}
},
{
#operator”: “Filter”,
“condition”: “(((((store.main).type) = “electronic-store”) and (((store.main).status) = “OPEN”)) and ((item is missing) or ((((store.id) = ((item.main).owner)) and (((item.main).type) = “real-estate”)) and (((item.main).status) = “AVAILABLE”))))”
},
{
#operator”: “InitialGroup”,
“aggregates”: [
“count((item.id))”
],
“group_keys”: [
“(store.id)”,
“(store.main)”
]
}
]
}
},
{
#operator”: “IntermediateGroup”,
“aggregates”: [
“count((item.id))”
],
“group_keys”: [
“(store.id)”,
“(store.main)”
]
},
{
#operator”: “FinalGroup”,
“aggregates”: [
“count((item.id))”
],
“group_keys”: [
“(store.id)”,
“(store.main)”
]
},
{
#operator”: “Parallel”,
“~child”: {
#operator”: “Sequence”,
“~children”: [
{
#operator”: “InitialProject”,
“result_terms”: [
{
“expr”: “(store.id)”
},
{
“as”: “count”,
“expr”: “count((item.id))”
},
{
“expr”: “(store.main)”
}
]
}
]
}
}
]
},
{
#operator”: “Order”,
“sort_terms”: [
{
“desc”: true,
“expr”: “count
}
]
},
{
#operator”: “FinalProject”
}
]
},
“text”: “SELECT store.id, count(item.id) as count, store.main\r\nFROM CONTENT store\r\nLEFT JOIN CONTENT item ON KEY item.main.owner FOR store\r\nWHERE store.main.type = “electronic-store”\r\nAND store.main.status = “OPEN”\r\nAND (item IS MISSING OR (store.id = item.main.owner AND item.main.type = “real-estate” AND item.main.status = “AVAILABLE”))\r\nGROUP BY store.id, store.main\r\nORDER BY count DESC”
}


#8

You are doing aggregate query with Joins. It take time. You can use max_parallelism


#9

looking the explain, i don´t understand why UnionScan over the same index happens?


#10

How can I set or configure max_parallelism on this query?


#11

UnionScan happening due to OR clause. Set through query parameter. query parameter name is max_parallelism