I have the below query, which gives me the results I need. The problem is that the query runs for 6+ secs. My intention is to try and re-write the query to make it more efficient and use indexes where possible.
WITH available_cb_docs AS ( WITH available_daily_docs AS (
SELECT RAW td.DocNum
FROM tranData td
WHERE td.`$MdfdTmstmp` BETWEEN "{report_date}" and "{report_end_date}"
AND td.RteId = "{report_route}"
AND td.`$Type`="DailyReport"
UNION ALL
SELECT RAW mch.DocNum
FROM tranData td
UNNEST td.ActvCustOrdr AS mch
WHERE td.`$MdfdTmstmp` BETWEEN "{report_date}" and "{report_end_date}"
AND td.RteId = "{report_route}"
AND td.`$Type`="DailyReport"
)
SELECT DISTINCT raw DocNum
FROM tranData
WHERE DocNum IN (available_daily_docs)and `$MdfdTmstmp` between "{previous_date}" and "{report_end_date}") ,
available_daily_report_docs AS (
SELECT RAW td.DocNum
FROM tranData td
WHERE td.`$MdfdTmstmp` BETWEEN "{report_date}" AND "{report_end_date}"
AND td.RteId = "{report_route}"
AND td.`$Type`="DailyReport"
UNION ALL
SELECT RAW mch.DocNum
FROM tranData td
UNNEST td.ActvCustOrdr AS mch
WHERE td.`$MdfdTmstmp` BETWEEN "{report_date}" AND "{report_end_date}"
AND td.RteId = "{report_route}"
AND td.`$Type`="DailyReport"
)
SELECT array_symdiff(available_daily_report_docs, available_cb_docs) AS cust_order_missing_docs;
One of the things I want to do is replace the Union query with a Variable so it can be executed only one. The problem I run into is with the DocNum IN (available_daily_docs) part. For whatever reason, whenever there is an IN clause, Analytics does not want to pick the index defined on DocNum. If I do DocNum = "123", the index is used and response is in milliseconds. But if I do DocNum in ["123", "456"], it takes 6 secs and does not use the index.
Can you rewrite as follows without repeatedly scan of the data?
WITH available_report_docs AS (
SELECT td.DocNum, td.`$MdfdTmstmp` AS timestamp
FROM tranData td
WHERE td.`$MdfdTmstmp` BETWEEN "{report_date}" and "{report_end_date}"
AND td.RteId = "{report_route}"
AND td.`$Type`="DailyReport"
UNION ALL
SELECT mch.DocNum, td.`$MdfdTmstmp` AS timestamp
FROM tranData td
UNNEST td.ActvCustOrdr AS mch
WHERE td.`$MdfdTmstmp` BETWEEN "{report_date}" and "{report_end_date}"
AND td.RteId = "{report_route}"
AND td.`$Type`="DailyReport"
),
available_cb_docs AS ( SELECT DISTINCT RAW c.DocNum
FROM available_report_docs AS c
WHERE c.timestamp BETWEEN "{previous_date}" and "{report_end_date}"),
available_daily_report_docs AS ( SELECT DISTINCT RAW c.DocNum FROM available_report_docs AS c)
SELECT ARRAY_SYMDIFF(available_daily_report_docs, available_cb_docs) AS cust_order_missing_docs;
adjust based on which date is earliest {previous_date} or “{report_date}”
If Analytic supports EXCEPT see if works for you
...............
SELECT RAW c1.DocNum FROM available_report_docs AS c1
EXCEPT
SELECT RAW c.DocNum FROM available_report_docs AS c WHERE c.timestamp BETWEEN "{previous_date}" and "{report_end_date}"
Now, if I run the same query again with multiple items in the IN clause. I just copied the same route again in the IN
Select `$MdfdTmstmp`,meta().id from tranData where `$Type` = "DailyReport"
and `$MdfdTmstmp` > "2021-06-07" and RteId in ["91940","91940"]
limit 5;
Query ran for 10.15s and still got the 2 docs back. No index used. Happy to log a support ticket if that will help with the investigation. Bottom line for us is that I have never ever been able to get Analytics indexes to work with an IN clause.