Cumulative Distribution Question

BitRunner2084 · April 19, 2020, 4:16am

Using an analytic query, I am trying to display the cumulative distribution of item quantities for all orders in a dataset named “orders,” which I based out of a bucket named dtOrders, which in turn features the data found in the sample Orders bucket. Here’s what I’ve done so far with the code so far.
CODE:

SELECT q.qty, q.custid, CUME_DIST() OVER (
  PARTITION BY q.custid
  ORDER BY q.qty
) AS `Quantity`
FROM orders AS q;

When I run this query, I get the following output.
OUTPUT:

[
  {
    "Quantity": 1,
    "custid": "C41"
  },
  {
    "Quantity": 1,
    "custid": "C41"
  },
  {
    "Quantity": 1,
    "custid": "C13"
  },
  {
    "Quantity": 1,
    "custid": "C13"
  },
  {
    "Quantity": 1,
    "custid": "C13"
  },
  {
    "Quantity": 1,
    "custid": "C31"
  },
  {
    "Quantity": 1,
    "custid": "C35"
  },
  {
    "Quantity": 1,
    "custid": "C37"
  }
]

As you can see, my current code outputs all quantities in the dataset as 1, which isn’t really what I want. Can someone be willing to help me out here? I would appreciate it. Thank you.

vsr1 · April 19, 2020, 2:25pm

Check your data.

If there is no window-order-clause, all objects are considered peers (i.e. ties). When window-order-clause result in ties CUME_DIST return same result for each object in the partition.

q.qty value is MISSING
Window ORDER BY q.qty has duplicates and all of them are MISSING. i.e nothing but no window order.
i.e your window function CUME_DIST() OVER (PARTITION BY q.custid) i.e. Same value for each partition object.