Join using joined bucket as key

spruzzi23 · June 3, 2018, 4:34pm

Hi,
suppose to have this document representing a trip:

{
    "_id": "trip_id",
    "type":  "Trip",
    ...
}

and these documents representing the tolls during a trip:

{
    "_id":  "toll_id",
    "trip_id": "trip_id",
    ...
}

so it’s a one-to-many relationship.
I’d like to execute a join like this, and i saw in this topic

to use the for key:

SELECT tolls
FROM bucket_name AS b 
USE KEYS 'trip_id' 
JOIN bucket_name AS tolls
ON KEYS tolls.trip_id FOR b

but console shows me:

syntax error - at FOR

If i remove the for, i end up with:

Ambiguous reference to field tolls.

What am i doing wrong?
The final result should be an array of tolls having the selected trip_id

Thanks in advance

vsr1 · June 4, 2018, 12:30am

This is Index JOIN. Syntax: ON KEY <expr> FOR <bucket>

There is no S

https://developer.couchbase.com/documentation/server/current/n1ql/n1ql-language-reference/from.html

CREATE INDEX ix1 ON bucket_name(trip_id);
SELECT tolls
FROM bucket_name AS b 
USE KEYS 'trip_id' 
NEST bucket_name AS tolls
ON KEY tolls.trip_id FOR b;

spruzzi23 · June 4, 2018, 6:25am

Hi,
i tried and it works correctly.
The problem now is that the query select every document that has trip_id="trip_id", so if there are other documents with different type field, the query select those too.
I tried to add a where clause like this, following this topic

SELECT tolls
FROM bucket_name AS b 
USE KEYS 'trip_id'  
LEFT OUTER NEST bucket_name tolls ON KEY tolls.trip_id FOR b 
WHERE  b.type = 'Trip' AND (tolls IS MISSING OR (tolls.type = "Toll" OR tolls.type IS MISSING))

but the query still selects all the documents. If i add only tolls.type = "Toll" nothing is returned

Another consideration: my final goal is to search the trip by its id, and then execute a join with 4-5 different types of documents (for example tolls, refuellings, routes…), ended up with a similar query:

SELECT tolls, refuellings
FROM bucket_name AS b 
USE KEYS 'trip_id'  
LEFT OUTER NEST bucket_name tolls ON KEY tolls.trip_id FOR b 
LEFT OUTER NEST bucket_name refuellings ON KEY refuellings.trip_id FOR b 
WHERE  b.type = 'Trip' 
   AND (tolls IS MISSING OR (tolls.type = "Toll" OR tolls.type IS MISSING)) 
   AND (refuellings IS MISSING OR (refuellings.type = "Refuelling" OR refuellings.type IS MISSING))

Do you think would be more efficient to separate the query, taking first the trip and then execute different queries for different documents?

vsr1 · June 4, 2018, 12:40pm

You can’t use post JOIN filters on NEST because by that time right side become ARRAY.
LEFT JOIN with GROUP BY on LEFT gives the NEST.

Check this if this works for you

SELECT  at AS tolls , ar AS refuellings
         FROM bucket_name AS t
         WHERE t.trip_id = "trip_id" AND t.type IN ["Toll", "Refuelling"]
         LETTING at = ARRAY avt FOR avt IN (ARRAY_AGG(CASE WHEN t.`type` = 'Toll' THEN t END)) WHEN avt IS VALUED END,
                 ar = ARRAY avr FOR avr IN (ARRAY_AGG(CASE WHEN t.`type` = 'Refuelling' THEN t END)) WHEN avr IS VALUED END

spruzzi23 · June 4, 2018, 2:05pm

Yes, it works great.
If i’d like to select also the trip document, how could i do?
I tried like this:

SELECT  at AS tolls , ar AS refuellings, tr AS trip
         FROM bucket_name AS t
         WHERE t.trip_id = "trip_id" AND t.type IN ["Toll", "Refuelling"] OR META().id = 'trip_id'
         LETTING at = ARRAY avt FOR avt IN (ARRAY_AGG(CASE WHEN t.`type` = 'Toll' THEN t END)) WHEN avt IS VALUED END,
                 ar = ARRAY avr FOR avr IN (ARRAY_AGG(CASE WHEN t.`type` = 'Refuelling' THEN t END)) WHEN avr IS VALUED END,
                 tr = ARRAY avr FOR avr IN (ARRAY_AGG(CASE WHEN t.`type` = 'Trip' THEN t END)) WHEN avr IS VALUED END

It works, but trip is an array(as expected), but i cannot find a way to retrive only the document.
I tried with:

...
     tr = CASE WHEN t.type = 'Trip' THEN t END

but it returns always null.

Also, i was still wondering if there would be a big difference in terms of resources doing three difference query selecting each type of document with no operation of aggregation or execute this query with group by and letting.

vsr1 · June 4, 2018, 2:14pm

You need WHERE (t.trip_id = “trip_id” AND t.type IN [“Toll”, “Refuelling”]) OR META().id = ‘trip_id’

You should try following so that you can create right index, simple and generic

SELECT tr AS trip, (OBJECT ov.type:ov.v FOR ov IN a END).* FROM bucket_name AS tr USE KEYS "trip_id"
LET a = (SELECT t.type, ARRAY_AGG(t) AS v
         FROM bucket_name AS t
         WHERE t.trip_id = "trip_id" AND t.type IN ["Toll", "Refuelling"]
         GROUP BY t.type
         )

GROUP BY needs to accumulate results, If result set are huge it may need memory. In that case individual queries can perform better for streaming

spruzzi23 · June 5, 2018, 6:39pm

Hi,
your query works like a charm.
I was wondering if it’s possible to extend it in a “multi-document” query.
What i’d like to do is to search for example every trip that has a specific date (for example a field name date).
So, in this case, i would have a where clause like this:

...
WHERE t.date = '2018-06-05'
...

and add to the retrieved documents the fields with the type name.
So, for example, if i have three trips:

{
   "_id": "trip_1",
   "type": "Trip",
   "date": "2018-06-05"
},
{
   "_id": "trip_2",
   "type": "Trip",
   "date": "2018-06-05"
},
{
   "_id": "trip_3",
   "type": "Trip",
   "date": "2018-06-06"
}

and the following tolls:

{
   "_id": "toll_1",
   "type": "Toll",
   "trip_id": "trip_1"
},
{
   "_id": "toll_2",
   "type": "Toll",
   "trip_id": "trip_1"
},
{
   "_id": "toll_3",
   "type": "Toll",
   "trip_id": "trip_2"
},
{
   "_id": "toll_4",
   "type": "Toll",
   "trip_id": "trip_3"
}

the result of the query should be:

[
   {
       "_id": "trip_1",
       "type": "Trip",
       "date": "2018-06-05",
       "Toll": [
           {
               "_id": "toll_1",
               "type": "Toll",
               "trip_id": "trip_1"
           }, 
           {
               "_id": "toll_2",
               "type": "Toll",
               "trip_id": "trip_1"
            }
         ]
     },
     {
         "_id": "trip_2",
         "type": "Trip",
         "date": "2018-06-05",
         "Toll": [
            {
                "_id": "toll_3",
                "type": "Toll",
                "trip_id": "trip_2"
            }
         ]
    }
]

The same would be for the type refuellings, so the trip should has a Refuelling field.

Thanks in advance for the patient

vsr1 · June 5, 2018, 7:38pm

INSERT INTO default VALUES ("trip_1",{ "_id": "trip_1", "type": "Trip", "date": "2018-06-05" }),
                           ("trip_2",{ "_id": "trip_2", "type": "Trip", "date": "2018-06-05" }),
                           ("trip_3",{ "_id": "trip_3", "type": "Trip", "date": "2018-06-06" }),
                           ("toll_1",{ "_id": "toll_1", "type": "Toll", "trip_id": "trip_1" }),
                           ("toll_2",{ "_id": "toll_2", "type": "Toll", "trip_id": "trip_1" }),
                           ("toll_3",{ "_id": "toll_3", "type": "Toll", "trip_id": "trip_2" }),
                           ("toll_4",{ "_id": "toll_4", "type": "Toll", "trip_id": "trip_3" });
CREATE INDEX ix1 ON default(trip_id);
CREATE INDEX ix2 ON default(date) WHERE type = "Trip";

SELECT t.tr, (OBJECT v[0].type:v FOR v IN ov END).* FROM (
     SELECT tr, ARRAY_AGG(t) AS a FROM default AS tr JOIN default AS t  ON KEY t.trip_id FOR tr
     WHERE tr.type = "Trip" AND tr.date = "2018-06-05" AND
          t.type IN ["Toll", "Refuelling"]
     GROUP BY tr, t.type) AS t
GROUP BY t.tr
LETTING ov = ARRAY_AGG(t.a);

spruzzi23 · June 6, 2018, 6:09pm

Hi,
i arrange your solution to my case, and my final query would be like that ( i change the alias name, but the query is the same as yours):

SELECT t.document, creator, updater, car, (OBJECT type[0].type:type FOR type IN types END).* FROM (
     SELECT document, ARRAY_AGG(t) AS documents, car, creator, updater
     FROM omnitrace AS document
     LEFT OUTER JOIN omnitrace AS t
     ON KEY t.duty_id FOR document
     LEFT OUTER JOIN omnitrace AS car
     ON KEYS document.car_id
     LEFT OUTER JOIN omnitrace AS creator
     ON KEYS document.created.user_id
     LEFT OUTER JOIN omnitrace AS updater
     ON KEYS document.updated.user_id
     WHERE (document.type = "Duty") OR (t.type IN ["CarReport", "Transport"])
     GROUP BY document, t.type, car, creator, updater) AS t
     
GROUP BY t.document
LETTING types = ARRAY_AGG(t.documents),
        car = t.car,
        creator = t.creator,
        updater = t.updater

I follow your guide to get the document that has a one-to-many relationship, and i thought about add a join with specific KEYS in the case of a one-to-one relationship.
My final goal is to add dynamically the relationship that i need, and i think this way could be a good deal. So, for example if a wanted another field that has a one-to-one rel i just simply add a join using the corresponding field.

Now, my first thought is about performance.
You told me that:

and with a single join (the case that i’ve presented to you) would be much faster compare to select all the documents, and then re-execute a bunch of queries for every one of them to get the various relationship.

But with all those joins, which i think can be max of 10-15, would be still much faster?
Or every join would slower a lot the query?