How to join several data types where I know the key?

query

#1

I am migrating from an existing system where I had all data sitting in Java objects in memory (maps with direct keys). I have run into a little challenge as to how to change some of my code to use N1QL.

I am using Community server 5.1.1 and have all data in a data bucket.

I have a User that can have several FishingTrip docs (and there is a userkey on the trip) and each trip can have zero to many Catchdocs (which actually also has a userkey). Each catch doc. references bait, species etc. that I have a key to do the direct lookup for.

I need to find the latest 10 catches for a particular user and show it with info from the trip as well as bait and species.

I have tried to start building a query like this:

SELECT t.*, c.*FROM data AS c JOIN data AS t 
ON KEYS 'FishingTrip:' || c.fishingtripkey 
WHERE c.type='Catch' AND t.type='FishingTrip' AND c.userkey='BA171123846CEBF1C1257CB2002DA330' 
ORDER BY t.date DESC 
LIMIT 10

I have also succeeded in getting the bait name with this query:

SELECT t.*, c.*,b.name as baitname FROM data AS c 
JOIN data AS t ON KEYS 'FishingTrip:' || c.fishingtripkey
JOIN data AS b ON KEYS 'Bait:' || c.bait
WHERE c.type='Catch' AND t.type='FishingTrip' AND c.userkey='BA171123846CEBF1C1257CB2002DA330' 
ORDER BY t.date DESC 
LIMIT 10

However, if I try to add the species name and user name following the same approach they just don’t appear in the result.

SELECT t.*, c.*,b.name as baitname, u.name as username FROM data AS c 
JOIN data AS t ON KEYS 'FishingTrip:' || c.fishingtripkey
JOIN data AS b ON KEYS 'Bait:' || c.bait
JOIN data AS u ON KEYS 'User:' || c.userkey
WHERE c.type='Catch' AND t.type='FishingTrip' AND c.userkey='BA171123846CEBF1C1257CB2002DA330' 
ORDER BY t.date DESC 
LIMIT 10

… and obviously we have not talked performance yet :wink:

I have a feeling that there must be an easier way to do what I am trying to obtain. Any input appreciated :+1:


#2

If right side of JOIN is not present it will return no results.
Example: JOIN data AS u ON KEYS ‘User:’ || c.userkey if ‘User:’ || c.userkey document is JOIN is not present no results.
Are you looking LEFT JOIN.

Also checkout ANSI JOIN IN 5.5

If your Join Keys are not arrays you can do that. If arrays and willing to change data output change projection t, b, c.

SELECT t[0].*, c.*,b[0] AS baitname, u[0] AS username
FROM data AS c
LET t = (SELECT RAW t FROM data AS t USE KEYS 'FishingTrip:' || c.fishingtripkey WHERE t.type="FishingTrip"),
    b = (SELECT RAW b.name FROM data AS b USE KEYS 'Bait:' || c.bait),
    u = (SELECT RAW u.name FROM data AS u USE KEYS 'User:' || c.userkey)
WHERE c.type="Catch" AND c.userkey='BA171123846CEBF1C1257CB2002DA330'
ORDER BY t[0].date DESC
LIMIT 10;

#3

Very interesting approach. I had considered using Let... after I had read about it - but just couldn’t get my head around how to build it :wink:

I need to adjust a couple of things but I’m getting close. And then I need to consider what indexes to use for it to perform nicely.

But thanks a lot for your help! I’ll give it a run tomorrow as it is passed midnight here now :sleeping:


#4

Hi @vsr1

This is my current query that works fine!

SELECT c as catch,t[0] as trip,b[0] AS baitname, u[0] AS username,s[0] as speciesname
FROM data AS c
LET t = (SELECT RAW t FROM data AS t USE KEYS 'FishingTrip:' || c.fishingtripkey),
    s = (SELECT RAW s.name FROM data AS s USE KEYS 'Species:' || c.species),
    b = (SELECT RAW b.name FROM data AS b USE KEYS 'Bait:' || c.bait),
    u = (SELECT RAW u.name FROM data AS u USE KEYS 'User:Private:' || c.userkey)
WHERE c.type="Catch" AND c.userkey='BA171123846CEBF1C1257CB2002DA330'
ORDER BY t[0].date DESC
LIMIT 10

I have created several indexes to try and optimize the query time:

By type, date and user key

CREATE INDEX `def_type_date_user` ON `data`(`type`,`date`,`userkey`) WHERE (`date` is valued)

By type and user key

CREATE INDEX `def_type_user` ON `data`(`type`,`userkey`) WHERE (`userkey` is valued)

**By type, user key and date **

CREATE INDEX `def_type_user_date` ON `data`(`type`,`userkey`,`date`) WHERE (`userkey` is valued)

The response time is around 500ms which I hope can be improved (this is just a test database with less data than our production environment)…

The Explain looks like this.

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan2",
            "index": "def_type_user",
            "index_id": "48654e7ddc27cf7f",
            "index_projection": {
              "primary_key": true
            },
            "keyspace": "data",
            "namespace": "default",
            "spans": [
              {
                "exact": true,
                "range": [
                  {
                    "high": "\"Catch\"",
                    "inclusion": 3,
                    "low": "\"Catch\""
                  },
                  {
                    "high": "\"BA171123846CEBF1C1257CB2002DA330\"",
                    "inclusion": 3,
                    "low": "\"BA171123846CEBF1C1257CB2002DA330\""
                  }
                ]
              }
            ],
            "using": "gsi"
          },
          {
            "#operator": "Fetch",
            "as": "c",
            "keyspace": "data",
            "namespace": "default"
          },
          {
            "#operator": "Parallel",
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "Filter",
                  "condition": "(((`c`.`type`) = \"Catch\") and ((`c`.`userkey`) = \"BA171123846CEBF1C1257CB2002DA330\"))"
                },
                {
                  "#operator": "Let",
                  "bindings": [
                    {
                      "expr": "(select raw `t` from `data` as `t` use keys (\"FishingTrip:\" || (`c`.`fishingtripkey`)))",
                      "var": "t"
                    },
                    {
                      "expr": "(select raw (`s`.`name`) from `data` as `s` use keys (\"Species:\" || (`c`.`species`)))",
                      "var": "s"
                    },
                    {
                      "expr": "(select raw (`b`.`name`) from `data` as `b` use keys (\"Bait:\" || (`c`.`bait`)))",
                      "var": "b"
                    },
                    {
                      "expr": "(select raw (`u`.`name`) from `data` as `u` use keys (\"User:Private:\" || (`c`.`userkey`)))",
                      "var": "u"
                    }
                  ]
                },
                {
                  "#operator": "InitialProject",
                  "result_terms": [
                    {
                      "as": "catch",
                      "expr": "`c`"
                    },
                    {
                      "as": "trip",
                      "expr": "(`t`[0])"
                    },
                    {
                      "as": "baitname",
                      "expr": "(`b`[0])"
                    },
                    {
                      "as": "username",
                      "expr": "(`u`[0])"
                    },
                    {
                      "as": "speciesname",
                      "expr": "(`s`[0])"
                    }
                  ]
                }
              ]
            }
          }
        ]
      },
      {
        "#operator": "Order",
        "limit": "10",
        "sort_terms": [
          {
            "desc": true,
            "expr": "((`t`[0]).`date`)"
          }
        ]
      },
      {
        "#operator": "Limit",
        "expr": "10"
      },
      {
        "#operator": "FinalProject"
      }
    ]
  },
  "text": "SELECT c as catch,t[0] as trip,b[0] AS baitname, u[0] AS username,s[0] as speciesname\nFROM data AS c\nLET t = (SELECT RAW t FROM data AS t USE KEYS 'FishingTrip:' || c.fishingtripkey),\n    s = (SELECT RAW s.name FROM data AS s USE KEYS 'Species:' || c.species),\n    b = (SELECT RAW b.name FROM data AS b USE KEYS 'Bait:' || c.bait),\n    u = (SELECT RAW u.name FROM data AS u USE KEYS 'User:Private:' || c.userkey)\nWHERE c.type=\"Catch\" AND c.userkey='BA171123846CEBF1C1257CB2002DA330'\nORDER BY t[0].date DESC\nLIMIT 10"
}

Any suggestions to how to improve this? Or is this as good as it gets :wink:


#5

As you have seen this query only uses def_type_user.
All subqueries does direct Fetch due to USE KEYS.

Most of time taken due to ORDER BY t[0].date DESC As this is on subquery. This is as best you can get.
You can try experiment by removing ORDER BY

Also check outProfiling blog see where the time taken.


#6

Thanks - I’ll check the article on profiling :+1:

Removing the ORDER BY... hardly makes any difference…

I was trying to extend the example by getting the remaining data into the query (there are more “labels”). However, this one causes me a problem:

SELECT c as catch,t[0] as trip,ts as targetspeciesnames, b[0] AS baitname, u[0] AS username,s[0] as speciesname
FROM data AS c
LET t = (SELECT RAW t FROM data AS t USE KEYS 'FishingTrip:' || c.fishingtripkey),
    s = (SELECT RAW s.name FROM data AS s USE KEYS 'Species:' || c.species),
    b = (SELECT RAW b.name FROM data AS b USE KEYS 'Bait:' || c.bait),
    ts = (SELECT RAW ts.name FROM data AS ts USE KEYS 'Species:' || t[0].targetspecies),
    u = (SELECT RAW u.name FROM data AS u USE KEYS 'User:Private:' || c.userkey)
WHERE c.type="Catch" AND c.userkey='BA171123846CEBF1C1257CB2002DA330'

I get an error on the ts... definition: "Ambiguous reference to field t."

I can have a list of ids as target species where I also want to look up the names (as for the s... def.

Can I do that?

… I suppose one approach could be to try and build an array of all the species names - and have the first being the species and the remaining being targets.

Or else I’ll have to handle it in code - though that is not ideal either…


#7

CASE CADE LET is not supported. So in t value is not available in ts

SELECT c as catch,t[0].trip,t[0].targetspeciesnames, b[0] AS baitname, u[0] AS username,s[0] as speciesname
FROM data AS c
LET t = (SELECT t AS trip, ts[0] AS targetspeciesnames FROM data AS t USE KEYS 'FishingTrip:' || c.fishingtripkey
             LET ts = (SELECT RAW ts.name FROM data AS ts USE KEYS 'Species:' || t[0].targetspecies)
        ),
    s = (SELECT RAW s.name FROM data AS s USE KEYS 'Species:' || c.species),
    b = (SELECT RAW b.name FROM data AS b USE KEYS 'Bait:' || c.bait),
    u = (SELECT RAW u.name FROM data AS u USE KEYS 'User:Private:' || c.userkey)
WHERE c.type="Catch" AND c.userkey='BA171123846CEBF1C1257CB2002DA330';

#8

Cool idea - but it does not seem to return any targetspeciesnames.

targetspecies is an array - I guess that could be the reason. I suppose the USE KEYS... can use an array so I’ll just need to create an array of the “Species:…” keys?


#9
SELECT c as catch,t[0].trip,t[0].targetspeciesnames, b[0] AS baitname, u[0] AS username,s[0] as speciesname
FROM data AS c
LET t = (SELECT t AS trip, ts AS targetspeciesnames FROM data AS t USE KEYS 'FishingTrip:' || c.fishingtripkey
             LET ts = (SELECT RAW ts.name FROM data AS ts USE KEYS (ARRAY 'Species:' || v FOR v IN  t.targetspecies END))
        ),
    s = (SELECT RAW s.name FROM data AS s USE KEYS 'Species:' || c.species),
    b = (SELECT RAW b.name FROM data AS b USE KEYS 'Bait:' || c.bait),
    u = (SELECT RAW u.name FROM data AS u USE KEYS 'User:Private:' || c.userkey)
WHERE c.type="Catch" AND c.userkey='BA171123846CEBF1C1257CB2002DA330';

#10

Hmmm… doesn’t work - but I don’t really understand why? I fully understand what you’re doing and I guess that should work…

I’ll play with the concept and see if I can find out why.

Thanks for your input!

… and then looking at my challenges here I may adopt my data structure slightly to fit better into CB. A simple thing would be to add a list of keys for the catches and then use the trips as the “entry” to getting the data. I also have photos that point back to the trip (in the same manner as the catches) - so “joining” them into the same query .

As I understand it then direct lookups are really fast so I could just from code do all of the subsequent lookups as direct lookups. Perhaps that would be a better approach :wink:


#11

Updated above query, try one more time.
If you have complex model and want to join on fields you should consider 5.5 with ANSI JOIN