LEFT OUTER JOIN + WHERE clause

Hello! I am trying to perform a LEFT OUTER JOIN while filtering on the right part of the join.

I have created the following index to achieve this:

CREATE INDEX `idx_store_order` ON `myBucket`(("Store::" || `storeId`)) 
WHERE ((`docType` = "Order") or (`docType` is missing))

and I am trying to execute the following query:

SELECT store.status, order.userId, store.docId 
FROM myBucket store
LEFT OUTER JOIN myBucket order ON KEY ("Store::" || order.storeId) FOR store
WHERE store.docType="Store"
AND (order.docType="Order" OR order.docType IS MISSING)
AND order.clientId="9281ae36-a418-4ea3-93f0-bfd7b1a38248"

I have 30 documents with docType="Store", but when I perform this query I don’t get the 30 results. If I remove the last clause and group by store, then I get the 30 results, so it’s the last clause that affects the final results.

I have also tried the following statement (unsucessfully) as the last clause:

(AND order.clientId="9281ae36-a418-4ea3-93f0-bfd7b1a38248" OR order.docType IS MISSING)

Here is the EXPLAIN clause for the latter:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexScan",
          "index": "idx_docType",
          "index_id": "e498d0c0ee2f0d9d",
          "keyspace": "myBucket",
          "namespace": "default",
          "spans": [
            {
              "Range": {
                "High": [
                  "\"Store\""
                ],
                "Inclusion": 3,
                "Low": [
                  "\"Store\""
                ]
              }
            }
          ],
          "using": "gsi"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Fetch",
                "as": "store",
                "keyspace": "myBucket",
                "namespace": "default"
              },
              {
                "#operator": "IndexJoin",
                "as": "order",
                "for": "store",
                "keyspace": "myBucket",
                "namespace": "default",
                "on_key": "(\"Store::\" || (`order`.`storeId`))",
                "outer": true,
                "scan": {
                  "index": "idx_store_order",
                  "index_id": "a97fce5158e6e573",
                  "using": "gsi"
                }
              },
              {
                "#operator": "Filter",
                "condition": "((((`store`.`docType`) = \"Store\") and (((`order`.`docType`) = \"Order\") or ((`order`.`docType`) is missing))) and (((`order`.`clientId`) = \"9281ae36-a418-4ea3-93f0-bfd7b1a138248\") or (`order` is missing)))"
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "(`store`.`status`)"
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "SELECT store.status\nFROM myBucket store\nLEFT OUTER JOIN myBucket order ON KEY (\"Store::\" || order.storeId) FOR store\nWHERE store.docType=\"Store\"\nAND (order.docType=\"Order\" OR order.docType IS MISSING)\nAND (order.clientId=\"9281ae36-a418-4ea3-93f0-bfd7b1a138248\" OR order IS MISSING)"
  }
]

Am I missing something? Why am I not getting the expected results? Let me know if you need more info. Thank you!

Our WHERE clause is not considered part of the JOIN predicate, so you have to do the following. You need to escape order throughout, or use a different alias.

SELECT store.status, order.userId, store.docId 
FROM myBucket store
LEFT OUTER JOIN myBucket order ON KEY ("Store::" || order.storeId) FOR store
WHERE store.docType="Store"
AND (
(order IS MISSING)
OR
 ((order.docType="Order" OR order.docType IS MISSING)
AND order.clientId="9281ae36-a418-4ea3-93f0-bfd7b1a38248")
2 Likes

Hi @geraldss, thank you for your help! I’ve changed the actual docType to make it more clear, I understand it is a reserved word :slight_smile:

I’ve tried to run this query, but now I get an error stating: "No index available for join term order", meaning I’ll have to do some adjustments to the index - can you help me with that?

I was hoping that I didn’t have to add the fields I am trying to filter to the index, since I’ll be filtering by other fields in the Order document, and some of them are optional. Will this possible with Couchbase? Let me know if you need more info or a more specific use case.

PS: I have noticed you answered in SO as well, where do you prefer that we continue this discussion?

Either SO or here is fine. No need to duplicate.

CREATE INDEX idx_storeId ON myBucket( "Store::" ||storeId);

Creating this index now gives me some results, but it still doesn’t give me the 30 results as expected - it gives fewer, and only for the existing orders. Here’s the EXPLAIN query:


[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexScan",
          "index": "idx_docType",
          "index_id": "e498d0c0ee2f0d9d",
          "keyspace": "myBucket",
          "namespace": "default",
          "spans": [
            {
              "Range": {
                "High": [
                  "\"Store\""
                ],
                "Inclusion": 3,
                "Low": [
                  "\"Store\""
                ]
              }
            }
          ],
          "using": "gsi"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Fetch",
                "as": "store",
                "keyspace": "myBucket",
                "namespace": "default"
              },
              {
                "#operator": "IndexJoin",
                "as": "order",
                "for": "store",
                "keyspace": "myBucket",
                "namespace": "default",
                "on_key": "(\"Store::\" || (`order`.`storeId`))",
                "outer": true,
                "scan": {
                  "index": "idx_orderId",
                  "index_id": "73f485c7a11b59cd",
                  "using": "gsi"
                }
              },
              {
                "#operator": "Filter",
                "condition": "((((`store`.`docType`) = \"Store\") and ((`order` is missing) or ((((`order`.`docType`) = \"Order\") or ((`order`.`docType`) is missing)) and ((`order`.`clientId`) = \"9281ae36-a418-4ea3-93f0-bfd7b1a38248\")))) and ((`store`.`status`) = \"published\"))"
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "(`store`.`status`)"
                  },
                  {
                    "expr": "(`order`.`clientId`)"
                  },
                  {
                    "as": "orderStatus",
                    "expr": "(`order`.`status`)"
                  },
                  {
                    "expr": "(`store`.`docId`)"
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "SELECT store.status, order.clientId, order.status as orderStatus, store.docId \nFROM myBucket store\nLEFT OUTER JOIN myBucket order ON KEY (\"Store::\" || order.storeId) FOR store\nWHERE store.docType=\"Store\"\nAND (\n(order IS MISSING)\nOR\n ((order.docType=\"Order\" OR order.docType IS MISSING)\nAND order.clientId=\"9281ae36-a418-4ea3-93f0-bfd7b1a38248\"))\nAND store.status=\"published\""
  }
]

Am I missing something still?

Can you try to drop index orderId?

You should have only two indexes.

CREATE INDEX idx_storeType ON myBucket( docType ) WHERE docType = "Store";

CREATE INDEX idx_orderStoreId ON myBucket( "Store::" || storeId );

I am still getting the same results with these two indices. Here’s the EXPLAIN result:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexScan",
          "index": "idx_storeType",
          "index_id": "c10063a8028a9e42",
          "keyspace": "myBucket",
          "namespace": "default",
          "spans": [
            {
              "Range": {
                "High": [
                  "\"Store\""
                ],
                "Inclusion": 3,
                "Low": [
                  "\"Store\""
                ]
              }
            }
          ],
          "using": "gsi"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Fetch",
                "as": "store",
                "keyspace": "myBucket",
                "namespace": "default"
              },
              {
                "#operator": "IndexJoin",
                "as": "order",
                "for": "store",
                "keyspace": "myBucket",
                "namespace": "default",
                "on_key": "(\"Store::\" || (`order`.`storeId`))",
                "outer": true,
                "scan": {
                  "index": "idx_storeId",
                  "index_id": "c5f55d924c583174",
                  "using": "gsi"
                }
              },
              {
                "#operator": "Filter",
                "condition": "((((`store`.`docType`) = \"Store\") and ((`order` is missing) or ((((`order`.`docType`) = \"Order\") or ((`order`.`docType`) is missing)) and ((`order`.`clientId`) = \"9281ae36-a418-4ea3-93f0-bfd7b1a38248\")))) and ((`store`.`status`) = \"published\"))"
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "(`store`.`status`)"
                  },
                  {
                    "expr": "(`order`.`clientId`)"
                  },
                  {
                    "as": "subStatus",
                    "expr": "(`order`.`status`)"
                  },
                  {
                    "expr": "(`store`.`docId`)"
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "SELECT store.status, order.clientId, order.status as subStatus, store.docId \nFROM myBucket store\nLEFT OUTER JOIN myBucket order ON KEY (\"Store::\" || order.storeId) FOR store\nWHERE store.docType=\"Store\"\nAND (\n(order IS MISSING)\nOR\n ((order.docType=\"Order\" OR order.docType IS MISSING)\nAND order.clientId=\"9281ae36-a418-4ea3-93f0-bfd7b1a38248\"))\nAND store.status=\"published\""
  }
]

It seems like only the second index is being used - i think it’s the same as the one you provided before, right?

Both indexes are being used, and those are the correct indexes.

Which condition is causing it to discard documents? Can you post this result:

SELECT store.status, order.userId, store.docId 
FROM myBucket store
LEFT OUTER JOIN myBucket order ON KEY ("Store::" || order.storeId) FOR store
WHERE store.docType="Store"

Sorry, misread the top of the explain query. Here’s the result of the EXPLAIN:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexScan",
          "index": "idx_storeType",
          "index_id": "c10063a8028a9e42",
          "keyspace": "myBucket",
          "namespace": "default",
          "spans": [
            {
              "Range": {
                "High": [
                  "\"Store\""
                ],
                "Inclusion": 3,
                "Low": [
                  "\"Store\""
                ]
              }
            }
          ],
          "using": "gsi"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Fetch",
                "as": "store",
                "keyspace": "myBucket",
                "namespace": "default"
              },
              {
                "#operator": "IndexJoin",
                "as": "order",
                "for": "store",
                "keyspace": "myBucket",
                "namespace": "default",
                "on_key": "(\"Store::\" || (`order`.`storeId`))",
                "outer": true,
                "scan": {
                  "index": "idx_storeId",
                  "index_id": "e900dc0b099e1df0",
                  "using": "gsi"
                }
              },
              {
                "#operator": "Filter",
                "condition": "((`store`.`docType`) = \"Store\")"
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "(`store`.`status`)"
                  },
                  {
                    "expr": "(`order`.`clientId`)"
                  },
                  {
                    "expr": "(`store`.`docId`)"
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "SELECT store.status, order.clientId, store.docId \nFROM myBucket store\nLEFT OUTER JOIN myBucket order ON KEY (\"Store::\" || order.storeId) FOR store\nWHERE store.docType=\"Store\""
  }
]

Please post the result of the query, not the EXPLAIN.

It gives 1108 results… For the sake of brevity I’ll post the first results if that’s okay with you!

{
    "docId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
    "status": "published",
    "clientId": "2a4dedb0-f6b7-4232-bb55-3933b4e33976"
  },
  {
    "docId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
    "status": "published",
    "clientId": "e1a85700-43f0-420a-a3cc-bcc672ce2bf6"
  },
  {
    "docId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
    "status": "published",
    "clientId": "32dcc414-f177-41d3-a29a-d7ec96d7dc56"
  },
  {
    "docId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
    "status": "published",
    "clientId": "838ac572-8169-43de-a61b-fbe98a4f251f"
  },
  {
    "docId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
    "status": "published",
    "clientId": "3beaf519-70d3-4a48-b006-829e0cb095c0"
  },
  {
    "docId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
    "status": "published",
    "clientId": "810874b7-0a56-4a2d-bfae-32b9e5e9d727"
  },
  {
    "docId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
    "status": "published",
    "clientId": "c1381ab4-2c0f-4bf0-aa6d-956eaa05223d"
  },
  {
    "docId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
    "status": "published",
    "clientId": "34fd0009-b15d-4849-8c98-05121e574e43"
  },
  {
    "docId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
    "status": "published",
    "clientId": "899aeb81-5e22-4ff8-9d48-099cbb810052"
  },
  {
    "docId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
    "status": "published",
    "clientId": "1c7e17c4-b9f1-4768-bcc0-d61264096814"
  },
  {
    "docId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
    "status": "published",
    "clientId": "486d65e9-37d7-4251-95c7-a03b293886cf"
  },

How about these results:

SELECT store.status, order.userId, store.docId 
FROM myBucket store
LEFT OUTER JOIN myBucket order ON KEY ("Store::" || order.storeId) FOR store
WHERE store.docType="Store"
AND order IS MISSING
LIMIT 10;

Here it is (total of 5 results):

[
  {
    "docId": "1bd5187b-0369-4d23-86c3-f0b50eede54c",
    "status": "draft"
  },
  {
    "docId": "1c9e498b-a175-4e1a-b5cc-7428de03a88c",
    "status": "draft"
  },
  {
    "docId": "9621b964-cbe7-456e-99c9-f4927e1dab60",
    "status": "draft"
  },
  {
    "docId": "9c038136-0ca2-4191-b61c-d6e6ce034a04",
    "status": "published"
  },
  {
    "docId": "ed7dd680-e354-4edb-9237-001a415f69bf",
    "status": "published"
  }
]

Great, so the LEFT OUTER JOIN is working. Now let’s look at some orders. Please post the results.

SELECT store.status, order, store.docId 
FROM myBucket store
LEFT OUTER JOIN myBucket order ON KEY ("Store::" || order.storeId) FOR store
WHERE store.docType="Store"
AND order IS NOT MISSING
LIMIT 5;

Here it is (omitted some fields for privacy):

[
  {
    "docId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
    "status": "published",
    "order": {
      "docId": "06238910-410c-449a-ae35-b9edadf74426",
      "docType": "DiscountCode",
      "storeId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
      "clientId": "2a4dedb0-f6b7-4232-bb55-3933b4e33976"
    }
  },
  {
    "docId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
    "status": "published",
    "order": {
      "docId": "11e3fabf-fb7f-4b29-8675-c97e41e40509",
      "docType": "DiscountCode",
      "storeId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
      "clientId": "e1a85700-43f0-420a-a3cc-bcc672ce2bf6"
    }
  },
  {
    "docId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
    "status": "published",
    "order": {
      "docId": "222582db-7daa-473a-8dde-044ec2d76874",
      "docType": "DiscountCode",
      "storeId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
      "clientId": "32dcc414-f177-41d3-a29a-d7ec96d7dc56"
    }
  },
  {
    "docId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
    "status": "published",
    "order": {
      "docId": "28ca4339-bd24-4294-8e75-d1782adfa2bf",
      "docType": "DiscountCode",
      "storeId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
      "clientId": "838ac572-8169-43de-a61b-fbe98a4f251f"
    }
  },
  {
    "docId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
    "status": "published",
    "order": {
      "docId": "306284c4-b6c7-4adf-89e7-a6716a7ec95c",
      "docType": "DiscountCode",
      "storeId": "0af38eef-9d68-4852-a9c5-baed11f0162f",
      "clientId": "3beaf519-70d3-4a48-b006-829e0cb095c0"
    }
  }
]

Since we don’t specified the docType it appears it finds all documents with a relationship with a Store… Here’s the EXPLAIN if it helps:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "IndexScan",
              "index": "idx_storeType",
              "index_id": "c10063a8028a9e42",
              "keyspace": "myBucket",
              "namespace": "default",
              "spans": [
                {
                  "Range": {
                    "High": [
                      "\"Store\""
                    ],
                    "Inclusion": 3,
                    "Low": [
                      "\"Store\""
                    ]
                  }
                }
              ],
              "using": "gsi"
            },
            {
              "#operator": "Parallel",
              "~child": {
                "#operator": "Sequence",
                "~children": [
                  {
                    "#operator": "Fetch",
                    "as": "store",
                    "keyspace": "myBucket",
                    "namespace": "default"
                  },
                  {
                    "#operator": "IndexJoin",
                    "as": "order",
                    "for": "store",
                    "keyspace": "myBucket",
                    "namespace": "default",
                    "on_key": "(\"Store::\" || (`order`.`storeId`))",
                    "outer": true,
                    "scan": {
                      "index": "idx_storeId",
                      "index_id": "e900dc0b099e1df0",
                      "using": "gsi"
                    }
                  },
                  {
                    "#operator": "Filter",
                    "condition": "(((`store`.`docType`) = \"Store\") and (`order` is not missing))"
                  },
                  {
                    "#operator": "InitialProject",
                    "result_terms": [
                      {
                        "expr": "(`store`.`status`)"
                      },
                      {
                        "expr": "`order`"
                      },
                      {
                        "expr": "(`store`.`docId`)"
                      }
                    ]
                  },
                  {
                    "#operator": "FinalProject"
                  }
                ]
              }
            }
          ]
        },
        {
          "#operator": "Limit",
          "expr": "5"
        }
      ]
    },
    "text": "SELECT store.status, order, store.docId \nFROM myBucket store\nLEFT OUTER JOIN myBucket order ON KEY (\"Store::\" || order.storeId) FOR store\nWHERE store.docType=\"Store\"\nAND order IS NOT MISSING\nLIMIT 5;"
  }
]

Looks like it is working correctly.

Try querying only for orders, without the JOIN, and verify that the matching orders are there for that clientId and docType.

Yes, I can confirm that I have matching results (I had to add docType=Order to the previous query in order to get just orders.)

Where do we go from here? Just to give you a bit more insight, my end query will likely be more complex with ORDER BY on the Order document, and with filters applied to both the Order and Store documents… Let me know if you need more info

The ORDER BY will not affect correctness. There is something we are both missing in that predicate…

Should I try to provide some data in order to have a similar environment to solve this problem? Let me know if I can help in anyway

What version of Couchbase are you using? @vsr1 any suggestions?