Document Relationships using Arrays and Views [passing though graph theory]


#1

State of the art

Relationships between documents are lost in NoSQL databases and the most common use cases (I know) is to embed documents inside others. For example a group may contain many users:

group::1
{
  "name": "group 1",
  "users": [
    {
      "name": "user 1",
      ...
    },
    {
      "name": "user 2",
      ...
    },
    ...
  ]
}

This structure brings unwanted problems, such as updating a user causes updating the group containing him and consequently all the other users. IMHO this is a bad technique in use cases like this because each document is independent from the others, so there should be a document for each entity:

group::1
{
  "name": "group 1"
}

user::1
{
  "name": "user 1"
}

user::2
{
  "name": "user 2"
}

Problem

Relationships are lost in document oriented databases, how can we overcome this missing?

Is there a way to map relations in NoSQL?

Graph Theory vs Document Oriented Database (DOD)

In a graph there are two kind of entities: vertices and edges. As shown in this picture (thanks to Wikipedia):

Vertices (nodes) are the points of a graph and relations are connections between points. I think that a DOD is very similar to a graph and it contains documents, that are vertices. By the way, there aren’t edges in a DOD. I think this is a real missing, and DOD can’t use graph properties, for example you can’t imagine to run Dijkstra or Prim algorithms on a DOD.

Solution

By creating a relation document we can map a graph inside a DOD, and by using Couchbase views we can have a fully operating graph of documents! Here are the relation documents containing keys related to the previous example:

relation::uuid
["group::1", "user::1"]
relation::uuid
["group::1", "user::2"]

Here is the view needed to manage binary undirected relations:

function (doc, meta) {
  // This if is only needed if relations are in the same bucket of documents
  if(Array.isArray(doc)){
    if(doc.length===2){
      emit(doc[0], doc[1]);
      emit(doc[1], doc[0]);
    }
  }
}

And finally all relations are inside the view:

"user::1"   ->  "group::1"
"group::1"  ->  "user::1"
"user::2"   ->  "group::1"
"group::1"  ->  "user::2"

So we can get all documents related to the user, and with little effort also all the document of a given type related to another document.

We can also extend the concept of traditional relations, for example this situation:

that is mapped with this documents:

// Documents
group::1    { ... }
group::2    { ... }
user::1     { ... }
user::2     { ... }
role::admin { ... }
role::user  { ... }

// Relations
["group::1", "user::1"]
["group::2", "user::1"]
["group::2", "user::2"]
["role::admin", "user::1"]
["role::user", "user::2"]

doesn’t let you manage a user that can have different roles for your groups. Imagine user::2 that needs to admin group::1 but its user cannot have the global role role::admin. In relational databases you probably have to change structure. With this approach you can simply add a relation of an other kind:

by adding this kind of relation:

["group::1", "role::admin", "user::2"]

Of course, you have to change your application logic, but this isn’t avoidable! And you must manage Relational Integrity by yourself.

Conclusion

From the tests I’ve done it works well managing document relations and can open a door to all the graph theory operations. Relations are independent from documents, so can be created between each couple of documents. Relations aren’t related to document type and can contain a different number of documents (for example three documents – this extends the concept of graph, as shown above).

Last, but not least, with a bit programming effort we can have joins between entities mapped with multiple MultiGet and views usage, so joins could be quite in RAM operations (IMHO, but this should be tested).

At the moment I need it (and so I’ve tested it) only for binary relations, but it can be used to manage n-ary relationships or directed relationships.

  • What do you think about this vision of Couchbase?
  • Have you tried something similar in your work? If yes, how does it
    work?
  • Why this is a bad solution? (I’m interested in all the possible problems, more than benefits, because data integrity is the primary objective)

#2

Interesting post.

I did something similar with RDF graphs, I wrote a blog post here:

The biggest issue I encountered was complex query performance at scale.

Andy


#3

Thank you for linking your work! Similar approach, but there is a difference in the way relationships are mapped. Comparing our versions, they differ:

  • With your approach, JSON-LD relationships are mapped inside the same documents
  • With my approach, relationships are separated entities

Guessing how this could affect query performance… I can’t imagine.

Do you know if there is any planning to integrate relationships inside Couchbase?

Thank you.
Matt


#4

I tested out this solutions and works well. I like the flexibility of the ‘always possible’ N-N relationships, because you can simply add the relationship document when you need it without changing the application logic. There is a drawback: you need to implement your own application logic constraints to avoid relationships abuse.

I noticed that using arrays there isn’t a great advantage compared to JSON objects and sometimes it may be useful to have other relationships data, for example the weight (or cost) of the relationship. So I suggest you to use a relationship document that as it’s own type:

{
  "type": "relationship",
  "documents": ["key1", "key2"],
  "all-the-data-you-need": { ... }
}

Looking at the performance there isn’t so much difference using objects over arrays.

Hope this helps someone! :wink: