Data modeling 1 to many relationship

In a 1 to many relationship is it better to embed or refer to a document if consistency is important but the child document is always read with the parent document.

For example, if an event is always read with an address but you also would want to get a list of all addresses and keep consistency, would it be better to embed the address in the event document with an addressId and type property and then run a script to update all instances of type address with that addressId if an address is mutated or would a better approach be to only reference an address in the event document? I noticed in the documentation examples that if an object is embedded in another document that the embedded object does not have an id property. I’m not sure if including an identifier for an embedded document is the right approach to ensure consistency.

// embed address in event document including addressId
{
type: event
id : event1
address {
type: address
id: address1
address: 1234 Main St
}
}

{
type: event
id : event2
address {
type: address
id: address1
address: 1234 Main St
}
}

Is this the correct way to model the event document with an address object embedded and then if an address is updated run a script to update all instance with address type that match that addressId or would it be better to model an event document to reference an address document?

// reference address in event document
{
type: event
id : event1
addressId: address1
}

{
type: event
id : event2
addressId: address1
}

{
type: address
id: address1
address: 1234 Main St
}

There’s really no such thing as an embedded document. It’s just a nested object (map, dictionary) in the JSON and has no special meaning to Couchbase Lite or Sync Gateway.

You don’t need to extract these into separate documents and try to keep those in sync – you can set up a view to do that automatically. The map function looks for the embedded objects representing addresses, and emits each one. The key would be whatever you want to search addresses by, for example the email, and the value is the address object. (Or if your addresses have unique IDS, you could use those as the keys so you can look them up by id.)

Now you can query or enumerate addresses as though they were documents using the Query API.

Ok thanks.

I’ll add a view to update a nested object using a key that’s unique to that object.

It seems that couchbase lite could add significant complexity to the client to ensure data consistency if nested objects are used heavily, especially to maintain a large project with many developers. I think it’s great for projects with 1 or 2 developers but could foresee issues with knowledge transfer when working on large projects with many developers. Are there any large scale projects that are currently using couchbase lite, and how are they handling these issues besides good documentation?

Can you explain why? It sounds like you’re implying that data would have to be copied between documents, but I don’t think that’s necessary except in special cases.

Are there any large scale projects that are currently using couchbase lite

We have some big customers like GE and RyanAir; I don’t know the size of their deployments, though.

But this JSON data model is identical to the one used by Couchbase Server, which is regularly used at huge scale by large customers like PayPal and LinkedIn.

I don’t think you need to. As I said, if you want to query addresses, you write a view that indexes them, then query that view. There’s no need to copy the addresses into new documents.

Let’s say an event has an address, and an event’s address is usually retrieved along with an event and you’d also like to display a view that lists all addresses.

Since an event’s address is usually retrieved whenever an event page is displayed I’ve nested the event address object in the event document. Since an address can belong to more than 1 event the client would need to ensure that all instances of address objects with that address key are updated.

// display event1 including the event address on a page
{
type: event
id : event1
address {
type: address
id: address1
address: 1234 Main St
}
}

// display event2 including the event address on a page
{
type: event
id : event2
address {
type: address
id: address1
address: 1234 Main St
}

// display event3 including the event address on a page
{
type: event
id : event3
address {
type: address
id: address2
address: 4321 Pine St
}

// display all addresses on a page
[
address {
type: address
id: address1
address: 1234 Main St
},
address {
type: address
id: address2
address: 4321 Pine St
}
]

// update address1
address {
type: address
id: address1
address: 9876 Maple St
}

// create a view to update all nested objects/documents of type address with id: address1

This would add additional complexity to the client to ensure data consistency. Clients that use web services or normalized databases don’t have to worry about data consistency in most cases in regards to creating a view to update multiple nested objects. My concern is that there could be issues in maintaining large projects since client-side developers would need to know when to create views to ensure data consistency for some embedded objects when objects are updated. I’m trying to understand the best way to model a case like this.

If an address can belong to more than one event, and addresses can be updated, you probably shouldn’t embed the addresses within the events. As you said, it creates a lot of effort to manually update all events with an address any time an address change. (This is especially bad when you take into account replication, since it multiplies the amount of data that has to be transmitted when an address changes.)

Instead I’d use relations: make the addresses be separate documents, and store the docID of the address in the event doc.

Ok thanks. I’ll use relations for one to many and many to many relationships when the object can be updated as a general rule with couchbase mobile.

It seems that embedding nested objects could cause issues as requirements change. For instance, if at the time the event model is created an address is implemented as a nested object because there isn’t a requirement to update an address from an all addresses view, but then if the project requirements change to allow an address to be updated it would require a lot of effort on the client to handle this change to update all address objects with that address key.

I’m concerned about potential maintenance issues with evolving project requirements, especially when working on a team with junior developers. It seems that a nested object shouldn’t be used in a one to many or many to many relationship unless you’re near certain the project requirements won’t change to allow the object to be mutated.

There doesn’t seem to be a lot of material available for document data modeling best practices besides general guidelines when to use a relation vs an embedded object, I understand a lot depends on individual use cases.