Design Patterns for ElasticSearch plugin and Couchbase


#1

I am trying to determine what is the best way to model sub objects in a doc for searching ElasticSearch. According to Elastic Search there are three patterns

  1. Inner Objects
  2. Nested Objects
  3. Parent-Child Docs

The following will make a nested object

curl -XPUT 'localhost:9200/crunchbase' -d '{
   "mappings" : {
   "company" : {
   "properties" : {
   "milestones" : {
       "type" : "nested"
   },
   "acquisitions" : {
       "type" : "nested",
       "properties" : {
       "description" : { "type" : "string" }
    }
}

}
}
}
}

According to ElasticSearch you should not use inner objects because Lucene will flatten them is such a way that you cannot query on two inner object properties with AND constriction. It will always return OR results. It seems by default the plugin will use inner objects.Nested Objects

So my question is how to do Nested Objects and how to override the Mappings for the index. If this is not possible then am I forced to pre-flatten my data or to using Parent Child docs.

In my design sometime Parent Child makes sense, but sometimes nested makes sense, and I would rather not have limitations of the plugin force me into a bad design pattern overall.


#2

Hi,

Indeed the indexing/search part of your system should not dictate how you model your data for your database.

I believe ElasticSearch provides various way of doing document processing before indexing the documents. Can you use the transformation scripts in your context?
If not you probably have to extend the plugin.


#3

I agree that the ElasticSearch index should not dictate exactly how I model the documents, but sometimes I find the documentation on Couchbase very slim, especially when trying to determine how to migrate a very sophisticated relational model into a denormalized document model. The samples in the documentation tend to be very simple. The views in Couchbase can be very powerful too, but they are limited when trying to index against a variety of variable fields. I tried building hashes to add more flexibility but it really exploded my views in size. I have looked at the three books on Amazon on Couchbase and they seem to not be deep discussions either.

As far as ElasticSearch goes which I am getting to work successfully.I had to modify the mapping and rebuild the index to get the advantages of nested child. At the moment, it provides the best DSL for looking at your data.


#4

I think @davido may know the best approach here. @davido?


#5

@envitraux Generally speaking, both Couchbase and the plugin don’t care how your data is structured. Couchbase will store any document you give it, and the plugin only proxies the data from Couchbase to ElasticSearch. So from a technical perspective, you can use any of the 3 sub-document patterns you mentioned in the original post: you can use inner, nested, or parent/child documents. For the first two, you just need to set up the right mappings in ElasticSearch, and for the third the plugin lets you configure parent/child mappings based on specific fields in the document.

So the best way to reason about the modeling is in terms of effort: how much effort is it to store the data, and how much effort it will take to search/retrieve the data. If the retrieval part will take the most effort, then build the data model to simplify that, which actually means letting ElasticSearch dictate your model. I don’t see anything inherently wrong with that, if data retrieval is what you do most.

Now, regarding your original question - are you asking how to technically write the ElasticSearch mappings to index your nested objects correctly? If so, we’ll need some more information about your data and how you intend to query it to answer that.


#6

One of the ideas I am puzzling in my head is how to model a many-to-many relationship where that relationship has additional field data associated with it separate from the two parent entities. So there are two thoughts I have.

  1. Roll the relationship into one of the parents, but which one?
  2. Create a separate entity for the relationship with some extra denormalized data from the parents

Here was an interesting blog post on the matter:

I thought the following quote interesting.

NoSQL data modeling often starts from the application-specific queries as opposed to relational modeling:
Relational modeling is typically driven by the structure of available data. The main design theme is “What answers do I have?”
NoSQL data modeling is typically driven by application-specific access patterns, i.e. the types of queries to be supported. The main design theme is “What questions do I have?”

Any thoughts?