Best way to design data structure


Hi experts,

I am planning to migrant existing ms sql database to couchbase db and would like your suggestion on what’s the best way to design the data structure for my housing inspection with the following requirements

My housing inspection application allows field agents to inspect different type of objects in different level.

For example, property agent can inspect an town house, unit or apartment which contains master bedroom, living room or car park.

Master bedroom consists of windows, door , toilet, sink. Those objects may have different attribute depend on type e.g. windows may have color, photo,material (wood, plastic or steel) and its condition e.g. damage, good, need to replace etc.

In short the business object is a hierarchy tree contains unpredictable nodes. Each node contains number of attributes with different types. Each attribute with particular type has its own reference data dictionary. Each of attribute may have its own history for past inspection

In ms sql database, I need to have many small tables for main business objects and data dictionary. For example,

Property : main table for property object
Property Type: type of property (house, unit, apartment)
Property Condition: condition of property (damage, poor,good)
Inspection: inspection list
Property Inspection: list of property in an inspection (number of properties to be inspected on datetime)
Property Nodes: hierarchy tree of property nodes and attributes (bedroom, bedroom->toilet, bedroom ->sink)
Property Dictionary: dictionary of the property nodes and attributes ( colours: red, white, blue or material: steel, wooden, plastic)
Property Template: template of node and attribute for property to be inspected.

What’s the best way to do achieve both flexibility, scalability and better performance? Should I used embedded or reference data model? My understanding, embedded data model is best suit for this since I need to load a single property and update the entire property relate data frequently e.g. inspector login to mobile app and select property to inspect. The system will load all property information including all property dynamic fields. However, if I used this data model, it may impact on the performance since I need to load a big chunk of data to mobile device and update info back to the server. Any better ideas for this?

What type of tree data model I need to use to traverse and update all of the property field information? Materialized Paths is best for this case because it gives best performance to query node and subnode but I am still not quite understand how to insert/update each nodes using this algorithm.

Thank you.


Excellent Questions.

The flexibility of NOSQL and a document database provides lots of options for modeling. Each strategy carries different requirements:

[1] Embedded Fields - Each document includes all of the application relevant information in the included fields. This is optimized for retrieval and “answers”. It allows the application to be largely agnostic. Data bloom and increased storage are a consideration. This generally provides maximum performance for retrieval, using a singular lookup. The tradeoff is as documents grow, retrieving the entire document can result in slowdowns.

[2] Referenced Fields - Using referential documents (carrying the schema within a document) requires a large degree of application specific knowledge of the data model, and managing the data model from the application tier. This type of model is optimized for a higher degree of selectivity and “questions”. As far as organizing the data into a specific tree data structure, as long as the the application applies the specific algorithm to storage and retrieval, its possible to utilize any tree structure of choice. It’s a balancing act determining the Order and Height for optimal storage.



For my situation, which one should I choose?