It is great that NoSQL/Couchbase does not require schema definition before writing a document. But it has the drawback that the database does not enforce consistency across docs. So I devised the following method for adding this control to our architecture.
The idea is that each bucket, besides its regular docs, also stores the schema of those docs. Then the application can use those schemas as it sees fit. It could check every doc against its schema before writing it to CB, or it could randomly check a subset of docs each night. Or only check important docs that have the potential to break something bad.
I am looking for feedback/suggestions on this. Others may copy as they want.
Standards
-
All regular (persisted to disk) documents in Couchbase store their value in JSON format (www.json.org).
-
Every document contains two text fields that state the schema for that document. The fields are docSchema and docSchemaVersion.
-
All documents contain a docType field that is used to distinguish logical document sets within a bucket.
-
Each Couchbase bucket contains a set of special documents that hold the data schemas for all the other documents in that bucket. The names (keys) of these documents are docSchemas and docSchemasNNNN (where NNNN is 0001, 0002… for previous versions).
-
Each of the schema documents contains a set of JSON schemas (http://json-schema.org), one for each schema in that bucket. The schemas define the list of fields permitted in a document and optionally other constraints such as required fields, non-null fields, numeric ranges for a field, keywords allowed, etc.
-
When a schema changes, the change is backward compatible so new documents match the previous schemas.
-
More than one docType can share a schema. So “worker” and “manager” documents can use the same “person” schema.
Not Addressed in this Standard
- Memory-only (memcached) buckets that are not written to disk. This data may be more free-form and not governed by these standards.
Example
Bucket: HR_RECORDS
Document: key = jsmith, value = { "docSchema":"utd-hr-person", "docSchemaVersion":"1.0.4", "docType":"worker", "lName":"smith", "fName":"john", "city":"waltham"}
Document: key = bjones, value = { "docSchema":"utd-hr-person", "docSchemaVersion":"1.0.4", "docType":"manager", "lName":"jones", "fName":"bob", "city":"newton"}
Document: key = devops, value = { "docSchema":"utd-hr-dept", "docSchemaVersion":"2.2.2", "docType":"department", "name":"Development Operations", "location":"230 Third Ave, Waltham"}
Document: key = docSchemas, value = {
"utd-hr-person" : {
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "utd-hr-person | 1.0.4",
"description": "The JSON schema for a person who works at UpToDate",
"type": "object",
"properties": {"docSchema": {"type": "string"},
"docSchemaVersion": {"type": "string"},
"docType": {"type": "string"},
"lName": {"type": "string"},
"fName": {"type": "string"},
"city": {"type": "string"}
},
"required": ["docSchema","docSchemaVersion","docType","lName"]
},
"utd-hr-dept": {
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "utd-hr-dept | 2.2.2",
"description": "The JSON schema for a company department at UpToDate",
"type": "object",
"properties": {"docSchema": {"type": "string"},
"docSchemaVersion": {"type": "string"},
"docType": {"type": "string"},
"name": {"type": "string"},
"location": {"type": "string"}
},
"required": ["docSchema","docSchemaVersion","docType","name"]
}
}
Document: key = doc-schemas-0001, value = similar to above schema, but perhaps with fewer fields