Enriching documents with sync function or another document transformer


#1

Documentation around the sync function often recommends to use properties of the document itself for authentication control purposes. The example sync function recommends, for example, using an owner property to prevent unauthorized access by other parties. However, this requires the owner property be known and set at the time the document is created. That in turn means the business logic (creation of the document data) and a cross-cutting technical concern (maintaining authorization meta-data) are interconnected.

If I want to have the authentication and access part of my client completely separated from the specific components creating a given document, I will still have to include a hook to access the username and insert it into the document. This only grows more complex the more meta-data I want to include.

It would be great to enforce these kind of invariants (newly created documents have an owner property) at the sync gateway. After a little digging it seems that making modifications to the document data as part of the sync function isn’t supported. That makes sense, as that isn’t the purpose of the sync function. Is there any other endorsed pattern I should be following to make these kind of transformations? Is there something I haven’t found in the CB Lite API that would allow me to make that kind of transformation on the client side before committing the document?


#2

As you say, that sort of operation is not supported by design in the sync function - primarily because there are various scenarios that require the sync function to be run without user context (resync and sg-replicate come to mind). In most cases it isn’t ideal for application data associated with the document (like owner) to be lost when the document is reprocessed or replicated between clusters.

I agree with you about the need to maintain a hook to access the username. I’m less sure that there’s significantly more auth-specific metadata you want to include than username (or variations on the username). Potentially “role” I suppose, but roles are more commonly used to manage channel access or to apply system-wide write restrictions.

@jens - do you know if there are any client helper functions to assist with username retrieval/management?


#3

At the end of the day, I don’t think wrapping the client-side database accesses in some kind of hook is going to be an extreme burden, it’s more that storing this in the document has a funny smell. In the best-case scenario the documents would be structurally identical regardless of how they were created (client, server, another platform, etc).

Another alternative would be to perform this sort of modification in an app-server. The CB mobile docs say that for simple applications the sync-gateway alone might suffice, but I’m fairly confident I’m going to need additional server-side components, and these will probably know more about the authentication context than the client. For example, a likely scenario would be Kerberos authentication with additional user details queried over LDAP. In this case the server has access to more accurate user and roll data than the client. Are there any patterns to transparently insert a proxy between the client and the gateway? For example, how challenging would it be to point the client push replication at an endpoint in the app server (rather than sync gateway), transform the documents, then pass the transformed document stream on to the gateway? Does that erase the simplicity benefits offered by using CB gateway in the first place?


#4

The design of the sync function was inspired by map/reduce. In particular, it tries to limit dependencies on external state. The intention is that external authorization is accomplished by manually adding uses to channels via the admin API.

The design is also strongly oriented toward making documents self-contained. This falls out of the design of a multi-master distributed system. Remember that documents are generally created and edited by clients.

The sync function doesn’t allow the document to be mutated because it’s approving a specific revision of a document, and if it mutated the document that would create a new different revision (the revision ID is based on a digest of the document contents.) This would leave the revision history on SG different and incompatible with the revision history on the client that pushed the document to SG.

Is there something I haven’t found in the CB Lite API that would allow me to make that kind of transformation on the client side before committing the document?

The Document class doesn’t have anything specific for that; it’s lower level. You can of course wrap your own logic around it that interposes such changes.

If you’re on iOS, the CBLModel class (a higher level abstraction around documents) has a propertiesToSave method that can be overridden to add/change properties that get saved to the database. We haven’t added data modeling features to Java or C# yet, though.


#5

Still making some tech choices, but Xamarin is a likely choice. I think something like the DocumentUpdater that’s available in the Java API should work well. Since that’s a functional API it would be fairly straightforward to create one DocumentUpdater instance to attach needed meta-data, a second DocumentUpdater to construct or modify the business data, and then aggregate the two together.

I hadn’t considered that making mutations on the server side would cause a skew between the client-side copy of the data and what is in the database. My scenarios were mostly centered around adding app-server relevant data so I wasn’t thinking about how that looks to the client. Since the revision is a hash of the document (I think?) I guess that’s not possible for the server and client to have a different view of the document.


#6

Could you elaborate on this? I’m familiar with the API endpoints for managing users and rolls, and that the channels property can be used to manage channel assignments. Is there another component of the API I’m not familiar with that would allow fine-grained management of channel assignments without modification of the documents themselves?


#7

Think of it as like Git. If the client pushes a revision to the server, and then the server modifies the document, it creates a new revision on top of the client’s. Which then has to be synced back to the client. And if in the meantime the client has made another change to the document locally, that revision will conflict with the server’s update, requiring a merge. It’s possible, but it gets messy.

Is there another component of the API I’m not familiar with that would allow fine-grained management of channel assignments without modification of the documents themselves?

The API allows you to modify user access to channels, but not document assignment to channels.