Surprising transformations on ISO-8601 values unique to .NET SDK?


#1

I have the following document with id in the form namespace.GUID, so for example:

RAMSOFT.AUDIT.STUDYACCESS.00485ff354cd491a9044aa6143608fa5

The document content as shown on the screen when I fetch it via couchbase admin console contains a field accesstime, which is in this format:

  "accesstime": "2014-11-28T11:01:18",

When I read the data via the .NET sdk, I see a transformation occurring where the T is changed to a space:

 "2014-11-28 11:01:18"

When I read the same data via the Python client (using my n1ql query runner tool), the T has not been transformed to a space:

select studydatetime from `default` where authtag = '783E12AE07ECED7E199490C1CF268799' 
        and _document_type = 'STUDYACCESS' 

{'studydatetime': '2014-11-28T11:30:00'}

So why does the .NET bucket.Get(id).Value (which is a Newtonsoft.Json.Linq.JContainer) get transformed?

What other magic transforms or JSON rewrite rules should I be aware of?

Warren


#2

@wpostma -

It looks like a bug; the JSON parser (JSON.NET) appears to format the date/time differently as a string. There should be consistency across the SDK’s with respect to formatting.

Since that is a string representation of a date, if converted to a DateTime is there an actual difference?

-Jeff


#3

I think it is a bug. But I also think there are ambiguities in the real world data formats.

For example, in ISO-8601 as writ, I believe omission of timezone is ok right, so ‘2014-11-28T11:30:00’ is just as valid as ‘2014-11-28T11:30:00Z’. I am not sure if omitting the Z means implicit UTC, or implicit NON-UTC though.

But perhaps in certain implementations, including perhaps newtonsoft JSON, this kind of thing is biting me?

It does NOT occur when you use N1QL query in .net:

 const string query = "SELECT accesstime FROM default where studyinstanceuid = '1.2.124.113532.19.29269.37783.20120128.72021.7570795' ";

                QueryRequest request = new QueryRequest(query);

                result = bucket.Query<dynamic>(request);
                foreach (var row in result.Rows)
                {
                    Console.WriteLine(row);
                    Count++;
                }

It DOES occur when you get a document via bucket.Get(“documentid”), which returns Result, and Result.Value, which is of type Newtonsoft.Json.Linq.JContainer contains a value which has been transformed.

Here is my entire Document:

{
  "studyinstanceuid": "1.2.124.113532.19.29269.37783.20120128.72021.7570795",
  "accesstime": "2014-11-13T18:15:23",
  "username": "ADMIN",
  "accesstype": "UPDATE OBJECT PT",
  "statusorder": 210,
  "studydatetime": "2012-01-28T07:19:00",
  "accessionnumber": "000408703",
  "institutionname": "RAMSOFT IMAGING",
  "patientid": "098234",
  "issuerofpatientid": "NNA",
  "patientfullname": "SANCHEZ^VIVIANA",
  "status": "PRIOR",
  "_fbidentity": 3156,
  "authtag": "6D417FA9C6A430CD3ED34B69C57361DC",
  "_document_type": "STUDYACCESS",
  "_document_rev": "2",
  "_document_origin": "FBIMPORT"
}

W


#4

According to wikipedia’s ISO-8601 page, it is omit-table in certain circumstances:

““ISO 8601:2004(E)”. ISO. 2004-12-01. 4.3.2
NOTE: By mutual agreement of the partners in information interchange,
the character [T] may be omitted in applications where there is no risk
of confusing a date and time of day representation with others defined
in this International Standard.”

Ref: https://en.wikipedia.org/wiki/ISO_8601#cite_note-17

-Jeff


#5

My problem with it being omittable is that I can’t do a round-trip verification that what I wrote, and what is IN the document will MATCH the string value that I get back when I fetch the document.

Can whatever is doing this “normalization” be told to STOP IT?

W


#6

Hmm, it’s using the same JSON serializer for both K/V and N1QL, so likely it’s related to the type being returned by bucket.Get(key)…can you try explicitly defining the Type T as dynamic and see the result?

var result = bucket.Get<dynamic>(key);

For example?

-Jeff


#7

Using string in the type T area the accesstime field remains untransformed.

var result = _bucket.Get<string>(id) 
-->       "2014-11-28T11:01:18\"

But with T of dynamic:

 var result = _bucket.Get<dynamic>(id);
 result = ["accesstime"]    {28-Nov-14 11:01:18}    
       dynamic {Newtonsoft.Json.Linq.JValue}

So this appears to be perhaps a behaviour of Json.Linq.JValue or whatever in Newtonsoft.Json is responsible for datetime parsing?

Since version 4.5, WRITING using ISO-8601 format has been the default for this library. So why isn’t it the default for reading? What a confusing choice. Source:

http://www.newtonsoft.com/json/help/html/DatesInJSON.htm

Is there any way that the Couchbase library can be made to invoke this library in a way that forces that the default (ISO 8601) date-time format is respected?

I am not sure why any transformation is required at all, and I am not sure why _bucket.Get(id) does NOT cause this bug, but _bucket.Get(id) does?

I suspect that this is either a bug in the couchbase .Net client code, or at least, something that requires a workaround from the couchbase .net client code. Perhaps some JsonSerializerSettings should be in use for the Get codepath? (Though that’s not usually what the Serializer is used for), something like this:

          static JsonSerializerSettings jsonSerializerSettings =  new JsonSerializerSettings 
          {  DateFormatHandling = DateFormatHandling.IsoDateFormat  // instead of MicrosoftDateFormat!                  };

W


#8

I am examining the source:

// CouchbaseBucket.cs ~ line 522


          public IOperationResult<T> Get<T>(string key)
          {  var operation = new Get<T>(key, null, _transcoder, _operationLifespanTimeout);
             return _requestExecuter.SendWithRetry(operation);
          }

I suspect that the only element that could be causing this is the _transcoder?
There’s not a lot of documentation on the options or configuration or what this Transcoder does. I see references to the Transcoder in the Java and Python SDK docs, but I was unable to see any user-configurable element to the Transcoder element of the .NET SDK. Am I just guessing based onthe name that it has some role to play in the transformation of data coming from Couchbase client read operations?

Footnote: I decided to try to extend the Unit Tests for the couchbase-net-client sources, but I was unable to get the Unit Tests to run in visual Studio 2015 or 2013, until I added the NUnit Test Adaptor via Tools -> Extensions and Updates.

W


#9

@wpostma -

The transcoder simply handles conversions of primitives and enforces byte order. More likely the issue is with the DefaultSerializer since that is what converts the body of the memcached packet to the type of T.

The differences between string and dynamic can be attributed to this as well: strings are treated as strings, but dynamic and anything other than a string or byte array will be converted to JSON. So, this is most likely the default serializer settings for JSON.NET.

-Jeff


#10

Ok. So far it makes some sense. Perhaps I simply need to enforce an ISO Serializer setting, somehow when I access the JSON.NET’s containers.

Another element of confusion for me, is why inserting bucket.Upsert(string id,string jsonstring) is different than inserting bucket.Upsert(Couchbase.Document), when the jsonstring I use is a valid json, why can’t I do a response = bucket.Get(id) and get back a response.Value that is a JSON.Net typed Container?

In other words, if I upsert AS a string, and I use dynamic Bucket.Get(string id), I get back a value of type of string, but if I upsert AS a Couchbase.Document, I get back a JSON.NET container. In my mental view of things, the ONLY things being persisted are the ID and the document (which in the end are both strings).

But it appears that a third “metadata” field, a document data type field, is being persisted somewhere. Yet it’s not shown to me, when I edit via the web console. This is a bit of a big omission, in my view.

Warren


#11

@wpostma -

You are somewhat correct here; there is a “flags” field included with the memcached packet and stored as metadata in couchbase. The flags includes information about the type of the body so that different SDK’s can read/write the same data and have a protocol for translation of the body.

This is how it works:

  1. Strings are stored as strings
  2. byte arrays are stored as binary blobs
  3. Objects and all primitives (int, object, etc) are encoded and stored as JSON

Along with the flags, other data is stored in the metadata as well: CAS, Expiry, etc.

-Jeff


#12

@wpostma -

Very important point: a JSON string is not the same as a JSON object!

var result = bucket.Upsert<object>(key, "{'poo':'bar'}");
//stored as "{'poo':'bar'}"

var result = bucket.Upsert<object>(key, new {poo="bar"});
// stored as  { "poo": "bar" }

-Jeff


#13

You’re so right. I was getting myself a bit confused.

Also that “flags” field is probably essential for anyone working with couchbase documents to understand. I will see if I can write some demo code that explains the difference in the resulting Server document metadata, and how it looks to the user.

I also think that it’s going to be insane for me or any other developer to do STRING based JSON DateTime comparison operations, or to do round-trip document “non-change” logic. That’s just fighting the grain of the system too much.

Just to make sure that the unit-tests for the Couchbase .NET SDK have a few sanity test cases for dates, I wrote a few, and I have posted the changes on my own github:

I’m not suggesting you pull my change but it’s okay if you did as long as you drop a few stupid extraneous hunks from it, I just wanted to write an example of the kind of date invariance checking I wanted to see myself.

I also think if anything changes in the JSON.NET Serializer or Deserializer that caused a breaking change in code, we’d want to know about it, and have it wrapped up under test, in these unit tests.

Thanks so much for looking into this. Things are clear to me now.

W