Limit total storage per user

Hi All,

I working on an app running on iOS and Mac.
I’m storing data locally with Couchbase Lite, and syncing to the Couchbase servers using SyncGateway.

In the free version - I plan to allow unlimited local storage.
In the subscribed version, the user will have limited storage (but synced across devices).
I’m thinking about the best way to implement that limitation.

Obviously I can calculate used space on the device periodically and stop the creation of new docs once they are over the limit for their subscription.
Is there any option to perform this calculation and limitation on the server side in the sync function or in the sync gateway somewhere?

Thanks for any ideas and advice you can give.
Cheers.
Paul

I would suggest you to read following answer I gave about the size

I dont think sizing is available on the server side for a particular user.

@paulr have you seen this blog post?

https://blog.couchbase.com/database-sizes-and-conflict-resolution/

Also, which version of Couchbase Lite / SG are you planning on using?

This is currently on 1.4.1 of couchbase lite, and 1.5.1 of SG.
(I haven’t switched to your shiny new 2.x line yet),

That document you referenced was interesting - thanks… but I’m not focused on space
used and compaction. Its on the size of the users documents.

I appreciate that there will be overheads (e.g if a user’s documents total
10MB, then it may cost me 20MB to store that data with revision/history etc.

Restricting total size of documents seems fairer to me (i.e. easy for the user to understand)
rather than storage used, with the complications that the user doesn’t know about or understand.

I have calculated the DB directory size on the mobile device, but that includes
all revisions etc, and doesn’t go down when the user deletes documents.

Does that make sense?

@pankaj.sharma - thanks for the comment - but I am focused on document size, not including revisions etc.

@jens
I was wondering if you have a thought on this?
I need to figure out how much actual data is stored in the docs (including attachments), not just the amount of disk space used (which includes revisions and deleted).
Thx.
Paul.

I assume you’d measure the data size as the total byte count of the JSON of all documents, plus attachments?

To get a document’s data size via the public API, I think you’d have to get the document, serialize it to JSON yourself, and take the size of the output data. Of course you’d need to iterate over all documents to compute the total.

If you want to track this over time, the most efficient way would be to define a view whose map function computes the JSON size and emits that as the value. Then you can do a simple reduce with the “sum” function to get the total.

Thanks @jens - that makes sense.
I dont like having to do this because of the compute cost,
I was just hoping that you guys would have a trick.

I have a search function that pulls and searches documents (without loading attachments) and its pretty slow for large (30k docs) data sets, and especially slow getting started (I use RunAsync to run it in a background thread, and it takes 5-15 seconds to get started).

The view/map makes sense - I’ll give that a try.

Another way is to capture all inserts/deletes/updates and keep a running total. I’m nervous about it getting inaccurate as local operations and sync activity happens.

@jens So here’s what I’ve got to - just wondering if you could comment (plus I have a couple of questions below). My goal is to find all docs of type “Note”, and add up the length of the text field, plus all attachments.

        view = database.GetView("Size");
        view.SetMapReduce(
            (doc, emit) =>
            {
                long len = 0;
                //    Make sure we have a type
                string thetype = doc.ContainsKey("type") ? (string)doc["type"] : "";
                if (thetype == "Note")
                {
                    len += ((string)doc["text"]).Length;
                    if (doc.ContainsKey("_attachments"))
                    {
                        var atts = doc["_attachments"] as Newtonsoft.Json.Linq.JObject;
                        var atts2 = atts.Children();
                       
                        foreach (Newtonsoft.Json.Linq.JProperty att in atts2)
                        {
                            var value = att.Value;
                            len += (int)value["length"];
                         }
                    }
                }
                emit((string)doc["_id"], len);
            },
            (keys, values, rereduce) =>
            {
                long len = 0;
                foreach(var value in values) {
                    len += (long)value;
                }
                return len;
            },
            "4");

I call it as follows:

        var query = database.GetView("Size").CreateQuery();

        var rows = query.Run();
        Assert.AreEqual(1, rows.Count);
        var row = rows[0];
        var total_size = row.Value;

Questions:

  • Walking attachments is awkward. The “doc._attachments” item is double nested (here’s what I see for “atts” in the debugger:

      {{   "test1.jpg": {     "content_type": "image/jpeg",    
       "digest": "sha1-HfekMJeUJp32UzuuA5uetG7FAog=",     "length": 1024,     
      "revpos": 1,     "stub": true   } }}
    

Hence me having to do .Children() and before the loop. Am I doing something wrong there?

  • The documentation mentions some built in reduce functions (_count, _sum etc) which are preferred for performance. How do I use those for reduce - I just need _sum?
  • The older (pre 2.x) documentation is getting a little hard to find now, and it looks like the Guides are gone - the archive which forwards to here which is minimal. I really liked those guides.

Thank you for any advice you can provide.

I’ve been looking into this same thing recently as well. The biggest limitation that I’ve found is that, as the number of indexes increases, the time between when you save a document and when that document can be found in a query increases.

Example - if you save a document and then immediately run a query where the result set should contain the document you just saved, the probability that your new document will appear in the result set goes down as the number of indexes increases.