How to get average document size in bucket

When sizing it’s helpful to know what the average document size is. The below query will output that.

select AVG(length(ENCODE_JSON(t))) as size
from `travel-sample` as t
2 Likes

Thanks for posting this tip, @melboulos :smiley:

Thanks. Certainly useful.

But curious to ask the Couchbase experts on this forum, will it get the answer by scanning every document (if yes, that can be very expensive and result in a lot of CPU utilization and IO) OR will it get the answer fro some metadata?

Update: tried it myself by executing the above query on a small bucket with just 200k docs.

Observations:
(1) Since it is a n1ql, it needs an index. Since we want to query all docs, a primary index is needed which is not so good. (2) It took pretty long.

Conclusion: For a bucket with just 200k docs and a resident ratio of 100% since it took pretty long, it leads me to believe that it actual scans every do. Which means it a very expensive operation to try on a production cluster.

Does somebody have another way? A more faster/efficient way to get the avg size of docs in a bucket? If a precise value cannot be arrived at, a somewhat-accurate value will also do.

Thanks

@couchbase team ,do we have any other way apart from N1ql to get average bucket size as N1ql gets expensive when we have humungous data.