Embed document to reduce bucket number

Aya_Gamal · August 4, 2021, 1:42am

if i has 16 csv file format dataset and i had load them into couchbase as every file mapped to abucket

with the following
Allergies 24MB

Careplans 40.1MB

Conditions 68MB

Devices 23.3MB

Encounters 258MB

imaging_studies 24.8MB

Immunizations 28.9MB

Medications 314MB

Observations 420MB

Organizations 24.5MB

Patients 30.5MB

payer_transitions 36.6MB

Payers 19.7MB

Procedures 68MB

Providers 37.8MB

Supplies 81.7MB

i had read that its better to reduce bucket number by embedding bucket and add type attribute

how can i do that

dh · August 4, 2021, 7:59am

You could add a leading type column to your CSV data.

This could be as simple as (Linux/Mac) something like:
cat allergies.csv|awk 'NR==1{print "type,"$0}NR>1{ print "allergy,"$0}' > allergies_type.csv

You could import with a range of key values and subsequently update that range of keys adding a type attribute.

cbimport csv -c http://localhost:8091 -u Administrator -p password -b default -d file://allergies.csv -g 'allergy_#UUID#'
update default set type = 'allergy' where meta().id like 'allergy_%'

HTH.

Aya_Gamal · August 4, 2021, 10:16am

do you mean import every csv file in the same bucket and every time update data type

dh · August 10, 2021, 3:11pm

I understood your question to be “how to load multiple CSVs each with different data into one bucket using a type attribute to differentiate the source”, which is what I attempted to demonstrate. I could have misinterpreted this (if so, sorry!).

You would only update either the CSV prior to loading or the documents loaded with the matching key, assuming you needed an explicit ‘type’ attribute and couldn’t just make use of the meta().id to differentiate between the types. (Having an explicit type field expands indexing options.)

You could have a variation where you load with a known unique key range (e.g. prefix) and update only those (rather than all with a common prefix). e.g. if you load with a date in the key - -g '20210810_allergy_#UUID#' then you could update only those matching a specific load. (Obviously customise to suit the frequency of your data loading.)

(If you’re loading often it may well be simplest to just update the CSV beforehand.)

HTH.