Data Modeling for Couchbase

data_modelling

#1

I have a crawler that tests web sites/pages. And below is the model I’d do with RDBMS:

class Site{
   Uri Uri {set;get;}
   Collection<Test> Test{set;get;}
}

class Test{
   Collection<Page> Pages {set;get;}
}

class Page{
   // Page info
}

My queries would be like how many pages failed to load, how many returned 404, etc. per site and overall.

So my concern with couchbase is document size, 20 MB, some sites I
Crawl has 10K pages.if i crawl couple lets say 10 times, Site object
will exceed this limit and it eventually will.

What is the correct way to do Modelling here?


#2

Hi, in any case, we don;t recommend having big documents. It really depends on the use case but basically big docs will clutter your network, and you don’t want that to happen. Here you need to normalize your documents. reference the id of the page document in the appropriate test document, refer the id of your test doc in the appropriate site document.


#3

cool. Thanks for the reply.

Then can you show me some pointers on how to query those documents. For example.

class Page{

   int SiteId;
}

How do i get the pages with site id = 5? ie: where clause in sql.

Thank you.