I have a crawler that tests web sites/pages. And below is the model I’d do with RDBMS:
class Site{
Uri Uri {set;get;}
Collection<Test> Test{set;get;}
}
class Test{
Collection<Page> Pages {set;get;}
}
class Page{
// Page info
}
My queries would be like how many pages failed to load, how many returned 404, etc. per site and overall.
So my concern with couchbase is document size, 20 MB, some sites I
Crawl has 10K pages.if i crawl couple lets say 10 times, Site object
will exceed this limit and it eventually will.
What is the correct way to do Modelling here?