Couchbase for historical data storage for 10 years

Hi All,

I want to store the data for 10 years in couchbase cluster and also need the information abput the data based on adhoc as well as fixed query. Please find the details i want to do with historical data

  1. I would like to do adhoc query and also does not want couchbase to search whole 10 years of data if am searching for 1 month old data.
  2. some BI to be run on the data stored

For this i want to create view for some fixed queries so that query will be faster. but how about the adhoc queries?
Also please explain me about the data modeling for storing 10 years data.

Thanks
Arihant

Hi Arihant,
Could you give a bit more details about the data you would like to store in Couchbase and the amount expected amount/size?

With Couchbase Server 4.0 you will get extended query options with N1QL (SQL like language for documents) meaning that you can freely search all your data ad-hoc. The performance of the search however can be greatly impacted by the way you model your data and build op your queries.
Therefore this is not a trivial question that needs a bit more background, before giving any advice.

Compared to Relational Data Bases, NoSQL has not had as many years to build up a set of proven best practices and therefore it’s a bit harder to give general advice.

If you have not already seen these resources, perhaps they can help you get more insight into data modelling and how Couchbase works:


http://martinfowler.com/nosql.html

I hope this helps

Thanks for the info martin, The data will be converted to Json and not sure about size of each document.
The documents will be more than 10 million and all documents are of same type.
Will it be helpful if i have one bucket with some 10-15 views or should i have more than one bucket to segregate the documents(based on years data) , like one bucket for 2 years data.

Also i studied that cocuhabse is good for interactive application rather than historical data.
i wanted to know will cocuhbase is good for BI on historical data.

I do not see any issues in storing all 10 million documents in the same bucket, in fact i would recommend you save all documents in the same bucket. The reason is that “cross bucket” query using views is not supported. Also storing different document content in each document is also possible and a very normal use case.
You can think of Buckets as a logical seperation of Views and Data, meaning that Views are bound to a single bucket and therefore only are executed for documents that are added to the bucket where a specific view belongs/lives. Bucket can therefore be used to help scale and perform in some cases.

One hard limit that you should be aware of is the document max size limit of 20MB. Couchbase does not support documents larger than 20 MB, in case the documents are larger you would need to split them up in logical pieces/parts. The video i referred to explains ways to do that.

Regarding historical data storage… In the early days (i think 2 major version back) Couchbase Server required all keys to be available in memory, meaning that if you would store huge amounts of historical data then you could run out of memory because all the document keys would not fit in memory. That is no longer the case and therefore storing a huge number of documents does not require the same amount of RAM.
Couchbase has a lot of customers that use Couchbase for both live data and historical data analysis, with even more than 10 million documents.

Perhaps this presentation about Elastic Search could give some inspiration to what is possible:
http://www.couchbase.com/connect/agenda/integrating-elasticsearch-real-time-kibana/

Almost forget the most important topic… N1QL - SQL for Documents.

From the next version of Couchbase Server we support ad-hoc queries in SQL like language.
N1QL supports: Joins, NEST, SELECT, COUNT and a lot of other operations.

You can read more about N1QL i try it out at: http://docs.couchbase.com/developer/n1ql-dp4/n1ql-intro.html

Depending on your analytics requirements, N1QL could be the only thing need in your case.