Can Couchbase be used to efficently save time series information and access that information fast?


#1

Situation

Right now we’re using a realtime database of aspentech called InfoPlus.21. http://www.aspentech.com/products/aspen-infoplus21/

We collect information of different data sources in various way.

  • Reading values directly from the PLC (the control system of our production machines) with that
    we save the measured values like pressure, currents, temperatures and so on. Most of those signals we sample with an equal timedistance of a few seconds. For the different signals of 1 to 10 seconds between each value.
  • Some of those could even be sampled in a slower sampling rate. Since the temperature of an melting oven with about 50 tons of almost fluid copper in it does not change significantly in a second :wink:
  • We collect this kind of information with OPC Servers that have a direct network link to communication processors of the mostly Siemens-S7 based automation systems.
  • For faster processes as Cold Rolling and Hot Rolling we are using a special system called IBA PDA
    http://www.iba-america.com/en/products/software/ibapda/ There we sample up to a frequency of 5 ms. this system creates binary files. For the analysis of those binary files we have API at hand. The users in the operating department also use those binary files with a special Visualisation tool called IbaAnalyzer
  • So each datapoint has a signalname (or ID)
  • Timestamp (downto ms or rather even usec)
  • Status
  • Value
  • We are talking about approx 1000 Signals if stored in compressed Form (100 ms) between two values.
    Some datasources may deliver data in 1 ms, so we are talking about 1.0 E12 values (Calculation based on a big thump) but the region should should be OK until proven wrong.

  • How big are the database files for such a setup?
  • Our test machine has 64 GB RAM and RAID Array with 1.8 Tb of diskplace

Questions:

  • Does this whole idea make sense?
  • How big are the database files for such a setup?
  • We want to query Data based on the signal name and area of times.
  • Calculation of maximum, minimum and average values would also be there

I’d be happyto get some thought about this idea.


#2

It is hard to estimate what the size of the data will be in couchbase without modelling your data. you can model your data in binary as key/value as well as JSON docs. At the end of the day, what we store isn’t all that much larger than the raw data except we maintain a number of indexes to make access much faster. The things that increase the storage size are things like replicas, compaction frequency, mutation rate etc. Best option would be to install couchbase and upload a little bit of your data there to see how much space it would consume.

However all the aggregation you mentioned can be done fairly easily with N1QL. N1QL is a super set of SQL and support aggregates, joins and more.


#3

We had a very similar case. We have built an industrial energy management product that has an embedded data historian, much like Aspentech’s IP21. Recently we started migrating from a Cassandra-based architecture to Couchbase. The numbers are similar to what you described: we store data from roughly 2000 meters, about 10 measurements per meter, most of them sampled one per minute, with a total storage period of 10 years. This database today takes around 5TB, which we handle in a 4 node environment. Here is the company website for reference on what the product does: http://viridis.energy/.

After spending quite some time on finding alternatives to replace our old architecture, we found Couchbase to be the best way to go. The trick to getting it right though is finding the best design for documents and views, otherwise things may get slow for this much data. Using time series data with CB’s custom map/reduce views was an excellent solution for us. We worked on several design alternatives before committing to Couchbase, but today we’re far happier with the solution than with what we had before with Cassandra. If you need more pointers, we used a consultancy company that specialises in industrial Big Data to help us, please let me know and I’ll share their contacts.