I am pretty new to NoSQL and Big Data. I was wondering do we really need hadoop as couchbase supports map reduce functions or vice versa.
With couchbase clusters having ability to have distributed nodes, what benefit Hadoop brings to couchbase users?
Couchbase does provide map reduce for creating views and we incrementally process the view as changes come in for a low latency access to the map/reduce views. Many folks use Hadoop with different objectives. Top few are around complex statistical models that don’t lend themselves easily to low latency analytics, or data archival etc etc.
Couchbase and Hadoop flavors (cloudera, hartonwors, mapr etc) work well hand in hand - you can easily exchange data between the 2 env and get the best of both worlds if needs be.
One more question … Does that mean that we need to maintain two different storage for the both systems ?
Yes, couchbase stores its files in a specific format that is required for us to access and concurrent access to these files form hadoop is not possible. We are working on facilities where you can use query language on hadoop such as hive or pig or even through ODBC or JDBC to access couchbase directly however. That could mean that hadoop does not store data but processes queries directly by communicating with couchbase.
would that be interesting to you?
Ya- that makes perfect sense for us. That will reduce the storage requirements and moreover the development work if we need to run Hadoop analytics.