The Couchbase sqoop connector retrieves ~ 22 million records to Hdfs, where as the bucket has ~ 40 million records and the count is different for each run.
Is there any other ways to import data from Couchbase to HDFS. We want to dump the complete bucket.
You can use our Kafka connector, which allows you to dump data into HDFS.
You can read this blog about how to dump bucket into HDFS
Right now we have fourth developer preview, and expect GA released next month
Thanks for your response, We want to import the data daily once at a specific time. Is there any tool available like Sqoop?
I’m experiencing similar issues. My bucket has about 223 million records, the hadoop connector runs daily, but sometimes it gets a few million less.
Any update on this issue? We are running on hadoop with speculative execution mode. Could that cause any problem?
We are not able to find the root cause. After changing the no. Mappers to 1, the missing records count decreased into ~100 - 1000 records.