How to bulk read the data from couchbase in spark?

stackoverflow1112 · December 3, 2018, 12:54pm

i have a couchbase bucket having 4 million records and i have written spark program to read the bucket data using n1ql query in intellij

val querydset = N1qlQuery.simple(“select * fromdset”)
val data_dset = spark.sparkContext.couchbaseQuery((querydset).map(_.value.toString())
val rd_dset = spark.read.json(data_dset.toDS())

but i get the following error

ERROR connection.QueryAccessor: Couchbase N1QL Query List(SimpleN1qlQuery{statement=select * from dset}) failed with {“msg”:“Error performing bulk get operation - cause: {7 errors, starting with read tcp 127.0.0.1:55885->127.0.0.1:11210: i/o timeout}”,“code”:12008}

if i use this query N1qlQuery.simple(“select * fromdset limit 250000”) . it works fine
but after that limit it throws the above error .

ingenthr · December 10, 2018, 2:39pm

If your actual use case is getting everything from a bucket, look at the streaming interface. That’s a lot more efficient than a “select *” kind of query. See the spark samples on github.com/couchbaselabs.