Couchbase spark bulk insert

yogesh_0586 · April 20, 2018, 9:53am

couch base server 5.1 and spark connector 2.2.0
is there any way to insert bulk using spark couch base connector ?

graham.pople · April 20, 2018, 2:54pm

Hi there. Indeed there is, please see Persisting spark dataframe to couchbase for the recommended way to insert a large Spark DataFrame into Couchbase.

yogesh_0586 · April 20, 2018, 6:21pm

I tried the mentioned method but it not work, I have RDD[String] then I converted it into data frame as below

val dfSchema = Seq(“data”)
val a = data.toDF(dfSchema: _*)
a.write.mode(saveMode).couchbase(“temp”) // temp is bucket name

but it show error “can not resolved symbol saveMode” I used spark connector 2.2.0 and it mentioned method saveToCouchbase(options) but not found method like couchbase(options)

graham.pople · April 20, 2018, 6:33pm

saveMode is a variable referencing a Spark SaveMode, e.g.

import org.apache.spark.sql.SaveMode

val saveMode = SaveMode.Overwrite

graham.pople · April 20, 2018, 6:35pm

The saveToCouchbase method is documented here https://developer.couchbase.com/documentation/server/current/connectors/spark-2.2/spark-sql.html.

Hope this helps

yogesh_0586 · April 21, 2018, 1:24pm

Thanks for reply, I’m pretty close to upload in bulk but I having some issues I tried following things

val saveMode = SaveMode.Overwrite
dataframe.write.mode(saveMode).couchbase(Map(“bucket” → “temp”))

but it show following error

com.couchbase.client.core.CouchbaseException: Could not find ID field META_ID in {“data”:“test”}

graham.pople · April 23, 2018, 11:07am

Yep this is as expected. All documents in Couchbase need a document ID, unique to that bucket. And when you’re inserting Spark DataFrames directly into Couchbase, as here, you’ll need to make sure that each row in the DataFrame has this unique document id. By default the Spark connector will look it in a json field called ‘META_ID’, but you can change this - search for ‘idField’ in these docs https://developer.couchbase.com/documentation/server/current/connectors/spark-2.2/spark-sql.html for more details.