Spark Connector from Python

spark

#1

Hi,
Can the Spark Connector (http://developer.couchbase.com/documentation/server/4.0/connectors/spark-1.0/spark-intro.html) be used from Python (pyspark)?

Thanks


#2

I think it is possible to utilize scala libraries through python, I think the best integration right now is through DataFrames if you are using Spark SQL.

I’m maintaining the spark connector, but I don’t have much experience with python. What kinds of APIs do you want to use? In any case, we don’t right now provide specific python integration, you’d need to find a way to reuse what we have on the JVM if that makes sense :smile:


#3

Thanks.
I am building a Spark Streaming app taking a realtime feed from Kafka and aggregating it and pushing the results into Couchbase.

Now I am thinking that if I am using a cluster - local driver setup in Spark, then the driver would run locally and using the foreachRDD(func) all the aggregated results would be sent to the driver and from there I should be able to use the standard Couchbase Python libraries to load the results into Couchbase.
As such, I might not need the Spark Connector in Python in the end… but obviously this isn’t the best setup (having local driver is not considered a production-ready setup)