Create Spark dataset using N1QL Query in Java

neeleshkumar_mannur · January 3, 2017, 2:47pm

Can we create a Spark dataset using a N1QL Query ?

neeleshkumar_mannur · January 3, 2017, 3:18pm

Its rather a dirty way to do in Java. I actually had to go through the Connector code to apply this hack. Also, a secondary index should exist on the key against which the query is to be made.

We specify options while loading the data from Couchbase into Spark as follows:

couchbaseReader(sparkSession.sqlContext().read()).couchbase(options);

where options is a Java Map. Add a key namely schemaFilter and then the conditions for WHERE clause as follows:

Map<String, String> options = new TreeMap<String, String>(); options.put("bucket", "basedata"); options.put("schemaFilter", "table_name = 'Employee' AND (Employee_ID in [1000004,1000030])");

The above options evaluate to the following query getting fired by the connector internally

SELECT META(basedata).id as META_ID, basedata.* FROM basedata WHERE table_name = ‘Employee’ AND (Employee_ID in [1000004,1000030]) LIMIT 1000

Column filters can also be applied in a similar way.

There seems very little help for Java API.

geraldss · January 3, 2017, 4:27pm

@keshav_m, more Spark questions.

daschl · January 3, 2017, 5:35pm

@neeleshkumar_mannur the docs are quite extensive on this https://developer.couchbase.com/documentation/server/4.5/connectors/spark-2.0/spark-sql.html

for the java API: https://developer.couchbase.com/documentation/server/4.5/connectors/spark-2.0/java-api.html

neeleshkumar_mannur · January 4, 2017, 5:12am

@daschl Indeed the docs are quite extensive for Scala.

The Java API does not have a clear picture of what should be done. In fact specifying the name of the bucket using Java was also clarified in a blog post and not in the docs.

daschl · January 5, 2017, 1:34pm

@neeleshkumar_mannur can you please let me know which parts exactly you are missing from the java-based docs? then I’ll add them