Can we create a Spark dataset using a N1QL Query ?
Its rather a dirty way to do in Java. I actually had to go through the Connector code to apply this hack. Also, a secondary index should exist on the key against which the query is to be made.
We specify options while loading the data from Couchbase into Spark as follows:
couchbaseReader(sparkSession.sqlContext().read()).couchbase(options);
where options
is a Java Map. Add a key namely schemaFilter
and then the conditions for WHERE
clause as follows:
Map<String, String> options = new TreeMap<String, String>(); options.put("bucket", "basedata"); options.put("schemaFilter", "table_name = 'Employee' AND (Employee_ID in [1000004,1000030])");
The above options evaluate to the following query getting fired by the connector internally
SELECT META(basedata
).id as META_ID
, basedata
.* FROM basedata
WHERE table_name = ‘Employee’ AND (Employee_ID in [1000004,1000030]) LIMIT 1000
Column filters can also be applied in a similar way.
There seems very little help for Java API.
@neeleshkumar_mannur the docs are quite extensive on this https://developer.couchbase.com/documentation/server/4.5/connectors/spark-2.0/spark-sql.html
for the java API: https://developer.couchbase.com/documentation/server/4.5/connectors/spark-2.0/java-api.html
@daschl Indeed the docs are quite extensive for Scala.
The Java API does not have a clear picture of what should be done. In fact specifying the name of the bucket using Java was also clarified in a blog post and not in the docs.
@neeleshkumar_mannur can you please let me know which parts exactly you are missing from the java-based docs? then I’ll add them