How to add Json array into dataframe

abhi_sadana · May 27, 2020, 8:17pm

[{
“Name” :“Test”
“Salary”:2000
“Department”:“Maths”
},
{
“Name” :“Test”
“Salary”:2000
“Department”:“Maths”
}
]

I want to convert this json array into dataframe with the couchbase connector . Also this json array is not fixed , so i cannot add the schema mannualy and also i cannot add the key to the array.
Have tried this code after doing indexing on the bucket and putting all the mandatory configurations such as username , password ,serverIp etc.

val jsonArray = sql.read.couchbase

But resulted dataframe only contains META_ID. It is unable to read the schema of json array.

What to do ?

`

daschl · May 29, 2020, 6:34am

@abhi_sadana it’s not clear to me what you want to achieve exactly. Are you loading data through the connector and then want to turn it into a dataframe? Or do you have an in-memory representation of a JsonArray and want to convert this one into a dataframe?

abhi_sadana · June 2, 2020, 7:02pm

@daschl Thank you for your reply ,
Json Corrected -

[
{
“Name”: “Test”,
“Salary”: 2000,
“Department”: “Maths”
},
{
“Name”: “Test”,
“Salary”: 2000,
“Department”: “Maths”
}
]

Note - There is only 1 json in Bucket.

Yes I am loading data from connector and converted this json data in dataframe directly .

Is there any way to get whole of the elements in the dataframe instead of only META_ID ?

graham.pople · June 3, 2020, 4:50pm

@abhi_sadana I think that should work; I wonder if you’re hitting a limit of the automatic schema inference there (perhaps it cannot infer a schema from a small set of docs, such as the single doc you have here).

But you can specify a schema manually, which is easy enough. Please see the Spark docs here for more https://docs.couchbase.com/spark-connector/current/spark-sql.html

abhi_sadana · June 4, 2020, 7:12am

@graham.pople Thank you for your reply.
I want to mention two points here -:

I have edited the Document by providing key to the Json Array

{
“test”: [
{
“Name”: “Test”,
“Salary”: 2000,
“Department”: “Maths”
},
{
“Name”: “Test”,
“Salary”: 2000,
“Department”: “Maths”
}
]
}

Resulted DataFrame-

±------±-------------------+
|META_ID| test|
±------±-------------------+
| 122|[[Maths, Test, 20…|
±------±-------------------+

Schema

root
|-- META_ID: string (nullable = true)
|-- test: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- Department: string (nullable = true)
| | |-- Name: string (nullable = true)
| | |-- Salary: long (nullable = true)

So , Is there a posibility that connector is not able to load data into dataframe properly in case of keyLess Json array ?

I cannnot add schema mannualy , because in my scenario i am unaware about the data present in bucket , I got to know the schema at runtime only after the creation of dataframe.

abhi_sadana · June 9, 2020, 5:32am

@graham.pople @daschl Any thoughts on this ??