Querying for all fields of each documents (without the sub-field)

Consider the following 2 docs in the same bucket:

{
“name” : “andrew”,
“sex” : “male”,
“id” : 123456789,
“data” : {
“hobby” : “basketball”,
“religion” : “atheist”
}
}

{
“name” : “laura”,
“sex” : “female”,
“foreigner_id” : 987654321,
“data” : {
“origin”: “france”
“date_of_arrival”: “26Jan”
}
}

I am looking at a n1ql query where it gives me all the fields at the top level, the result should be { name, sex, id, foreigner_id, data }

and another query where it gives me all the fields in data, the result should be { hobby, religion, origin, date_of_arrival }

First and second queries.

SELECT b.* FROM my_bucket b;

SELECT b.data.* FROM my_bucket b;

Thank you geraldss. I am thinking getting just the fields but NOT the values. So the return result should not include e.g. andrew, male, …

See the OBJECT_NAMES() function.

using OBJECT_NAMES() can only get the fields looks like the following:

SELECT OBJECT_NAMES(b) FROM my_bucket b;

the data looks like the following:

[
  {
    "$1": [
      "data",
      "id",
      "name",
      "sex"
    ]
  },
  {
    "$1": [
      "data",
      "foreigner_id",
      "name",
      "sex"
    ]
  }
]

but I think what he want is:

[
  {
    "$1": [
      "data",
      "id",
      "name",
      "sex",
      "foreigner_id",
    ]
  }
]

is there some way to get the data looks like that?

Just wondering. Why are the fields needed in order?

I don’t know the writer’s real intention.
I am just wondering how to write N1QL to get unique fields (don’t need in order)

Yes, there are a few ways to do that. GROUP BY, ARRAY_AGG, ARRAY_DISTINCT, …

I can only get fields by hard code for this two documents:

SELECT array_distinct(array_concat(array_agg(a.f)[0],array_agg(a.f)[1]))   
  FROM (
             SELECT object_names(b) as f 
               FROM default b USE KEYS ["123456789","a123456789"]
       ) a

is there a good way ?

Hi @atom_yang,

Ignore the fact these values are coming from field names. If you have nested arrays, you can get distinct values.

SELECT DISTINCT v
FROM my_data UNNEST my_array AS v;

SELECT ARRAY_DISTINCT( ARRAY_AGG( v ) )
FROM my_data UNNEST my_array AS v;
1 Like

Thank you, it works.
post N1QL

SELECT ARRAY_DISTINCT( ARRAY_AGG( v ) )
  FROM default b 
   USE KEYS ["123456789","a123456789"]
UNNEST object_names(b.data) AS v

And result here:

[
  {
    "$1": [
      "origin",
      "date_of_arrival",
      "hobby",
      "religion"
    ]
  }
]
1 Like

Thank you geraldss and atom_yang

So far, the following works:
select DISTINCT OBJECT_NAMES(b.data) from db b.
select DISTINCT OBJECT_NAMES(b) from db b

I wonder if there is any way to create an index to improve the above query?

try to add this index:

CREATE INDEX `idx_obj_name_db` ON `db`((object_names(db)))  USING GSI;

The index creation went well, however when querying against object names, it doesn’t seems to be using the index created.

E.g.
create index test on repo (object_names(repo.tag)) USING GSI;

explain select DISTINCT OBJECT_NAMES(repo.tag) from repo;
It is doing primaryscan

try this N1QL

SELECT ARRAY_DISTINCT( ARRAY_AGG( v ) )
  FROM repo b 
UNNEST object_names(b.tag) AS v

How about this.

CREATE INDEX idx_names ON repo( OBJECT_NAMES( repo ) );

SELECT DISTINCT OBJECT_NAMES( repo )
FROM repo
WHERE OBJECT_NAMES( repo ) IS NOT NULL;
1 Like

With the condition, it does look like it is hitting the index. But if I may ask, I don’t see the needs for the condition as null should not part of the ‘consideration’ of the object function to begin with.

The only way to use an index is to have a WHERE clause. This is a global requirement, not tied to OBJECT_NAMES or this specific query.

Without a WHERE clause, you are by definition doing a full scan.

1 Like