Indexing and querying documents

#1

I’m evaluating Couchbase vs. MongoDB for storing JSON documents.

  1. Is there a way to index documents real-time? For example, if a user writes the data and a query is issued using index, I want to get the recently updated data.

  2. Here is a scenario. I have millions of documents like below:

    { userid:1234, schemaid:87777, recordid:7797, field1:“fff”, field2:“dddd”,…}

Fields will be variable based on schemaid but userid, schemaid & recordid will be present always. In most cases, I want to query for a specific userid, schemaid and fetch all matched documents with all its fields. I assume that an index is needed for performance. If I have to create index, can I return the entire doc in value field of map() function? What is the perf and storage impact of this approach?

Alternative is, I can move userid, schemaid, recordid properties to the key. In this case, can I still query on a substring with wildcard something like {userid:1234,schemaid:87777,recordid:% and get matched records? I’m aware of the memory footprint of the key.

Thanks.

#2

I assume this is the right DL monitored by Couchbase engineers. It will be helpful to know the answer before moving forward.

#3

I don’t fully understand your questions, but I can address your first question:

Is there a way to index documents real-time?

No, indexes are defined before hand and queries are made in real time.

With Couchbase views, you can either define separate indexes for each document field or define a grouped index, like in the case of time:

// grouped key
if (doc.date)
{
var date = new Date(doc.date);
emit([date.getFullYear(), date.getMonth(), date.getDate()], null);
}

See link below on how to query.
http://www.couchbase.com/docs//couchbase-manual-2.0/couchbase-views-writing-querying-grouping.html

Also, N1QL is a new feature that is in developer preview. It is an SQL-like query language for Couchbase. They use real time algorithms in addition to predefined indexes to make dynamically generated, complex queries.
http://www.couchbase.com/communities/n1ql

UPDATE (for comments below)

  1. There is a stale parameter for queries, which has three options:
    stale=ok - Stale views are OK.
    stale=false - Waits for view to be updated before returning results.
    stale=update_after - (default) Returns immediately available results, but triggers an update to occur after results are returned.
    http://docs.couchbase.com/couchbase-manual-2.2/#couchbase-views-writing-stale

  2. N1QL DP3 is supposed to come out in March. I don’t know when a production ready build will be released, but the answer at the link below gives good insight.
    http://www.couchbase.com/communities/q-and-a/couchbase-commitment-n1ql

#4

Thanks, here is my followup questions:

  1. Is there a way to index documents real-time? – I should not see any stale data from index.
  2. Any ETA on when N1QL will be released?