Understanding the changes feed


#1

Hi,

I am trying to understand how the changes feed works. In our app I am seeing a replicator requesting the changes feed via GET /{db}/_changes and I am seeing a response from the sync gateway which looks like:

{
    "seq": 4186,
    "id": "some-weird-id",
    "removed": [
        "another-doc-id"
    ],
    "changes": [
        {
            "rev": "2993-b5bcdf0462fc00e2719ffe7bb1a751d0"
        },
        {
            "rev": "7875-8288a642182a313b56cb8c62bffe1e1a"
        }
    ]
}

I’d like to know what this response means? Is it documented anywhere in the developer portal? How should the client react to the response?


#2

Hey Cliff.

You should see something like this.
last_seq = the database has 7 changes since creation of database.

results = consist of an array of document WITH:
id = document key (id)
seq = number that this document was changed/created or deleted in database
changes = revision id … revision id(version) is the numerical update for every time that document was updated
+ the MD5 hash of the JSON contents

NOTE I put since=4 ,in the url string, which means give me all the changes from seq 4 to 7


#3

"But househippo I don’t just want the changes but I want to get the documents too"

just add &include_docs=true in the url and you’ll also get the full documents.

FOR MORE INFO GO HERE: http://developer.couchbase.com/documentation/mobile/1.2/develop/references/sync-gateway/rest-api/database-public/index.html


#4

“Now I’m getting to many documents that I don’t need househippo I only need XYZ and ABC documents from the _changes feed”

Simple
STEP 1. Do the first changes feed request. /{db}/_changes to find the document you want.
STEP 2. Use the /{db}/_bulk_get like below

POST /{db}/_bulk_get
Host: localhost:4984
{
    "docs": [
        {
            "id": "PeachCobbler"
        },
        {
            "id": "LemonChicken"
        },
        {
            "id": "CinnamonCookies"
        }
    ]
}

MORE INFO HERE: http://developer.couchbase.com/documentation/mobile/1.2/develop/references/sync-gateway/rest-api/database-public/post-bulk-get/index.html


#5

Thanks a ton Hippo! What about my specific example? I’m trying to understand what it means when I get an array of changes in the response, as in:

"changes": [
    {
        "rev": "2993-b5bcdf0462fc00e2719ffe7bb1a751d0"
    },
    {
        "rev": "7875-8288a642182a313b56cb8c62bffe1e1a"
    }
]

also, what does the “removed” attribute mean? It appears this one particular response is triggering a massive bulk_docs update in my client app which is running the Java version of couchebase lite. The initial replication examines the changes feed where there is an entry like the above. Then on a subsequent launch the push replicator attempts a bulk_docs POST that looks like this:

{
    "docs": [{
        "_rev": "2993-b5bcdf0462fc00e2719ffe7bb1a751d0",
        "_revisions": {
            "start": 2993,
            "ids": ["b5bcdf0462fc00e2719ffe7bb1a751d0",
// a ton of rev’s listed here
"5917b3607ed4d51669624080d5cfd1f8"]
        },
        "_deleted": true,
        "_id": "some-weird-id"
    }],
    "new_edits": false
}

I’m trying to understand why this happens. I believe the changes feed is identifying conflicts but I’m not certain how to interpret the feed or how to determine where these IDs that the PUSH replicator tries to include in the bulk_doc delete are coming from.


#6

@househippo I’m trying to understand how to interpret what I am pulling reading from the http activity. Does this particular response in the _changes feed imply that all revs between 2993 and 7875 should be considered “in-conflict”?

"changes": [
    {
        "rev": "2993-b5bcdf0462fc00e2719ffe7bb1a751d0"
    },
    {
        "rev": "7875-8288a642182a313b56cb8c62bffe1e1a"
    }
]

Are the ids included in the following bulk_docs POST revisions, not doc ids?

{
    "docs": [{
        "_rev": "2993-b5bcdf0462fc00e2719ffe7bb1a751d0",
        "_revisions": {
            "start": 2993,
            "ids": ["b5bcdf0462fc00e2719ffe7bb1a751d0",
// a ton of rev’s listed here
"5917b3607ed4d51669624080d5cfd1f8"]
        },
        "_deleted": true,
        "_id": "some-weird-id"
    }],
    "new_edits": false
}

Are these derived by querying somehow for revisions in the range implied by the changes feed? Some of this touches on code within the Java Couchbase lite replicator repo but I’m specifically interested in what the changes feed is actually telling me. I’m trying to understand What the changes feed is actually saying, what the code is trying to do, and what the code should actually do.


#7

@cliff76 Here’s some additional information, to supplement what @househippo has already provided:

  1. The fact that you’re seeing multiple entries for a given doc in your changes response implies that your original changes request was sent with style=all_docs, which will include revision information for all leaf revisions, including conflicts and deleted former conflicts. So in your particular response, the document has two conflicting leaf revisions (2993-b5bcdf0462fc00e2719ffe7bb1a751d0 and 7875-8288a642182a313b56cb8c62bffe1e1a).

  2. The removed attribute indicates that the document has been removed from the specified channel in this revision. So the “removed”: [“another-doc-id”] in your original post is actually “removed”:“some-channel-name”.

  3. The bulk_docs post being made by the client suggests that the client has a large number of revisions in the 2993-b5bcdf0462fc00e2719ffe7bb1a751d0 branch that the server doesn’t know about. Given the high generation count for the revisions (2993 for one branch and 7875 for the other), one possibility is that the client hasn’t synchronized with the server in some time, and so the server revision tree doesn’t go back far enough to identify that the client’s revisions are actually ancestors of it’s current revision (7875). By default Sync Gateway maintains a revision tree depth of 1000, so that seems like a possible scenario here.


#8

Thank you @adamf! In my case the client has been started fresh with no database. The initial launch starts the replicators and creates the DB. The replicators fire and start downloading data. The document "id": "some-weird-id" is never seen on the client in my testing. (I inspect the DB after the 1st replication and again after the 2nd replication.) I have been using Charles proxy to examine the communication between the replicators and the sync gateway. The problem I am trying to resolve is tricky. After the 1st replication, where I see the above entry in the changes feed, document some-weird-id does not exist on the client and my client app runs without issue. After stopping and relaunching the client app, the push replicator attempts to send this huge bulk_doc request (which I inspect with Charles proxy) to the sync gateway which responds with a 404. I do not see any of the revisions in the bulk_doc request from the push replicator in the initial changes feed on the 1st launch. Nor do I see any of the revisions in the initial changes feed in the bulk_docs request.


#9

@cliff76 I would expect that the client must be generating changes to the document to cause anything to be pushed to the server, but there might be something unexpected going on here.

What client/build are you using?


#10

@adamf I am using the latest Cuochbase lite Java release here:


#11

I don’t think there’s a direct connection there. The _changes feed is requested by the pull replicator, while _bulk_docs is sent by the push replicator. The two types of replicators don’t directly talk to each other at all.

The _bulk_docs excerpt you showed just indicates that the document has a very long revision history. It also seems to imply that the latest revision the server has is generation 2993, and that the local db has a ton of newer revisions.

Couchbase Lite in its current form is not very efficient at handling documents that are updated large numbers of times or very rapidly. Can you coalesce the client-side updates so there aren’t so many of them?


#12

Hi @cliff76

It seems the client app updated same document many times. 3000 or 7000+ times. Could you please review how app update the doc with some-weird-id id?


#13

I hadn’t noticed that you said the database starts out empty. Is the client making any changes to documents or creating documents? If not, something’s very wrong, because the push replicator shouldn’t be trying to send anything back to the server … the next step would be to file an issue against couchbase-lite-java-core.


#14

All: Thank you for all the prompt replies. I’m going to try to diagnose what/why the 2nd run decides to POST the huge bulk delete. I’ll try to trace any/all delete/write requests in the client. Indeed the document had been updated several times by an external server side process which also has access to the sync gateway. From my understanding of the client (I have inherited much of the code) there is no reason for it to modify this document on the 1st or 2nd launch. Somehow, the client is deciding to send this request after examining the changes feed. I’ve tried combing thru the Couchbase Java source and got confused after reading several call levels in. I read up to the point where the replicator accumulates all of the conflicts in a queue and dispatches them to be processed. There appears to be differences from iOS noted in the Java source. We are running a similar client app on iOS using the same Sync gateway and not seeing the issue.


#15

Sorry about that I thought the json you posted was a fake example , not actual system generated.

@adamf is right it is pointing to a high write load.


#16

Hey all! I’m still stuck investigating this issue. Here’s what I’ve determinedso far using Charles Proxy. On the initial run the (pull?) replicatorinspects the changes feed via GET /{db}/_changes and sees something like the following:

{
    "seq": 4186,
    "id": "some-weird-id",
    "removed": [
        "another-doc-id"
    ],
    "changes": [
        {
            "rev": "2993-b5bcdf0462fc00e2719ffe7bb1a751d0"
        },
        {
            "rev": "7875-8288a642182a313b56cb8c62bffe1e1a"
        }
    ]
}

It then sends a _bulk_get?revs=true&attachment=true POST request with this included in the body (along w/ a bunch of other docs):

{
	"atts_since": null,
	"rev": "2993-b5bcdf0462fc00e2719ffe7bb1a751d0",
	"id": "some-weird-id"
}, {
	"atts_since": null,
	"rev": "7875-8288a642182a313b56cb8c62bffe1e1a",
	"id": "some-weird-id"
}

It receives a response which does not contain any info on “some-weird-id”. Everything else in my app appears to function as normal. I stop and restart it and the client sends a _revs_diff POST request with this in the body (along with several other items):

"some-weird-id": ["2993-b5bcdf0462fc00e2719ffe7bb1a751d0", "7875-8288a642182a313b56cb8c62bffe1e1a"],

The sync gateway server responds with:

{
	"some-weird-id": {
		"missing": ["2993-b5bcdf0462fc00e2719ffe7bb1a751d0"]
	}
}

Finally there are a couple more revs_diff requests and changes feed request, none which include the document in question. The client push replicator then sends a massive _bulk_docs POST request which looks like this:

{
	"docs": [{
		"_rev": "2993-b5bcdf0462fc00e2719ffe7bb1a751d0",
		"_revisions": {
			"start": 2993,
			"ids": ["b5bcdf0462fc00e2719ffe7bb1a751d0", "16b8070854a560d762548ad4f04e3024", "8cbe27ead30f613632e323e3fc15f22b",

//several more ids which I removed for brevity

 "5917b3607ed4d51669624080d5cfd1f8"]
    		},
    		"_deleted": true,
    		"_id": "some-weird-id"
    	}],
    	"new_edits": false
    }

This fails with a 404 leaving me puzzled. The client does not delete anything locally and this all seems to be server-side driven. Can anyone shed light on this behavior?


#17

It’s a bit tricky to piece it together based on the information you’ve provided, but it still looks like the client database had a copy of doc some-weird-id - either existing prior to replication, or generated/modified on the fly. Here’s what the set of events looks like, based on the information you provided:

Pull replication

  1. During pull replication, Sync Gateway is sending a removal notification for doc some-weird-id, indicating that it’s been removed from channel another-doc-id.
  2. The bulk_get doesn’t get return any data for some-weird-id, because the doc has been deleted.
  3. At this point I don’t know what we’d expect to be written to the client DB - @hideki, can you help out?

Push replication

  1. The client has doc some-weird-id, revision 2993-b5bcdf0462fc00e2719ffe7bb1a751d0 locally as a non-deleted document. It sends a revs_diff to the server, and the server says it doesn’t have that document.
  2. The client then sends a bulk_docs to push some-weird-id, with a full history of revisions for that document (the long list of ids)

It’s that last piece of information - that the client has a long revision history for 2993 - that indicates that the client already has (or is generating) revisions for that document - I don’t see anywhere in the pull replication it would have been obtaining those from the server.


#18

Thanx @adamf! I’m puzzled as I look inside the client’s DB for some-weird-id and it doesn’t exist after both the 1st and 2nd launch. I should mention that the local data store does not exist when I launch the 1st time. I allow Couchbase to create it on the fly. Also I corrected a typo in my post above. The client is sending this in its _revs_diff POST to the server:
"some-weird-id": ["2993-b5bcdf0462fc00e2719ffe7bb1a751d0", "7875-8288a642182a313b56cb8c62bffe1e1a"],

I’m really puzzled as to why/how the client would get the long list of revisions since it starts off completely empty. Any docs would have to originate from the server. I am debugging through the logic and not seeing any local deletes happening. My understanding is that the client is somehow persisting (in some meta data somewhere maybe?) the conflict response it receives in the changes feed from the initial replication. I’m thinking this behavior is implemented in the Java API for couchbase lite. I read through the replicator source code but I got lost in a few places where it copies the conflict docs to different queues to process them. There’s one spot that gave me pause:

                                if (responseOK) {
                                    // TODO: this logic is questionable, there's lots
                                    // TODO: of differences in the iOS changetracker code,
                                    client.changeTrackerCaughtUp();
                                    Log.v(Log.TAG_CHANGE_TRACKER, "%s: Starting new longpoll", this);
                                    backoff.resetBackoff();
                                    continue;
                                }

I’m not implying that this section has anything to do with my particular problem. Instead I’m questioning this because we have a similar iOS client app that runs against the same sync gateway but does not have trouble with the changes feed or replication. I’m wondering if there are other differences in replicator behavior that may explain what I’m seeing.


#19

One more update. I was assuming that our iOS client was different but I ran te test last night and noticed that the iOS client behaves the exact same way. There is something with this particular document, which has a lot of revisions, that causes the client side push replicator to die. It gets this huge list of revisions from seemingly nowhere and tries a bulk_docs POST to delete them. The server always responds with 404. I am trying to isolate this in an easily repeatable way.