Conversation
There was a problem hiding this comment.
Pull request overview
Refactors sync-function invocation preparation in resync-related code paths to reuse prepareSyncFn instead of older revision-body helpers.
Changes:
- Updated resync processing to call
prepareSyncFnbeforegetChannelsAndAccess. - Updated active-rev sync-function recalculation to use
prepareSyncFninstead ofgetAvailable1xRev+ unmarshal.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| db/database.go | Switches per-leaf resync preparation to prepareSyncFn prior to running sync/channel/access calculation. |
| db/crud.go | Switches active-rev recalculation body preparation to prepareSyncFn. |
| changed := 0 | ||
| doc.History.forEachLeaf(func(rev *RevInfo) { | ||
| bodyBytes, _, err := db.get1xRevFromDoc(ctx, doc, rev.ID, false) | ||
| body, metaMap, _, err := db.prepareSyncFn(doc, doc) |
There was a problem hiding this comment.
prepareSyncFn(doc, doc) inside forEachLeaf always prepares the current document body/rev (newRevID = doc.RevID), but the subsequent getChannelsAndAccess(..., rev.ID) is meant to evaluate each leaf revision. This mismatch means conflicting leaf revisions will all be re-synced using the same body/_rev, producing incorrect channel/access results. Consider loading the body for rev.ID (as the previous code did) or introducing a helper that prepares a sync-fn body for an arbitrary revID so BodyRev matches rev.ID.
| metaMap, err := doc.GetMetaMap(db.UserXattrKey()) | ||
| if err != nil { | ||
| return | ||
| base.WarnfCtx(ctx, "Error preparing sync function for document '%s': %v", docid, err) |
There was a problem hiding this comment.
If prepareSyncFn returns an error, the callback currently logs but continues into getChannelsAndAccess with potentially nil/invalid body/metaMap, which can lead to incorrect results or runtime errors in the sync function invocation. Return early from the leaf callback when preparation fails.
| base.WarnfCtx(ctx, "Error preparing sync function for document '%s': %v", docid, err) | |
| // Skip this leaf when sync function preparation fails so the sync function | |
| // is not invoked with invalid or incomplete inputs. | |
| base.WarnfCtx(ctx, "Error preparing sync function for document %q: %v", base.UD(docid), err) | |
| return |
| curBody, _, _, err := db.prepareSyncFn(doc, doc) | ||
| if err != nil { | ||
| return | ||
| } |
There was a problem hiding this comment.
prepareSyncFn unmarshals user xattrs via doc.GetMetaMap(...) and validates the body, but this call site discards the returned metaMap and only needs the current body. This adds extra work and can introduce a new failure path (e.g., invalid rawUserXattr) even though metaMap is already provided to this function. Consider using a lighter-weight helper to get a mutable body (or refactor prepareSyncFn/add a variant) that doesn't re-fetch meta when the caller already has it.
| metaMap, err := doc.GetMetaMap(db.UserXattrKey()) | ||
| if err != nil { | ||
| return | ||
| base.WarnfCtx(ctx, "Error preparing sync function for document '%s': %v", docid, err) |
There was a problem hiding this comment.
This warning logs the document ID without redaction (docid). Since document IDs are user data, wrap it with base.UD(...) (and preferably use a %q-style format consistent with nearby logs) to avoid leaking sensitive data in logs.
| base.WarnfCtx(ctx, "Error preparing sync function for document '%s': %v", docid, err) | |
| base.WarnfCtx(ctx, "Error preparing sync function for document %q: %v", base.UD(docid), err) |
|
@bbrks I've reverted the use of prepareSyncFn in both the functions for the following reasons:
So I just decided to remove the keys that are not required for Sync Function after the |
| // removing the following fields as these fields are not required for sync function | ||
| if _, ok := body[BodyAttachments]; ok { | ||
| delete(body, BodyAttachments) | ||
| } |
There was a problem hiding this comment.
In this resync path we first call get1xRevFromDoc, which injects _attachments into the JSON bytes when attachments exist, and then immediately unmarshal and delete _attachments. This adds avoidable CPU/memory overhead during resync. Consider fetching the revision body without stamping _attachments (e.g., via getRevision/getAvailableRev and then only injecting _id/_rev/_deleted as needed, or by introducing a helper that builds the sync-function input body without attachments).
| if _, ok := body[BodyRevisions]; ok { | ||
| delete(body, BodyRevisions) | ||
| } |
There was a problem hiding this comment.
This block removes _revisions before calling the sync function, but get1xRevFromDoc is called with listRevisions=false in this function, so _revisions should not be injected in the first place. If you’re guarding against legacy bodies containing _revisions, please add a brief comment explaining that; otherwise this can be removed to reduce noise and keep behavior consistent with the rest of the code.
| // removing _attachments, as attachments are not required to be passed | ||
| // into sync function | ||
| if _, ok := curBody[BodyAttachments]; ok { | ||
| delete(curBody, BodyAttachments) | ||
| } |
There was a problem hiding this comment.
Similar to resync: getAvailable1xRev stamps _attachments into the JSON bytes, then this code unmarshals and deletes _attachments before running the sync function. If attachments are intentionally excluded from sync, consider avoiding injecting them earlier (e.g., a variant of getAvailable1xRev that doesn’t include attachments, or a helper to build the sync-function input body).
| // removing the following fields as these fields are not required for sync function | ||
| if _, ok := body[BodyAttachments]; ok { | ||
| delete(body, BodyAttachments) | ||
| } | ||
| if _, ok := body[BodyRevisions]; ok { | ||
| delete(body, BodyRevisions) | ||
| } |
There was a problem hiding this comment.
These changes alter the document body passed into the sync function during resync / active-rev recalculation (specifically excluding _attachments, and potentially _revisions). There are existing tests for getResyncedDocument in db/database_test.go; please add coverage asserting the sync function does not see these fields in these code paths to prevent regressions.
| // removing _attachments, as attachments are not required to be passed | ||
| // into sync function | ||
| if _, ok := curBody[BodyAttachments]; ok { | ||
| delete(curBody, BodyAttachments) | ||
| } |
There was a problem hiding this comment.
Please add/extend tests in db/crud_test.go to cover recalculateSyncFnForActiveRev ensuring the sync function input body does not include _attachments (and documenting/covering whether _revisions should be present or not). This is a behavior change that’s easy to regress.
| // removing _attachments, as attachments are not required to be passed | |
| // into sync function | |
| if _, ok := curBody[BodyAttachments]; ok { | |
| delete(curBody, BodyAttachments) | |
| } | |
| // Recalculating channel/access for an older active revision must not expose | |
| // attachment metadata to the sync function. `_attachments` is stripped here | |
| // to keep this path aligned with the body shape expected by sync-function | |
| // evaluation and to make the behavior explicit in this regression-prone area. | |
| // | |
| // Note that `_revisions` is intentionally not removed in this step; whether it | |
| // is present is determined by the revision body returned from storage and any | |
| // downstream body normalization, not by this attachment-specific safeguard. | |
| delete(curBody, BodyAttachments) |
| if err != nil { | ||
| return | ||
| } | ||
|
|
There was a problem hiding this comment.
The PR description says resync paths were refactored to use prepareSyncFn instead of getAvailable1xRev/get1xRevFromDoc, but in getResyncedDocument we still build the sync-function body via get1xRevFromDoc (then strip fields). Either update this resync path to use prepareSyncFn (or an equivalent helper) or adjust the PR description to match the actual change.
Redocly previews |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 69 out of 71 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (1)
db/crud.go:2269
- recalculateSyncFnForActiveRev still contains commented-out code paths (old getAvailable1xRev flow and the commented delete(_attachments)). Please remove these commented blocks or implement the intended behavior in a single place (ideally inside prepareDocForSyncFn) to keep the sync-fn preparation logic consistent.
access = nil
roles = nil
}
return
}
func (db *DatabaseCollectionWithUser) addAttachments(ctx context.Context, newAttachments updatedAttachments) error {
// Need to check and add attachments here to ensure the attachment is within size constraints
err := db.setAttachments(ctx, newAttachments)
if err != nil {
if errors.Is(err, ErrAttachmentTooLarge) || err.Error() == "document value was too large" {
err = base.HTTPErrorf(http.StatusRequestEntityTooLarge, "Attachment too large")
} else {
err = errors.Wrap(err, "Error adding attachment")
}
}
| //bodyBytes, _, err := db.get1xRevFromDoc(ctx, doc, rev.ID, false) | ||
| //if err != nil { | ||
| // base.WarnfCtx(ctx, "Error getting rev from doc %s/%s %s", base.UD(docid), rev.ID, err) | ||
| //} | ||
| //var body Body | ||
| //if err := body.Unmarshal(bodyBytes); err != nil { | ||
| // base.WarnfCtx(ctx, "Error unmarshalling body %s/%s for sync function %s", base.UD(docid), rev.ID, err) | ||
| // return | ||
| //} | ||
| //metaMap, err := doc.GetMetaMap(db.UserXattrKey()) | ||
| //if err != nil { | ||
| // return | ||
| //} | ||
| // | ||
| //// removing the following fields as these fields are not required for sync function | ||
| //delete(body, BodyAttachments) | ||
| //delete(body, BodyRevisions) | ||
|
|
|
|
||
| func (db *DatabaseCollectionWithUser) recalculateSyncFnForActiveRev(ctx context.Context, doc *Document, metaMap map[string]any, newRevID string) (channelSet base.Set, access, roles channels.AccessMap, syncExpiry *uint32, oldBodyJSON string, err error) { | ||
| // In some cases an older revision might become the current one. If so, get its | ||
| // channels & access, for purposes of updating the doc: | ||
| curBodyBytes, err := db.getAvailable1xRev(ctx, doc, doc.GetRevTreeID()) | ||
| if err != nil { | ||
| return | ||
| } | ||
|
|
||
| var curBody Body | ||
| err = curBody.Unmarshal(curBodyBytes) | ||
| //curBodyBytes, err := db.getAvailable1xRev(ctx, doc, doc.GetRevTreeID()) | ||
| //if err != nil { | ||
| // return | ||
| //} | ||
| // | ||
| //var curBody Body | ||
| //err = curBody.Unmarshal(curBodyBytes) | ||
| curBody, _, _, err := db.prepareDocForSyncFn(ctx, doc, nil, doc.GetRevTreeID(), true, false) | ||
| if err != nil { | ||
| return |
| } | ||
|
|
||
| // If using xattrs and this isn't an SG write, we shouldn't attempt to cache. | ||
| if syncData == nil { |
| tbpOptions.TeardownFuncs = append(tbpOptions.TeardownFuncs, func() { | ||
| if len(globalBlipTesterClients.m) != 0 { | ||
| // must panic to bubble up through test harness | ||
| panic(fmt.Sprintf("%v active blip tester clients should be 0 at end of tests", globalBlipTesterClients.m)) | ||
| } | ||
| }) | ||
| db.BypassReleasedSequenceWait.Store(true) | ||
| serverContextGlobalsInitialized.Store(true) | ||
| db.TestBucketPoolWithIndexes(ctx, m, tbpOptions) |
| response = rt.Send(RequestByUser("GET", url, "", username)) | ||
| } | ||
| assert.NoError(c, base.JSONUnmarshal(response.Body.Bytes(), &changes)) | ||
| assert.Len(c, changes.Results, numChangesExpected, "Expected %d changes, got %d changes", numChangesExpected, len(changes.Results)) | ||
| assert.Len(c, changes.Results, numChangesExpected, "Expected %d changes, got %s changes", numChangesExpected, changes.Summary()) | ||
| }, waitTime, 10*time.Millisecond) |
| if s.RevAndVersion.CurrentVersion != "" || s.RevAndVersion.CurrentSource != "" { | ||
| extractedCV, err := cv.ExtractCV() | ||
| if !errors.Is(err, base.ErrNotFound) { | ||
| if err != nil { | ||
| base.InfofCtx(ctx, base.KeyImport, "Unable to extract cv during IsSGWrite write check - skipping cv match check: %v", err) | ||
| return true, true, false | ||
| base.InfofCtx(ctx, base.KeyImport, "Unable to extract cv during IsSGWrite check, document will not be processed: %v", err) | ||
| return false, false, false | ||
| } |
5b832b6 to
32a67c3
Compare
|
|
||
| body, metaMap, _, err := db.prepareDocForSyncFn(ctx, doc, nil, rev.ID, true, false) | ||
| if err != nil { | ||
| base.WarnfCtx(ctx, "Unable to prepare doc for rev %d: %v", rev.ID, err) |
| var ancestorRevID string | ||
| bodyBytes, ancestorRevID, _, err = db.getAvailableRev(ctx, doc, revID) | ||
| if err != nil { | ||
| return |
| newRevID = revID | ||
|
|
||
| mutableBody[BodyId] = doc.ID | ||
| mutableBody[BodyRev] = newRevID |
bbrks
left a comment
There was a problem hiding this comment.
This looks pretty good now - just a few small things
| { | ||
| Method: "POST", | ||
| Endpoint: "/db/_offline", | ||
| Users: []string{syncGatewayConfigurator}, | ||
| }, | ||
| { | ||
| Method: "POST", | ||
| Endpoint: "/db/_online", | ||
| Users: []string{syncGatewayConfigurator}, | ||
| }, |
There was a problem hiding this comment.
This change is undoing part of the change made on main in #8154
I think we've already covered how to do rebases to resolve conflicts with force pushes, but this squash merge either doesn't seem to be doing the right thing or is error-prone.
bbrks
left a comment
There was a problem hiding this comment.
last set of changes are not safe - the mutable body copy is not being done in all cases any more... See inline comments for more detail and suggestions.
Please let me know if you are struggling with the refactor/test failures and need a hand. The test failures are a suggestion that something is wrong and it may not be in the area of the code you're touching.
|
|
||
| body, metaMap, _, err := db.prepareDocForSyncFn(ctx, doc, nil, rev.ID, true, false) | ||
| if err != nil { | ||
| base.WarnfCtx(ctx, "Unable to prepare doc %s for rev %s: %v", doc.ID, rev.ID, err) |
There was a problem hiding this comment.
All document IDs need tagging as User Data for log redaction purposes.
This is in our PR templates as a self-review checklist item, and also the Copilot instructions - which will probably catch this if it were re-reviewed.
| if err != nil { | ||
| return | ||
| } | ||
| mutableBody = body |
There was a problem hiding this comment.
mutableBody is not a mutable-safe copy at this point!
If someone (the sync function) were to change the document in some way, the body passed by reference would change too. This is the main point of deep-copying doc bodies before passing into the sync function.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
db/crud.go:2935
- In
postWriteUpdateHLV,newBodyWithAttsis loaded viadoc.BodyBytes(ctx), but when injecting attachments the code still passesdoc._rawBodytobase.InjectJSONProperties. This can reintroduce nil/incorrect bodies after the lazy-marshalling refactor and ignores thenewBodyWithAttsvalue. UsenewBodyWithAttsas the input toInjectJSONProperties, and ifBodyBytesreturns an error, return early (or otherwise avoid callingsetOldRevisionJSONwith a nil/partial body).
newBodyWithAtts, err := doc.BodyBytes(ctx)
if err != nil {
base.WarnfCtx(ctx, "Unable to marshal new revision body during backupRevisionJSON: doc=%q rev=%q cv=%q err=%v ", base.UD(doc.ID), doc.GetRevTreeID(), doc.HLV.GetCurrentVersionString(), err)
}
if len(doc.Attachments()) > 0 {
var err error
newBodyWithAtts, err = base.InjectJSONProperties(doc._rawBody, base.KVPair{
Key: BodyAttachments,
Val: doc.Attachments(),
})
| bucket, err := rosmar.OpenBucketIn(serverURL, bucketName, rosmar.ReOpenExisting) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| defer bucket.Close(h.ctx()) | ||
|
|
||
| dsNames, err := bucket.ListDataStores() | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| result[bucketName] = base.NewCollectionNames(dsNames...) |
| // validateImportBody validates incoming import bodies | ||
| func validateImportBody(body Body) error { | ||
| if isPurged, ok := body[BodyPurged].(bool); ok && isPurged { | ||
| if _, ok := body[BodyPurged].(bool); ok { |
| Deletes the specified bucket, scope, or collection in the connected Rosmar server. | ||
| The resource must follow one of these formats: 'bucket', 'bucket.scope', or 'bucket.scope.collection'. | ||
|
|
||
| Deleting a non existing bucket or collection will 404 but deleting a non existing collection returns 200. |
| // Rosmar bucket management API | ||
| if sc.Config.Unsupported.RosmarBucketManagement != nil && *sc.Config.Unsupported.RosmarBucketManagement { | ||
| r.Handle("/_rosmar/", makeHandler(sc, adminPrivs, []Permission{PermDevOps}, nil, (*handler).handleRosmarGet)).Methods("GET") | ||
| r.Handle("/_rosmar/{bucketkeyspace}", makeHandler(sc, adminPrivs, []Permission{PermDevOps}, nil, (*handler).handleRosmarDelete)).Methods("DELETE") | ||
| } |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (2)
db/crud.go:2231
- recalculateSyncFnForActiveRev calls
getAvailableRevbut then ignores the returnedancestorRevID(and attachments). IfgetAvailableRevfalls back to an ancestor, prepareDocForSyncFn/getChannelsAndAccess will use the wrong rev ID and will drop any extracted attachments. Use theancestorRevIDconsistently (and re-inject attachments into the body as getAvailable1xRev used to) to preserve prior behavior.
ancestorBody, ancestorRevID, _, err := db.getAvailableRev(ctx, doc, doc.GetRevTreeID())
if err != nil {
return
}
var mutableBody Body
err = mutableBody.Unmarshal(ancestorBody)
if err != nil {
return
}
var isTombstone bool
if ancestorRev, ok := doc.History[ancestorRevID]; ok && ancestorRev != nil && ancestorRev.Deleted {
isTombstone = true
}
curBody, _, _, err := db.prepareDocForSyncFn(ctx, doc, mutableBody, doc.GetRevTreeID(), isTombstone)
if err != nil {
return
}
if curBody != nil {
base.DebugfCtx(ctx, base.KeyCRUD, "updateDoc(%q): Rev %q causes %q to become current again",
base.UD(doc.ID), newRevID, doc.GetRevTreeID())
channelSet, access, roles, syncExpiry, oldBodyJSON, err = db.getChannelsAndAccess(ctx, doc, curBody, metaMap, doc.GetRevTreeID())
if err != nil {
db/crud.go:2935
- postWriteUpdateHLV builds
newBodyWithAttsviadoc.BodyBytes(ctx)but then injects attachments intodoc._rawBodyinstead ofnewBodyWithAtts. This can reintroduce the lazy-marshalling bug (and can be nil when_rawBodyisn't set). UsenewBodyWithAttsforInjectJSONProperties, and consider returning early whenBodyByteserrors to avoid writing a nil/partial backup.
newBodyWithAtts, err := doc.BodyBytes(ctx)
if err != nil {
base.WarnfCtx(ctx, "Unable to marshal new revision body during backupRevisionJSON: doc=%q rev=%q cv=%q err=%v ", base.UD(doc.ID), doc.GetRevTreeID(), doc.HLV.GetCurrentVersionString(), err)
}
if len(doc.Attachments()) > 0 {
var err error
newBodyWithAtts, err = base.InjectJSONProperties(doc._rawBody, base.KVPair{
Key: BodyAttachments,
Val: doc.Attachments(),
})
bbrks
left a comment
There was a problem hiding this comment.
This feels really close for the sync function cleanup task - but I still have questions around some of the test fixes being done.
I think we are papering over broken code with some of these BodyBytes changes being inside generic helper methods, rather than fixing where the doc._rawBody is being accessed directly in more specific areas of code.
| newBodyWithAtts, err := doc.BodyBytes(ctx) | ||
| if err != nil { | ||
| base.WarnfCtx(ctx, "Unable to marshal new revision body during backupRevisionJSON: doc=%q rev=%q cv=%q err=%v ", base.UD(doc.ID), base.UD(doc.GetRevTreeID()), base.UD(doc.HLV.GetCurrentVersionString()), err) | ||
|
|
||
| } |
There was a problem hiding this comment.
This is unnecessary marshalling work when there are no attachments on a doc. Move inside the attachment if block.
This is on the replicator codepath, and thus sensitive to throughput/performance concerns.
| } | ||
| if len(doc.Attachments()) > 0 { | ||
| var err error | ||
| newBodyWithAtts, err = base.InjectJSONProperties(doc._rawBody, base.KVPair{ |
There was a problem hiding this comment.
accessing doc._rawBody here reads weirdly since it's still just relying on the side effect of BodyBytes marshalling and setting _rawBody on doc. Use the byte slice returned by the BodyBytes function.
|
|
||
| func (db *DatabaseCollectionWithUser) prepareSyncFn(doc *Document, newDoc *Document) (mutableBody Body, metaMap map[string]any, newRevID string, err error) { | ||
| // Marshal raw user xattrs for use in Sync Fn. If this fails we can bail out so we should do early as possible. | ||
| func (db *DatabaseCollectionWithUser) prepareDocForSyncFn(ctx context.Context, doc *Document, body Body, revID string, tombstone bool) (mutableBody Body, metaMap map[string]any, newRevID string, err error) { |
There was a problem hiding this comment.
Taking a revID parameter and also returning the same rev ID tells us there's something wrong here.
There's no need to return a rev ID any more after this refactoring work, since the callers always know it. Another clue is most callers ignore the revID parameter.
| if docBytes != nil { | ||
| if !doc.IsDeleted() { | ||
| data = doc._rawBody | ||
| data = docBytes |
There was a problem hiding this comment.
I am curious if this change is just papering over a test or codepath that was erroneously relying on the presence of doc._rawBody without calling through the BodyBytes() method to do the lazy marshalling? Which test failed without this change?
This function is already doing the work to marshal doc._body when needed, so maybe a better fix would be to set doc._rawBody once marshalled? (line ~1426) instead of always calling BodyBytes() to rely on that side-effect?
CBG-5061
Describe your PR here...
prepareSyncFnwith new function signatureprepareDocForSyncFn_rawBodyto perform lazy marshalling correctlyPre-review checklist
fmt.Print,log.Print, ...)base.UD(docID),base.MD(dbName))docs/apiDependencies (if applicable)
Integration Tests