Skip to content

fix(summarization): Fix Rename ref regression covariate to ref_covariate for SRM experiments#191

Draft
tonywu1999 wants to merge 2 commits intodevelfrom
fix-covariates
Draft

fix(summarization): Fix Rename ref regression covariate to ref_covariate for SRM experiments#191
tonywu1999 wants to merge 2 commits intodevelfrom
fix-covariates

Conversation

@tonywu1999
Copy link
Copy Markdown
Contributor

@tonywu1999 tonywu1999 commented Apr 13, 2026

User description

Motivation and Context

Please include relevant motivation and context of the problem along with a short summary of the solution.

Changes

Please provide a detailed bullet point list of your changes.

Testing

Please describe any unit tests you added or modified to verify your changes.

Checklist Before Requesting a Review

  • I have read the MSstats contributing guidelines
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • I have run the devtools::document() command after my changes and committed the added files

PR Type

Bug fix, Tests


Description

  • Fix SRM covariate reference column usage

  • Update imputation merge key names

  • Add regression test for saved output


Diagram Walkthrough

flowchart LR
  srm["SRM summarized input"] 
  cov["Use ref_covariate column"] 
  merge["Merge imputed output correctly"] 
  test["Regression test against saved SRM output"]
  srm -- "reference lookup" --> cov
  cov -- "feeds" --> merge
  merge -- "verified by" --> test
Loading

File Walkthrough

Relevant files
Bug fix
linear_summary.cpp
Fix reference covariate lookup in C++                                       

src/linear_summary.cpp

  • Read reference values from ref_covariate
  • Align SRM covariate lookup with input schema
+1/-1     
utils_output.R
Update R output merge covariate column                                     

R/utils_output.R

  • Use ref_covariate in imputation merge keys
  • Keep merged summarized output consistent
+1/-1     
Tests
test_dataProcess.R
Add SRM regression coverage for dataProcess                           

inst/tinytest/test_dataProcess.R

  • Load saved SRM processed output fixture
  • Assert current output matches saved result
  • Preserve heavy-label rows validation
+8/-0     

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 594e67c8-f76b-4be4-b93f-ea9c465d8ca4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-covariates

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Wrong length

ref_covariate appears to be a per-row column, but the new code still uses refs.length() to decide how many entries to pull from contrast_matrix. On SRM inputs where many rows share the same reference covariate, this makes seq(0, n_refs - 2) much larger than the number of actual reference levels and can return incorrect or out-of-bounds contrast values. unique_refs is already computed here and seems to be the intended count.

CharacterVector refs = input["ref_covariate"];
CharacterVector unique_refs = unique(refs);
int n_refs = refs.length();
ref = contrast_matrix[seq(0, n_refs - 2)];

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Align merge keys safely

Building the merge keys from input alone can fail when predicted_survival still
carries the legacy ref column or omits ref_covariate. Derive the reference key from
columns shared by both tables so imputation does not break on mixed-schema data.

R/utils_output.R [131-137]

-cols = intersect(colnames(input), c("newABUNDANCE", 
-                                    "cen", "RUN",
-                                    "FEATURE", "ref_covariate"))
+merge_keys = c("cen", "RUN", "FEATURE")
+if ("ref_covariate" %in% colnames(input) &&
+    "ref_covariate" %in% colnames(predicted_survival)) {
+    merge_keys = c(merge_keys, "ref_covariate")
+} else if ("ref" %in% colnames(input) &&
+           "ref" %in% colnames(predicted_survival)) {
+    merge_keys = c(merge_keys, "ref")
+}
+cols = intersect(colnames(input), c("newABUNDANCE", merge_keys))
 input = merge(input[, colnames(input) != "newABUNDANCE", with = FALSE], 
               predicted_survival,
-              by = setdiff(cols, "newABUNDANCE"),
+              by = merge_keys,
               all.x = TRUE)
Suggestion importance[1-10]: 6

__

Why: Building merge keys from input alone can fail if predicted_survival does not expose the same reference column, so checking for a shared ref_covariate or ref is a sensible safeguard. The suggestion is relevant and plausible, but its importance depends on whether mixed-schema tables are actually expected after this PR.

Low
Preserve legacy reference support

Accessing input["ref_covariate"] unconditionally can throw at runtime for callers
that still pass the legacy ref column. Choose the available reference column
dynamically so this path works with both old and new input schemas.

src/linear_summary.cpp [69-70]

 CharacterVector temp_ref = coef_names[find_ref];
-CharacterVector refs = input["ref_covariate"];
+CharacterVector input_names = input.names();
+bool has_ref_covariate = false;
+for (int i = 0; i < input_names.size(); ++i) {
+    if (input_names[i] == "ref_covariate") {
+        has_ref_covariate = true;
+        break;
+    }
+}
+CharacterVector refs = has_ref_covariate
+    ? CharacterVector(input["ref_covariate"])
+    : CharacterVector(input["ref"]);
Suggestion importance[1-10]: 5

__

Why: Accessing input["ref_covariate"] directly can raise a runtime error if some callers still provide only ref, so the fallback is technically sound. Still, the PR appears to be intentionally renaming the schema to ref_covariate, which makes this more of a backward-compatibility improvement than a clear defect in the change.

Low

@tonywu1999 tonywu1999 changed the title Fix covariates fix(summarization): Fix Rename ref regression covariate to ref_covariate for SRM experiments Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant