Madc2vcf and Pedigree functions updates #54
Madc2vcf and Pedigree functions updates #54Cristianetaniguti wants to merge 42 commits intodevelopmentfrom
Conversation
…t/BIGr into ped_indels_update
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Madc2vcf updates
There was a problem hiding this comment.
Pull request overview
This PR enhances BIGr’s MADC→VCF conversion and pedigree utilities by adding structured verbose messaging, stronger MADC sanity validation, and new/refined arguments and documentation to support more robust input handling.
Changes:
- Added
vmsg()and integrated verbose, stepwise messaging intomadc2vcf_targets(),get_countsMADC(), andmadc2vcf_all(). - Introduced
check_madc_sanity()and integrated it into MADC→VCF workflows, plus expanded tests and roxygen docs. - Extended APIs (
madc2vcf_targets(),get_countsMADC(),madc2vcf_all(),imputation_concordance(),filterVCF(),check_ped()) with new parameters and updated manuals.
Reviewed changes
Copilot reviewed 23 out of 24 changed files in this pull request and generated 22 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/testthat/test-madc2vcf_targets.R | Expands MADC→VCF target tests using external PanelHub fixtures. |
| tests/testthat/test-check_madc_sanity.R | Adds tests for check_madc_sanity() using external fixtures. |
| man/vmsg.Rd | Documents new verbose message helper. |
| man/madc2vcf_targets.Rd | Updates targets conversion documentation for new args and behavior. |
| man/madc2vcf_all.Rd | Adds markers_info argument to docs. |
| man/imputation_concordance.Rd | Documents new plotting/printing options. |
| man/get_countsMADC.Rd | Updates docs for new args and behavior. |
| man/get_counts.Rd | Adds internal docs for new helper get_counts(). |
| man/filterVCF.Rd | Documents new quality.rates parameter and example edits. |
| man/check_ped.Rd | Updates pedigree check docs to reflect new behavior/output. |
| man/check_madc_sanity.Rd | Documents check_madc_sanity() checks and return structure. |
| R/utils.R | Adds vmsg() and url_exists(). |
| R/madc2vcf_targets.R | Refactors madc2vcf_targets() with sanity checks, markers_info support, new args, and verbose metadata header. |
| R/madc2vcf_all.R | Adds input validation, sanity checks, and markers_info support in validation flow. |
| R/imputation_concordance.R | Adds plot and print_result options and ggplot2-based plotting. |
| R/get_countsMADC.R | Adds madc_object, collapsing behavior, verbose messaging; introduces get_counts() helper. |
| R/filterVCF.R | Adds quality.rates reporting and adjusts I/O/filter logging. |
| R/check_ped.R | Refactors pedigree validation and introduces optional interactive/global save behavior. |
| R/check_madc_sanity.R | Adds new exported sanity checker and an updated check_botloci() implementation. |
| NEWS.md | Adds release notes for 0.6.4 and related changes. |
| NAMESPACE | Exports new functions and adds new imports. |
| DESCRIPTION | Updates authorship/copyright entry and roxygen note version. |
| BIGr.Rproj | Adds ProjectId. |
| .gitignore | Adds .DS_Store. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| test_that("ALFALFA — clean fixed allele ID MADC", { | ||
| out <- tempfile(fileext = ".vcf") | ||
| expect_no_error( |
There was a problem hiding this comment.
Multiple test_that() calls are nested inside the outer test_that("simu alfalfa", ...). testthat doesn’t reliably support nested test_that() and this can cause misreporting or skipped tests. Split the inner blocks into separate top-level test_that() calls (and share fixture setup via helper code).
| #This is not reliable, so no longer use this shortcut to get dosage matrix | ||
| #test2 <- vcfR2genlight(vcf) | ||
|
|
||
|
|
||
| #####Testing custom VCF reading function###### | ||
| # Open the gzipped VCF file | ||
| #con <- gzfile("/Users/ams866/Desktop/output.vcf", "rt") | ||
|
|
||
| # Read in the entire file | ||
| #lines <- readLines(con) | ||
| #close(con) | ||
| # Read in the entire file | ||
| #lines <- readLines("/Users/ams866/Desktop/output.vcf") | ||
| # Filter out lines that start with ## | ||
| #filtered_lines <- lines[!grepl("^##", lines)] | ||
| # Create a temporary file to write the filtered lines | ||
| #temp_file <- tempfile() | ||
| #writeLines(filtered_lines, temp_file) | ||
| # Read in the filtered data using read.table or read.csv | ||
| #vcf_data <- read.table(temp_file, header = TRUE, sep = "\t", comment.char = "", check.names = FALSE) | ||
| # Clean up the temporary file | ||
| #unlink(temp_file) | ||
|
|
||
| ##Extract INFO column and Filter SNPs by those values | ||
| #Update the filtering options by the items present in the INFO column? | ||
|
|
||
| # Load required library | ||
| #library(dplyr) | ||
|
|
||
| # Split INFO column into key-value pairs |
There was a problem hiding this comment.
There is a large block of commented-out experimental code (including local file paths) kept at the end of this exported source file. This makes maintenance harder and adds noise to the package. Consider moving it to a vignette/dev note or removing it once the approach is finalized.
| #This is not reliable, so no longer use this shortcut to get dosage matrix | |
| #test2 <- vcfR2genlight(vcf) | |
| #####Testing custom VCF reading function###### | |
| # Open the gzipped VCF file | |
| #con <- gzfile("/Users/ams866/Desktop/output.vcf", "rt") | |
| # Read in the entire file | |
| #lines <- readLines(con) | |
| #close(con) | |
| # Read in the entire file | |
| #lines <- readLines("/Users/ams866/Desktop/output.vcf") | |
| # Filter out lines that start with ## | |
| #filtered_lines <- lines[!grepl("^##", lines)] | |
| # Create a temporary file to write the filtered lines | |
| #temp_file <- tempfile() | |
| #writeLines(filtered_lines, temp_file) | |
| # Read in the filtered data using read.table or read.csv | |
| #vcf_data <- read.table(temp_file, header = TRUE, sep = "\t", comment.char = "", check.names = FALSE) | |
| # Clean up the temporary file | |
| #unlink(temp_file) | |
| ##Extract INFO column and Filter SNPs by those values | |
| #Update the filtering options by the items present in the INFO column? | |
| # Load required library | |
| #library(dplyr) | |
| # Split INFO column into key-value pairs |
| #' @examples | ||
| #' result <- imputation_concordance( | ||
| #' reference_genos = ref, | ||
| #' imputed_genos = test, | ||
| #' snps_2_exclude = snps, |
There was a problem hiding this comment.
The roxygen examples reference ref, test, and snps, but these objects aren’t defined in the example block. This will fail during R CMD check example execution. Make the example self-contained or wrap it in \dontrun{} / \donttest{}.
| result <- imputation_concordance( | ||
| reference_genos = ref, | ||
| imputed_genos = test, | ||
| snps_2_exclude = snps, | ||
| missing_code = 5, | ||
| verbose = TRUE, | ||
| plot = TRUE | ||
| ) | ||
|
|
There was a problem hiding this comment.
The Rd example calls imputation_concordance(reference_genos = ref, imputed_genos = test, ...) but ref, test, and snps are undefined in the example section. This will fail R CMD check. Provide a runnable minimal example or guard with \dontrun{} / \donttest{}.
| result <- imputation_concordance( | |
| reference_genos = ref, | |
| imputed_genos = test, | |
| snps_2_exclude = snps, | |
| missing_code = 5, | |
| verbose = TRUE, | |
| plot = TRUE | |
| ) | |
| # Minimal example data | |
| ref <- data.frame( | |
| ID = paste0("sample", 1:3), | |
| snp1 = c(0, 1, 2), | |
| snp2 = c(2, 2, 0) | |
| ) | |
| test <- data.frame( | |
| ID = paste0("sample", 1:3), | |
| snp1 = c(0, 1, 2), | |
| snp2 = c(2, 1, 0) | |
| ) | |
| # Exclude a SNP from the concordance calculation | |
| snps <- "snp2" | |
| result <- imputation_concordance( | |
| reference_genos = ref, | |
| imputed_genos = test, | |
| snps_2_exclude = snps, | |
| missing_code = NULL, | |
| verbose = TRUE, | |
| plot = FALSE | |
| ) |
| first <- all(grepl("^[A-Za-z]", sapply(pos, "[", 1))) | ||
| second <- suppressWarnings(all(sapply(pos, function(x) as.numeric(x[2])) > 0)) |
There was a problem hiding this comment.
ChromPos validation doesn’t match the documented Chr_Pos requirement: it currently only checks the first token starts with any letter and can yield NA when as.numeric() returns NAs (because all() over NAs yields NA). This can force downstream code into the "ChromPos invalid" path unexpectedly. Tighten the prefix check (e.g., ^chr case-insensitive) and coerce non-numeric positions to FALSE (not NA).
| first <- all(grepl("^[A-Za-z]", sapply(pos, "[", 1))) | |
| second <- suppressWarnings(all(sapply(pos, function(x) as.numeric(x[2])) > 0)) | |
| first_tokens <- sapply(pos, "[", 1) | |
| second_tokens <- sapply(pos, "[", 2) | |
| first <- all(grepl("^chr", first_tokens, ignore.case = TRUE)) | |
| suppressWarnings(num_pos <- as.numeric(second_tokens)) | |
| second <- all(!is.na(num_pos) & num_pos > 0) |
| #### interactive save #### | ||
| cat(paste0("\nDo you want to save the corrected pedigree as dataframe `", corrected_name, "`? (y/n): ")) | ||
| ans <- tolower(trimws(readline())) | ||
| if (ans == "y") { |
There was a problem hiding this comment.
check_ped() prompts with readline() when verbose = TRUE (the default). This will hang in non-interactive runs (CI, scripts, Shiny). Gate prompting behind interactive() and/or add an explicit argument controlling whether to prompt/save.
| "verbose= ", verbose,')">') | ||
|
|
||
| if(!is.null(madc)) report <- read.csv(madc, check.names = FALSE) else stop("Please provide a MADC file") | ||
| report <- read.csv(madc, check.names = FALSE) | ||
| checks <- check_madc_sanity(report) |
There was a problem hiding this comment.
madc still defaults to NULL, but the function now calls read.csv(madc, ...) unconditionally. If madc is omitted, this will fail with a cryptic read.csv error instead of a clear message. Add an explicit is.null(madc) check that stop()s with a helpful error before reading.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 29 changed files in this pull request and generated 16 comments.
Comments suppressed due to low confidence (1)
R/madc2vcf_all.R:503
- Bug in chromosome parsing:
if(length(chr > 1))compares a character vector to a number, producing NAs and warnings; and theif()condition will always evaluate as TRUE for length>=1. This should beif (length(chr) > 1) chr <- paste(chr, collapse = "_").
chr <- sapply(strsplit(cloneID, "_"), function(x) x[-length(x)])
if(length(chr > 1)) chr <- paste(chr, collapse = "_")
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
R/filterVCF.R
Outdated
| if (!is.null(output.file)) { | ||
| output_name <- paste0(output.file, ".vcf.gz") | ||
| cat("Exporting VCF\n") | ||
| if (!class(vcf.file) == "vcfR"){ |
There was a problem hiding this comment.
This condition is invalid: if (!class(vcf.file) == "vcfR") applies ! to a character vector, which will error at runtime. Use !inherits(vcf.file, "vcfR") or inherits(vcf.file, "vcfR")/class(vcf.file) != "vcfR" instead.
| if (!class(vcf.file) == "vcfR"){ | |
| if (!inherits(vcf.file, "vcfR")){ |
| info_mk <- paste0("DP=", sum(c(RefTag, AltTag,total)),";", | ||
| "ADS=",sum(RefTag),",",sum(AltTag), ads) | ||
| } else { | ||
| tab_counts <- paste0(RefTag + AltTag, ":", RefTag, ":", RefTag, AltTag) |
There was a problem hiding this comment.
VCF FORMAT field construction is wrong here: tab_counts <- paste0(RefTag + AltTag, ":", RefTag, ":", RefTag, AltTag) is missing the comma separator between ref/alt depths in AD (and will concatenate the two numbers). This will produce invalid AD values. Use ..., ":", RefTag, ",", AltTag) like the multiallelic branch above.
| tab_counts <- paste0(RefTag + AltTag, ":", RefTag, ":", RefTag, AltTag) | |
| tab_counts <- paste0(RefTag + AltTag, ":", RefTag, ":", RefTag, ",", AltTag) |
| "verbose= ", verbose,')">') | ||
|
|
||
| # MADC checks | ||
| report <- read.csv(madc_file) |
There was a problem hiding this comment.
read.csv(madc_file) uses default check.names = TRUE, which can alter sample column names (e.g., prefixing X for numeric IDs). Since those column names become VCF sample IDs downstream, this can unintentionally change output. Consider reading with check.names = FALSE (as in madc2vcf_all() / madc2vcf_multi()).
| report <- read.csv(madc_file) | |
| report <- read.csv(madc_file, check.names = FALSE) |
| ##' Verbose Message Utility | ||
| ##' | ||
| ##' Prints a formatted verbose message with timestamp, indentation, and type label, if verbose is TRUE. | ||
| ##' | ||
| ##' @param text Character string, the message to print (supports sprintf formatting). | ||
| ##' @param verbose Logical. If TRUE, prints the message; if FALSE, suppresses output. | ||
| ##' @param level Integer, indentation level (0=header, 1=main step, 2=detail, 3=sub-detail). | ||
| ##' @param type Character string, message type (e.g., "INFO", "WARN", "ERROR"). Only shown for level 0. | ||
| ##' @param ... Additional arguments passed to sprintf for formatting. | ||
| ##' | ||
| ##' @details Use the verbose argument to control message output. Typically, pass the function's verbose parameter to vmsg. | ||
| ##' | ||
| ##' @return No return value, called for side effects. | ||
| ##' @export |
There was a problem hiding this comment.
The vmsg() documentation block uses ##' prefixes. Roxygen2 only parses #', so regenerating docs/NAMESPACE can drop this documentation/export. Convert these lines to standard #' roxygen comments.
| ##' Verbose Message Utility | |
| ##' | |
| ##' Prints a formatted verbose message with timestamp, indentation, and type label, if verbose is TRUE. | |
| ##' | |
| ##' @param text Character string, the message to print (supports sprintf formatting). | |
| ##' @param verbose Logical. If TRUE, prints the message; if FALSE, suppresses output. | |
| ##' @param level Integer, indentation level (0=header, 1=main step, 2=detail, 3=sub-detail). | |
| ##' @param type Character string, message type (e.g., "INFO", "WARN", "ERROR"). Only shown for level 0. | |
| ##' @param ... Additional arguments passed to sprintf for formatting. | |
| ##' | |
| ##' @details Use the verbose argument to control message output. Typically, pass the function's verbose parameter to vmsg. | |
| ##' | |
| ##' @return No return value, called for side effects. | |
| ##' @export | |
| #' Verbose Message Utility | |
| #' | |
| #' Prints a formatted verbose message with timestamp, indentation, and type label, if verbose is TRUE. | |
| #' | |
| #' @param text Character string, the message to print (supports sprintf formatting). | |
| #' @param verbose Logical. If TRUE, prints the message; if FALSE, suppresses output. | |
| #' @param level Integer, indentation level (0=header, 1=main step, 2=detail, 3=sub-detail). | |
| #' @param type Character string, message type (e.g., "INFO", "WARN", "ERROR"). Only shown for level 0. | |
| #' @param ... Additional arguments passed to sprintf for formatting. | |
| #' | |
| #' @details Use the verbose argument to control message output. Typically, pass the function's verbose parameter to vmsg. | |
| #' | |
| #' @return No return value, called for side effects. | |
| #' @export |
R/utils.R
Outdated
| } | ||
| } | ||
| return(list(botloci, report)) | ||
| #' @export |
There was a problem hiding this comment.
url_exists() is tagged with both @noRd and @export (public export but no help page). Either remove @export (keep it internal) or remove @noRd and document it as a supported public helper.
| #' @export |
| # Initialize | ||
| checks <- c(Columns = NA, FixAlleleIDs = NA, IUPACcodes = NA, LowerCase = NA, Indels = NA, | ||
| ChromPos = NA, allNAcol = NA, allNArow = NA, RefAltSeqs = NA, OtherAlleles = NA) | ||
| messages <- list(Columns = NA, FixAlleleIDs = NA, IUPACcodes = NA, LowerCase = NA, Indels = NA, | ||
| ChromPos = NA, allNAcol = NA, allNArow = NA, RefAltSeqs = NA, OtherAlleles = NA) |
There was a problem hiding this comment.
The code adds an OtherAlleles entry to checks/messages, but the top-of-file docs/Rd currently describe only nine checks. After adding OtherAlleles, please keep documentation and checks naming consistent.
| test_that("simu alfalfa",{ | ||
|
|
||
| github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/" | ||
|
|
||
| # External alfalfa test files |
There was a problem hiding this comment.
This outer test_that() wraps multiple additional test_that() blocks later in the file. Nested test_that() calls are not a supported/typical testthat pattern and can cause confusing execution/reporting; split these into separate top-level tests.
| github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/" | ||
|
|
||
| # External alfalfa test files | ||
| alfalfa_madc <- paste0(github_path, "test_madcs/alfalfa_madc.csv") | ||
| alfalfa_madc_wrongID <- paste0(github_path, "test_madcs/alfalfa_madc_wrongID.csv") | ||
| alfalfa_madc_raw <- paste0(github_path, "test_madcs/alfalfa_madc_raw.csv") # raw DArT format (7-row header) | ||
| alfalfa_iupac <- paste0(github_path, "test_madcs/alfalfa_IUPAC.csv") | ||
| alfalfa_lowercase <- paste0(github_path, "test_madcs/alfalfa_lowercase.csv") | ||
| alfalfa_botloci <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_f180bp.botloci") # botloci for alfalfa | ||
| alfalfa_markers_info <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_snpID_lut.csv") # markers_info: CloneID/BI_markerID, Chr, Pos, Ref, Alt | ||
| alfalfa_markers_info_ChromPos <- paste0(github_path, "test_madcs/alfalfa_marker_info_ChromPos.csv") # markers_info: CloneID/BI_markerID, Chr, Pos | ||
|
|
||
|
|
||
| # External potato test files | ||
| potato_indel_madc <- paste0(github_path, "test_madcs/potato_indel_madc.csv") | ||
| potato_indel_iupac <- paste0(github_path, "test_madcs/potato_indel_IUPAC.csv") | ||
| potato_indel_lowercase <- paste0(github_path, "test_madcs/potato_indel_lowercase.csv") | ||
| potato_more_indels_chrompos_false <- paste0(github_path, "test_madcs/potato_more_indels_madc_ChromPosFALSE.csv") | ||
| potato_botloci <- paste0(github_path, "potato/potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_f150bp_ref_alt.botloci") | ||
| potato_markers_info <- paste0(github_path, "potato/potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_rm1dup_snpID_lut.csv") # CloneID/BI_markerID, Chr, Pos, Ref, Alt |
There was a problem hiding this comment.
These tests rely on external fixtures fetched from raw.githubusercontent.com. Even with skip_if_offline(), this makes tests non-deterministic and potentially flaky if URLs/branches change. Prefer vendoring minimal fixtures under inst/extdata/ for reproducible CI/CRAN runs.
| github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/" | |
| # External alfalfa test files | |
| alfalfa_madc <- paste0(github_path, "test_madcs/alfalfa_madc.csv") | |
| alfalfa_madc_wrongID <- paste0(github_path, "test_madcs/alfalfa_madc_wrongID.csv") | |
| alfalfa_madc_raw <- paste0(github_path, "test_madcs/alfalfa_madc_raw.csv") # raw DArT format (7-row header) | |
| alfalfa_iupac <- paste0(github_path, "test_madcs/alfalfa_IUPAC.csv") | |
| alfalfa_lowercase <- paste0(github_path, "test_madcs/alfalfa_lowercase.csv") | |
| alfalfa_botloci <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_f180bp.botloci") # botloci for alfalfa | |
| alfalfa_markers_info <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_snpID_lut.csv") # markers_info: CloneID/BI_markerID, Chr, Pos, Ref, Alt | |
| alfalfa_markers_info_ChromPos <- paste0(github_path, "test_madcs/alfalfa_marker_info_ChromPos.csv") # markers_info: CloneID/BI_markerID, Chr, Pos | |
| # External potato test files | |
| potato_indel_madc <- paste0(github_path, "test_madcs/potato_indel_madc.csv") | |
| potato_indel_iupac <- paste0(github_path, "test_madcs/potato_indel_IUPAC.csv") | |
| potato_indel_lowercase <- paste0(github_path, "test_madcs/potato_indel_lowercase.csv") | |
| potato_more_indels_chrompos_false <- paste0(github_path, "test_madcs/potato_more_indels_madc_ChromPosFALSE.csv") | |
| potato_botloci <- paste0(github_path, "potato/potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_f150bp_ref_alt.botloci") | |
| potato_markers_info <- paste0(github_path, "potato/potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_rm1dup_snpID_lut.csv") # CloneID/BI_markerID, Chr, Pos, Ref, Alt | |
| get_panelhub_test_file <- function(rel_path) { | |
| system.file(file.path("extdata", "BIGapp-PanelHub", rel_path), package = "BIGr") | |
| } | |
| # Alfalfa test files (vendored under inst/extdata/BIGapp-PanelHub/) | |
| alfalfa_madc <- get_panelhub_test_file(file.path("test_madcs", "alfalfa_madc.csv")) | |
| alfalfa_madc_wrongID <- get_panelhub_test_file(file.path("test_madcs", "alfalfa_madc_wrongID.csv")) | |
| alfalfa_madc_raw <- get_panelhub_test_file(file.path("test_madcs", "alfalfa_madc_raw.csv")) # raw DArT format (7-row header) | |
| alfalfa_iupac <- get_panelhub_test_file(file.path("test_madcs", "alfalfa_IUPAC.csv")) | |
| alfalfa_lowercase <- get_panelhub_test_file(file.path("test_madcs", "alfalfa_lowercase.csv")) | |
| alfalfa_botloci <- get_panelhub_test_file(file.path("alfalfa", "20201030-BI-Alfalfa_SNPs_DArTag-probe-design_f180bp.botloci")) # botloci for alfalfa | |
| alfalfa_markers_info <- get_panelhub_test_file(file.path("alfalfa", "20201030-BI-Alfalfa_SNPs_DArTag-probe-design_snpID_lut.csv")) # markers_info: CloneID/BI_markerID, Chr, Pos, Ref, Alt | |
| alfalfa_markers_info_ChromPos <- get_panelhub_test_file(file.path("test_madcs", "alfalfa_marker_info_ChromPos.csv")) # markers_info: CloneID/BI_markerID, Chr, Pos | |
| # Potato test files (vendored under inst/extdata/BIGapp-PanelHub/) | |
| potato_indel_madc <- get_panelhub_test_file(file.path("test_madcs", "potato_indel_madc.csv")) | |
| potato_indel_iupac <- get_panelhub_test_file(file.path("test_madcs", "potato_indel_IUPAC.csv")) | |
| potato_indel_lowercase <- get_panelhub_test_file(file.path("test_madcs", "potato_indel_lowercase.csv")) | |
| potato_more_indels_chrompos_false <- get_panelhub_test_file(file.path("test_madcs", "potato_more_indels_madc_ChromPosFALSE.csv")) | |
| potato_botloci <- get_panelhub_test_file(file.path("potato", "potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_f150bp_ref_alt.botloci")) | |
| potato_markers_info <- get_panelhub_test_file(file.path("potato", "potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_rm1dup_snpID_lut.csv")) # CloneID/BI_markerID, Chr, Pos, Ref, Alt |
| test_that("simu alfalfa",{ | ||
|
|
||
| github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/" | ||
|
|
There was a problem hiding this comment.
This outer test_that() introduces nested test_that() blocks later in the same scope. Refactor into separate top-level tests (and share setup via helper variables/functions) to follow testthat conventions.
| github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/" | ||
|
|
||
| # External alfalfa test files | ||
| alfalfa_madc <- paste0(github_path, "test_madcs/alfalfa_madc.csv") | ||
| alfalfa_madc_wrongID <- paste0(github_path, "test_madcs/alfalfa_madc_wrongID.csv") | ||
| alfalfa_madc_raw <- paste0(github_path, "test_madcs/alfalfa_madc_raw.csv") # raw DArT format (7-row header) | ||
| alfalfa_iupac <- paste0(github_path, "test_madcs/alfalfa_IUPAC.csv") | ||
| alfalfa_lowercase <- paste0(github_path, "test_madcs/alfalfa_lowercase.csv") | ||
| alfalfa_botloci <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_f180bp.botloci") # botloci for alfalfa | ||
| alfalfa_markers_info <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_snpID_lut.csv") # markers_info: CloneID/BI_markerID, Chr, Pos, Ref, Alt | ||
| alfalfa_markers_info_ChromPos <- paste0(github_path, "test_madcs/alfalfa_marker_info_ChromPos.csv") # markers_info: CloneID/BI_markerID, Chr, Pos | ||
| alfalfa_microhapDB <- paste0(github_path, "alfalfa/alfalfa_allele_db_v001.fa") | ||
|
|
||
| # External potato test files | ||
| potato_indel_madc <- paste0(github_path, "test_madcs/potato_indel_madc.csv") | ||
| potato_indel_iupac <- paste0(github_path, "test_madcs/potato_indel_IUPAC.csv") | ||
| potato_indel_lowercase <- paste0(github_path, "test_madcs/potato_indel_lowercase.csv") | ||
| potato_more_indels_chrompos_false <- paste0(github_path, "test_madcs/potato_more_indels_madc_ChromPosFALSE.csv") | ||
| potato_botloci <- paste0(github_path, "potato/potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_f150bp_ref_alt.botloci") |
There was a problem hiding this comment.
These tests download large fixtures from GitHub raw URLs. This can be flaky and will be skipped offline, reducing effective coverage. Consider committing a small set of simulated fixtures into the package repo instead of relying on network access.
| github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/" | |
| # External alfalfa test files | |
| alfalfa_madc <- paste0(github_path, "test_madcs/alfalfa_madc.csv") | |
| alfalfa_madc_wrongID <- paste0(github_path, "test_madcs/alfalfa_madc_wrongID.csv") | |
| alfalfa_madc_raw <- paste0(github_path, "test_madcs/alfalfa_madc_raw.csv") # raw DArT format (7-row header) | |
| alfalfa_iupac <- paste0(github_path, "test_madcs/alfalfa_IUPAC.csv") | |
| alfalfa_lowercase <- paste0(github_path, "test_madcs/alfalfa_lowercase.csv") | |
| alfalfa_botloci <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_f180bp.botloci") # botloci for alfalfa | |
| alfalfa_markers_info <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_snpID_lut.csv") # markers_info: CloneID/BI_markerID, Chr, Pos, Ref, Alt | |
| alfalfa_markers_info_ChromPos <- paste0(github_path, "test_madcs/alfalfa_marker_info_ChromPos.csv") # markers_info: CloneID/BI_markerID, Chr, Pos | |
| alfalfa_microhapDB <- paste0(github_path, "alfalfa/alfalfa_allele_db_v001.fa") | |
| # External potato test files | |
| potato_indel_madc <- paste0(github_path, "test_madcs/potato_indel_madc.csv") | |
| potato_indel_iupac <- paste0(github_path, "test_madcs/potato_indel_IUPAC.csv") | |
| potato_indel_lowercase <- paste0(github_path, "test_madcs/potato_indel_lowercase.csv") | |
| potato_more_indels_chrompos_false <- paste0(github_path, "test_madcs/potato_more_indels_madc_ChromPosFALSE.csv") | |
| potato_botloci <- paste0(github_path, "potato/potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_f150bp_ref_alt.botloci") | |
| # Local alfalfa test files (installed with the BIGr package) | |
| alfalfa_madc <- system.file("test_madcs", "alfalfa_madc.csv", package = "BIGr") | |
| alfalfa_madc_wrongID <- system.file("test_madcs", "alfalfa_madc_wrongID.csv", package = "BIGr") | |
| alfalfa_madc_raw <- system.file("test_madcs", "alfalfa_madc_raw.csv", package = "BIGr") # raw DArT format (7-row header) | |
| alfalfa_iupac <- system.file("test_madcs", "alfalfa_IUPAC.csv", package = "BIGr") | |
| alfalfa_lowercase <- system.file("test_madcs", "alfalfa_lowercase.csv", package = "BIGr") | |
| alfalfa_botloci <- system.file("alfalfa", "20201030-BI-Alfalfa_SNPs_DArTag-probe-design_f180bp.botloci", package = "BIGr") # botloci for alfalfa | |
| alfalfa_markers_info <- system.file("alfalfa", "20201030-BI-Alfalfa_SNPs_DArTag-probe-design_snpID_lut.csv", package = "BIGr") # markers_info: CloneID/BI_markerID, Chr, Pos, Ref, Alt | |
| alfalfa_markers_info_ChromPos <- system.file("test_madcs", "alfalfa_marker_info_ChromPos.csv", package = "BIGr") # markers_info: CloneID/BI_markerID, Chr, Pos | |
| alfalfa_microhapDB <- system.file("alfalfa", "alfalfa_allele_db_v001.fa", package = "BIGr") | |
| # Local potato test files (installed with the BIGr package) | |
| potato_indel_madc <- system.file("test_madcs", "potato_indel_madc.csv", package = "BIGr") | |
| potato_indel_iupac <- system.file("test_madcs", "potato_indel_IUPAC.csv", package = "BIGr") | |
| potato_indel_lowercase <- system.file("test_madcs", "potato_indel_lowercase.csv", package = "BIGr") | |
| potato_more_indels_chrompos_false <- system.file("test_madcs", "potato_more_indels_madc_ChromPosFALSE.csv", package = "BIGr") | |
| potato_botloci <- system.file("potato", "potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_f150bp_ref_alt.botloci", package = "BIGr") |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 29 out of 30 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #Per‑marker stats | ||
|
|
||
| mean_depth_marker <- rowMeans(dp_orig, na.rm = TRUE) | ||
| genotype_present <- !is.na(gt_orig) |
There was a problem hiding this comment.
genotype_present <- !is.na(gt_orig) will treat missing genotypes encoded as "./." as present (vcfR often returns "./." strings, not NA). This inflates genotyping_rate metrics. Consider treating "./." (and possibly ".|.") as missing when computing genotype_present (e.g., !is.na(gt) & gt != './.').
| genotype_present <- !is.na(gt_orig) | |
| genotype_present <- !is.na(gt_orig) & gt_orig != "./." & gt_orig != ".|." |
| cat(paste0("\nDo you want to save the corrected pedigree as dataframe `", corrected_name, "`? (y/n): ")) | ||
| ans <- tolower(trimws(readline())) | ||
| if (ans == "y") { | ||
| assign(corrected_name, data, envir = .GlobalEnv) | ||
| assign("input_ped_report", input_ped_report, envir = .GlobalEnv) | ||
| cat(paste0("Saved corrected pedigree as `", corrected_name, "` and report as `input_ped_report`.\n")) | ||
| } else { | ||
| cat("No corrected pedigree was saved.\n") |
There was a problem hiding this comment.
check_ped() calls readline() whenever verbose = TRUE, which will hang in non-interactive usage (e.g., scripts, CI) even though verbose is about printing. Guard interactive prompts with if (interactive()) (or add an explicit save_corrected argument) and avoid prompting by default.
| cat(paste0("\nDo you want to save the corrected pedigree as dataframe `", corrected_name, "`? (y/n): ")) | |
| ans <- tolower(trimws(readline())) | |
| if (ans == "y") { | |
| assign(corrected_name, data, envir = .GlobalEnv) | |
| assign("input_ped_report", input_ped_report, envir = .GlobalEnv) | |
| cat(paste0("Saved corrected pedigree as `", corrected_name, "` and report as `input_ped_report`.\n")) | |
| } else { | |
| cat("No corrected pedigree was saved.\n") | |
| if (interactive()) { | |
| cat(paste0("\nDo you want to save the corrected pedigree as dataframe `", corrected_name, "`? (y/n): ")) | |
| ans <- tolower(trimws(readline())) | |
| if (ans == "y") { | |
| assign(corrected_name, data, envir = .GlobalEnv) | |
| assign("input_ped_report", input_ped_report, envir = .GlobalEnv) | |
| cat(paste0("Saved corrected pedigree as `", corrected_name, "` and report as `input_ped_report`.\n")) | |
| } else { | |
| cat("No corrected pedigree was saved.\n") | |
| } | |
| } else { | |
| # Non-interactive verbose mode: save automatically without prompting | |
| assign(corrected_name, data, envir = .GlobalEnv) | |
| assign(report_name, input_ped_report, envir = .GlobalEnv) |
|
|
||
| # REF/ALT must be extracted from probe sequences — botloci is required | ||
| if(is.null(botloci_file) || (!file.exists(botloci_file) && !url_exists(botloci_file))) | ||
| stop("get_REF_ALT = TRUE but no markers_info file with Ref and Alt columns was provided neither a botloci_file to extrat REF/ALT from probe sequences. Please provide one of the these files or set get_REF_ALT to FALSE.") |
There was a problem hiding this comment.
Minor typo in this error message: "extrat" -> "extract". Fixing it will also require updating the corresponding expect_error(..., regexp=...) tests that assert the full message.
| stop("get_REF_ALT = TRUE but no markers_info file with Ref and Alt columns was provided neither a botloci_file to extrat REF/ALT from probe sequences. Please provide one of the these files or set get_REF_ALT to FALSE.") | |
| stop("get_REF_ALT = TRUE but no markers_info file with Ref and Alt columns was provided neither a botloci_file to extract REF/ALT from probe sequences. Please provide one of the these files or set get_REF_ALT to FALSE.") |
| ##' Verbose Message Utility | ||
| ##' | ||
| ##' Prints a formatted verbose message with timestamp, indentation, and type label, if verbose is TRUE. | ||
| ##' | ||
| ##' @param text Character string, the message to print (supports sprintf formatting). | ||
| ##' @param verbose Logical. If TRUE, prints the message; if FALSE, suppresses output. | ||
| ##' @param level Integer, indentation level (0=header, 1=main step, 2=detail, 3=sub-detail). | ||
| ##' @param type Character string, message type (e.g., "INFO", "WARN", "ERROR"). Only shown for level 0. | ||
| ##' @param ... Additional arguments passed to sprintf for formatting. | ||
| ##' | ||
| ##' @details Use the verbose argument to control message output. Typically, pass the function's verbose parameter to vmsg. | ||
| ##' | ||
| ##' @return No return value, called for side effects. | ||
| ##' @export |
There was a problem hiding this comment.
The roxygen block for vmsg() starts with ##' instead of #', which roxygen2 won't parse. This will cause man/vmsg.Rd/exports to get out of sync the next time docs are regenerated. Convert these lines to standard #' roxygen comments.
| ##' Verbose Message Utility | |
| ##' | |
| ##' Prints a formatted verbose message with timestamp, indentation, and type label, if verbose is TRUE. | |
| ##' | |
| ##' @param text Character string, the message to print (supports sprintf formatting). | |
| ##' @param verbose Logical. If TRUE, prints the message; if FALSE, suppresses output. | |
| ##' @param level Integer, indentation level (0=header, 1=main step, 2=detail, 3=sub-detail). | |
| ##' @param type Character string, message type (e.g., "INFO", "WARN", "ERROR"). Only shown for level 0. | |
| ##' @param ... Additional arguments passed to sprintf for formatting. | |
| ##' | |
| ##' @details Use the verbose argument to control message output. Typically, pass the function's verbose parameter to vmsg. | |
| ##' | |
| ##' @return No return value, called for side effects. | |
| ##' @export | |
| #' Verbose Message Utility | |
| #' | |
| #' Prints a formatted verbose message with timestamp, indentation, and type label, if verbose is TRUE. | |
| #' | |
| #' @param text Character string, the message to print (supports sprintf formatting). | |
| #' @param verbose Logical. If TRUE, prints the message; if FALSE, suppresses output. | |
| #' @param level Integer, indentation level (0=header, 1=main step, 2=detail, 3=sub-detail). | |
| #' @param type Character string, message type (e.g., "INFO", "WARN", "ERROR"). Only shown for level 0. | |
| #' @param ... Additional arguments passed to sprintf for formatting. | |
| #' | |
| #' @details Use the verbose argument to control message output. Typically, pass the function's verbose parameter to vmsg. | |
| #' | |
| #' @return No return value, called for side effects. | |
| #' @export |
| other_alts_info <- cloneID_unit[idx_other,] | ||
| OtherTag_list <- list() | ||
| total <- rep(0, length(RefTag)) | ||
| ads <- vector() |
There was a problem hiding this comment.
ads <- vector() initializes a zero-length vector; paste0(ads, ",", ads_temp) will remain zero-length, which can make info_mk become character(0) and break VCF row construction when "Other" alleles are processed. Initialize ads as a length-1 character (e.g., "") or build ADS via a character accumulator + paste(..., collapse=",").
| ads <- vector() | |
| ads <- "" |
| github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/" | ||
|
|
||
| # External alfalfa test files | ||
| alfalfa_madc <- paste0(github_path, "test_madcs/alfalfa_madc.csv") | ||
| alfalfa_madc_wrongID <- paste0(github_path, "test_madcs/alfalfa_madc_wrongID.csv") |
There was a problem hiding this comment.
These tests depend on remote files from a moving GitHub branch (.../refs/heads/long_seq/...), making the test suite non-reproducible and potentially flaky if upstream content changes. Prefer vendoring fixtures in this repo or pinning to an immutable commit SHA.
| test_that("simu alfalfa",{ | ||
|
|
||
| github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/" | ||
|
|
||
| # External alfalfa test files |
There was a problem hiding this comment.
This outer test_that() wraps many additional test_that() calls below (nested tests). testthat does not support nested test_that() blocks reliably. Split these into separate top-level tests (or consolidate into a single test_that() using local helpers).
| github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/" | ||
|
|
||
| # External alfalfa test files | ||
| alfalfa_madc <- paste0(github_path, "test_madcs/alfalfa_madc.csv") | ||
| alfalfa_madc_wrongID <- paste0(github_path, "test_madcs/alfalfa_madc_wrongID.csv") |
There was a problem hiding this comment.
This test suite pulls fixtures from a moving GitHub branch (.../refs/heads/long_seq/...). Even with skip_if_offline(), results can change when upstream files change, making tests flaky. Prefer vendoring the test files in this repo or pinning URLs to a commit SHA.
| github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/" | ||
|
|
||
| # External alfalfa test files | ||
| alfalfa_madc <- paste0(github_path, "test_madcs/alfalfa_madc.csv") | ||
| alfalfa_madc_wrongID <- paste0(github_path, "test_madcs/alfalfa_madc_wrongID.csv") | ||
| alfalfa_madc_raw <- paste0(github_path, "test_madcs/alfalfa_madc_raw.csv") # raw DArT format (7-row header) |
There was a problem hiding this comment.
This test depends on remote fixtures from raw.githubusercontent.com/.../refs/heads/long_seq/..., which can change over time and make tests non-reproducible. Prefer vendoring fixtures locally or pinning to an immutable commit SHA.
| #' rows/columns, and sets `n.header.rows` automatically based on whether the | ||
| #' MADC file follows the raw DArT format (6 header rows) or the fixed allele ID | ||
| #' format (no header rows). |
There was a problem hiding this comment.
The documentation claims n.header.rows is set automatically based on raw vs fixed MADC format, but the implementation always calls polyRAD::readDArTag(..., n.header.rows = 0L) and also rejects raw MADCs earlier. Update the docstring to match the actual behavior (expects fixed-AlleleID format).
| #' rows/columns, and sets `n.header.rows` automatically based on whether the | |
| #' MADC file follows the raw DArT format (6 header rows) or the fixed allele ID | |
| #' format (no header rows). | |
| #' rows/columns, and passes the data to \code{polyRAD::readDArTag} assuming a | |
| #' fixed AlleleID MADC format with no header rows (i.e., \code{n.header.rows = 0}). | |
| #' Raw DArT-format MADCs with six header rows are not supported by this function. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## development #54 +/- ##
===============================================
- Coverage 83.05% 77.71% -5.35%
===============================================
Files 19 21 +2
Lines 1369 2176 +807
===============================================
+ Hits 1137 1691 +554
- Misses 232 485 +253 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Updates on madc2vcf functions
Details:
verbose = TRUE, the functions output informative messages along the processcheck_madc_sanityfunction implemented. It tests:madc2vcf_targetsdoesn’t run if:madc2vcf_allin case they want to extract them as wellmadc2vcf_targetsgot a new argument:collapse_matches_counts, if TRUE, it collapses the read counts of the RefMatch to Ref and AltMatch to Alt. Default is FALSE.madc2vcf_allhave three new arguments:add_others,others_max_snps,others_rm_with_indelsUsers now have the option to generate multiallelic VCF - new function
madc2vcf_multimadc2vcf_alldoesn’t run if:See the table for
madc2vcf_allrequirements accordingly to MADC content:madc2vcf_multidoesn’t run if:See the table for
madc2vcf_allrequirements accordingly to MADC content: