Madc2vcf and Pedigree functions updates by Cristianetaniguti · Pull Request #54 · Breeding-Insight/BIGr

Cristianetaniguti · 2026-03-27T13:47:15Z

Updates on madc2vcf functions

Details:

If verbose = TRUE, the functions output informative messages along the process
both functions (targets and all (targets + off-targets) markers now have check_madc_sanity function implemented. It tests:
- [Columns] If MADC has the expected columns
- [allNArow | allNAcol] Presence of columns and rows with all NA (happens often when people open the MADC in excel before loading in R)
- [IUPACcodes] Presence of IUPAC codes on AlleleSequence
- [LowerCase] Presence of lower case bases on AlleleSequence
- [Indels] Presence of Indels
- [ChromPos] If CloneID follows the format Chr_Pos
- [RefAltSeqs] If all Ref Allele has corresponding Alt and vice-versa
- [OtherAlleles] If "Other" exists in the MADC AlleleID
madc2vcf_targets doesn’t run if:
- MADC Column names are not correct
- Ignore Other alleles - but inform the user if they exist or not and direct them to madc2vcf_all in case they want to extract them as well
See the table for madc2vcf_targets requirements accordingly to MADC content:

	check status	get_REF_ALT	Requires
IUPAC	TRUE	TRUE	markers_info REF/ALT
	TRUE	FALSE	-
	FALSE	TRUE	botloci or markers_info REF/ALT
	FALSE	FALSE	-
Indels	TRUE	TRUE	markers_info REF/ALT
	TRUE	FALSE	-
	FALSE	TRUE	botloci or markers_info REF/ALT
	FALSE	FALSE	-
ChromPos	TRUE	TRUE	botloci or markers_info REF/ALT
	TRUE	FALSE	-
	FALSE	TRUE	markers_info CHR/POS/REF/ALT or markers_info CHR/POS/ + botloci
	FALSE	FALSE	markers_info CHR/POS
FixAlleleIDs	TRUE	TRUE	botloci or markers_info REF/ALT
	TRUE	FALSE	-
	FALSE	TRUE	markers_info REF/ALT
	FALSE	FALSE	-

madc2vcf_targets got a new argument: collapse_matches_counts, if TRUE, it collapses the read counts of the RefMatch to Ref and AltMatch to Alt. Default is FALSE.
madc2vcf_all have three new arguments: add_others, others_max_snps, others_rm_with_indels
Users now have the option to generate multiallelic VCF - new function madc2vcf_multi
madc2vcf_all doesn’t run if:
- MADC Column names are not correct
- If it is raw MADC
- If it has IUPAC codes
See the table for madc2vcf_all requirements accordingly to MADC content:

	Check status	Requires
Indels	TRUE	markers_info REF/ALT/IndelPos/IndelLenght + botloci
	FALSE	botloci
ChromPos	TRUE	botloci
	FALSE	markers_info CHR/POS + botloci
RefAltSeqs	TRUE	botloci
	FALSE	botloci + microhapdb

madc2vcf_multi doesn’t run if:
- MADC Column names are not correct
- If it is raw MADC
- If it has IUPAC codes
- if there are absent Ref or Alt tags
See the table for madc2vcf_all requirements accordingly to MADC content:

	Check status	Requires
Indels	TRUE	botloci
	FALSE	botloci
ChromPos	TRUE	botloci
	FALSE	markers_info CHR/POS + botloci

test scripts were added and simulated MADC files were created to test all scenarios in all madc2vcf functions. Test files are located at: https://github.com/Breeding-Insight/BIGapp-PanelHub/tree/long_seq/test_madcs
updated version, packages Imports in DESCRIPTION and functions documentations

…t/BIGr into ped_indels_update

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Madc2vcf updates

Copilot

Pull request overview

This PR enhances BIGr’s MADC→VCF conversion and pedigree utilities by adding structured verbose messaging, stronger MADC sanity validation, and new/refined arguments and documentation to support more robust input handling.

Changes:

Added vmsg() and integrated verbose, stepwise messaging into madc2vcf_targets(), get_countsMADC(), and madc2vcf_all().
Introduced check_madc_sanity() and integrated it into MADC→VCF workflows, plus expanded tests and roxygen docs.
Extended APIs (madc2vcf_targets(), get_countsMADC(), madc2vcf_all(), imputation_concordance(), filterVCF(), check_ped()) with new parameters and updated manuals.

Reviewed changes

Copilot reviewed 23 out of 24 changed files in this pull request and generated 22 comments.

Show a summary per file

File	Description
tests/testthat/test-madc2vcf_targets.R	Expands MADC→VCF target tests using external PanelHub fixtures.
tests/testthat/test-check_madc_sanity.R	Adds tests for `check_madc_sanity()` using external fixtures.
man/vmsg.Rd	Documents new verbose message helper.
man/madc2vcf_targets.Rd	Updates targets conversion documentation for new args and behavior.
man/madc2vcf_all.Rd	Adds `markers_info` argument to docs.
man/imputation_concordance.Rd	Documents new plotting/printing options.
man/get_countsMADC.Rd	Updates docs for new args and behavior.
man/get_counts.Rd	Adds internal docs for new helper `get_counts()`.
man/filterVCF.Rd	Documents new `quality.rates` parameter and example edits.
man/check_ped.Rd	Updates pedigree check docs to reflect new behavior/output.
man/check_madc_sanity.Rd	Documents `check_madc_sanity()` checks and return structure.
R/utils.R	Adds `vmsg()` and `url_exists()`.
R/madc2vcf_targets.R	Refactors `madc2vcf_targets()` with sanity checks, markers_info support, new args, and verbose metadata header.
R/madc2vcf_all.R	Adds input validation, sanity checks, and `markers_info` support in validation flow.
R/imputation_concordance.R	Adds `plot` and `print_result` options and ggplot2-based plotting.
R/get_countsMADC.R	Adds `madc_object`, collapsing behavior, verbose messaging; introduces `get_counts()` helper.
R/filterVCF.R	Adds `quality.rates` reporting and adjusts I/O/filter logging.
R/check_ped.R	Refactors pedigree validation and introduces optional interactive/global save behavior.
R/check_madc_sanity.R	Adds new exported sanity checker and an updated `check_botloci()` implementation.
NEWS.md	Adds release notes for 0.6.4 and related changes.
NAMESPACE	Exports new functions and adds new imports.
DESCRIPTION	Updates authorship/copyright entry and roxygen note version.
BIGr.Rproj	Adds `ProjectId`.
.gitignore	Adds `.DS_Store`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

R/madc2vcf_targets.R

Copilot · 2026-03-27T13:57:06Z

tests/testthat/test-madc2vcf_targets.R

+  test_that("ALFALFA — clean fixed allele ID MADC", {
+    out <- tempfile(fileext = ".vcf")
+    expect_no_error(


Multiple test_that() calls are nested inside the outer test_that("simu alfalfa", ...). testthat doesn’t reliably support nested test_that() and this can cause misreporting or skipped tests. Split the inner blocks into separate top-level test_that() calls (and share fixture setup via helper code).

Copilot · 2026-03-27T13:57:06Z

R/filterVCF.R

+#This is not reliable, so no longer use this shortcut to get dosage matrix
+#test2 <- vcfR2genlight(vcf)
+
+
+#####Testing custom VCF reading function######
+# Open the gzipped VCF file
+#con <- gzfile("/Users/ams866/Desktop/output.vcf", "rt")
+
+# Read in the entire file
+#lines <- readLines(con)
+#close(con)
+# Read in the entire file
+#lines <- readLines("/Users/ams866/Desktop/output.vcf")
+# Filter out lines that start with ##
+#filtered_lines <- lines[!grepl("^##", lines)]
+# Create a temporary file to write the filtered lines
+#temp_file <- tempfile()
+#writeLines(filtered_lines, temp_file)
+# Read in the filtered data using read.table or read.csv
+#vcf_data <- read.table(temp_file, header = TRUE, sep = "\t", comment.char = "", check.names = FALSE)
+# Clean up the temporary file
+#unlink(temp_file)
+
+##Extract INFO column and Filter SNPs by those values
+#Update the filtering options by the items present in the INFO column?
+
+# Load required library
+#library(dplyr)
+
+# Split INFO column into key-value pairs


There is a large block of commented-out experimental code (including local file paths) kept at the end of this exported source file. This makes maintenance harder and adds noise to the package. Consider moving it to a vignette/dev note or removing it once the approach is finalized.

Suggested change

#This is not reliable, so no longer use this shortcut to get dosage matrix

#test2 <- vcfR2genlight(vcf)

#####Testing custom VCF reading function######

# Open the gzipped VCF file

#con <- gzfile("/Users/ams866/Desktop/output.vcf", "rt")

# Read in the entire file

#lines <- readLines(con)

#close(con)

# Read in the entire file

#lines <- readLines("/Users/ams866/Desktop/output.vcf")

# Filter out lines that start with ##

#filtered_lines <- lines[!grepl("^##", lines)]

# Create a temporary file to write the filtered lines

#temp_file <- tempfile()

#writeLines(filtered_lines, temp_file)

# Read in the filtered data using read.table or read.csv

#vcf_data <- read.table(temp_file, header = TRUE, sep = "\t", comment.char = "", check.names = FALSE)

# Clean up the temporary file

#unlink(temp_file)

##Extract INFO column and Filter SNPs by those values

#Update the filtering options by the items present in the INFO column?

# Load required library

#library(dplyr)

# Split INFO column into key-value pairs

Copilot · 2026-03-27T13:57:06Z

R/imputation_concordance.R

+#' @examples
+#' result <- imputation_concordance(
+#'   reference_genos = ref,
+#'   imputed_genos = test,
+#'   snps_2_exclude = snps,


The roxygen examples reference ref, test, and snps, but these objects aren’t defined in the example block. This will fail during R CMD check example execution. Make the example self-contained or wrap it in \dontrun{} / \donttest{}.

Copilot · 2026-03-27T13:57:06Z

man/imputation_concordance.Rd

+result <- imputation_concordance(
+  reference_genos = ref,
+  imputed_genos = test,
+  snps_2_exclude = snps,
+  missing_code = 5,
+  verbose = TRUE,
+  plot = TRUE
+)



The Rd example calls imputation_concordance(reference_genos = ref, imputed_genos = test, ...) but ref, test, and snps are undefined in the example section. This will fail R CMD check. Provide a runnable minimal example or guard with \dontrun{} / \donttest{}.

Suggested change

result <- imputation_concordance(

reference_genos = ref,

imputed_genos = test,

snps_2_exclude = snps,

missing_code = 5,

verbose = TRUE,

plot = TRUE

)

# Minimal example data

ref <- data.frame(

ID = paste0("sample", 1:3),

snp1 = c(0, 1, 2),

snp2 = c(2, 2, 0)

)

test <- data.frame(

ID = paste0("sample", 1:3),

snp1 = c(0, 1, 2),

snp2 = c(2, 1, 0)

)

# Exclude a SNP from the concordance calculation

snps <- "snp2"

result <- imputation_concordance(

reference_genos = ref,

imputed_genos = test,

snps_2_exclude = snps,

missing_code = NULL,

verbose = TRUE,

plot = FALSE

)

R/check_madc_sanity.R

Copilot · 2026-03-27T13:57:10Z

R/check_madc_sanity.R

+      first <- all(grepl("^[A-Za-z]", sapply(pos, "[", 1)))
+      second <- suppressWarnings(all(sapply(pos, function(x) as.numeric(x[2])) > 0))


ChromPos validation doesn’t match the documented Chr_Pos requirement: it currently only checks the first token starts with any letter and can yield NA when as.numeric() returns NAs (because all() over NAs yields NA). This can force downstream code into the "ChromPos invalid" path unexpectedly. Tighten the prefix check (e.g., ^chr case-insensitive) and coerce non-numeric positions to FALSE (not NA).

Suggested change

first <- all(grepl("^[A-Za-z]", sapply(pos, "[", 1)))

second <- suppressWarnings(all(sapply(pos, function(x) as.numeric(x[2])) > 0))

first_tokens <- sapply(pos, "[", 1)

second_tokens <- sapply(pos, "[", 2)

first <- all(grepl("^chr", first_tokens, ignore.case = TRUE))

suppressWarnings(num_pos <- as.numeric(second_tokens))

second <- all(!is.na(num_pos) & num_pos > 0)

R/get_countsMADC.R

Copilot · 2026-03-27T13:57:10Z

R/check_ped.R

+    #### interactive save ####
+    cat(paste0("\nDo you want to save the corrected pedigree as dataframe `", corrected_name, "`? (y/n): "))
+    ans <- tolower(trimws(readline()))
+    if (ans == "y") {


check_ped() prompts with readline() when verbose = TRUE (the default). This will hang in non-interactive runs (CI, scripts, Shiny). Gate prompting behind interactive() and/or add an explicit argument controlling whether to prompt/save.

Copilot · 2026-03-27T13:57:10Z

R/madc2vcf_all.R

                      "verbose= ", verbose,')">')

-  if(!is.null(madc)) report <- read.csv(madc, check.names = FALSE) else stop("Please provide a MADC file")
+  report <- read.csv(madc, check.names = FALSE)
+  checks <- check_madc_sanity(report)


madc still defaults to NULL, but the function now calls read.csv(madc, ...) unconditionally. If madc is omitted, this will fail with a cryptic read.csv error instead of a clear message. Add an explicit is.null(madc) check that stop()s with a helpful error before reading.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…ht/BIGr into ped_indels_update

Copilot

Pull request overview

Copilot reviewed 28 out of 29 changed files in this pull request and generated 16 comments.

Comments suppressed due to low confidence (1)

R/madc2vcf_all.R:503

Bug in chromosome parsing: if(length(chr > 1)) compares a character vector to a number, producing NAs and warnings; and the if() condition will always evaluate as TRUE for length>=1. This should be if (length(chr) > 1) chr <- paste(chr, collapse = "_").

  chr <- sapply(strsplit(cloneID, "_"), function(x) x[-length(x)])
  if(length(chr > 1)) chr <- paste(chr, collapse = "_")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-01T18:47:42Z

R/filterVCF.R

-    if (!is.null(output.file)) {
-      output_name <- paste0(output.file, ".vcf.gz")
+  cat("Exporting VCF\n")
+  if (!class(vcf.file) == "vcfR"){


This condition is invalid: if (!class(vcf.file) == "vcfR") applies ! to a character vector, which will error at runtime. Use !inherits(vcf.file, "vcfR") or inherits(vcf.file, "vcfR")/class(vcf.file) != "vcfR" instead.

Suggested change

if (!class(vcf.file) == "vcfR"){

if (!inherits(vcf.file, "vcfR")){

Copilot · 2026-04-01T18:47:43Z

R/madc2vcf_all.R

+    info_mk <- paste0("DP=", sum(c(RefTag, AltTag,total)),";",
+                        "ADS=",sum(RefTag),",",sum(AltTag), ads)
+  } else {
+    tab_counts <- paste0(RefTag + AltTag, ":", RefTag, ":", RefTag, AltTag)


VCF FORMAT field construction is wrong here: tab_counts <- paste0(RefTag + AltTag, ":", RefTag, ":", RefTag, AltTag) is missing the comma separator between ref/alt depths in AD (and will concatenate the two numbers). This will produce invalid AD values. Use ..., ":", RefTag, ",", AltTag) like the multiallelic branch above.

Suggested change

tab_counts <- paste0(RefTag + AltTag, ":", RefTag, ":", RefTag, AltTag)

tab_counts <- paste0(RefTag + AltTag, ":", RefTag, ":", RefTag, ",", AltTag)

Copilot · 2026-04-01T18:47:43Z

R/madc2vcf_targets.R

+                      "verbose= ", verbose,')">')
+
+  # MADC checks
+  report <- read.csv(madc_file)


read.csv(madc_file) uses default check.names = TRUE, which can alter sample column names (e.g., prefixing X for numeric IDs). Since those column names become VCF sample IDs downstream, this can unintentionally change output. Consider reading with check.names = FALSE (as in madc2vcf_all() / madc2vcf_multi()).

Suggested change

report <- read.csv(madc_file)

report <- read.csv(madc_file, check.names = FALSE)

Copilot · 2026-04-01T18:47:43Z

R/utils.R

+##' Verbose Message Utility
+##'
+##' Prints a formatted verbose message with timestamp, indentation, and type label, if verbose is TRUE.
+##'
+##' @param text Character string, the message to print (supports sprintf formatting).
+##' @param verbose Logical. If TRUE, prints the message; if FALSE, suppresses output.
+##' @param level Integer, indentation level (0=header, 1=main step, 2=detail, 3=sub-detail).
+##' @param type Character string, message type (e.g., "INFO", "WARN", "ERROR"). Only shown for level 0.
+##' @param ... Additional arguments passed to sprintf for formatting.
+##'
+##' @details Use the verbose argument to control message output. Typically, pass the function's verbose parameter to vmsg.
+##'
+##' @return No return value, called for side effects.
+##' @export


The vmsg() documentation block uses ##' prefixes. Roxygen2 only parses #', so regenerating docs/NAMESPACE can drop this documentation/export. Convert these lines to standard #' roxygen comments.

Suggested change

##' Verbose Message Utility

##'

##' Prints a formatted verbose message with timestamp, indentation, and type label, if verbose is TRUE.

##'

##' @param text Character string, the message to print (supports sprintf formatting).

##' @param verbose Logical. If TRUE, prints the message; if FALSE, suppresses output.

##' @param level Integer, indentation level (0=header, 1=main step, 2=detail, 3=sub-detail).

##' @param type Character string, message type (e.g., "INFO", "WARN", "ERROR"). Only shown for level 0.

##' @param ... Additional arguments passed to sprintf for formatting.

##'

##' @details Use the verbose argument to control message output. Typically, pass the function's verbose parameter to vmsg.

##'

##' @return No return value, called for side effects.

##' @export

#' Verbose Message Utility

#'

#' Prints a formatted verbose message with timestamp, indentation, and type label, if verbose is TRUE.

#'

#' @param text Character string, the message to print (supports sprintf formatting).

#' @param verbose Logical. If TRUE, prints the message; if FALSE, suppresses output.

#' @param level Integer, indentation level (0=header, 1=main step, 2=detail, 3=sub-detail).

#' @param type Character string, message type (e.g., "INFO", "WARN", "ERROR"). Only shown for level 0.

#' @param ... Additional arguments passed to sprintf for formatting.

#'

#' @details Use the verbose argument to control message output. Typically, pass the function's verbose parameter to vmsg.

#'

#' @return No return value, called for side effects.

#' @export

Copilot · 2026-04-01T18:47:44Z

R/utils.R

-    }
-  }
-  return(list(botloci, report))
+#' @export 


url_exists() is tagged with both @noRd and @export (public export but no help page). Either remove @export (keep it internal) or remove @noRd and document it as a supported public helper.

Suggested change

#' @export

Copilot · 2026-04-01T18:47:45Z

R/check_madc_sanity.R

+  # Initialize
+  checks <- c(Columns = NA, FixAlleleIDs = NA, IUPACcodes = NA, LowerCase = NA, Indels = NA,
+              ChromPos = NA, allNAcol = NA, allNArow = NA, RefAltSeqs = NA, OtherAlleles = NA)
+  messages <-  list(Columns = NA, FixAlleleIDs = NA, IUPACcodes = NA, LowerCase = NA, Indels = NA,
+                    ChromPos = NA, allNAcol = NA, allNArow = NA, RefAltSeqs = NA, OtherAlleles = NA)


The code adds an OtherAlleles entry to checks/messages, but the top-of-file docs/Rd currently describe only nine checks. After adding OtherAlleles, please keep documentation and checks naming consistent.

Copilot · 2026-04-01T18:47:46Z

tests/testthat/test-madc2vcf_targets.R

+test_that("simu alfalfa",{
+
+  github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/"
+
+  # External alfalfa test files


This outer test_that() wraps multiple additional test_that() blocks later in the file. Nested test_that() calls are not a supported/typical testthat pattern and can cause confusing execution/reporting; split these into separate top-level tests.

Copilot · 2026-04-01T18:47:46Z

tests/testthat/test-madc2vcf_targets.R

+  github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/"
+
+  # External alfalfa test files
+  alfalfa_madc           <- paste0(github_path, "test_madcs/alfalfa_madc.csv")
+  alfalfa_madc_wrongID   <- paste0(github_path, "test_madcs/alfalfa_madc_wrongID.csv")
+  alfalfa_madc_raw       <- paste0(github_path, "test_madcs/alfalfa_madc_raw.csv")       # raw DArT format (7-row header)
+  alfalfa_iupac          <- paste0(github_path, "test_madcs/alfalfa_IUPAC.csv")
+  alfalfa_lowercase      <- paste0(github_path, "test_madcs/alfalfa_lowercase.csv")
+  alfalfa_botloci        <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_f180bp.botloci")          # botloci for alfalfa
+  alfalfa_markers_info   <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_snpID_lut.csv") # markers_info: CloneID/BI_markerID, Chr, Pos, Ref, Alt
+  alfalfa_markers_info_ChromPos   <- paste0(github_path, "test_madcs/alfalfa_marker_info_ChromPos.csv") # markers_info: CloneID/BI_markerID, Chr, Pos
+
+
+  # External potato test files
+  potato_indel_madc                 <- paste0(github_path, "test_madcs/potato_indel_madc.csv")
+  potato_indel_iupac                <- paste0(github_path, "test_madcs/potato_indel_IUPAC.csv")
+  potato_indel_lowercase            <- paste0(github_path, "test_madcs/potato_indel_lowercase.csv")
+  potato_more_indels_chrompos_false <- paste0(github_path, "test_madcs/potato_more_indels_madc_ChromPosFALSE.csv")
+  potato_botloci                    <- paste0(github_path, "potato/potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_f150bp_ref_alt.botloci")
+  potato_markers_info               <- paste0(github_path, "potato/potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_rm1dup_snpID_lut.csv") # CloneID/BI_markerID, Chr, Pos, Ref, Alt


These tests rely on external fixtures fetched from raw.githubusercontent.com. Even with skip_if_offline(), this makes tests non-deterministic and potentially flaky if URLs/branches change. Prefer vendoring minimal fixtures under inst/extdata/ for reproducible CI/CRAN runs.

Suggested change

github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/"

# External alfalfa test files

alfalfa_madc <- paste0(github_path, "test_madcs/alfalfa_madc.csv")

alfalfa_madc_wrongID <- paste0(github_path, "test_madcs/alfalfa_madc_wrongID.csv")

alfalfa_madc_raw <- paste0(github_path, "test_madcs/alfalfa_madc_raw.csv") # raw DArT format (7-row header)

alfalfa_iupac <- paste0(github_path, "test_madcs/alfalfa_IUPAC.csv")

alfalfa_lowercase <- paste0(github_path, "test_madcs/alfalfa_lowercase.csv")

alfalfa_botloci <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_f180bp.botloci") # botloci for alfalfa

alfalfa_markers_info <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_snpID_lut.csv") # markers_info: CloneID/BI_markerID, Chr, Pos, Ref, Alt

alfalfa_markers_info_ChromPos <- paste0(github_path, "test_madcs/alfalfa_marker_info_ChromPos.csv") # markers_info: CloneID/BI_markerID, Chr, Pos

# External potato test files

potato_indel_madc <- paste0(github_path, "test_madcs/potato_indel_madc.csv")

potato_indel_iupac <- paste0(github_path, "test_madcs/potato_indel_IUPAC.csv")

potato_indel_lowercase <- paste0(github_path, "test_madcs/potato_indel_lowercase.csv")

potato_more_indels_chrompos_false <- paste0(github_path, "test_madcs/potato_more_indels_madc_ChromPosFALSE.csv")

potato_botloci <- paste0(github_path, "potato/potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_f150bp_ref_alt.botloci")

potato_markers_info <- paste0(github_path, "potato/potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_rm1dup_snpID_lut.csv") # CloneID/BI_markerID, Chr, Pos, Ref, Alt

get_panelhub_test_file <- function(rel_path) {

system.file(file.path("extdata", "BIGapp-PanelHub", rel_path), package = "BIGr")

}

# Alfalfa test files (vendored under inst/extdata/BIGapp-PanelHub/)

alfalfa_madc <- get_panelhub_test_file(file.path("test_madcs", "alfalfa_madc.csv"))

alfalfa_madc_wrongID <- get_panelhub_test_file(file.path("test_madcs", "alfalfa_madc_wrongID.csv"))

alfalfa_madc_raw <- get_panelhub_test_file(file.path("test_madcs", "alfalfa_madc_raw.csv")) # raw DArT format (7-row header)

alfalfa_iupac <- get_panelhub_test_file(file.path("test_madcs", "alfalfa_IUPAC.csv"))

alfalfa_lowercase <- get_panelhub_test_file(file.path("test_madcs", "alfalfa_lowercase.csv"))

alfalfa_botloci <- get_panelhub_test_file(file.path("alfalfa", "20201030-BI-Alfalfa_SNPs_DArTag-probe-design_f180bp.botloci")) # botloci for alfalfa

alfalfa_markers_info <- get_panelhub_test_file(file.path("alfalfa", "20201030-BI-Alfalfa_SNPs_DArTag-probe-design_snpID_lut.csv")) # markers_info: CloneID/BI_markerID, Chr, Pos, Ref, Alt

alfalfa_markers_info_ChromPos <- get_panelhub_test_file(file.path("test_madcs", "alfalfa_marker_info_ChromPos.csv")) # markers_info: CloneID/BI_markerID, Chr, Pos

# Potato test files (vendored under inst/extdata/BIGapp-PanelHub/)

potato_indel_madc <- get_panelhub_test_file(file.path("test_madcs", "potato_indel_madc.csv"))

potato_indel_iupac <- get_panelhub_test_file(file.path("test_madcs", "potato_indel_IUPAC.csv"))

potato_indel_lowercase <- get_panelhub_test_file(file.path("test_madcs", "potato_indel_lowercase.csv"))

potato_more_indels_chrompos_false <- get_panelhub_test_file(file.path("test_madcs", "potato_more_indels_madc_ChromPosFALSE.csv"))

potato_botloci <- get_panelhub_test_file(file.path("potato", "potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_f150bp_ref_alt.botloci"))

potato_markers_info <- get_panelhub_test_file(file.path("potato", "potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_rm1dup_snpID_lut.csv")) # CloneID/BI_markerID, Chr, Pos, Ref, Alt

Copilot · 2026-04-01T18:47:46Z

tests/testthat/test-madc2vcf_all.R

+test_that("simu alfalfa",{
+
+  github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/"
+


This outer test_that() introduces nested test_that() blocks later in the same scope. Refactor into separate top-level tests (and share setup via helper variables/functions) to follow testthat conventions.

Copilot · 2026-04-01T18:47:46Z

tests/testthat/test-madc2vcf_all.R

+  github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/"
+
+  # External alfalfa test files
+  alfalfa_madc           <- paste0(github_path, "test_madcs/alfalfa_madc.csv")
+  alfalfa_madc_wrongID   <- paste0(github_path, "test_madcs/alfalfa_madc_wrongID.csv")
+  alfalfa_madc_raw       <- paste0(github_path, "test_madcs/alfalfa_madc_raw.csv")       # raw DArT format (7-row header)
+  alfalfa_iupac          <- paste0(github_path, "test_madcs/alfalfa_IUPAC.csv")
+  alfalfa_lowercase      <- paste0(github_path, "test_madcs/alfalfa_lowercase.csv")
+  alfalfa_botloci        <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_f180bp.botloci")          # botloci for alfalfa
+  alfalfa_markers_info   <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_snpID_lut.csv") # markers_info: CloneID/BI_markerID, Chr, Pos, Ref, Alt
+  alfalfa_markers_info_ChromPos   <- paste0(github_path, "test_madcs/alfalfa_marker_info_ChromPos.csv") # markers_info: CloneID/BI_markerID, Chr, Pos
+  alfalfa_microhapDB <- paste0(github_path, "alfalfa/alfalfa_allele_db_v001.fa")
+
+  # External potato test files
+  potato_indel_madc                 <- paste0(github_path, "test_madcs/potato_indel_madc.csv")
+  potato_indel_iupac                <- paste0(github_path, "test_madcs/potato_indel_IUPAC.csv")
+  potato_indel_lowercase            <- paste0(github_path, "test_madcs/potato_indel_lowercase.csv")
+  potato_more_indels_chrompos_false <- paste0(github_path, "test_madcs/potato_more_indels_madc_ChromPosFALSE.csv")
+  potato_botloci                    <- paste0(github_path, "potato/potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_f150bp_ref_alt.botloci")


These tests download large fixtures from GitHub raw URLs. This can be flaky and will be skipped offline, reducing effective coverage. Consider committing a small set of simulated fixtures into the package repo instead of relying on network access.

Suggested change

github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/"

# External alfalfa test files

alfalfa_madc <- paste0(github_path, "test_madcs/alfalfa_madc.csv")

alfalfa_madc_wrongID <- paste0(github_path, "test_madcs/alfalfa_madc_wrongID.csv")

alfalfa_madc_raw <- paste0(github_path, "test_madcs/alfalfa_madc_raw.csv") # raw DArT format (7-row header)

alfalfa_iupac <- paste0(github_path, "test_madcs/alfalfa_IUPAC.csv")

alfalfa_lowercase <- paste0(github_path, "test_madcs/alfalfa_lowercase.csv")

alfalfa_botloci <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_f180bp.botloci") # botloci for alfalfa

alfalfa_markers_info <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_snpID_lut.csv") # markers_info: CloneID/BI_markerID, Chr, Pos, Ref, Alt

alfalfa_markers_info_ChromPos <- paste0(github_path, "test_madcs/alfalfa_marker_info_ChromPos.csv") # markers_info: CloneID/BI_markerID, Chr, Pos

alfalfa_microhapDB <- paste0(github_path, "alfalfa/alfalfa_allele_db_v001.fa")

# External potato test files

potato_indel_madc <- paste0(github_path, "test_madcs/potato_indel_madc.csv")

potato_indel_iupac <- paste0(github_path, "test_madcs/potato_indel_IUPAC.csv")

potato_indel_lowercase <- paste0(github_path, "test_madcs/potato_indel_lowercase.csv")

potato_more_indels_chrompos_false <- paste0(github_path, "test_madcs/potato_more_indels_madc_ChromPosFALSE.csv")

potato_botloci <- paste0(github_path, "potato/potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_f150bp_ref_alt.botloci")

# Local alfalfa test files (installed with the BIGr package)

alfalfa_madc <- system.file("test_madcs", "alfalfa_madc.csv", package = "BIGr")

alfalfa_madc_wrongID <- system.file("test_madcs", "alfalfa_madc_wrongID.csv", package = "BIGr")

alfalfa_madc_raw <- system.file("test_madcs", "alfalfa_madc_raw.csv", package = "BIGr") # raw DArT format (7-row header)

alfalfa_iupac <- system.file("test_madcs", "alfalfa_IUPAC.csv", package = "BIGr")

alfalfa_lowercase <- system.file("test_madcs", "alfalfa_lowercase.csv", package = "BIGr")

alfalfa_botloci <- system.file("alfalfa", "20201030-BI-Alfalfa_SNPs_DArTag-probe-design_f180bp.botloci", package = "BIGr") # botloci for alfalfa

alfalfa_markers_info <- system.file("alfalfa", "20201030-BI-Alfalfa_SNPs_DArTag-probe-design_snpID_lut.csv", package = "BIGr") # markers_info: CloneID/BI_markerID, Chr, Pos, Ref, Alt

alfalfa_markers_info_ChromPos <- system.file("test_madcs", "alfalfa_marker_info_ChromPos.csv", package = "BIGr") # markers_info: CloneID/BI_markerID, Chr, Pos

alfalfa_microhapDB <- system.file("alfalfa", "alfalfa_allele_db_v001.fa", package = "BIGr")

# Local potato test files (installed with the BIGr package)

potato_indel_madc <- system.file("test_madcs", "potato_indel_madc.csv", package = "BIGr")

potato_indel_iupac <- system.file("test_madcs", "potato_indel_IUPAC.csv", package = "BIGr")

potato_indel_lowercase <- system.file("test_madcs", "potato_indel_lowercase.csv", package = "BIGr")

potato_more_indels_chrompos_false <- system.file("test_madcs", "potato_more_indels_madc_ChromPosFALSE.csv", package = "BIGr")

potato_botloci <- system.file("potato", "potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_f150bp_ref_alt.botloci", package = "BIGr")

Copilot

Pull request overview

Copilot reviewed 29 out of 30 changed files in this pull request and generated 11 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-01T19:16:41Z

R/filterVCF.R

+    #Per‑marker stats
+
+    mean_depth_marker   <- rowMeans(dp_orig, na.rm = TRUE)
+    genotype_present    <- !is.na(gt_orig)


genotype_present <- !is.na(gt_orig) will treat missing genotypes encoded as "./." as present (vcfR often returns "./." strings, not NA). This inflates genotyping_rate metrics. Consider treating "./." (and possibly ".|.") as missing when computing genotype_present (e.g., !is.na(gt) & gt != './.').

Suggested change

genotype_present <- !is.na(gt_orig)

genotype_present <- !is.na(gt_orig) & gt_orig != "./." & gt_orig != ".|."

Copilot · 2026-04-01T19:16:42Z

R/check_ped.R

+    cat(paste0("\nDo you want to save the corrected pedigree as dataframe `", corrected_name, "`? (y/n): "))
+    ans <- tolower(trimws(readline()))
+    if (ans == "y") {
+      assign(corrected_name, data, envir = .GlobalEnv)
+      assign("input_ped_report", input_ped_report, envir = .GlobalEnv)
+      cat(paste0("Saved corrected pedigree as `", corrected_name, "` and report as `input_ped_report`.\n"))
+    } else {
+      cat("No corrected pedigree was saved.\n")


check_ped() calls readline() whenever verbose = TRUE, which will hang in non-interactive usage (e.g., scripts, CI) even though verbose is about printing. Guard interactive prompts with if (interactive()) (or add an explicit save_corrected argument) and avoid prompting by default.

Suggested change

cat(paste0("\nDo you want to save the corrected pedigree as dataframe `", corrected_name, "`? (y/n): "))

ans <- tolower(trimws(readline()))

if (ans == "y") {

assign(corrected_name, data, envir = .GlobalEnv)

assign("input_ped_report", input_ped_report, envir = .GlobalEnv)

cat(paste0("Saved corrected pedigree as `", corrected_name, "` and report as `input_ped_report`.\n"))

} else {

cat("No corrected pedigree was saved.\n")

if (interactive()) {

cat(paste0("\nDo you want to save the corrected pedigree as dataframe `", corrected_name, "`? (y/n): "))

ans <- tolower(trimws(readline()))

if (ans == "y") {

assign(corrected_name, data, envir = .GlobalEnv)

assign("input_ped_report", input_ped_report, envir = .GlobalEnv)

cat(paste0("Saved corrected pedigree as `", corrected_name, "` and report as `input_ped_report`.\n"))

} else {

cat("No corrected pedigree was saved.\n")

}

} else {

# Non-interactive verbose mode: save automatically without prompting

assign(corrected_name, data, envir = .GlobalEnv)

assign(report_name, input_ped_report, envir = .GlobalEnv)

Copilot · 2026-04-01T19:16:42Z

R/madc2vcf_targets.R

+
+      # REF/ALT must be extracted from probe sequences — botloci is required
+      if(is.null(botloci_file) || (!file.exists(botloci_file) && !url_exists(botloci_file)))
+        stop("get_REF_ALT = TRUE but no markers_info file with Ref and Alt columns was provided neither a botloci_file to extrat REF/ALT from probe sequences. Please provide one of the these files or set get_REF_ALT to FALSE.")


Minor typo in this error message: "extrat" -> "extract". Fixing it will also require updating the corresponding expect_error(..., regexp=...) tests that assert the full message.

Suggested change

stop("get_REF_ALT = TRUE but no markers_info file with Ref and Alt columns was provided neither a botloci_file to extrat REF/ALT from probe sequences. Please provide one of the these files or set get_REF_ALT to FALSE.")

stop("get_REF_ALT = TRUE but no markers_info file with Ref and Alt columns was provided neither a botloci_file to extract REF/ALT from probe sequences. Please provide one of the these files or set get_REF_ALT to FALSE.")

Copilot · 2026-04-01T19:16:42Z

R/utils.R

+##' Verbose Message Utility
+##'
+##' Prints a formatted verbose message with timestamp, indentation, and type label, if verbose is TRUE.
+##'
+##' @param text Character string, the message to print (supports sprintf formatting).
+##' @param verbose Logical. If TRUE, prints the message; if FALSE, suppresses output.
+##' @param level Integer, indentation level (0=header, 1=main step, 2=detail, 3=sub-detail).
+##' @param type Character string, message type (e.g., "INFO", "WARN", "ERROR"). Only shown for level 0.
+##' @param ... Additional arguments passed to sprintf for formatting.
+##'
+##' @details Use the verbose argument to control message output. Typically, pass the function's verbose parameter to vmsg.
+##'
+##' @return No return value, called for side effects.
+##' @export


The roxygen block for vmsg() starts with ##' instead of #', which roxygen2 won't parse. This will cause man/vmsg.Rd/exports to get out of sync the next time docs are regenerated. Convert these lines to standard #' roxygen comments.

Suggested change

##' Verbose Message Utility

##'

##' Prints a formatted verbose message with timestamp, indentation, and type label, if verbose is TRUE.

##'

##' @param text Character string, the message to print (supports sprintf formatting).

##' @param verbose Logical. If TRUE, prints the message; if FALSE, suppresses output.

##' @param level Integer, indentation level (0=header, 1=main step, 2=detail, 3=sub-detail).

##' @param type Character string, message type (e.g., "INFO", "WARN", "ERROR"). Only shown for level 0.

##' @param ... Additional arguments passed to sprintf for formatting.

##'

##' @details Use the verbose argument to control message output. Typically, pass the function's verbose parameter to vmsg.

##'

##' @return No return value, called for side effects.

##' @export

#' Verbose Message Utility

#'

#' Prints a formatted verbose message with timestamp, indentation, and type label, if verbose is TRUE.

#'

#' @param text Character string, the message to print (supports sprintf formatting).

#' @param verbose Logical. If TRUE, prints the message; if FALSE, suppresses output.

#' @param level Integer, indentation level (0=header, 1=main step, 2=detail, 3=sub-detail).

#' @param type Character string, message type (e.g., "INFO", "WARN", "ERROR"). Only shown for level 0.

#' @param ... Additional arguments passed to sprintf for formatting.

#'

#' @details Use the verbose argument to control message output. Typically, pass the function's verbose parameter to vmsg.

#'

#' @return No return value, called for side effects.

#' @export

Copilot · 2026-04-01T19:16:43Z

R/madc2vcf_all.R

+    other_alts_info <- cloneID_unit[idx_other,]
+    OtherTag_list <- list()
+    total <- rep(0, length(RefTag))
+    ads <- vector()


ads <- vector() initializes a zero-length vector; paste0(ads, ",", ads_temp) will remain zero-length, which can make info_mk become character(0) and break VCF row construction when "Other" alleles are processed. Initialize ads as a length-1 character (e.g., "") or build ADS via a character accumulator + paste(..., collapse=",").

Suggested change

ads <- vector()

ads <- ""

Copilot · 2026-04-01T19:16:43Z

tests/testthat/test-madc2vcf_targets.R

+  github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/"
+
+  # External alfalfa test files
+  alfalfa_madc           <- paste0(github_path, "test_madcs/alfalfa_madc.csv")
+  alfalfa_madc_wrongID   <- paste0(github_path, "test_madcs/alfalfa_madc_wrongID.csv")


These tests depend on remote files from a moving GitHub branch (.../refs/heads/long_seq/...), making the test suite non-reproducible and potentially flaky if upstream content changes. Prefer vendoring fixtures in this repo or pinning to an immutable commit SHA.

Copilot · 2026-04-01T19:16:43Z

tests/testthat/test-madc2vcf_all.R

+test_that("simu alfalfa",{
+
+  github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/"
+
+  # External alfalfa test files


This outer test_that() wraps many additional test_that() calls below (nested tests). testthat does not support nested test_that() blocks reliably. Split these into separate top-level tests (or consolidate into a single test_that() using local helpers).

Copilot · 2026-04-01T19:16:44Z

tests/testthat/test-madc2vcf_all.R

+  github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/"
+
+  # External alfalfa test files
+  alfalfa_madc           <- paste0(github_path, "test_madcs/alfalfa_madc.csv")
+  alfalfa_madc_wrongID   <- paste0(github_path, "test_madcs/alfalfa_madc_wrongID.csv")


This test suite pulls fixtures from a moving GitHub branch (.../refs/heads/long_seq/...). Even with skip_if_offline(), results can change when upstream files change, making tests flaky. Prefer vendoring the test files in this repo or pinning URLs to a commit SHA.

Copilot · 2026-04-01T19:16:44Z

tests/testthat/test-madc2vcf_multi.R

+ github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/"
+
+  # External alfalfa test files
+  alfalfa_madc           <- paste0(github_path, "test_madcs/alfalfa_madc.csv")
+  alfalfa_madc_wrongID   <- paste0(github_path, "test_madcs/alfalfa_madc_wrongID.csv")
+  alfalfa_madc_raw       <- paste0(github_path, "test_madcs/alfalfa_madc_raw.csv")       # raw DArT format (7-row header)


This test depends on remote fixtures from raw.githubusercontent.com/.../refs/heads/long_seq/..., which can change over time and make tests non-reproducible. Prefer vendoring fixtures locally or pinning to an immutable commit SHA.

Copilot · 2026-04-01T19:16:44Z

R/madc2vcf_multi.R

+#' rows/columns, and sets `n.header.rows` automatically based on whether the
+#' MADC file follows the raw DArT format (6 header rows) or the fixed allele ID
+#' format (no header rows).


The documentation claims n.header.rows is set automatically based on raw vs fixed MADC format, but the implementation always calls polyRAD::readDArTag(..., n.header.rows = 0L) and also rejects raw MADCs earlier. Update the docstring to match the actual behavior (expects fixed-AlleleID format).

Suggested change

#' rows/columns, and sets `n.header.rows` automatically based on whether the

#' MADC file follows the raw DArT format (6 header rows) or the fixed allele ID

#' format (no header rows).

#' rows/columns, and passes the data to \code{polyRAD::readDArTag} assuming a

#' fixed AlleleID MADC format with no header rows (i.e., \code{n.header.rows = 0}).

#' Raw DArT-format MADCs with six header rows are not supported by this function.

codecov · 2026-04-01T20:54:49Z

Codecov Report

❌ Patch coverage is 68.15166% with 336 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.71%. Comparing base (4f30e52) to head (b01c12b).

Files with missing lines	Patch %	Lines
R/madc2vcf_all.R	65.65%	113 Missing ⚠️
R/madc2vcf_targets.R	74.49%	63 Missing ⚠️
R/filterVCF.R	21.66%	47 Missing ⚠️
R/check_ped.R	56.09%	36 Missing ⚠️
R/check_madc_sanity.R	81.15%	26 Missing ⚠️
R/madc2vcf_multi.R	72.22%	25 Missing ⚠️
R/imputation_concordance.R	50.00%	18 Missing ⚠️
R/get_countsMADC.R	84.61%	8 Missing ⚠️

Additional details and impacted files

@@               Coverage Diff               @@
##           development      #54      +/-   ##
===============================================
- Coverage        83.05%   77.71%   -5.35%     
===============================================
  Files               19       21       +2     
  Lines             1369     2176     +807     
===============================================
+ Hits              1137     1691     +554     
- Misses             232      485     +253

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Cristianetaniguti and others added 22 commits October 3, 2025 15:11

indels support for madc2vcf_targets

778aefa

updated check_ped to save corrected dataframe and report

1b761b9

reorganized report and fixed language

743043a

bugfix - if hapDB padding is not matching report

0b97b46

added option to print plot or list to imputation_concordance

82279af

ignore DS_STore

6b81982

added option to print pre-filtering depth and genotyping rate

8205e4e

added calculation for Ho

31248e3

up version

757b01c

Merge branch 'check_ped_update' of https://github.com/Breeding-Insigh…

0934210

…t/BIGr into ped_indels_update

merge dev branches

e18b2c6

Merge branch 'development' into ped_indels_update

768ab93

opt messages

5c0b590

messages ok

9afb265

targets okay

c31118d

targets ok

5d54f0d

Potential fix for pull request finding

ee50981

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

d3a4061

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

f765c7c

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Potential fix for pull request finding

87bb1fc

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Merge branch 'ped_indels_update' into madc2vcf_all_updates

7c12d49

Merge pull request #53 from Breeding-Insight/madc2vcf_all_updates

df6fe92

Madc2vcf updates

Cristianetaniguti requested a review from Copilot March 27, 2026 13:47

Copilot started reviewing on behalf of Cristianetaniguti March 27, 2026 13:47 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

Cristianetaniguti and others added 5 commits March 27, 2026 10:33

Update R/madc2vcf_targets.R

6059c10

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update R/check_madc_sanity.R

b09b0c1

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update R/get_countsMADC.R

409dbd3

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update R/get_countsMADC.R

e6fce19

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update R/check_madc_sanity.R

669ac4e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Cristianetaniguti added 7 commits March 27, 2026 10:50

fix tests

bbfbee2

Merge branch 'ped_indels_update' of https://github.com/Breeding-Insig…

38c3564

…ht/BIGr into ped_indels_update

madc2vcf_all indels support okay

55ee61a

madc2vcf_all support indel

bf5ff4c

add support for Others

291ae8e

up version

84852da

add madc2vcf_multi

96a4ed1

Cristianetaniguti requested a review from Copilot April 1, 2026 18:39

Copilot started reviewing on behalf of Cristianetaniguti April 1, 2026 18:40 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

fix checks

cec168d

Cristianetaniguti requested a review from Copilot April 1, 2026 19:09

Copilot started reviewing on behalf of Cristianetaniguti April 1, 2026 19:10 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

Cristianetaniguti added 3 commits April 1, 2026 15:30

fix checks 2

0be2e0f

add VariantAnnotation to test env

33fc87c

ignore madc2vcf_multi tests in actions

77107ba

Cristianetaniguti added 3 commits April 2, 2026 09:00

more messages and tests

ccf9e77

bugfix

8a00c9e

update man

f2013e3

Cristianetaniguti requested a review from alex-sandercock April 2, 2026 17:19

minor version up

b01c12b

-result <- imputation_concordance(
-  reference_genos = ref,
-  imputed_genos = test,
-  snps_2_exclude = snps,
-  missing_code = 5,
-  verbose = TRUE,
-  plot = TRUE
-)
+# Minimal example data
+ref <- data.frame(
+  ID   = paste0("sample", 1:3),
+  snp1 = c(0, 1, 2),
+  snp2 = c(2, 2, 0)
+)
+test <- data.frame(
+  ID   = paste0("sample", 1:3),
+  snp1 = c(0, 1, 2),
+  snp2 = c(2, 1, 0)
+)
+# Exclude a SNP from the concordance calculation
+snps <- "snp2"
+result <- imputation_concordance(
+  reference_genos  = ref,
+  imputed_genos    = test,
+  snps_2_exclude   = snps,
+  missing_code     = NULL,
+  verbose          = TRUE,
+  plot             = FALSE
+)

		first <- all(grepl("^[A-Za-z]", sapply(pos, "[", 1)))
		second <- suppressWarnings(all(sapply(pos, function(x) as.numeric(x[2])) > 0))

-      first <- all(grepl("^[A-Za-z]", sapply(pos, "[", 1)))
-      second <- suppressWarnings(all(sapply(pos, function(x) as.numeric(x[2])) > 0))
+      first_tokens <- sapply(pos, "[", 1)
+      second_tokens <- sapply(pos, "[", 2)
+      first <- all(grepl("^chr", first_tokens, ignore.case = TRUE))
+      suppressWarnings(num_pos <- as.numeric(second_tokens))
+      second <- all(!is.na(num_pos) & num_pos > 0)

	if (!class(vcf.file) == "vcfR"){
	if (!inherits(vcf.file, "vcfR")){

	tab_counts <- paste0(RefTag + AltTag, ":", RefTag, ":", RefTag, AltTag)
	tab_counts <- paste0(RefTag + AltTag, ":", RefTag, ":", RefTag, ",", AltTag)

	report <- read.csv(madc_file)
	report <- read.csv(madc_file, check.names = FALSE)

-  github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/"
-  # External alfalfa test files
-  alfalfa_madc           <- paste0(github_path, "test_madcs/alfalfa_madc.csv")
-  alfalfa_madc_wrongID   <- paste0(github_path, "test_madcs/alfalfa_madc_wrongID.csv")
-  alfalfa_madc_raw       <- paste0(github_path, "test_madcs/alfalfa_madc_raw.csv")       # raw DArT format (7-row header)
-  alfalfa_iupac          <- paste0(github_path, "test_madcs/alfalfa_IUPAC.csv")
-  alfalfa_lowercase      <- paste0(github_path, "test_madcs/alfalfa_lowercase.csv")
-  alfalfa_botloci        <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_f180bp.botloci")          # botloci for alfalfa
-  alfalfa_markers_info   <- paste0(github_path, "alfalfa/20201030-BI-Alfalfa_SNPs_DArTag-probe-design_snpID_lut.csv") # markers_info: CloneID/BI_markerID, Chr, Pos, Ref, Alt
-  alfalfa_markers_info_ChromPos   <- paste0(github_path, "test_madcs/alfalfa_marker_info_ChromPos.csv") # markers_info: CloneID/BI_markerID, Chr, Pos
-  # External potato test files
-  potato_indel_madc                 <- paste0(github_path, "test_madcs/potato_indel_madc.csv")
-  potato_indel_iupac                <- paste0(github_path, "test_madcs/potato_indel_IUPAC.csv")
-  potato_indel_lowercase            <- paste0(github_path, "test_madcs/potato_indel_lowercase.csv")
-  potato_more_indels_chrompos_false <- paste0(github_path, "test_madcs/potato_more_indels_madc_ChromPosFALSE.csv")
-  potato_botloci                    <- paste0(github_path, "potato/potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_f150bp_ref_alt.botloci")
-  potato_markers_info               <- paste0(github_path, "potato/potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_rm1dup_snpID_lut.csv") # CloneID/BI_markerID, Chr, Pos, Ref, Alt
+  get_panelhub_test_file <- function(rel_path) {
+    system.file(file.path("extdata", "BIGapp-PanelHub", rel_path), package = "BIGr")
+  }
+  # Alfalfa test files (vendored under inst/extdata/BIGapp-PanelHub/)
+  alfalfa_madc                 <- get_panelhub_test_file(file.path("test_madcs", "alfalfa_madc.csv"))
+  alfalfa_madc_wrongID         <- get_panelhub_test_file(file.path("test_madcs", "alfalfa_madc_wrongID.csv"))
+  alfalfa_madc_raw             <- get_panelhub_test_file(file.path("test_madcs", "alfalfa_madc_raw.csv"))       # raw DArT format (7-row header)
+  alfalfa_iupac                <- get_panelhub_test_file(file.path("test_madcs", "alfalfa_IUPAC.csv"))
+  alfalfa_lowercase            <- get_panelhub_test_file(file.path("test_madcs", "alfalfa_lowercase.csv"))
+  alfalfa_botloci              <- get_panelhub_test_file(file.path("alfalfa", "20201030-BI-Alfalfa_SNPs_DArTag-probe-design_f180bp.botloci"))          # botloci for alfalfa
+  alfalfa_markers_info         <- get_panelhub_test_file(file.path("alfalfa", "20201030-BI-Alfalfa_SNPs_DArTag-probe-design_snpID_lut.csv")) # markers_info: CloneID/BI_markerID, Chr, Pos, Ref, Alt
+  alfalfa_markers_info_ChromPos <- get_panelhub_test_file(file.path("test_madcs", "alfalfa_marker_info_ChromPos.csv")) # markers_info: CloneID/BI_markerID, Chr, Pos
+  # Potato test files (vendored under inst/extdata/BIGapp-PanelHub/)
+  potato_indel_madc                 <- get_panelhub_test_file(file.path("test_madcs", "potato_indel_madc.csv"))
+  potato_indel_iupac                <- get_panelhub_test_file(file.path("test_madcs", "potato_indel_IUPAC.csv"))
+  potato_indel_lowercase            <- get_panelhub_test_file(file.path("test_madcs", "potato_indel_lowercase.csv"))
+  potato_more_indels_chrompos_false <- get_panelhub_test_file(file.path("test_madcs", "potato_more_indels_madc_ChromPosFALSE.csv"))
+  potato_botloci                    <- get_panelhub_test_file(file.path("potato", "potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_f150bp_ref_alt.botloci"))
+  potato_markers_info               <- get_panelhub_test_file(file.path("potato", "potato_dartag_v2_3915markers_rm7dupTags_6traitMarkers_rm1dup_snpID_lut.csv")) # CloneID/BI_markerID, Chr, Pos, Ref, Alt

		test_that("simu alfalfa",{

		github_path <- "https://raw.githubusercontent.com/Breeding-Insight/BIGapp-PanelHub/refs/heads/long_seq/"

	genotype_present <- !is.na(gt_orig)
	genotype_present <- !is.na(gt_orig) & gt_orig != "./." & gt_orig != ".\|."

	stop("get_REF_ALT = TRUE but no markers_info file with Ref and Alt columns was provided neither a botloci_file to extrat REF/ALT from probe sequences. Please provide one of the these files or set get_REF_ALT to FALSE.")
	stop("get_REF_ALT = TRUE but no markers_info file with Ref and Alt columns was provided neither a botloci_file to extract REF/ALT from probe sequences. Please provide one of the these files or set get_REF_ALT to FALSE.")

Conversation

Cristianetaniguti commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Updates on madc2vcf functions

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Cristianetaniguti commented Mar 27, 2026 •

edited

Loading

codecov bot commented Apr 1, 2026 •

edited

Loading