Skip to content

Add --confident-only flag to exclude partial extractions from output #3

@ayobi

Description

@ayobi

Problem

When using --region full, ITSxRust outputs both full-chain and partial extractions together. Users targeting only high-confidence full-length ITS sequences have no way to exclude partials at extraction time.

Currently, the only way to identify partial reads is by parsing the QC JSON or FASTA header coordinates and post-filtering — which is cumbersome for large datasets.

Requested in #2

Proposed solution

Add a --confident-only flag that excludes reads classified as partial from the output. Only reads with confident classification (full-chain with all four ribosomal anchors detected) would be written.

This is a lightweight change since the classification already happens internally, just skip writing partial reads when the flag is set.

Example usage

itsxrust extract \
    --input reads.fastq \
    --hmm F.hmm \
    --region full \
    --confident-only \
    --output extracted

Expected behavior

  • Without flag: current behavior (all reads with any detected region, including partials)
  • With flag: only confident full-chain reads written to output
  • QC JSON still reports all classifications for transparency

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions