Problem
When using --region full, ITSxRust outputs both full-chain and partial extractions together. Users targeting only high-confidence full-length ITS sequences have no way to exclude partials at extraction time.
Currently, the only way to identify partial reads is by parsing the QC JSON or FASTA header coordinates and post-filtering — which is cumbersome for large datasets.
Requested in #2
Proposed solution
Add a --confident-only flag that excludes reads classified as partial from the output. Only reads with confident classification (full-chain with all four ribosomal anchors detected) would be written.
This is a lightweight change since the classification already happens internally, just skip writing partial reads when the flag is set.
Example usage
itsxrust extract \
--input reads.fastq \
--hmm F.hmm \
--region full \
--confident-only \
--output extracted
Expected behavior
- Without flag: current behavior (all reads with any detected region, including partials)
- With flag: only confident full-chain reads written to output
- QC JSON still reports all classifications for transparency
Problem
When using
--region full, ITSxRust outputs both full-chain and partial extractions together. Users targeting only high-confidence full-length ITS sequences have no way to exclude partials at extraction time.Currently, the only way to identify partial reads is by parsing the QC JSON or FASTA header coordinates and post-filtering — which is cumbersome for large datasets.
Requested in #2
Proposed solution
Add a
--confident-onlyflag that excludes reads classified aspartialfrom the output. Only reads withconfidentclassification (full-chain with all four ribosomal anchors detected) would be written.This is a lightweight change since the classification already happens internally, just skip writing partial reads when the flag is set.
Example usage
itsxrust extract \ --input reads.fastq \ --hmm F.hmm \ --region full \ --confident-only \ --output extractedExpected behavior