Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ jobs:
conda run -n bit pip install -e . # the editable pip install is so test coverages make sense

- name: Check installed version
run: conda run -n bit bit-cov-analyzer -v
run: conda run -n bit bit -v

- name: Set env var so subprocesses get coverage during pytest
run: echo "COVERAGE_PROCESS_START=.coveragerc" >> $GITHUB_ENV
Expand Down
93 changes: 47 additions & 46 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@
-->


## v2.0.0 (NOT YET RELEASED)
## v2.0.0 (27-May-2026)

A lot of changes have been made recently to group and reorganize bit commands (alongside a hefty python revamp). The changelog over several past versions should be able to help you find anything you might be looking for that's been moved. But if you're having trouble finding something you used to use, please reach out and let me know! You can post an issue on this repo or reach out to me however :)
A lot of changes have been made recently to group and reorganize bit commands (alongside a hefty python revamp). I've followed suit with the rest of the world and everything is a subcommand available under `bit`. Running `bit` by itself will print out an overview of all programs/subcommands grouped by general utility. If you're having trouble finding something you used to use, please reach out and let me know! You can post an issue on this repo or reach out to me however :)

### Added
- `bit` by itself will print out an overview of available programs
- `bit-data`
- `bit` by itself will print out an overview of available programs, and it is the sole entry point into everything now
- `bit data`
- this replaces `bit-data-locations` and all the database download programs with the following subcommands:
- `locations`
- `check`
Expand All @@ -32,60 +32,61 @@ A lot of changes have been made recently to group and reorganize bit commands (a
- `go-dbs`
- `gtdb-data`
- `test-data`
- `bit-fasta` which holds subcommands as listed below
- `bit-lineage` which holds subcommands as listed below
- `bit-kraken2` which holds subcommands as listed below
- `bit-table` which holds subcommands as listed below
- `bit-go` which holds subcommands as listed below
- `bit fasta` which holds subcommands as listed below
- `bit lineage` which holds subcommands as listed below
- `bit kraken2` which holds subcommands as listed below
- `bit table` which holds subcommands as listed below
- `bit go` which holds subcommands as listed below


### Changed
- several fasta-related programs have been placed as subcommands under `bit-fasta`
- `bit-calc-gc-per-seq` and `bit-calc-gc-sliding-window` -> `bit-fasta calc-gc`
- `bit-calc-variation-in-msa` -> `bit-fasta calc-var-in-msa`
- `bit-count-bases` -> `bit-fasta count`
- `bit-extract-seqs by-coords` -> `bit-fasta extract-by-coords`
- `bit-extract-seqs by-headers` -> `bit-fasta extract-by-headers`
- `bit-extract-seqs by-primers` -> `bit-fasta extract-by-primers`
- `bit-fasta-to-bed` -> `bit-fasta to-bed`
- `bit-fasta-to-genbank` -> `bit-fasta to-genbank`
- `bit-filter-fasta-by-length` -> `bit-fasta filter-by-length`
- `bit-rename-fasta-headers` -> `bit-fasta modify-headers`
- `bit-remove-wraps` -> `bit-fasta remove-wraps`
- several fasta-related programs have been placed as subcommands under `bit fasta`
- `bit-calc-gc-per-seq` and `bit-calc-gc-sliding-window` -> `bit fasta calc-gc`
- `bit-calc-variation-in-msa` -> `bit fasta calc-var-in-msa`
- `bit-count-bases` -> `bit fasta count`
- `bit-extract-seqs by-coords` -> `bit fasta extract-by-coords`
- `bit-extract-seqs by-headers` -> `bit fasta extract-by-headers`
- `bit-extract-seqs by-primers` -> `bit fasta extract-by-primers`
- `bit-fasta-to-bed` -> `bit fasta to-bed`
- `bit-fasta-to-genbank` -> `bit fasta to-genbank`
- `bit-filter-fasta-by-length` -> `bit fasta filter-by-length`
- `bit-rename-fasta-headers` -> `bit fasta modify-headers`
- `bit-remove-wraps` -> `bit fasta remove-wraps`
- this is moderately slower now since i took it out of shell and put in into python
- if wanted, you can add the shell way as a function as found in this gist: https://gist.github.com/AstrobioMike/4054ce9ed84162f31c830bac03beda68
- kraken2/bracken-related programs have been placed as subcommands under `bit-kraken2`
- `bit-kraken2-tax-summary` -> `bit-kraken2 tax-summary`
- `bit-kraken2-tax-plots` -> `bit-kraken2 tax-plots`
- several table-related commands have been combined as subcommands under `bit-table`
- `bit-colnames` -> `bit-table colnames`
- `bit-filter-table` -> `bit-table filter`
- `bit-normalize-table` -> `bit-table normalize`
- `bit-summarize-column` -> `bit-table summarize-column`
- GO-related commands have been placed as subcommands under `bit-go`
- `bit-get-go-term-info` -> `bit-go get-term-info`
- `bit-go-summarize-annotations` -> `bit-go summarize-annotations`
- `bit-combine-go-summaries` -> `bit-go combine-summaries`
- `bit-slim-down-go-terms` -> `bit-go slim-terms`
- `bit-update-GO-dbs` -> `get-go-dbs`
- kraken2/bracken-related programs have been placed as subcommands under `bit kraken2`
- `bit-kraken2-tax-summary` -> `bit kraken2 tax-summary`
- `bit-kraken2-tax-plots` -> `bit kraken2 tax-plots`
- several table-related commands have been combined as subcommands under `bit table`
- `bit-colnames` -> `bit table colnames`
- `bit-filter-table` -> `bit table filter`
- `bit-normalize-table` -> `bit table normalize`
- `bit-summarize-column` -> `bit table summarize-column`
- GO-related commands have been placed as subcommands under `bit go`
- `bit-get-go-term-info` -> `bit go get-term-info`
- `bit-go-summarize-annotations` -> `bit go summarize-annotations`
- `bit-combine-go-summaries` -> `bit go combine-summaries`
- `bit-slim-down-go-terms` -> `bit go slim-terms`
- database helpers for setting/checking locations has been reorganized
- `bit-data-locations` -> `bit-data locations`
- `bit-data-locations` -> `bit data locations`
- database helpers for downloading/updating them have been reorganized
- `get-ncbi-assembly-data` -> `bit-data get ncbi-assembly-data`
- `get-ncbi-tax-data` -> `bit-data get ncbi-tax-data`
- `get-go-dbs` -> `bit-data get go-dbs`
- `get-gtdb-data` -> `bit-data get gtdb-data`
- `bit-get-test-data` -> `bit-data get test-data`
- `get-ncbi-assembly-data` -> `bit data get ncbi-assembly-data`
- `get-ncbi-tax-data` -> `bit data get ncbi-tax-data`
- `get-go-dbs` -> `bit data get go-dbs`
- `get-gtdb-data` -> `bit data get gtdb-data`
- `bit-update-GO-dbs` -> `bit data get-go-dbs`
- `bit-get-test-data` -> `bit data get test-data`
- lineage-related helpers have been reorganized
- `bit-get-lineage-from-taxids` -> `bit-lineage from-taxids`
- `bit-lineage-to-tsv` -> `bit-lineage to-tsv`
- `bit-filter-kofamscan-results` -> `bit-filter-ko-results`
- `bit-get-accessions-from-gtdb` -> `bit-get-accs-from-gtdb`
- `bit-get-lineage-from-taxids` -> `bit lineage from-taxids`
- `bit-lineage-to-tsv` -> `bit lineage to-tsv`
- `bit-filter-kofamscan-results` -> `bit filter-ko-results`
- `bit-get-accessions-from-gtdb` -> `bit get-accs-from-gtdb`
- there are more programs that used to be `bit-` something, but now are in subcommands under `bit`. Run `bit` by itself to find them


### Removed
- `bit-version` has been removed, each program has its own `-v|--version` flag now
- `bit-dedupe-fasta-headers` has been removed entirely, as it's purpose can be achieved with `bit-fasta modify-headers`
- `bit-dedupe-fasta-headers` has been removed entirely, as it's purpose can be achieved with `bit fasta modify-headers`
- `bit-check-fastq-for-dup-headers` removed due to only super-niche utility
- `bit-parse-fastq-by-headers` removed due to only niche utility, gist is here: https://gist.github.com/AstrobioMike/785265b43847e7cb10089d102573b575

Expand Down
44 changes: 22 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

* [**Overview**](#overview)
* [**Programs**](#programs)
* [GTDB/NCBI-related](#gtdbncbi-related)
* [NCBI/GTDB-related](#ncbigtdb-related)
* [Coverage/mapping-related](#coveragemapping-related)
* [Sequence manipulation / read generation](#sequence-manipulation--read-generation)
* [Sequence searching](#sequence-searching)
Expand Down Expand Up @@ -53,38 +53,38 @@ conda activate bit

You can see an overview of available programs like below by running `bit` by itself at the command line once it's installed.

#### GTDB/NCBI-related
#### NCBI/GTDB-related

| Program | Purpose |
| ------- | ------- |
| `bit-dl-ncbi-assemblies` | download NCBI assemblies in different formats given input accessions |
| `bit-get-accs-from-gtdb` | search the [GTDB](https://gtdb.ecogenomic.org/) by taxonomy and retrieve NCBI accessions |
| `bit dl-ncbi-assemblies` | download NCBI assemblies in different formats given input accessions |
| `bit get-accs-from-gtdb` | search the [GTDB](https://gtdb.ecogenomic.org/) by taxonomy and retrieve NCBI accessions |

---

#### Coverage/mapping-related

| Program | Purpose |
| ------- | ------- |
| `bit-cov-analyzer` | analyze coverage patterns from a bam + reference fasta to identify regions of relatively higher or lower coverage |
| `bit-cov-stats` | get detection, coverage, and mean percent ID for single or multiple references given fasta(s) and a bam file |
| `bit-mapped-reads-pid` | get percent ID information for mapped reads in a bam file |
| `bit cov-analyzer` | analyze coverage patterns from a bam + reference fasta to identify regions of relatively higher or lower coverage |
| `bit cov-stats` | get detection, coverage, and mean percent ID for single or multiple references given fasta(s) and a bam file |
| `bit mapped-reads-pid` | get percent ID information for mapped reads in a bam file |

---

#### Sequence manipulation / read generation

| Program | Purpose |
| ------- | ------- |
| `bit-gen-reads` | generate reads from fasta files |
| `bit-mutate-seqs` | introduce point mutations (substitutions/indels) into nucleotide or amino-acid fasta files |
| `bit-add-insertion` | add insertions into nucleotide or amino-acid fasta sequences |
| `bit gen-reads` | generate reads from fasta files |
| `bit mutate-seqs` | introduce point mutations (substitutions/indels) into nucleotide or amino-acid fasta files |
| `bit add-insertion` | add insertions into nucleotide or amino-acid fasta sequences |

---

#### Sequence searching

##### Program: `bit-ez-screen`
##### Program: `bit ez-screen`
| Subcommand | Purpose |
| ------- | ------- |
| `assembly` | runs blast-based screening of targets in assemblies |
Expand All @@ -94,7 +94,7 @@ You can see an overview of available programs like below by running `bit` by its

#### Fasta utilities

##### Program: `bit-fasta`
##### Program: `bit fasta`
| Subcommand | Purpose |
| ---------- | ------- |
| `calc-gc` | calculate GC content per sequence or for the full file |
Expand All @@ -115,15 +115,15 @@ You can see an overview of available programs like below by running `bit` by its

| Program | Purpose |
| ------- | ------- |
| `bit-assemble` | simple wrapper for assembly with optional quality trimming and normalization |
| `bit-summarize-assembly` | quickly summarize nucleotide assemblies |
| `bit assemble` | simple wrapper for assembly with optional quality trimming and normalization |
| `bit summarize-assembly` | quickly summarize nucleotide assemblies |

---


#### GenBank-format utilities

##### Program: `bit-genbank`
##### Program: `bit genbank`
| Subcommand | Purpose |
| ---------- | ------- |
| `to-AA-seqs` | extract amino acid sequences |
Expand All @@ -135,13 +135,13 @@ You can see an overview of available programs like below by running `bit` by its

#### Taxonomy and lineage helpers

##### Program: `bit-kraken2`
##### Program: `bit kraken2`
| Subcommand | Purpose |
| ---------- | ------- |
| `tax-plots` | generate standard taxonomy barplots from kraken2/bracken outputs |
| `tax-summary` | generate summary tables from kraken2/bracken outputs |

##### Program: `bit-lineage`
##### Program: `bit lineage`
| Subcommand | Purpose |
| ---------- | ------- |
| `from-taxids` | get full lineage info from a list of NCBI taxon IDs |
Expand All @@ -152,7 +152,7 @@ You can see an overview of available programs like below by running `bit` by its

#### Table utilities

##### Program: `bit-table`
##### Program: `bit table`
| Subcommand | Purpose |
| ---------- | ------- |
| `colnames` | print column names with numbers (handy for `cut`/`awk`) |
Expand All @@ -166,10 +166,10 @@ You can see an overview of available programs like below by running `bit` by its

| Program | Purpose |
| ------- | ------- |
| `bit-filter-ko-results` | filter [KOFamScan](https://github.com/takaram/kofam_scan) results |
| `bit filter-ko-results` | filter [KOFamScan](https://github.com/takaram/kofam_scan) results |


##### Program: `bit-go`
##### Program: `bit go`
Subcommand | Purpose |
| ---------- | ------- |
| `get-term-info` | look up GO term info |
Expand All @@ -181,7 +181,7 @@ Subcommand | Purpose |

#### iTOL-helpers

##### Program: `bit-itol`
##### Program: `bit itol`
| Subcommand | Purpose |
| ---------- | ------- |
| `binary-dataset` | generate a binary dataset annotation file |
Expand All @@ -193,7 +193,7 @@ Subcommand | Purpose |

#### bit-data management

##### Program: `bit-data`
##### Program: `bit data`
| Subcommand | Purpose |
| ---------- | ------- |
| `get` | download or update bit-utilized reference databases, or grab test data |
Expand Down
22 changes: 15 additions & 7 deletions bit/cli/add_insertion.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,26 @@
from bit.modules.general import check_files_are_found, report_message, notify_premature_exit


def build_parser():
def build_parser(parent_subparsers=None):

desc = """
This script is for adding an insertion sequence to an input fasta.
"""

parser = argparse.ArgumentParser(
description=desc,
epilog="Ex. usage: `bit-add-insertion -i input.fasta -I insertion-sequence.fasta -o output.fasta`",
formatter_class=CustomRichHelpFormatter,
add_help=False
)
if parent_subparsers is not None:
parser = parent_subparsers.add_parser(
"add-insertion",
description=desc,
formatter_class=CustomRichHelpFormatter,
add_help=False,
)
else:
parser = argparse.ArgumentParser(
description=desc,
epilog="Ex. usage: `bit add-insertion -i input.fasta -I insertion-sequence.fasta -o output.fasta`",
formatter_class=CustomRichHelpFormatter,
add_help=False
)

required = parser.add_argument_group('Required Parameters')
optional = parser.add_argument_group('Optional Parameters')
Expand Down
22 changes: 15 additions & 7 deletions bit/cli/assemble.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
RawTextRichHelpFormatter.group_name_formatter = lambda name: "Usage" if name.lower() == "usage" else name


def build_parser():
def build_parser(parent_subparsers=None):

raw_desc = (
"This program runs an assembly workflow with optional QC and digital normalization "
Expand All @@ -22,12 +22,20 @@ def build_parser():

desc = wrap_help(raw_desc, 4)

parser = argparse.ArgumentParser(
description=desc,
epilog="Ex. usage: `bit-assemble -1 R1.fastq.gz -2 R2.fastq.gz` or `bit-assemble -r reads-dir/`",
formatter_class=RawTextRichHelpFormatter,
add_help=False
)
if parent_subparsers is not None:
parser = parent_subparsers.add_parser(
"assemble",
description=desc,
formatter_class=RawTextRichHelpFormatter,
add_help=False,
)
else:
parser = argparse.ArgumentParser(
description=desc,
epilog="Ex. usage: `bit assemble -1 R1.fastq.gz -2 R2.fastq.gz` or `bit assemble -r reads-dir/`",
formatter_class=RawTextRichHelpFormatter,
add_help=False
)

required = parser.add_argument_group("Required Parameters (choose one input method)")
general = parser.add_argument_group("General Parameters")
Expand Down
Loading
Loading