diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml index f233a54..e2ccf4c 100644 --- a/.github/workflows/ci.yaml +++ b/.github/workflows/ci.yaml @@ -43,7 +43,7 @@ jobs: conda run -n bit pip install -e . # the editable pip install is so test coverages make sense - name: Check installed version - run: conda run -n bit bit-cov-analyzer -v + run: conda run -n bit bit -v - name: Set env var so subprocesses get coverage during pytest run: echo "COVERAGE_PROCESS_START=.coveragerc" >> $GITHUB_ENV diff --git a/CHANGELOG.md b/CHANGELOG.md index 1e55c5e..eacc8eb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -15,13 +15,13 @@ --> -## v2.0.0 (NOT YET RELEASED) +## v2.0.0 (27-May-2026) -A lot of changes have been made recently to group and reorganize bit commands (alongside a hefty python revamp). The changelog over several past versions should be able to help you find anything you might be looking for that's been moved. But if you're having trouble finding something you used to use, please reach out and let me know! You can post an issue on this repo or reach out to me however :) +A lot of changes have been made recently to group and reorganize bit commands (alongside a hefty python revamp). I've followed suit with the rest of the world and everything is a subcommand available under `bit`. Running `bit` by itself will print out an overview of all programs/subcommands grouped by general utility. If you're having trouble finding something you used to use, please reach out and let me know! You can post an issue on this repo or reach out to me however :) ### Added -- `bit` by itself will print out an overview of available programs -- `bit-data` +- `bit` by itself will print out an overview of available programs, and it is the sole entry point into everything now +- `bit data` - this replaces `bit-data-locations` and all the database download programs with the following subcommands: - `locations` - `check` @@ -32,60 +32,61 @@ A lot of changes have been made recently to group and reorganize bit commands (a - `go-dbs` - `gtdb-data` - `test-data` -- `bit-fasta` which holds subcommands as listed below -- `bit-lineage` which holds subcommands as listed below -- `bit-kraken2` which holds subcommands as listed below -- `bit-table` which holds subcommands as listed below -- `bit-go` which holds subcommands as listed below +- `bit fasta` which holds subcommands as listed below +- `bit lineage` which holds subcommands as listed below +- `bit kraken2` which holds subcommands as listed below +- `bit table` which holds subcommands as listed below +- `bit go` which holds subcommands as listed below ### Changed -- several fasta-related programs have been placed as subcommands under `bit-fasta` - - `bit-calc-gc-per-seq` and `bit-calc-gc-sliding-window` -> `bit-fasta calc-gc` - - `bit-calc-variation-in-msa` -> `bit-fasta calc-var-in-msa` - - `bit-count-bases` -> `bit-fasta count` - - `bit-extract-seqs by-coords` -> `bit-fasta extract-by-coords` - - `bit-extract-seqs by-headers` -> `bit-fasta extract-by-headers` - - `bit-extract-seqs by-primers` -> `bit-fasta extract-by-primers` - - `bit-fasta-to-bed` -> `bit-fasta to-bed` - - `bit-fasta-to-genbank` -> `bit-fasta to-genbank` - - `bit-filter-fasta-by-length` -> `bit-fasta filter-by-length` - - `bit-rename-fasta-headers` -> `bit-fasta modify-headers` - - `bit-remove-wraps` -> `bit-fasta remove-wraps` +- several fasta-related programs have been placed as subcommands under `bit fasta` + - `bit-calc-gc-per-seq` and `bit-calc-gc-sliding-window` -> `bit fasta calc-gc` + - `bit-calc-variation-in-msa` -> `bit fasta calc-var-in-msa` + - `bit-count-bases` -> `bit fasta count` + - `bit-extract-seqs by-coords` -> `bit fasta extract-by-coords` + - `bit-extract-seqs by-headers` -> `bit fasta extract-by-headers` + - `bit-extract-seqs by-primers` -> `bit fasta extract-by-primers` + - `bit-fasta-to-bed` -> `bit fasta to-bed` + - `bit-fasta-to-genbank` -> `bit fasta to-genbank` + - `bit-filter-fasta-by-length` -> `bit fasta filter-by-length` + - `bit-rename-fasta-headers` -> `bit fasta modify-headers` + - `bit-remove-wraps` -> `bit fasta remove-wraps` - this is moderately slower now since i took it out of shell and put in into python - if wanted, you can add the shell way as a function as found in this gist: https://gist.github.com/AstrobioMike/4054ce9ed84162f31c830bac03beda68 -- kraken2/bracken-related programs have been placed as subcommands under `bit-kraken2` - - `bit-kraken2-tax-summary` -> `bit-kraken2 tax-summary` - - `bit-kraken2-tax-plots` -> `bit-kraken2 tax-plots` -- several table-related commands have been combined as subcommands under `bit-table` - - `bit-colnames` -> `bit-table colnames` - - `bit-filter-table` -> `bit-table filter` - - `bit-normalize-table` -> `bit-table normalize` - - `bit-summarize-column` -> `bit-table summarize-column` -- GO-related commands have been placed as subcommands under `bit-go` - - `bit-get-go-term-info` -> `bit-go get-term-info` - - `bit-go-summarize-annotations` -> `bit-go summarize-annotations` - - `bit-combine-go-summaries` -> `bit-go combine-summaries` - - `bit-slim-down-go-terms` -> `bit-go slim-terms` - - `bit-update-GO-dbs` -> `get-go-dbs` +- kraken2/bracken-related programs have been placed as subcommands under `bit kraken2` + - `bit-kraken2-tax-summary` -> `bit kraken2 tax-summary` + - `bit-kraken2-tax-plots` -> `bit kraken2 tax-plots` +- several table-related commands have been combined as subcommands under `bit table` + - `bit-colnames` -> `bit table colnames` + - `bit-filter-table` -> `bit table filter` + - `bit-normalize-table` -> `bit table normalize` + - `bit-summarize-column` -> `bit table summarize-column` +- GO-related commands have been placed as subcommands under `bit go` + - `bit-get-go-term-info` -> `bit go get-term-info` + - `bit-go-summarize-annotations` -> `bit go summarize-annotations` + - `bit-combine-go-summaries` -> `bit go combine-summaries` + - `bit-slim-down-go-terms` -> `bit go slim-terms` - database helpers for setting/checking locations has been reorganized - - `bit-data-locations` -> `bit-data locations` + - `bit-data-locations` -> `bit data locations` - database helpers for downloading/updating them have been reorganized - - `get-ncbi-assembly-data` -> `bit-data get ncbi-assembly-data` - - `get-ncbi-tax-data` -> `bit-data get ncbi-tax-data` - - `get-go-dbs` -> `bit-data get go-dbs` - - `get-gtdb-data` -> `bit-data get gtdb-data` - - `bit-get-test-data` -> `bit-data get test-data` + - `get-ncbi-assembly-data` -> `bit data get ncbi-assembly-data` + - `get-ncbi-tax-data` -> `bit data get ncbi-tax-data` + - `get-go-dbs` -> `bit data get go-dbs` + - `get-gtdb-data` -> `bit data get gtdb-data` + - `bit-update-GO-dbs` -> `bit data get-go-dbs` + - `bit-get-test-data` -> `bit data get test-data` - lineage-related helpers have been reorganized - - `bit-get-lineage-from-taxids` -> `bit-lineage from-taxids` - - `bit-lineage-to-tsv` -> `bit-lineage to-tsv` -- `bit-filter-kofamscan-results` -> `bit-filter-ko-results` -- `bit-get-accessions-from-gtdb` -> `bit-get-accs-from-gtdb` + - `bit-get-lineage-from-taxids` -> `bit lineage from-taxids` + - `bit-lineage-to-tsv` -> `bit lineage to-tsv` +- `bit-filter-kofamscan-results` -> `bit filter-ko-results` +- `bit-get-accessions-from-gtdb` -> `bit get-accs-from-gtdb` +- there are more programs that used to be `bit-` something, but now are in subcommands under `bit`. Run `bit` by itself to find them ### Removed - `bit-version` has been removed, each program has its own `-v|--version` flag now -- `bit-dedupe-fasta-headers` has been removed entirely, as it's purpose can be achieved with `bit-fasta modify-headers` +- `bit-dedupe-fasta-headers` has been removed entirely, as it's purpose can be achieved with `bit fasta modify-headers` - `bit-check-fastq-for-dup-headers` removed due to only super-niche utility - `bit-parse-fastq-by-headers` removed due to only niche utility, gist is here: https://gist.github.com/AstrobioMike/785265b43847e7cb10089d102573b575 diff --git a/README.md b/README.md index 894c2dd..6f642c7 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ * [**Overview**](#overview) * [**Programs**](#programs) - * [GTDB/NCBI-related](#gtdbncbi-related) + * [NCBI/GTDB-related](#ncbigtdb-related) * [Coverage/mapping-related](#coveragemapping-related) * [Sequence manipulation / read generation](#sequence-manipulation--read-generation) * [Sequence searching](#sequence-searching) @@ -53,12 +53,12 @@ conda activate bit You can see an overview of available programs like below by running `bit` by itself at the command line once it's installed. -#### GTDB/NCBI-related +#### NCBI/GTDB-related | Program | Purpose | | ------- | ------- | -| `bit-dl-ncbi-assemblies` | download NCBI assemblies in different formats given input accessions | -| `bit-get-accs-from-gtdb` | search the [GTDB](https://gtdb.ecogenomic.org/) by taxonomy and retrieve NCBI accessions | +| `bit dl-ncbi-assemblies` | download NCBI assemblies in different formats given input accessions | +| `bit get-accs-from-gtdb` | search the [GTDB](https://gtdb.ecogenomic.org/) by taxonomy and retrieve NCBI accessions | --- @@ -66,9 +66,9 @@ You can see an overview of available programs like below by running `bit` by its | Program | Purpose | | ------- | ------- | -| `bit-cov-analyzer` | analyze coverage patterns from a bam + reference fasta to identify regions of relatively higher or lower coverage | -| `bit-cov-stats` | get detection, coverage, and mean percent ID for single or multiple references given fasta(s) and a bam file | -| `bit-mapped-reads-pid` | get percent ID information for mapped reads in a bam file | +| `bit cov-analyzer` | analyze coverage patterns from a bam + reference fasta to identify regions of relatively higher or lower coverage | +| `bit cov-stats` | get detection, coverage, and mean percent ID for single or multiple references given fasta(s) and a bam file | +| `bit mapped-reads-pid` | get percent ID information for mapped reads in a bam file | --- @@ -76,15 +76,15 @@ You can see an overview of available programs like below by running `bit` by its | Program | Purpose | | ------- | ------- | -| `bit-gen-reads` | generate reads from fasta files | -| `bit-mutate-seqs` | introduce point mutations (substitutions/indels) into nucleotide or amino-acid fasta files | -| `bit-add-insertion` | add insertions into nucleotide or amino-acid fasta sequences | +| `bit gen-reads` | generate reads from fasta files | +| `bit mutate-seqs` | introduce point mutations (substitutions/indels) into nucleotide or amino-acid fasta files | +| `bit add-insertion` | add insertions into nucleotide or amino-acid fasta sequences | --- #### Sequence searching -##### Program: `bit-ez-screen` +##### Program: `bit ez-screen` | Subcommand | Purpose | | ------- | ------- | | `assembly` | runs blast-based screening of targets in assemblies | @@ -94,7 +94,7 @@ You can see an overview of available programs like below by running `bit` by its #### Fasta utilities -##### Program: `bit-fasta` +##### Program: `bit fasta` | Subcommand | Purpose | | ---------- | ------- | | `calc-gc` | calculate GC content per sequence or for the full file | @@ -115,15 +115,15 @@ You can see an overview of available programs like below by running `bit` by its | Program | Purpose | | ------- | ------- | -| `bit-assemble` | simple wrapper for assembly with optional quality trimming and normalization | -| `bit-summarize-assembly` | quickly summarize nucleotide assemblies | +| `bit assemble` | simple wrapper for assembly with optional quality trimming and normalization | +| `bit summarize-assembly` | quickly summarize nucleotide assemblies | --- #### GenBank-format utilities -##### Program: `bit-genbank` +##### Program: `bit genbank` | Subcommand | Purpose | | ---------- | ------- | | `to-AA-seqs` | extract amino acid sequences | @@ -135,13 +135,13 @@ You can see an overview of available programs like below by running `bit` by its #### Taxonomy and lineage helpers -##### Program: `bit-kraken2` +##### Program: `bit kraken2` | Subcommand | Purpose | | ---------- | ------- | | `tax-plots` | generate standard taxonomy barplots from kraken2/bracken outputs | | `tax-summary` | generate summary tables from kraken2/bracken outputs | -##### Program: `bit-lineage` +##### Program: `bit lineage` | Subcommand | Purpose | | ---------- | ------- | | `from-taxids` | get full lineage info from a list of NCBI taxon IDs | @@ -152,7 +152,7 @@ You can see an overview of available programs like below by running `bit` by its #### Table utilities -##### Program: `bit-table` +##### Program: `bit table` | Subcommand | Purpose | | ---------- | ------- | | `colnames` | print column names with numbers (handy for `cut`/`awk`) | @@ -166,10 +166,10 @@ You can see an overview of available programs like below by running `bit` by its | Program | Purpose | | ------- | ------- | -| `bit-filter-ko-results` | filter [KOFamScan](https://github.com/takaram/kofam_scan) results | +| `bit filter-ko-results` | filter [KOFamScan](https://github.com/takaram/kofam_scan) results | -##### Program: `bit-go` +##### Program: `bit go` Subcommand | Purpose | | ---------- | ------- | | `get-term-info` | look up GO term info | @@ -181,7 +181,7 @@ Subcommand | Purpose | #### iTOL-helpers -##### Program: `bit-itol` +##### Program: `bit itol` | Subcommand | Purpose | | ---------- | ------- | | `binary-dataset` | generate a binary dataset annotation file | @@ -193,7 +193,7 @@ Subcommand | Purpose | #### bit-data management -##### Program: `bit-data` +##### Program: `bit data` | Subcommand | Purpose | | ---------- | ------- | | `get` | download or update bit-utilized reference databases, or grab test data | diff --git a/bit/cli/add_insertion.py b/bit/cli/add_insertion.py index f5be42d..8ecea28 100755 --- a/bit/cli/add_insertion.py +++ b/bit/cli/add_insertion.py @@ -4,18 +4,26 @@ from bit.modules.general import check_files_are_found, report_message, notify_premature_exit -def build_parser(): +def build_parser(parent_subparsers=None): desc = """ This script is for adding an insertion sequence to an input fasta. """ - parser = argparse.ArgumentParser( - description=desc, - epilog="Ex. usage: `bit-add-insertion -i input.fasta -I insertion-sequence.fasta -o output.fasta`", - formatter_class=CustomRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "add-insertion", + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + epilog="Ex. usage: `bit add-insertion -i input.fasta -I insertion-sequence.fasta -o output.fasta`", + formatter_class=CustomRichHelpFormatter, + add_help=False + ) required = parser.add_argument_group('Required Parameters') optional = parser.add_argument_group('Optional Parameters') diff --git a/bit/cli/assemble.py b/bit/cli/assemble.py index 04a2153..da54b62 100644 --- a/bit/cli/assemble.py +++ b/bit/cli/assemble.py @@ -13,7 +13,7 @@ RawTextRichHelpFormatter.group_name_formatter = lambda name: "Usage" if name.lower() == "usage" else name -def build_parser(): +def build_parser(parent_subparsers=None): raw_desc = ( "This program runs an assembly workflow with optional QC and digital normalization " @@ -22,12 +22,20 @@ def build_parser(): desc = wrap_help(raw_desc, 4) - parser = argparse.ArgumentParser( - description=desc, - epilog="Ex. usage: `bit-assemble -1 R1.fastq.gz -2 R2.fastq.gz` or `bit-assemble -r reads-dir/`", - formatter_class=RawTextRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "assemble", + description=desc, + formatter_class=RawTextRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + epilog="Ex. usage: `bit assemble -1 R1.fastq.gz -2 R2.fastq.gz` or `bit assemble -r reads-dir/`", + formatter_class=RawTextRichHelpFormatter, + add_help=False + ) required = parser.add_argument_group("Required Parameters (choose one input method)") general = parser.add_argument_group("General Parameters") diff --git a/bit/cli/bit.py b/bit/cli/bit.py index 5416e7d..f36eb8b 100644 --- a/bit/cli/bit.py +++ b/bit/cli/bit.py @@ -1,18 +1,44 @@ import sys import argparse +import importlib from bit.cli.common import add_help, add_version_arg, CustomRichHelpFormatter +SUBCOMMAND_MAP = { + "dl-ncbi-assemblies": "bit.cli.dl_ncbi_assemblies", + "get-accs-from-gtdb": "bit.cli.get_accessions_from_gtdb", + "cov-analyzer": "bit.cli.cov_analyzer", + "cov-stats": "bit.cli.cov_stats", + "mapped-reads-pid": "bit.cli.mapped_reads_pid", + "gen-reads": "bit.cli.gen_reads", + "mutate-seqs": "bit.cli.mutate_seqs", + "add-insertion": "bit.cli.add_insertion", + "ez-screen": "bit.cli.ez_screen", + "fasta": "bit.cli.fasta", + "assemble": "bit.cli.assemble", + "summarize-assembly": "bit.cli.summarize_assembly", + "genbank": "bit.cli.genbank", + "kraken2": "bit.cli.kraken2", + "lineage": "bit.cli.lineage", + "table": "bit.cli.table", + "filter-ko-results": "bit.cli.filter_kofamscan_results", + "go": "bit.cli.go", + "itol": "bit.cli.itol", + "get-workflow": "bit.cli.get_workflow", + "data": "bit.cli.data", +} + + PROGRAM_GROUPS = [ { - "title": "GTDB/NCBI-related", + "title": "NCBI/GTDB-related", "programs": [ { - "name": "bit-dl-ncbi-assemblies", + "name": "dl-ncbi-assemblies", "desc": "download NCBI assemblies in different formats given input accessions", }, { - "name": "bit-get-accs-from-gtdb", + "name": "get-accs-from-gtdb", "desc": "search the GTDB by taxonomy and retrieve NCBI accessions", }, ], @@ -21,15 +47,15 @@ "title": "Coverage/mapping-related", "programs": [ { - "name": "bit-cov-analyzer", + "name": "cov-analyzer", "desc": "analyze coverage patterns from a bam + reference fasta to identify regions of relatively higher or lower coverage", }, { - "name": "bit-cov-stats", + "name": "cov-stats", "desc": "get detection, coverage, and mean percent ID for single or multiple references given fasta(s) and a bam file", }, { - "name": "bit-mapped-reads-pid", + "name": "mapped-reads-pid", "desc": "get percent ID information for mapped reads in a bam file", }, ], @@ -38,15 +64,15 @@ "title": "Sequence manipulation / read generation", "programs": [ { - "name": "bit-gen-reads", + "name": "gen-reads", "desc": "generate reads from fasta files", }, { - "name": "bit-mutate-seqs", + "name": "mutate-seqs", "desc": "introduce point mutations (substitutions/indels) into nucleotide or amino-acid fasta files", }, { - "name": "bit-add-insertion", + "name": "add-insertion", "desc": "add insertions into nucleotide or amino-acid fasta sequences", }, ], @@ -55,7 +81,7 @@ "title": "Sequence searching", "programs": [ { - "name": "bit-ez-screen", + "name": "ez-screen", "desc": "", "subcommands": [ ("assembly", "run blast-based screening of targets in assemblies"), @@ -68,7 +94,7 @@ "title": "Fasta utilities", "programs": [ { - "name": "bit-fasta", + "name": "fasta", "desc": "", "subcommands": [ ("calc-gc", "calculate GC content per sequence or for the full file"), @@ -90,11 +116,11 @@ "title": "Assembly-related", "programs": [ { - "name": "bit-assemble", + "name": "assemble", "desc": "simple wrapper for assembly with optional quality trimming and normalization", }, { - "name": "bit-summarize-assembly", + "name": "summarize-assembly", "desc": "quickly summarize nucleotide assemblies", }, ], @@ -103,13 +129,13 @@ "title": "GenBank-format utilities", "programs": [ { - "name": "bit-genbank", + "name": "genbank", "desc": "", "subcommands": [ - ("to-fasta", "extract nucleotide sequences"), ("to-AA-seqs", "extract amino acid sequences"), ("to-cds-tsv", "extract CDS info to a TSV"), ("to-cds-seqs","extract CDS nucleotide sequences"), + ("to-fasta", "extract nucleotide sequences"), ], }, ], @@ -118,7 +144,7 @@ "title": "Taxonomy and lineage helpers", "programs": [ { - "name": "bit-kraken2", + "name": "kraken2", "desc": "summarize and visualize kraken2/bracken outputs", "subcommands": [ ("tax-summary","generate summary tables from kraken2/bracken outputs"), @@ -126,7 +152,7 @@ ], }, { - "name": "bit-lineage", + "name": "lineage", "desc": "", "subcommands": [ ("from-taxids","get full lineage info from a list of NCBI taxon IDs"), @@ -139,7 +165,7 @@ "title": "Table utilities", "programs": [ { - "name": "bit-table", + "name": "table", "desc": "", "subcommands": [ ("colnames", "print column names with numbers (handy for cut/awk)"), @@ -154,11 +180,11 @@ "title": "Functional-annotation helpers", "programs": [ { - "name": "bit-filter-ko-results", + "name": "filter-ko-results", "desc": "filter KOFamScan results", }, { - "name": "bit-go", + "name": "go", "desc": "", "subcommands": [ ("get-term-info", "look up GO term info"), @@ -173,7 +199,7 @@ "title": "iTOL helpers", "programs": [ { - "name": "bit-itol", + "name": "itol", "desc": "", "subcommands": [ ("binary-dataset","generate a binary dataset annotation file"), @@ -188,7 +214,7 @@ "title": "Workflows", "programs": [ { - "name": "bit-get-workflow", + "name": "get-workflow", "desc": "download a bit-packaged workflow (available: sra-download, genome-summarize, metagenomics)", }, ], @@ -197,7 +223,7 @@ "title": "bit-data management", "programs": [ { - "name": "bit-data", + "name": "data", "desc": "", "subcommands": [ ("get", "download/update bit-utilized databases or get test data"), @@ -210,6 +236,7 @@ def print_overview(): + from rich.console import Console # type: ignore from rich.table import Table # type: ignore from importlib.metadata import version @@ -222,7 +249,7 @@ def print_overview(): console.print(f"{'':>33}bit [green]{ver}[/green]") console.print(f"{'':>25}github.com/AstrobioMike/bit") console.print() - console.print(f"{'':>28}OVERVIEW OF PROGRAMS") + console.print(f"{'':>27}OVERVIEW OF SUBCOMMANDS") console.print() # global name-column width — keeps description column aligned across all groups @@ -279,6 +306,26 @@ def print_overview(): +def _suppress_help_version_on_group_parsers(parser): + """ + Suppress -h/--help/-v/--version from argcomplete on parsers that have + subparsers, so TAB after a group shows only subcommand names. + Leaf-level parsers are left untouched so their flags still appear + without requiring a '-' prefix. + """ + for action in parser._actions: + if isinstance(action, argparse._SubParsersAction): + for a in parser._actions: + if hasattr(a, 'option_strings') and any( + s in ('-h', '--help', '-v', '--version') + for s in a.option_strings + ): + a.help = argparse.SUPPRESS + for sub_parser in action.choices.values(): + _suppress_help_version_on_group_parsers(sub_parser) + break + + def build_parser(): desc = """ @@ -294,6 +341,32 @@ def build_parser(): add_help(parser) add_version_arg(parser) + subparsers = parser.add_subparsers(dest='subcommand') + + # Lazy-load modules during tab completion so only the one relevant module + # is imported per keypress instead of all 21 at once. + import os + comp_line = os.environ.get('COMP_LINE', '') + if comp_line: + words = comp_line.split() + if len(words) >= 2 and words[1] in SUBCOMMAND_MAP: + # User is completing args/flags for a specific subcommand — only + # import that one module. + module = importlib.import_module(SUBCOMMAND_MAP[words[1]]) + module.build_parser(parent_subparsers=subparsers) + else: + # User is still completing the subcommand name itself — lightweight + # stubs are all argcomplete needs to offer the names. + for name in SUBCOMMAND_MAP: + subparsers.add_parser(name, add_help=False) + else: + # Normal (non-completion) invocation — build the full tree. + for module_path in SUBCOMMAND_MAP.values(): + module = importlib.import_module(module_path) + module.build_parser(parent_subparsers=subparsers) + + _suppress_help_version_on_group_parsers(parser) + return parser @@ -301,9 +374,32 @@ def main(): parser = build_parser() + try: + import argcomplete # type: ignore + argcomplete.autocomplete(parser) + except ImportError: + pass + # print the overview when called with no arguments or with -h/--help if len(sys.argv) == 1 or sys.argv[1] in ("-h", "--help"): print_overview() sys.exit(0) - args = parser.parse_args() + subcommand = sys.argv[1] + + # handle -v/--version before subcommand dispatch + if subcommand in ("-v", "--version"): + parser.parse_args() + return + + if subcommand not in SUBCOMMAND_MAP: + from rich.console import Console # type: ignore + Console(stderr=True).print(f"\n [yellow]Unknown subcommand:[/yellow] [cyan]{subcommand}[/cyan]\n") + Console(stderr=True).print(f" Run [cyan]bit[/cyan] by itself to see available subcommands.\n") + # print_overview() + sys.exit(1) + + # rewrite argv so the target module's parser sees the right program name + sys.argv = [f"bit {subcommand}"] + sys.argv[2:] + module = importlib.import_module(SUBCOMMAND_MAP[subcommand]) + module.main() diff --git a/bit/cli/cov_analyzer.py b/bit/cli/cov_analyzer.py index 294de20..adc9618 100644 --- a/bit/cli/cov_analyzer.py +++ b/bit/cli/cov_analyzer.py @@ -3,7 +3,8 @@ from bit.cli.common import (CustomRichHelpFormatter, reconstruct_invocation, add_help, add_version_arg, add_force) -def main(): + +def build_parser(parent_subparsers=None): desc = """ This program analyzes coverage patterns given a reference fasta and a bam file as inputs. @@ -18,12 +19,21 @@ def main(): specifically want to investigate them too, you should probably use `--per-contig` mode). """ - parser = argparse.ArgumentParser( - description=desc, - epilog="Ex. usage: `bit-cov-analyzer -r reference.fasta -b mapping.bam`", - formatter_class=CustomRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "cov-analyzer", + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + epilog="Ex. usage: `bit cov-analyzer -r reference.fasta -b mapping.bam`", + formatter_class=CustomRichHelpFormatter, + add_help=False + ) + required = parser.add_argument_group("Required Parameters") optional = parser.add_argument_group("Optional Parameters") @@ -132,6 +142,13 @@ def main(): add_help(optional) add_version_arg(optional) + return parser + + +def main(): + + parser = build_parser() + if len(sys.argv) == 1: # pragma: no cover parser.print_help(sys.stderr) sys.exit(0) diff --git a/bit/cli/cov_stats.py b/bit/cli/cov_stats.py index 15399ba..771d4c9 100644 --- a/bit/cli/cov_stats.py +++ b/bit/cli/cov_stats.py @@ -5,7 +5,7 @@ from bit.modules.general import report_message, notify_premature_exit, is_gzipped -def build_parser(): +def build_parser(parent_subparsers=None): desc = """ This script generates whole-reference and contig-level detection and coverage info @@ -14,12 +14,20 @@ def build_parser(): each input reference and contig. """ - parser = argparse.ArgumentParser( - description=desc, - epilog="Ex. usage: `bit-cov-stats -r reference.fasta -b mapping.bam` or `bit-cov-stats -r reference.fasta --bed per-base.bed.gz`", - formatter_class=CustomRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "cov-stats", + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + epilog="Ex. usage: `bit cov-stats -r reference.fasta -b mapping.bam` or `bit cov-stats -r reference.fasta --bed per-base.bed.gz`", + formatter_class=CustomRichHelpFormatter, + add_help=False + ) required = parser.add_argument_group("Required Parameters (choose one of `--reference-fastas` or `--reference-list` then `--bam` and/or `--bed`)") optional = parser.add_argument_group("Optional Parameters") diff --git a/bit/cli/data.py b/bit/cli/data.py index 2376764..26ed01c 100644 --- a/bit/cli/data.py +++ b/bit/cli/data.py @@ -4,18 +4,26 @@ from bit.cli.common import CustomRichHelpFormatter, add_help, add_version_arg -def build_parser(): +def build_parser(parent_subparsers=None): desc = """ This program manages bit-utilized databases and their location settings. See subcommand-specific help menus for more info. """ - parser = argparse.ArgumentParser( - description=desc, - formatter_class=CustomRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "data", + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False + ) add_help(parser) @@ -38,7 +46,7 @@ def build_parser(): "get", help="Download/update bit-utilized databases, or get test data", description=get_desc, - epilog="Ex. usage: `bit-data get ncbi-assembly-data`", + epilog="Ex. usage: `bit data get ncbi-assembly-data`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -74,7 +82,7 @@ def add_get_common_args(group): "go-dbs", help="Download or update GO databases", description=get_go_dbs_desc, - epilog="Ex. usage: `bit-data get go-dbs`", + epilog="Ex. usage: `bit data get go-dbs`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -98,7 +106,7 @@ def add_get_common_args(group): "gtdb-data", help="Download or update GTDB metadata", description=get_gtdb_data_desc, - epilog="Ex. usage: `bit-data get gtdb-data`", + epilog="Ex. usage: `bit data get gtdb-data`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -122,7 +130,7 @@ def add_get_common_args(group): "ncbi-assembly-data", help="Download or update NCBI assembly-summary tables", description=get_ncbi_assembly_desc, - epilog="Ex. usage: `bit-data get ncbi-assembly-data`", + epilog="Ex. usage: `bit data get ncbi-assembly-data`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -146,7 +154,7 @@ def add_get_common_args(group): "ncbi-tax-data", help="Download or update NCBI taxonomy data", description=get_ncbi_tax_desc, - epilog="Ex. usage: `bit-data get ncbi-tax-data`", + epilog="Ex. usage: `bit data get ncbi-tax-data`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -170,7 +178,7 @@ def add_get_common_args(group): "test-data", help="Download test data", description=get_test_data_desc, - epilog="Ex. usage: `bit-data get test-data genome`", + epilog="Ex. usage: `bit data get test-data genome`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -203,7 +211,7 @@ def add_get_common_args(group): "locations", help="Check or set data-location environment variables", description=locations_desc, - epilog="Ex. usage: `bit-data locations check`", + epilog="Ex. usage: `bit data locations check`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -227,7 +235,7 @@ def add_get_common_args(group): "check", help="Report current data-location environment variables", description=locations_check_desc, - epilog="Ex. usage: `bit-data locations check`", + epilog="Ex. usage: `bit data locations check`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -249,7 +257,7 @@ def add_get_common_args(group): "set", help="Interactively set data-location environment variables", description=locations_set_desc, - epilog="Ex. usage: `bit-data locations set`", + epilog="Ex. usage: `bit data locations set`", formatter_class=CustomRichHelpFormatter, add_help=False ) diff --git a/bit/cli/dl_ncbi_assemblies.py b/bit/cli/dl_ncbi_assemblies.py index 04151e0..c08cf20 100644 --- a/bit/cli/dl_ncbi_assemblies.py +++ b/bit/cli/dl_ncbi_assemblies.py @@ -3,7 +3,8 @@ from bit.cli.common import CustomRichHelpFormatter, add_help, add_version_arg from bit.modules.dl_ncbi_assemblies import dl_ncbi_assemblies -def main(): + +def build_parser(parent_subparsers=None): desc = """ This program downloads assembly files for NCBI genomes. It takes as input @@ -11,12 +12,20 @@ def main(): which format to download. """ - parser = argparse.ArgumentParser( - description=desc, - epilog="Ex. usage: `bit-dl-ncbi-assemblies -w wanted-accessions.txt`", - formatter_class=CustomRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "dl-ncbi-assemblies", + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + epilog="Ex. usage: `bit dl-ncbi-assemblies -w wanted-accessions.txt`", + formatter_class=CustomRichHelpFormatter, + add_help=False + ) required = parser.add_argument_group("Required Parameters") optional = parser.add_argument_group("Optional Parameters") @@ -58,6 +67,13 @@ def main(): add_help(optional) add_version_arg(optional) + return parser + + +def main(): + + parser = build_parser() + if len(sys.argv) == 1: # pragma: no cover parser.print_help(sys.stderr) sys.exit(0) diff --git a/bit/cli/extract_seqs.py b/bit/cli/extract_seqs.py index cb15881..00a7f0f 100644 --- a/bit/cli/extract_seqs.py +++ b/bit/cli/extract_seqs.py @@ -54,7 +54,7 @@ def add_common_optional_arguments(group): "by-coords", help="Extract sequences based on coordinates provided in a bed file", description=by_coords_desc, - epilog="Ex. usage: `bit-extract-seqs by-coords -i input.fasta -b targets.bed`", + epilog="Ex. usage: `bit extract-seqs by-coords -i input.fasta -b targets.bed`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -89,7 +89,7 @@ def add_common_optional_arguments(group): "by-headers", help="Extract sequences based on specified headers", description=by_headers_desc, - epilog="Ex. usage: `bit-extract-seqs by-headers -i input.fasta -h contig-1 contig-2`", + epilog="Ex. usage: `bit extract-seqs by-headers -i input.fasta -h contig-1 contig-2`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -139,7 +139,7 @@ def add_common_optional_arguments(group): "by-primers", help="Extract sequences based on forward and reverse primer sequences", description=by_primers_desc, - epilog="Ex. usage: `bit-extract-seqs by-primers -i input.fasta -f ForwardPrimerSeq -r ReversePrimerSeq`", + epilog="Ex. usage: `bit extract-seqs by-primers -i input.fasta -f ForwardPrimerSeq -r ReversePrimerSeq`", formatter_class=CustomRichHelpFormatter, add_help=False ) diff --git a/bit/cli/ez_screen.py b/bit/cli/ez_screen.py index 3ff1923..04a1a3b 100644 --- a/bit/cli/ez_screen.py +++ b/bit/cli/ez_screen.py @@ -8,18 +8,26 @@ add_version_arg) -def build_parser(): +def build_parser(parent_subparsers=None): desc = """ This program helps detect target-genes/regions present in assemblies or reads. See subcommand-specific help menus for more info. """ - parser = argparse.ArgumentParser( - description=desc, - formatter_class=CustomRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "ez-screen", + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False + ) add_help(parser) add_version_arg(parser) @@ -73,7 +81,7 @@ def add_common_optional_arguments(group): "assembly", help="Run BLAST-based screening of targets in assemblies", description=assembly_description, - epilog="Ex. usage: `bit-ez-screen assembly -a assembly.fasta -t targets.fasta`", + epilog="Ex. usage: `bit ez-screen assembly -a assembly.fasta -t targets.fasta`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -111,7 +119,7 @@ def add_common_optional_arguments(group): "reads", help="Run mapping-based screening of reads against targets", description=reads_description, - epilog="Ex. usage: `bit-ez-screen reads -t targets.fasta`", + epilog="Ex. usage: `bit ez-screen reads -t targets.fasta`", formatter_class=CustomRichHelpFormatter, add_help=False ) diff --git a/bit/cli/fasta.py b/bit/cli/fasta.py index 64461c6..63ededd 100644 --- a/bit/cli/fasta.py +++ b/bit/cli/fasta.py @@ -3,18 +3,26 @@ import argcomplete # type: ignore from bit.cli.common import CustomRichHelpFormatter, add_help, add_version_arg -def build_parser(): +def build_parser(parent_subparsers=None): desc = """ This program performs various operations on fasta files. See subcommand-specific help menus for more info. """ - parser = argparse.ArgumentParser( - description=desc, - formatter_class=CustomRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "fasta", + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False + ) add_help(parser) @@ -45,7 +53,7 @@ def add_common_required_arguments(group): "calc-gc", help="Calculate GC content (per seq or per sliding window)", description=calc_gc_desc, - epilog="Ex. usage: `bit-fasta calc-gc input.fasta`", + epilog="Ex. usage: `bit fasta calc-gc input.fasta`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -99,7 +107,7 @@ def add_common_required_arguments(group): "calc-var-in-msa", help="Calculate variation in a multiple-sequence alignment", description=calc_var_desc, - epilog="Ex. usage: `bit-fasta calc-var-in-msa alignment.fasta -o variation.tsv`", + epilog="Ex. usage: `bit fasta calc-var-in-msa alignment.fasta -o variation.tsv`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -155,7 +163,7 @@ def add_common_required_arguments(group): "count", help="Count number of seqs and characters", description=count_desc, - epilog="Ex. usage: `bit-fasta count input.fasta`", + epilog="Ex. usage: `bit fasta count input.fasta`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -189,7 +197,7 @@ def add_common_required_arguments(group): "extract-by-coords", help="Extract sequences based on coordinates provided in a bed file", description=extract_by_coords_desc, - epilog="Ex. usage: `bit-fasta extract-by-coords -i input.fasta -b targets.bed`", + epilog="Ex. usage: `bit fasta extract-by-coords -i input.fasta -b targets.bed`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -238,7 +246,7 @@ def add_common_required_arguments(group): "extract-by-headers", help="Extract sequences based on specified headers", description=extract_by_headers_desc, - epilog="Ex. usage: `bit-fasta extract-by-headers -i input.fasta -H contig-1 contig-2`", + epilog="Ex. usage: `bit fasta extract-by-headers -i input.fasta -H contig-1 contig-2`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -300,7 +308,7 @@ def add_common_required_arguments(group): "extract-by-primers", help="Extract sequences based on forward and reverse primer sequences", description=extract_by_primers_desc, - epilog="Ex. usage: `bit-fasta extract-by-primers -i input.fasta -f ForwardPrimerSeq -r ReversePrimerSeq`", + epilog="Ex. usage: `bit fasta extract-by-primers -i input.fasta -f ForwardPrimerSeq -r ReversePrimerSeq`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -362,7 +370,7 @@ def add_common_required_arguments(group): "filter-by-length", help="Filter sequences based on minimum/maximum length", description=filter_by_length_desc, - epilog="Ex. usage: `bit-fasta filter-by-length -i input.fasta -m 1000 -o filtered.fasta`", + epilog="Ex. usage: `bit fasta filter-by-length -i input.fasta -m 1000 -o filtered.fasta`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -419,7 +427,7 @@ def add_common_required_arguments(group): "modify-headers", help="Modify or rename fasta headers", description=modify_headers_desc, - epilog="Ex. usage: `bit-fasta modify-headers -i input.fasta -w contig`", + epilog="Ex. usage: `bit fasta modify-headers -i input.fasta -w contig`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -484,7 +492,7 @@ def add_common_required_arguments(group): "remove-wraps", help="Remove line wraps from a fasta file", description=remove_wraps_desc, - epilog="Ex. usage: `bit-fasta remove-wraps input.fasta > unwrapped.fasta`", + epilog="Ex. usage: `bit fasta remove-wraps input.fasta > unwrapped.fasta`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -519,7 +527,7 @@ def add_common_required_arguments(group): "to-bed", help="Generate a bed file from a fasta", description=to_bed_desc, - epilog="Ex. usage: `bit-fasta to-bed input.fasta -o output.bed`", + epilog="Ex. usage: `bit fasta to-bed input.fasta -o output.bed`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -554,7 +562,7 @@ def add_common_required_arguments(group): "to-genbank", help="Generate a genbank file from a nucleotide fasta", description=to_genbank_desc, - epilog="Ex. usage: `bit-fasta to-genbank input.fasta -o output.gb`", + epilog="Ex. usage: `bit fasta to-genbank input.fasta -o output.gb`", formatter_class=CustomRichHelpFormatter, add_help=False ) diff --git a/bit/cli/filter_kofamscan_results.py b/bit/cli/filter_kofamscan_results.py index bd3490f..c7a233d 100644 --- a/bit/cli/filter_kofamscan_results.py +++ b/bit/cli/filter_kofamscan_results.py @@ -5,7 +5,7 @@ from bit.modules.general import check_files_are_found -def build_parser(): +def build_parser(parent_subparsers=None): desc = """ This script filters the "detail-tsv"-formatted output file from KOFamScan to retain @@ -15,12 +15,20 @@ def build_parser(): with: gene_ID, KO_ID, and KO_annotation. """ - parser = argparse.ArgumentParser( - description=desc, - epilog="Ex. usage: `bit-filter-kofamscan-results -i initial-KOFamScan-results.txt -o KO-annotations.tsv`", - formatter_class=CustomRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "filter-ko-results", + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + epilog="Ex. usage: `bit filter-kofamscan-results -i initial-KOFamScan-results.txt -o KO-annotations.tsv`", + formatter_class=CustomRichHelpFormatter, + add_help=False + ) required = parser.add_argument_group("Required Parameters") optional = parser.add_argument_group("Optional Parameters") diff --git a/bit/cli/gen_reads.py b/bit/cli/gen_reads.py index 4237324..415cd69 100644 --- a/bit/cli/gen_reads.py +++ b/bit/cli/gen_reads.py @@ -7,20 +7,29 @@ add_version_arg) -def main(): +def build_parser(parent_subparsers=None): desc = """ This script generates perfect (no error model) reads in FASTQ format from one or - multiple input FASTA files. See `bit-mutate-seqs` if wanting to introduce variation + multiple input FASTA files. See `bit mutate-seqs` if wanting to introduce variation to a fasta prior to read-generation. """ - parser = argparse.ArgumentParser( - description=desc, - epilog="Ex. usage: `bit-gen-reads -i genome.fasta`", - formatter_class=CustomRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "gen-reads", + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + epilog="Ex. usage: `bit gen-reads -i genome.fasta`", + formatter_class=CustomRichHelpFormatter, + add_help=False + ) + required = parser.add_argument_group("Required Parameters") general = parser.add_argument_group("General Parameters") paired = parser.add_argument_group("Paired-end Parameters") @@ -100,9 +109,7 @@ def main(): ) add_seed(general) - add_help(general) - add_version_arg(general) paired.add_argument( @@ -134,6 +141,13 @@ def main(): default=50, ) + return parser + + +def main(): + + parser = build_parser() + if len(sys.argv) == 1: # pragma: no cover parser.print_help(sys.stderr) sys.exit(0) diff --git a/bit/cli/genbank.py b/bit/cli/genbank.py index d68aaa5..0e0eae1 100644 --- a/bit/cli/genbank.py +++ b/bit/cli/genbank.py @@ -4,17 +4,25 @@ from bit.cli.common import CustomRichHelpFormatter, add_help, add_version_arg -def build_parser(): +def build_parser(parent_subparsers=None): desc = """ This program extracts different types of information and sequences from GenBank files. """ - parser = argparse.ArgumentParser( - description=desc, - formatter_class=CustomRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "genbank", + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False + ) add_help(parser) @@ -57,7 +65,7 @@ def add_common_optional_arguments(group): "to-AA-seqs", help="Extract amino-acid sequences for complete coding sequences", description=to_AA_seqs_desc, - epilog="Ex. usage: `bit-genbank to-AA-seqs -i input.gbff`", + epilog="Ex. usage: `bit genbank to-AA-seqs -i input.gbff`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -85,7 +93,7 @@ def add_common_optional_arguments(group): "to-cds-tsv", help="Extract CDS info to a tab-delimited file", description=to_cds_tsv_desc, - epilog="Ex. usage: `bit-genbank to-cds-tsv -i input.gbff`", + epilog="Ex. usage: `bit genbank to-cds-tsv -i input.gbff`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -120,7 +128,7 @@ def add_common_optional_arguments(group): "to-cds-seqs", help="Extract nucleotide sequences for CDS features", description=to_cds_seqs_desc, - epilog="Ex. usage: `bit-genbank to-cds-seqs -i input.gbff`", + epilog="Ex. usage: `bit genbank to-cds-seqs -i input.gbff`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -146,7 +154,7 @@ def add_common_optional_arguments(group): "to-fasta", help="Extract the full nucleotide fasta sequence", description=to_fasta_desc, - epilog="Ex. usage: `bit-genbank to-fasta -i input.gbff`", + epilog="Ex. usage: `bit genbank to-fasta -i input.gbff`", formatter_class=CustomRichHelpFormatter, add_help=False ) diff --git a/bit/cli/get_accessions_from_gtdb.py b/bit/cli/get_accessions_from_gtdb.py index 51c9377..8159a7d 100644 --- a/bit/cli/get_accessions_from_gtdb.py +++ b/bit/cli/get_accessions_from_gtdb.py @@ -4,25 +4,33 @@ from bit.modules.gtdb.get_accessions_from_gtdb import get_accessions_from_gtdb -def main(): +def build_parser(parent_subparsers=None): desc = """ This is a helper program to facilitate using taxonomy and genomes from the Genome Taxonomy Database (gtdb.ecogenomic.org). It primarily returns NCBI accessions and GTDB summary tables based on GTDB-taxonomy searches, - which could then be passed to, e.g., `bit-dl-ncbi-assemblies`. It also + which could then be passed to, e.g., `bit dl-ncbi-assemblies`. It also currently has filtering capabilities built-in for specifying only GTDB representative species or RefSeq reference genomes (see help menu and links therein for explanations of what these are). It will cache the GTDB - metadata tables, if you want to update them, run `bit-data get gtdb-data -f`. + metadata tables, if you want to update them, run `bit data get gtdb-data -f`. """ - parser = argparse.ArgumentParser( - description=desc, - epilog="Ex. usage: bit-get-accessions-from-gtdb -t Archaea --gtdb-representatives-only", - formatter_class=CustomRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "get-accs-from-gtdb", + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + epilog="Ex. usage: bit get-accs-from-gtdb -t Archaea --gtdb-representatives-only", + formatter_class=CustomRichHelpFormatter, + add_help=False + ) required = parser.add_argument_group("Required Parameters") optional = parser.add_argument_group("Optional Parameters") @@ -85,9 +93,15 @@ def main(): ) add_help(optional) - add_version_arg(optional) + return parser + + +def main(): + + parser = build_parser() + if len(sys.argv) == 1: # pragma: no cover parser.print_help(sys.stderr) sys.exit(0) diff --git a/bit/cli/get_workflow.py b/bit/cli/get_workflow.py index a758d97..1d1a2b0 100644 --- a/bit/cli/get_workflow.py +++ b/bit/cli/get_workflow.py @@ -4,19 +4,27 @@ from bit.modules.get_workflow import dl_wf -def build_parser(): +def build_parser(parent_subparsers=None): desc = """ This is a helper program for downloading bit workflows. Workflow version is included with the downloaded workflow. """ - parser = argparse.ArgumentParser( - description=desc, - epilog="Ex. usage: `bit-get-workflow metagenomics`", - formatter_class=CustomRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "get-workflow", + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + epilog="Ex. usage: `bit get-workflow metagenomics`", + formatter_class=CustomRichHelpFormatter, + add_help=False + ) required = parser.add_argument_group("Required Parameters") optional = parser.add_argument_group("Optional Parameters") diff --git a/bit/cli/go.py b/bit/cli/go.py index 2a4407a..b07fb4f 100644 --- a/bit/cli/go.py +++ b/bit/cli/go.py @@ -4,18 +4,26 @@ from bit.cli.common import CustomRichHelpFormatter, add_help, add_version_arg -def build_parser(): +def build_parser(parent_subparsers=None): desc = """ This program has helpers for working Gene Ontology (GO) annotations. See subcommand-specific help menus for more info. """ - parser = argparse.ArgumentParser( - description=desc, - formatter_class=CustomRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "go", + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False + ) add_help(parser) @@ -57,7 +65,7 @@ def add_common_optional_arguments(group): "get-term-info", help="Get information on individual GO terms", description=get_term_info_desc, - epilog="Ex. usage: `bit-go get-term-info GO:0004386`", + epilog="Ex. usage: `bit go get-term-info GO:0004386`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -97,7 +105,7 @@ def add_common_optional_arguments(group): "summarize-annotations", help="Summarize GO annotations", description=summarize_annotations_desc, - epilog="Ex. usage: `bit-go summarize-annotations -i GO-annotations.tsv`", + epilog="Ex. usage: `bit go summarize-annotations -i GO-annotations.tsv`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -137,14 +145,14 @@ def add_common_optional_arguments(group): ### subcommand cli for combining GO-annotation summaries ### combine_summaries_desc = """ - This subcommand takes multiple GO summary tables produced by `bit-go summarize-annotations` and combines them into a single table. + This subcommand takes multiple GO summary tables produced by `bit go summarize-annotations` and combines them into a single table. """ combine_summaries_parser = subparsers.add_parser( "combine-summaries", help="Combine GO summary tables", description=combine_summaries_desc, - epilog="Ex. usage: `bit-go combine-summaries -i sample-1-GO-summary.tsv sample-2-GO-summary.tsv`", + epilog="Ex. usage: `bit go combine-summaries -i sample-1-GO-summary.tsv sample-2-GO-summary.tsv`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -158,7 +166,7 @@ def add_common_optional_arguments(group): metavar="", nargs="+", type=str, - help="Space-delimited list of `bit-go summarize-annotations` output files", + help="Space-delimited list of `bit go summarize-annotations` output files", required=True ) @@ -191,14 +199,14 @@ def add_common_optional_arguments(group): This subcommand wraps the goatools `map_to_slim.py` program (github.com/tanghaibao/Goatools#map-go-terms-to-goslim-terms). See there for more details, and if you use it in your work, be sure to properly cite them :) https://www.nature.com/articles/s41598-018-28948-z. It is included here to streamline integration with - with the GO databases stored with `bit` and programs like `bit-go summarize-annotations`. + with the GO databases stored with `bit` and programs like `bit go summarize-annotations`. """ slim_terms_parser = subparsers.add_parser( "slim-terms", help="Slim down GO annotations to a specified slim obo", description=slim_terms_desc, - epilog="Ex. usage: `bit-go slim-terms -i GO-annotations.tsv`", + epilog="Ex. usage: `bit go slim-terms -i GO-annotations.tsv`", formatter_class=CustomRichHelpFormatter, add_help=False ) diff --git a/bit/cli/itol.py b/bit/cli/itol.py index b4b6088..177f806 100644 --- a/bit/cli/itol.py +++ b/bit/cli/itol.py @@ -3,18 +3,26 @@ import argcomplete # type: ignore from bit.cli.common import CustomRichHelpFormatter, add_help, add_version_arg -def build_parser(): +def build_parser(parent_subparsers=None): desc = """ This program helps generate various Interacitve Tree of Life (iToL) files that can be dropped onto a tree on the website for visualization/annotation. See itol.embl.de/help.cgi for information on the different types. """ - parser = argparse.ArgumentParser( - description=desc, - formatter_class=CustomRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "itol", + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False + ) add_help(parser) @@ -61,7 +69,7 @@ def add_common_optional_arguments(group): "binary-dataset", help="Create an iToL binary-dataset file", description=binary_desc, - epilog="Ex. usage: `bit-itol binary-dataset -i genomes.txt`", + epilog="Ex. usage: `bit itol binary-dataset -i genomes.txt`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -113,7 +121,7 @@ def add_common_optional_arguments(group): "colorstrip", help="Create an iToL colorstrip file", description=colorstrip_desc, - epilog="Ex. usage: `bit-itol colorstrip -i genomes.txt`", + epilog="Ex. usage: `bit itol colorstrip -i genomes.txt`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -162,7 +170,7 @@ def add_common_optional_arguments(group): "map", help="Create an iToL map file for coloring labels and/or branches", description=map_desc, - epilog="Ex. usage: `bit-itol map -i genomes.txt`", + epilog="Ex. usage: `bit itol map -i genomes.txt`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -205,7 +213,7 @@ def add_common_optional_arguments(group): "text-dataset", help="Create an iToL text-dataset file", description=text_desc, - epilog="Ex. usage: `bit-itol text-dataset -i genomes.txt`", + epilog="Ex. usage: `bit itol text-dataset -i genomes.txt`", formatter_class=CustomRichHelpFormatter, add_help=False ) diff --git a/bit/cli/kraken2.py b/bit/cli/kraken2.py index 81a8640..3426c91 100644 --- a/bit/cli/kraken2.py +++ b/bit/cli/kraken2.py @@ -6,18 +6,26 @@ from bit.cli.common import CustomRichHelpFormatter, add_help, add_version_arg -def build_parser(): +def build_parser(parent_subparsers=None): desc = """ This program provides subcommands for working with kraken2 (or bracken) output. See subcommand-specific help menus for more info. """ - parser = argparse.ArgumentParser( - description=desc, - formatter_class=CustomRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "kraken2", + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False + ) add_help(parser) @@ -38,7 +46,7 @@ def build_parser(): "tax-plots", help="Generate taxonomy bar plots from a kraken2 report", description=tax_plots_desc, - epilog="Ex. usage: `bit-kraken2 tax-plots -i kraken2.report`", + epilog="Ex. usage: `bit kraken2 tax-plots -i kraken2.report`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -112,7 +120,7 @@ def build_parser(): "tax-summary", help="Generate a taxonomy summary table from kraken2/bracken report(s)", description=tax_summary_desc, - epilog="Ex. usage: `bit-kraken2 tax-summary -i kraken2.report`", + epilog="Ex. usage: `bit kraken2 tax-summary -i kraken2.report`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -228,7 +236,7 @@ def check_and_setup_file_to_sample_map(input_reports, sample_names): if sample_names is not None and len(sample_names) != len(input_reports): report_message("\n It seems the number of provided sample names doesn't match the number of provided input files :(") - report_message("\n Check usage with `bit-kraken2-tax-summary -h`.\n") + report_message("\n Check usage with `bit kraken2 tax-summary -h`.\n") notify_premature_exit() if sample_names is not None: diff --git a/bit/cli/lineage.py b/bit/cli/lineage.py index 3b7cb4a..8361387 100644 --- a/bit/cli/lineage.py +++ b/bit/cli/lineage.py @@ -5,18 +5,26 @@ from bit.modules.general import check_files_are_found -def build_parser(): +def build_parser(parent_subparsers=None): desc = """ This program has helpers for getting lineages from taxids and working with lineages. See subcommand-specific help menus for more info. """ - parser = argparse.ArgumentParser( - description=desc, - formatter_class=CustomRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "lineage", + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False + ) add_help(parser) @@ -38,7 +46,7 @@ def build_parser(): "from-taxids", help="Get NCBI lineage info from taxids", description=from_taxids_desc, - epilog="Ex. usage: `bit-lineage from-taxids -i taxids.txt -o lineages.tsv`", + epilog="Ex. usage: `bit lineage from-taxids -i taxids.txt -o lineages.tsv`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -88,7 +96,7 @@ def build_parser(): "to-tsv", help="Convert condensed lineages to TSV format", description=to_tsv_desc, - epilog="Ex. usage: `bit-lineage to-tsv -i input-lineages.tsv -o formatted-tax.tsv`", + epilog="Ex. usage: `bit lineage to-tsv -i input-lineages.tsv -o formatted-tax.tsv`", formatter_class=CustomRichHelpFormatter, add_help=False ) diff --git a/bit/cli/mapped_reads_pid.py b/bit/cli/mapped_reads_pid.py index 560b0d1..327dd06 100644 --- a/bit/cli/mapped_reads_pid.py +++ b/bit/cli/mapped_reads_pid.py @@ -5,22 +5,30 @@ get_summary_stats) -def build_parser(): +def build_parser(parent_subparsers=None): desc = """ This script takes an input bam file and generates percent-identity information for mapped reads based on edit distance (using the NM field) and total alignment length. By default, it just prints out some summary stats. Specify an output file if you also want it to write out the percent identities for each mapped read. [bold]TO ALSO GET[/bold] coverage and detection information, use - `bit-cov-stats` instead. + `bit cov-stats` instead. """ - parser = argparse.ArgumentParser( - description=desc, - epilog="Ex. usage: `bit-get-mapped-reads-pid input.bam`", - formatter_class=CustomRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "mapped-reads-pid", + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + epilog="Ex. usage: `bit mapped-reads-pid input.bam`", + formatter_class=CustomRichHelpFormatter, + add_help=False + ) required = parser.add_argument_group("Required Parameters") optional = parser.add_argument_group("Optional Parameters") diff --git a/bit/cli/mutate_seqs.py b/bit/cli/mutate_seqs.py index 65dd554..cc6cf28 100644 --- a/bit/cli/mutate_seqs.py +++ b/bit/cli/mutate_seqs.py @@ -6,19 +6,27 @@ from bit.modules.general import check_files_are_found, report_message -def build_parser(): +def build_parser(parent_subparsers=None): desc = """ This script will mutate all sequences of a nucleotide or amino-acid multifasta with the specified mutation rate. By default it only swaps bases, but it can optionally introduce indels also. """ - parser = argparse.ArgumentParser( - description=desc, - epilog="Ex. usage: `bit-mutate-seqs -i input.fasta`", - formatter_class=CustomRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "mutate-seqs", + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + epilog="Ex. usage: `bit mutate-seqs -i input.fasta`", + formatter_class=CustomRichHelpFormatter, + add_help=False + ) required = parser.add_argument_group("Required Parameters") optional = parser.add_argument_group("Optional Parameters") diff --git a/bit/cli/summarize_assembly.py b/bit/cli/summarize_assembly.py index 29e90b2..e02d9c3 100644 --- a/bit/cli/summarize_assembly.py +++ b/bit/cli/summarize_assembly.py @@ -6,7 +6,7 @@ add_version_arg) -def build_parser(): +def build_parser(parent_subparsers=None): desc = """ This script outputs general summary stats for assemblies provided in fasta @@ -15,12 +15,20 @@ def build_parser(): total counts of any letter that is not "A", "T", "C", or "G".. """ - parser = argparse.ArgumentParser( - description=desc, - epilog="Ex. usage: bit-summarize-assembly assembly.fasta", - formatter_class=CustomRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "summarize-assembly", + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + epilog="Ex. usage: `bit summarize-assembly assembly.fasta`", + formatter_class=CustomRichHelpFormatter, + add_help=False + ) required = parser.add_argument_group("Required Parameters") optional = parser.add_argument_group("Optional Parameters") diff --git a/bit/cli/table.py b/bit/cli/table.py index 530587f..fe49bbf 100644 --- a/bit/cli/table.py +++ b/bit/cli/table.py @@ -5,18 +5,26 @@ from bit.cli.common import CustomRichHelpFormatter, add_help, add_version_arg -def build_parser(): +def build_parser(parent_subparsers=None): desc = """ This program has utilities for working with tabular data. See subcommand-specific help menus for more info.. """ - parser = argparse.ArgumentParser( - description=desc, - formatter_class=CustomRichHelpFormatter, - add_help=False - ) + if parent_subparsers is not None: + parser = parent_subparsers.add_parser( + "table", + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False, + ) + else: + parser = argparse.ArgumentParser( + description=desc, + formatter_class=CustomRichHelpFormatter, + add_help=False + ) add_help(parser) add_version_arg(parser) @@ -36,7 +44,7 @@ def build_parser(): "colnames", help="List column names of a delimited file", description=colnames_desc, - epilog="Ex. usage: `bit-table colnames input.tsv`", + epilog="Ex. usage: `bit table colnames input.tsv`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -68,7 +76,7 @@ def build_parser(): "filter", help="Filter a table by values (strings) in a column", description=filter_desc, - epilog="Ex. usage: `bit-table filter -i input.tsv -w wanted-values.txt`", + epilog="Ex. usage: `bit table filter -i input.tsv -w wanted-values.txt`", formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -147,7 +155,7 @@ def build_parser(): "normalize", help="Normalize a table by CPM or median-ratio", description=normalize_desc, - epilog='Ex. usage: `bit-table normalize -i input-table.tsv -n CPM`', + epilog='Ex. usage: `bit table normalize -i input-table.tsv -n CPM`', formatter_class=CustomRichHelpFormatter, add_help=False ) @@ -196,7 +204,7 @@ def build_parser(): "summarize-column", help="Summarize stats of a numeric column", description=summarize_col_desc, - epilog="Ex. usage: `bit-table summarize-column data.tsv -c 2`", + epilog="Ex. usage: `bit table summarize-column data.tsv -c 2`", formatter_class=CustomRichHelpFormatter, add_help=False ) diff --git a/bit/modules/data_locations.py b/bit/modules/data_locations.py index bcbcf08..c78278c 100644 --- a/bit/modules/data_locations.py +++ b/bit/modules/data_locations.py @@ -50,7 +50,7 @@ def check_and_report_env_variables(): if not writable_dict[entry]: print() - wprint(color_text("The path set for the '" + str(entry) + "' variable is not writable. This may cause problems, so you might want to put it somewhere else (with `bit-data-locations set`).", "red")) + wprint(color_text("The path set for the '" + str(entry) + "' variable is not writable. This may cause problems, so you might want to put it somewhere else (with `bit data-locations set`).", "red")) print() @@ -65,7 +65,7 @@ def check_location_var_is_set_and_writable(variable): except: print() wprint(color_text("The environment variable '" + str(variable) + "' does not seem to be set :(", "yellow")) - wprint("Try to set it with `bit-data-locations set`.") + wprint("Try to set it with `bit data-locations set`.") print("") sys.exit(1) @@ -277,6 +277,6 @@ def notify_to_reactivate_conda(): print(color_text(" --------------------------------------------------------------------------------", "green")) wprint(color_text("Environment variables have been updated. But for the changes to take effect, be sure to reactivate the conda environment, e.g.:", "yellow")) print(f"\n `conda activate {curr_conda_name}`\n") - wprint(color_text("Then you can double-check with `bit-data-locations check`.", "yellow")) + wprint(color_text("Then you can double-check with `bit data-locations check`.", "yellow")) print(color_text(" --------------------------------------------------------------------------------", "green")) print(color_text(" --------------------------------------------------------------------------------\n", "green")) diff --git a/bit/modules/go/get_go_dbs.py b/bit/modules/go/get_go_dbs.py index 54ee5b9..ec9b35c 100644 --- a/bit/modules/go/get_go_dbs.py +++ b/bit/modules/go/get_go_dbs.py @@ -11,7 +11,7 @@ def check_go_data_location_var_is_set(): go_data_dir = os.environ['GO_DB_DIR'] except: wprint(color_text("The environment variable 'GO_DB_DIR' does not seem to be set :(", "yellow")) - wprint("This shouldn't happen, check on things with `bit-data locations check`.") + wprint("This shouldn't happen, check on things with `bit data locations check`.") print("") sys.exit(0) @@ -65,7 +65,7 @@ def get_go_data(force_update=False, quiet=False): if not quiet: report_message(f"GO data already present at:") print(f" {go_db_dir}") - report_message(f"Run `bit-data get go-dbs -f` if you want to re-download/update it.") + report_message(f"Run `bit data get go-dbs -f` if you want to re-download/update it.") print() return return diff --git a/bit/modules/go/go.py b/bit/modules/go/go.py index d26e257..ba5c6d3 100644 --- a/bit/modules/go/go.py +++ b/bit/modules/go/go.py @@ -198,7 +198,7 @@ def combine_summaries(args): # checking if sample names provided the length equals the number of input files if len(args.sample_names) != len(args.input_files): print("\n It seems the number of provided sample names doesn't match the number of provided input files :(") - print("\n Check usage with `bit-go combine-summaries -h`.\n") + print("\n Check usage with `bit go combine-summaries -h`.\n") sys.exit(0) for i, curr_sample in enumerate(args.sample_names): diff --git a/bit/modules/gtdb/get_accessions_from_gtdb.py b/bit/modules/gtdb/get_accessions_from_gtdb.py index f9b3cf9..21a322f 100644 --- a/bit/modules/gtdb/get_accessions_from_gtdb.py +++ b/bit/modules/gtdb/get_accessions_from_gtdb.py @@ -19,7 +19,7 @@ def get_accessions_from_gtdb(args): print("") wprint(color_text("A specific taxon needs to also be provided to the `-t` flag in order to use `--get-taxon-counts`.", "yellow")) print("") - wprint(" E.g.: bit-get-accessions-from-gtdb --get-taxon-counts -t Alteromonas") + wprint(" E.g.: bit get-accs-from-gtdb --get-taxon-counts -t Alteromonas") print("") sys.exit(0) @@ -195,7 +195,7 @@ def get_accessions(taxon, gtdb_tab, gtdb_rep_tab=None, rank=None, representative # report results print("") - + wprint(f"Wrote {len(target_accs):,} accession(s) to:") wprint(" " + color_text(acc_out_filename)) print("") diff --git a/bit/modules/gtdb/get_gtdb_data.py b/bit/modules/gtdb/get_gtdb_data.py index 9d0b48d..4d4bcda 100644 --- a/bit/modules/gtdb/get_gtdb_data.py +++ b/bit/modules/gtdb/get_gtdb_data.py @@ -15,7 +15,7 @@ def get_gtdb_data(force_update=False, quiet=False): if not quiet: report_message("GTDB data already present at:") print(f" {GTDB_dir}") - report_message("Run `bit-data get gtdb-data -f` if you want to re-download/update it.") + report_message("Run `bit data get gtdb-data -f` if you want to re-download/update it.") print("") return GTDB_dir return GTDB_dir @@ -29,7 +29,7 @@ def check_gtdb_location_var_is_set(): gtdb_data_dir = os.environ['GTDB_DIR'] except KeyError: wprint(color_text("The environment variable 'GTDB_DIR' does not seem to be set :(", "red")) - wprint("This shouldn't happen, check on things with `bit-data-locations check`.") + wprint("This shouldn't happen, check on things with `bit data-locations check`.") sys.exit(1) return gtdb_data_dir diff --git a/bit/modules/ncbi/get_ncbi_assembly_data.py b/bit/modules/ncbi/get_ncbi_assembly_data.py index ea266cd..df3e7f7 100644 --- a/bit/modules/ncbi/get_ncbi_assembly_data.py +++ b/bit/modules/ncbi/get_ncbi_assembly_data.py @@ -13,7 +13,7 @@ def check_ncbi_assembly_info_location_var_is_set(): ncbi_assembly_data_dir = os.environ['NCBI_assembly_data_dir'] except: wprint(color_text("The environment variable 'NCBI_assembly_data_dir' does not seem to be set :(", "yellow")) - wprint("This shouldn't happen, check on things with `bit-data locations check`.") + wprint("This shouldn't happen, check on things with `bit data locations check`.") print("") sys.exit(0) @@ -86,7 +86,7 @@ def get_ncbi_assembly_data(force_update=False, quiet=False): if not quiet: report_message(f"Assembly data already present at:") print(f" {ncbi_dir}") - report_message(f"Run `bit-data get ncbi-assembly-data -f` if you want to re-download/update it.") + report_message(f"Run `bit data get ncbi-assembly-data -f` if you want to re-download/update it.") print() return return diff --git a/bit/modules/ncbi/get_ncbi_tax_data.py b/bit/modules/ncbi/get_ncbi_tax_data.py index 3230fc4..10d426b 100644 --- a/bit/modules/ncbi/get_ncbi_tax_data.py +++ b/bit/modules/ncbi/get_ncbi_tax_data.py @@ -10,7 +10,7 @@ def check_tax_location_var_is_set(): ncbi_tax_data_dir = os.environ['TAXONKIT_DB'] except KeyError: wprint(color_text("The environment variable 'TAXONKIT_DB' does not seem to be set :(", "red")) - wprint("This shouldn't happen, check on things with `bit-data locations check`.") + wprint("This shouldn't happen, check on things with `bit data locations check`.") print("") sys.exit(1) @@ -65,7 +65,7 @@ def get_ncbi_tax_data(force_update=False, quiet = False): if not quiet: report_message(f"Tax data already present at:") print(f" {ncbi_data_dir}") - report_message(f"Run `bit-data get ncbi-tax-data -f` if you want to re-download/update it.") + report_message(f"Run `bit data get ncbi-tax-data -f` if you want to re-download/update it.") print() return return diff --git a/bit/smk/__init__.py b/bit/smk/__init__.py index 0ccaa16..8479f1e 100644 --- a/bit/smk/__init__.py +++ b/bit/smk/__init__.py @@ -1 +1 @@ -# this dir holds snakemake workflows utilized by bit scripts (*not* workflows pulled by `bit-get-workflow`) +# this dir holds snakemake workflows utilized by bit scripts (*not* workflows pulled by `bit get-workflow`) diff --git a/bit/smk/assemble.smk b/bit/smk/assemble.smk index 570f590..cd86a2c 100644 --- a/bit/smk/assemble.smk +++ b/bit/smk/assemble.smk @@ -165,7 +165,7 @@ else: """ printf "\nFiltering spades contigs for minimum length {params.min_contig_len}\n\n" >> {log} - bit-filter-fasta-by-length -i {input} --min-length {params.min_contig_len} -o {output} >> {log} + bit fasta filter-by-length -i {input} --min-length {params.min_contig_len} -o {output} >> {log} """ @@ -176,5 +176,5 @@ rule summarize_assemblies: output_dir + "/assembly-summaries.tsv" shell: """ - bit-summarize-assembly {input} -o {output} + bit summarize-assembly {input} -o {output} """ diff --git a/bit/tests/test_add_insertion.py b/bit/tests/test_add_insertion.py index 7cc62a3..b499124 100644 --- a/bit/tests/test_add_insertion.py +++ b/bit/tests/test_add_insertion.py @@ -347,7 +347,7 @@ def test_cli_basic_run(self, tmp_path): _write_fasta(ins, [("ins", "TTTT")]) cmd = [ - "bit-add-insertion", + "bit", "add-insertion", "-i", str(fa), "-I", str(ins), "-o", str(out), @@ -364,7 +364,7 @@ def test_cli_basic_run(self, tmp_path): def test_cli_help_exits_zero(self): import subprocess result = subprocess.run( - ["bit-add-insertion", "-h"], + ["bit", "add-insertion", "-h"], capture_output=True, text=True, ) assert result.returncode == 0 @@ -380,7 +380,7 @@ def test_cli_log_file(self, tmp_path): _write_fasta(ins, [("ins", "TTTT")]) cmd = [ - "bit-add-insertion", + "bit", "add-insertion", "-i", str(fa), "-I", str(ins), "-o", str(out), diff --git a/bit/tests/test_assemble.py b/bit/tests/test_assemble.py index cc210d2..2396737 100644 --- a/bit/tests/test_assemble.py +++ b/bit/tests/test_assemble.py @@ -17,7 +17,7 @@ def test_assemble(tmp_path): out_dir = tmp_path / "out" cmd = [ - "bit-assemble", + "bit", "assemble", "-r", str(reads_dir), "-o", str(out_dir), "--run-fastp", diff --git a/bit/tests/test_cov_stats.py b/bit/tests/test_cov_stats.py index 1662147..cadc868 100644 --- a/bit/tests/test_cov_stats.py +++ b/bit/tests/test_cov_stats.py @@ -12,7 +12,7 @@ def test_get_cov_stats(tmp_path): out_prefix = tmp_path / "cov-stats" cmd = [ - "bit-cov-stats", + "bit", "cov-stats", "-r", str(test_fasta_path), "-b", str(test_bam_path), "-o", str(out_prefix), diff --git a/bit/tests/test_ez_screen.py b/bit/tests/test_ez_screen.py index a4dfd3c..c1be366 100644 --- a/bit/tests/test_ez_screen.py +++ b/bit/tests/test_ez_screen.py @@ -12,7 +12,7 @@ def test_ez_screen_assembly(tmp_path): out_prefix = tmp_path / "ez-screen" cmd = [ - "bit-ez-screen", + "bit", "ez-screen", "assembly", "-a", str(test_assembly_fasta), "-t", str(test_targets_fasta), @@ -43,7 +43,7 @@ def test_ez_screen_reads(tmp_path): shutil.copy(R2, reads_dir) cmd = [ - "bit-ez-screen", + "bit", "ez-screen", "reads", "-t", str(test_targets_fasta), "-r", str(reads_dir), diff --git a/bit/tests/test_gen_reads.py b/bit/tests/test_gen_reads.py index 4330bff..a7a8bf8 100644 --- a/bit/tests/test_gen_reads.py +++ b/bit/tests/test_gen_reads.py @@ -20,7 +20,7 @@ def test_gen_reads(tmp_path): shutil.copy(test_fasta, tmp_path / "input.fasta") cmd = [ - "bit-gen-reads", + "bit", "gen-reads", "-i", str(tmp_path / "input.fasta"), "-o", str(tmp_path / "perfect-reads"), "-n", "2", @@ -137,7 +137,7 @@ def test_gen_reads_single_end(tmp_path): shutil.copy(test_fasta, tmp_path / "input.fasta") cmd = [ - "bit-gen-reads", + "bit", "gen-reads", "-i", str(tmp_path / "input.fasta"), "-o", str(tmp_path / "se-reads"), "-n", "1", @@ -258,7 +258,7 @@ def test_long_reads_cli(tmp_path): shutil.copy(test_fasta, tmp_path / "input.fasta") cmd = [ - "bit-gen-reads", + "bit", "gen-reads", "-i", str(tmp_path / "input.fasta"), "-o", str(tmp_path / "long-reads"), "-n", "10", @@ -466,7 +466,7 @@ def test_coverage_cli_paired_end(tmp_path): shutil.copy(test_fasta, tmp_path / "input.fasta") cmd = [ - "bit-gen-reads", + "bit", "gen-reads", "-i", str(tmp_path / "input.fasta"), "-o", str(tmp_path / "cov-reads"), "-c", "10", @@ -488,7 +488,7 @@ def test_coverage_cli_single_end(tmp_path): shutil.copy(test_fasta, tmp_path / "input.fasta") cmd = [ - "bit-gen-reads", + "bit", "gen-reads", "-i", str(tmp_path / "input.fasta"), "-o", str(tmp_path / "cov-se-reads"), "-c", "10", @@ -511,7 +511,7 @@ def test_coverage_tsv_cli(tmp_path): cov_file.write_text(f"{str(tmp_path / 'input.fasta')}\t5\n") cmd = [ - "bit-gen-reads", + "bit", "gen-reads", "-i", str(tmp_path / "input.fasta"), "-o", str(tmp_path / "cov-tsv-reads"), "-c", str(cov_file), diff --git a/bit/tests/test_get_workflow.py b/bit/tests/test_get_workflow.py index 6a66e0d..4619825 100644 --- a/bit/tests/test_get_workflow.py +++ b/bit/tests/test_get_workflow.py @@ -115,7 +115,7 @@ def test_dl_wf_full_flow(mock_download, mock_check_dir, mock_get_versions): {"1.0.0": "metagenomics-wf-v1.0.0", "1.1.0": "metagenomics-wf-v1.1.0"} ) - # simulating input args (e.g., bit-get-wf metagenomics) + # simulating input args (e.g., bit get-workflow metagenomics) args = Namespace( workflow="metagenomics", wanted_version=None, diff --git a/bit/tests/test_summarize_assembly.py b/bit/tests/test_summarize_assembly.py index 0b83300..15e13df 100644 --- a/bit/tests/test_summarize_assembly.py +++ b/bit/tests/test_summarize_assembly.py @@ -9,7 +9,7 @@ def test_summarize_assembly(tmp_path): out_dir.mkdir() cmd = [ - "bit-summarize-assembly", + "bit", "summarize-assembly", str(test_assembly), "-o", f"{out_dir}/test.tsv" ] diff --git a/bit/tests/test_table.py b/bit/tests/test_table.py index 675df92..d0e70d9 100644 --- a/bit/tests/test_table.py +++ b/bit/tests/test_table.py @@ -123,7 +123,7 @@ def test_no_matches_removes_output(filter_table_tsv, tmp_path): def test_cli_filter_basic(filter_table_tsv, wanted_col1, tmp_path): out = tmp_path / "out.tsv" - run_cli(["bit-table", "filter", "-i", str(filter_table_tsv), "-w", str(wanted_col1), "-o", str(out)]) + run_cli(["bit", "table", "filter", "-i", str(filter_table_tsv), "-w", str(wanted_col1), "-o", str(out)]) lines = out.read_text().splitlines() assert lines[0] == "id\tvalue\tcategory" assert len(lines) == 3 @@ -132,7 +132,7 @@ def test_cli_filter_basic(filter_table_tsv, wanted_col1, tmp_path): def test_cli_filter_column_flag(filter_table_tsv, wanted_col3, tmp_path): out = tmp_path / "out.tsv" - run_cli(["bit-table", "filter", "-i", str(filter_table_tsv), "-w", str(wanted_col3), "-o", str(out), "-c", "3"]) + run_cli(["bit", "table", "filter", "-i", str(filter_table_tsv), "-w", str(wanted_col3), "-o", str(out), "-c", "3"]) lines = out.read_text().splitlines() assert len(lines) == 3 assert all(l.split("\t")[2] == "alpha" for l in lines[1:]) @@ -204,7 +204,7 @@ def test_restore_zero_columns(): def test_cli_cpm(norm_table, tmp_path): out = tmp_path / "out_cpm.tsv" - run_cli(["bit-table", "normalize", "-i", str(norm_table), "-n", "CPM", "-o", str(out)]) + run_cli(["bit", "table", "normalize", "-i", str(norm_table), "-n", "CPM", "-o", str(out)]) result = pd.read_csv(out, sep="\t", index_col=0) expected = {"g1": 166666.66666666666, "g2": 333333.3333333333, "g3": 500000.0} for gene, val in expected.items(): @@ -214,7 +214,7 @@ def test_cli_cpm(norm_table, tmp_path): def test_cli_mr(norm_table, tmp_path): out = tmp_path / "out_mr.tsv" - run_cli(["bit-table", "normalize", "-i", str(norm_table), "-n", "MR", "-o", str(out)]) + run_cli(["bit", "table", "normalize", "-i", str(norm_table), "-n", "MR", "-o", str(out)]) result = pd.read_csv(out, sep="\t", index_col=0) expected = {"g1": 13.572088082974535, "g2": 27.14417616594907, "g3": 40.71626424892361} for gene, val in expected.items(): @@ -264,7 +264,7 @@ def test_detect_header_no_header_named_column_fails(): # ── summarize-column cli tests ──────────────────────────────────────────────── def test_summarize_first_column_from_stdin(summarize_table): - result = run_cli(["bit-table", "summarize-column", "-c", "1"], input=summarize_table.read_text()) + result = run_cli(["bit", "table", "summarize-column", "-c", "1"], input=summarize_table.read_text()) lines = parse_summary_lines(result.stdout) assert lines[0] == " Column '1' summary" stats = {"N:": "3", "Min:": "2", "Max:": "5", "Sum:": "10", "Mean:": "3", "Median:": "3", "StDev:": "1.25"} @@ -275,7 +275,7 @@ def test_summarize_first_column_from_stdin(summarize_table): def test_summarize_second_column_by_name(summarize_table): - result = run_cli(["bit-table", "summarize-column", str(summarize_table), "-c", "lee"]) + result = run_cli(["bit", "table", "summarize-column", str(summarize_table), "-c", "lee"]) lines = parse_summary_lines(result.stdout) assert lines[0] == " Column 'lee' summary" stats = {"N:": "3", "Min:": "20", "Max:": "40", "Sum:": "90", "Mean:": "30", "Median:": "30", "StDev:": "8.16"} diff --git a/conda-recipe/bit-subcommands-tab-completion.sh b/conda-recipe/bit-subcommands-tab-completion.sh index 927e073..d3fc730 100644 --- a/conda-recipe/bit-subcommands-tab-completion.sh +++ b/conda-recipe/bit-subcommands-tab-completion.sh @@ -1,32 +1,13 @@ #!/usr/bin/env bash -### handling setting up of argcomplete for bit commands with subcommands ### +### tab completion for the `bit` command via argcomplete ### -ARGCOMPLETE_COMMANDS=( - bit-data - bit-ez-screen - bit-fasta - bit-genbank - bit-go - bit-itol - bit-kraken2 - bit-lineage - bit-table -) - -# checking interactive shell +# only activate in interactive shells case "$-" in *i*) ;; *) return 0 2>/dev/null || exit 0 ;; esac -# checking for argcomplete helper -if ! command -v register-python-argcomplete >/dev/null 2>&1; then - return 0 2>/dev/null || exit 0 +if command -v register-python-argcomplete >/dev/null 2>&1; then + eval "$(register-python-argcomplete bit)" fi - -for cmd in "${ARGCOMPLETE_COMMANDS[@]}"; do - if command -v "$cmd" >/dev/null 2>&1; then - eval "$(register-python-argcomplete "$cmd")" - fi -done diff --git a/conda-recipe/meta.yaml b/conda-recipe/meta.yaml index 31ef157..8338dd4 100644 --- a/conda-recipe/meta.yaml +++ b/conda-recipe/meta.yaml @@ -53,7 +53,7 @@ requirements: test: commands: - bit-cov-analyzer -v + bit -v about: home: https://github.com/AstrobioMike/bit diff --git a/dev-setup.sh b/dev-setup.sh index ee6a862..002737c 100755 --- a/dev-setup.sh +++ b/dev-setup.sh @@ -22,12 +22,12 @@ BIN_DIR=$(dirname $(which python)) # removing stale symlinks that point into bit/scripts/ # (these conflict with pyproject.toml entry points) -for f in ${BIN_DIR}/*; do - target=$(readlink "$f" 2>/dev/null) - if [[ "$target" == *"bit/scripts"* ]]; then - rm -f "$f" - fi -done +# for f in ${BIN_DIR}/*; do +# target=$(readlink "$f" 2>/dev/null) +# if [[ "$target" == *"bit/scripts"* ]]; then +# rm -f "$f" +# fi +# done pip install --no-build-isolation -e . @@ -43,25 +43,28 @@ pip install --no-build-isolation -e . # done # setting up tab-completion for the bit commands with subcommands -ARGCOMPLETE_COMMANDS=( - bit-data - bit-ez-screen - bit-fasta - bit-genbank - bit-go - bit-itol - bit-kraken2 - bit-lineage - bit-table -) - -for cmd in "${ARGCOMPLETE_COMMANDS[@]}"; do - if command -v "$cmd" >/dev/null 2>&1; then - eval "$(register-python-argcomplete "$cmd")" - fi -done +# ARGCOMPLETE_COMMANDS=( +# bit +# bit-data +# bit-ez-screen +# bit-fasta +# bit-genbank +# bit-go +# bit-itol +# bit-kraken2 +# bit-lineage +# bit-table +# ) +# for cmd in "${ARGCOMPLETE_COMMANDS[@]}"; do +# if command -v "$cmd" >/dev/null 2>&1; then +# eval "$(register-python-argcomplete "$cmd")" +# fi +# done +if command -v register-python-argcomplete >/dev/null 2>&1; then + eval "$(register-python-argcomplete bit)" +fi ## if changing conda versions and wanting to install locally entirely (rather than using a prior official conda install of bit) # conda build -c conda-forge -c bioconda conda-recipe/ diff --git a/pyproject.toml b/pyproject.toml index 90e0afd..fa6e174 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -20,121 +20,4 @@ include-package-data = true bit = ["tests/data/*", "smk/*.smk", "smk/envs/*"] [project.scripts] - bit = "bit.cli.bit:main" - - -## ncbi/gtdb-related ## -bit-dl-ncbi-assemblies = "bit.cli.dl_ncbi_assemblies:main" -bit-get-accs-from-gtdb = "bit.cli.get_accessions_from_gtdb:main" - - -## coverage / mapping ## -bit-cov-analyzer = "bit.cli.cov_analyzer:main" -bit-cov-stats = "bit.cli.cov_stats:main" -bit-mapped-reads-pid = "bit.cli.mapped_reads_pid:main" - # think about combining this with bit-cov-stats and have a flag for writting out all read pids (or more info per read) - - -## sequence manipulation/read generation ## -bit-gen-reads = "bit.cli.gen_reads:main" -bit-mutate-seqs = "bit.cli.mutate_seqs:main" -bit-add-insertion = "bit.cli.add_insertion:main" - # maybe put these under bit-seq or bit-simulate with subcommands: - # gen-reads - # mutate-seqs - # add-insertion - - -## sequence searching ## -bit-ez-screen = "bit.cli.ez_screen:main" - - -## fasta utilities ## -bit-fasta = "bit.cli.fasta:main" - # subcommands: - # calc-gc - # calc-var-in-msa - # count - # extract-by-coords - # extract-by-headers - # extract-by-primers - # filter-by-length - # modify-headers - # remove-wraps - # to-bed - # to-genbank - - -## assembly ## -bit-assemble = "bit.cli.assemble:main" -bit-summarize-assembly = "bit.cli.summarize_assembly:main" - - -## taxonomy and lineage ## -bit-kraken2 = "bit.cli.kraken2:main" - # subcommands: - # tax-summary - # tax-plots -bit-lineage = "bit.cli.lineage:main" - # subcommands: - # from-taxids - # to-tsv - - -## table utilities ## -bit-table = "bit.cli.table:main" - # subcommands: - # colnames - # filter - # normalize - # summarize-column - - -## workflows ## -bit-get-workflow = "bit.cli.get_workflow:main" - # genome-summarize - # metagenomics - # SRA download - - -## functional-annotation helpers ## -bit-filter-ko-results = "bit.cli.filter_kofamscan_results:main" -bit-go = "bit.cli.go:main" - # subcommands: - # get-term-info - # summarize-annotations - # combine-summaries - # slim-terms - - -## genbank manipulations ## -bit-genbank = "bit.cli.genbank:main" - # subcommands: - # to-fasta - # to-AA-seqs - # to-cds-tsv - # to-cds-seqs - - -## iToL helpers ## -bit-itol = "bit.cli.itol:main" - # subcommands: - # binary-dataset - # colorstrip - # map - # text-dataset - - -## bit-database related ## -bit-data = "bit.cli.data:main" - # subcommands: - # locations - # check - # set - # get - # ncbi-assembly-data - # ncbi-tax-data - # go-dbs - # gtdb-data - # test-data diff --git a/workflows/genome-summarize-wf/README.md b/workflows/genome-summarize-wf/README.md index ee1bb23..e6a7224 100644 --- a/workflows/genome-summarize-wf/README.md +++ b/workflows/genome-summarize-wf/README.md @@ -40,7 +40,7 @@ _bit_ should be installed via conda as described [here](https://github.com/Astro ### Retrieving the worklfow ```bash -bit-get-workflow genome-summarize +bit get-workflow genome-summarize ``` ### Modifying the config.yaml @@ -67,6 +67,6 @@ See `snakemake -h` for more options and details. --- ## Version info -Note that the workflows are version independently of the _bit_ package. When you pull one with `bit-get-workflow`, the directory name will have the version, and it is also listed at the top of the Snakefile. +Note that the workflows are version independently of the _bit_ package. When you pull one with `bit get-workflow`, the directory name will have the version, and it is also listed at the top of the Snakefile. All versions of programs used can be found in their corresponding conda yaml file in the envs/ directory. diff --git a/workflows/genome-summarize-wf/Snakefile b/workflows/genome-summarize-wf/Snakefile index b7ef132..fee65fc 100644 --- a/workflows/genome-summarize-wf/Snakefile +++ b/workflows/genome-summarize-wf/Snakefile @@ -119,7 +119,7 @@ if config["is_euk"]: assembly_extension = config["assembly_extension"] resources: cpus = config["threads"], - mem_mb = config["CAT_memory_resources"] + mem_mb = config["CAT_memory_resources"] log: config["logs_dir"] + "{genome_ID}-CAT.log" output: @@ -291,7 +291,7 @@ rule setup_checkm2_db: # so will be set when the conda environment is started from now on mkdir -p ${{CONDA_PREFIX}}/etc/conda/activate.d/ echo 'export CHECKM2DB={params.checkm2_db_dir}/{params.checkm2_db_filename}' >> ${{CONDA_PREFIX}}/etc/conda/activate.d/set_env_vars.sh - + # downloading ref db (will move to where we want it next) checkm2 database --download > {log} 2>&1 diff --git a/workflows/metagenomics-wf/README.md b/workflows/metagenomics-wf/README.md index 7d7fb2c..bdc2c05 100644 --- a/workflows/metagenomics-wf/README.md +++ b/workflows/metagenomics-wf/README.md @@ -43,7 +43,7 @@ _bit_ should be installed via conda as described [here](https://github.com/Astro ### Retrieving the worklfow ```bash -bit-get-workflow metagenomics +bit get-workflow metagenomics ``` ### Creating the input file and modifying the config.yaml @@ -109,6 +109,6 @@ In the directory the workflow was executed from: a `logs/` directory will hold l --- ## Version info -Note that the workflows are version independently of the _bit_ package. When you pull one with `bit-get-workflow`, the directory name will have the version, and it is also listed at the top of the Snakefile. +Note that the workflows are version independently of the _bit_ package. When you pull one with `bit get-workflow`, the directory name will have the version, and it is also listed at the top of the Snakefile. All versions of programs used can be found in their corresponding conda yaml file in the envs/ directory. diff --git a/workflows/metagenomics-wf/Snakefile b/workflows/metagenomics-wf/Snakefile index f879217..6de8611 100644 --- a/workflows/metagenomics-wf/Snakefile +++ b/workflows/metagenomics-wf/Snakefile @@ -80,7 +80,7 @@ else: dirs_to_create = [config["outputs_dir"], fastqc_out_dir, filtered_reads_dir, genes_dir, annotations_and_tax_dir, mapping_dir, - combined_outputs_dir, logs_dir, config["REF_DB_ROOT_DIR"], + combined_outputs_dir, logs_dir, config["REF_DB_ROOT_DIR"], benchmarks_dir] @@ -153,7 +153,7 @@ rule summarize_MAG_KO_annots_with_KEGG_Decoder: # KEGGDecoder splits on the first underscore to identify unique genome/MAG IDs # this can be problematic with how things are named, so we are swapping them all to not have - # any "_" first, then afterwards we are changing the output table back to the original names so + # any "_" first, then afterwards we are changing the output table back to the original names so # they match elsewhere (they will still be slightly different in the html output, but that is # only manually explored anyway) @@ -403,7 +403,7 @@ rule gtdbtk_on_MAGs: rule filter_checkm_results_and_copy_MAGs: - """ + """ Filters checkm results based on est. completion, redundancy, and strain heterogeneity set in 'config.yaml' Defaults are conservatively 90, 10, and 50 """ @@ -421,7 +421,7 @@ rule filter_checkm_results_and_copy_MAGs: benchmark: benchmarks_dir + "filtering_checkm_results_and_copying_MAGs-benchmarks.tsv" shell: - """ + """ # only running if there were bins recovered if [ $(find {params.bins_dir} -name "*.fasta" | wc -l | sed 's/^ *//') -gt 0 ]; then @@ -597,7 +597,7 @@ rule metabat_binning: jgi_summarize_bam_contig_depths --outputDepth {output.depth_file} --percentIdentity 97 --minContigLength 1000 --minContigDepth 1.0 --referenceFasta {input.assembly} {input.bam} > {log} 2>&1 # only running if there are contigs with coverage information in the coverage file we just generated - if [ $(wc -l {output.depth_file} | sed 's/^ *//' | cut -f 1 -d " ") -gt 1 ]; then + if [ $(wc -l {output.depth_file} | sed 's/^ *//' | cut -f 1 -d " ") -gt 1 ]; then metabat2 --inFile {input.assembly} --outFile {params.prefix} --abdFile {output.depth_file} -t {resources.cpus} >> {log} 2>&1 else printf "\n\nThere was no coverage info generated in {output.depth_file}, so no binning with metabat was performed.\n\n" >> {log} @@ -605,7 +605,7 @@ rule metabat_binning: # changing extensions from .fa to .fasta to match nt fasta extension elsewhere in GeneLab find {params.bins_dir} -name {wildcards.ID}*.fa > {params.tmp_bins_file} - + if [ -s {params.tmp_bins_file} ]; then paste -d " " <( sed 's/^/mv /' {params.tmp_bins_file} ) <( sed 's/.fa/.fasta/' {params.tmp_bins_file} ) > {params.tmp_rename_script} bash {params.tmp_rename_script} @@ -662,7 +662,7 @@ rule make_combined_gene_level_tables: rule combine_contig_tax_and_coverage: """ - This rule combines the contig-level taxonomic and coverage information for each individual sample. + This rule combines the contig-level taxonomic and coverage information for each individual sample. """ input: cov = mapping_dir + "{ID}-contig-coverages.tsv", @@ -1058,7 +1058,7 @@ rule summarize_assemblies: conda: "envs/bit.yaml" input: - expand(assemblies_dir + "{ID}-assembly.fasta", ID = sample_ID_list) + expand(assemblies_dir + "{ID}-assembly.fasta", ID = sample_ID_list) output: assemblies_dir + config["additional_filename_prefix"] + f"assembly-summaries.tsv" benchmark: @@ -1284,14 +1284,14 @@ if config["single_end_data"] != "TRUE": shell: """ multiqc -q -n {params.out_filename_prefix} --force --cl-config 'max_table_rows: 99999999' --interactive --config {params.config_file} {input} > /dev/null 2>&1 - + # removing the individual fastqc files rm -rf {params.reads_dir}*fastqc* # making an output report directory and moving things into it mkdir -p {params.int_out_dir} mv {params.int_html_file} {params.int_out_data_dir} {params.int_out_dir} - + # zipping and removing unzipped dir zip -q -r {params.int_zip} {params.int_out_dir} && rm -rf {params.int_out_dir} @@ -1377,14 +1377,14 @@ else: shell: """ multiqc -q -n {params.out_filename_prefix} --force --cl-config 'max_table_rows: 99999999' --interactive --config {params.config_file} {input} > /dev/null 2>&1 - + # removing the individual fastqc files rm -rf {params.reads_dir}*fastqc* # making an output report directory and moving things into it mkdir -p {params.int_out_dir} mv {params.int_html_file} {params.int_out_data_dir} {params.int_out_dir} - + # zipping and removing unzipped dir zip -q -r {params.int_zip} {params.int_out_dir} && rm -rf {params.int_out_dir} @@ -1460,7 +1460,7 @@ rule setup_checkm2_db: # so will be set when the conda environment is started from now on mkdir -p ${{CONDA_PREFIX}}/etc/conda/activate.d/ echo 'export CHECKM2DB={params.checkm2_db_dir}/{params.checkm2_db_filename}' >> ${{CONDA_PREFIX}}/etc/conda/activate.d/set_env_vars.sh - + # downloading ref db (will move to where we want it next) checkm2 database --download > {log} 2>&1 @@ -1499,7 +1499,7 @@ rule setup_KOFamScan_db: mkdir -p {params.ko_db_dir} printf "### Setting up KOFamScan reference database ###\n\n" > {log} 2>&1 - + # using https instead of ftp for those whose systems that don't have access to the ftp servers printf "\n Downloading ko_list file:\n\n" >> {log} 2>&1 diff --git a/workflows/sra-download-wf/README.md b/workflows/sra-download-wf/README.md index 12660d4..ebe4d45 100644 --- a/workflows/sra-download-wf/README.md +++ b/workflows/sra-download-wf/README.md @@ -25,7 +25,7 @@ _bit_ should be installed via conda as described [here](https://github.com/Astro ### Retrieving the worklfow ```bash -bit-get-workflow sra-download +bit get-workflow sra-download ``` ### Creating the input file and modifying the config.yaml @@ -81,6 +81,6 @@ Note that by default the original files will be removed after they are combined --- ## Version info -Note that the workflows are versioned independently of the _bit_ package. When you pull one with `bit-get-workflow`, the directory name will have the version, and it is also listed at the top of the Snakefile. +Note that the workflows are versioned independently of the _bit_ package. When you pull one with `bit get-workflow`, the directory name will have the version, and it is also listed at the top of the Snakefile. All versions of programs used can be found in their corresponding conda yaml file in the envs/ directory.