MOSTAR is comprehensive and complete bioinformatics pipeline for downstream analysis of whole-genome Oxford Nanopore sequencing data (ONT-reads). The pipeline constructs highly-polished genomes (using hybrid- or non-hybrid assembly), in addition to performing functional annotation, AMR profiling, ICE detection, and taxonomic classification — with built-in quality controls and an interactive HTML report. The name Mostar is inspired by the historic Stari Most (Old Bridge) of Mostar, a symbol of connection and cultural resilience.
MOSTAR has been developed and tested on S. aureus, B. fragilis, as well as H. influenzae strains, but will work with any bacteria, as long as the correct genome size and ONT model are specified. The pipeline contains some of the most well known tools in bioinformatics, and is designed to be a "one-stop shop" for most bacterial analysis. Finally the pipeline provides results and log files from every included tool.
When Illumina short reads are provided, MOSTAR leverages them during the long-read pre-processing phase to guide Filtlong in selecting ONT reads with the highest k-mer consistency relative to the high-accuracy short-read data. This ensures the assembly begins with the most reliable and representative long-read subset, directly improving contiguity and reducing the introduction of systematic errors before assembly begins. In ONT-only mode, quality filtering proceeds using read length and quality score thresholds without short-read guidance.
MOSTAR supports both ONT-only and hybrid assembly modes through a tiered polishing strategy. All assemblies undergo ONT-based polishing with Medaka to correct homopolymer errors and indels characteristic of nanopore sequencing. In hybrid mode, a second polishing pass with Polypolish uses per-read, multi-alignment short-read mappings to further resolve errors in repeat regions that single-mapping approaches cannot correct.
MOSTAR provides an end-to-end biological characterisation of each assembled genome within a single run. It automatically identifies the organism to species or subspecies level via Kraken2, profiles the complete resistome with NCBI AMRFinder+, and evaluates genomic plasticity through geNomad plasmid and provirus detection. The pipeline optionally performs full functional annotation via Bakta, enabling downstream comparative genomics and submission-ready genome records.
By tightly integrating NCBI AMRFinder+ with MacSyFinder CONJScan, MOSTAR cross-references the physical location of resistance genes with conjugative transfer machinery. This allows the pipeline to distinguish between fixed chromosomal resistance — which poses a contained clinical risk — and resistance elements embedded within active Integrative and Conjugative Elements (ICEs), which are capable of horizontal transfer to naive recipient strains. This distinction provides a substantially more accurate assessment of horizontal gene transfer potential and the true epidemiological threat posed by the isolate.
MOSTAR includes dedicated detection of Integrative and Conjugative Elements using MacSyFinder with the CONJScan model database. ICE boundaries are resolved to exact genomic coordinates and cross-referenced against the Bakta functional annotation to identify flanking genes, attachment sites, and associated cargo. Detected ICEs are rendered directly on the circular genome map with strand-aware orientation arrows, providing immediate visual context for their genomic position relative to resistance and virulence loci.
Every MOSTAR run produces a single, portable HTML report requiring no server or external dependencies. The report includes circular genome maps with zoomable, cursor-tracked pan functionality, a colour-coded complete resistome table, prophage annotations, mobile resistome cross-reference, and a full software version manifest for reproducibility. All genome maps are embedded as base64-encoded images, ensuring the report remains fully self-contained for sharing and archiving.
Long-read quality trimming (Filtlong) De novo assembly (Flye) ONT consensus polishing (Medaka) AMR profiling (AMRFinder+) Interactive HTML report
Short-read quality trimming (Fastp) Short-read alignment to ONT consensus (BWA-MEM) Short-read polishing (Polypolish)
Taxonomic classification (Kraken2 / EMU) Functional annotation (Bakta) ICE detection — Integrative and Conjugative Elements (MacSyFinder / CONJScan) Plasmid-borne AMR cross-referencing (geNomad + AMRFinder+) Prophage detection and localisation (geNomad)
A successful run will contain the following output, including the final polished fasta, HTML-report, as well as individual output files and logs from all the included tools.
Output_folder |- amr_results | |- maps/ (Contains high-res .png circular genome maps) | |- AMR_Report.tsv |- annotation |- flye |- ice_detection |- annotation |- flye |- ice_detection |- intermediate |- logs |- medaka |- taxonomy |- amr_summary.html |- MOSTAR_Final_Report.html |- MOSTAR_Assembly.fasta
The installation has been designed to be as simple as possible. The included YML will create a separate environment with all the required dependencies. The only manual step is downloading and configuring databases. For some systems geNomad may become a dependency issue, if you encounter issues, follow the steps below.
# Download the repository
git clone https://github.com/nermze/mostar.git
# Change to MOSTAR dir
cd mostar
# Create mostar_env using supplied YML
conda env create -f environment.yml -v
conda activate mostar_env
# Install MOSTAR
python -m pip install .
# Test the install
mostar --help
# If you encounter installation pronblems, first remove geNomad from the YML, then do
conda env create -f environment.yml -v
conda install -c conda-forge -c bioconda genomad
# Use micromamba to install (much faster)
conda install micromamba
micromamba env create -f environment.yml -v
micromamba activate mostar_env
pip install .# Activate env (if not activated)
conda activate mostar-env
# Download AMRFinder+ database:
amrfinder -u
# Download bakta database (Specify light or full)
bakta_db download --output <output-path> --type [light|full]
# Download Kraken2 database
# To download the small pre-built db (any Kraken2 compatible DB will also work)
mkdir -p ~/kraken2_db && cd ~/kraken2_db
wget https://genome-idx.s3.amazonaws.com/kraken/k2_pluspf_08gb_20240904.tar.gz
tar -xvzf k2_pluspf_08gb_20240904.tar.gz
# Download geNomad database in current directory, approx 1.5Gb
genomad download-database .
# Download standard EMU database
# The pipeline will auto-download the EMU-db if --emu-db is specified.
# If the automatic download fails, use the steps below
pip install osfclient
export EMU_DATABASE_DIR=<path_to_database>
cd ${EMU_DATABASE_DIR}
osf -p 56uf7 fetch osfstorage/emu-prebuilt/emu.tar
tar -xvf emu.tar# Required:
* ONT-reads
* Genome size
* Model
* Output
# Run MOSTAR in ONT-only mode:
mostar --ont ont.fq.gz --genome-size [size] --output [dir] --model [model]
# Run MOSTAR in Hybrid mode:
mostar --ont ont.fq.gz --genome-size [size] --output [dir] --model [model] --r1 R1.fq --r2 R2.fq
# The "Everything" Run (Taxonomy, Annotation, ICE, and Plasticity/Prophages):
mostar --ont ont_read.fastq.gz --r1 read1.fastq.gz --r2 read2.fastq.gz \
--genome-size 1.9m --output Output \
--kraken2-db kraken2_db_path \
--bakta-db db-light_path --ice \
--genomad-db genomad_db_path --plasticity| Required | Tool/Name | Description |
|---|---|---|
--ont |
ONT Reads | Nanopore long-reads (.fastq.gz) |
--genome-size |
Genome Size | Estimated size (e.g., 2.1m, 500k) |
--output |
Output | Directory name for output files |
--model |
Model | Default: r1041_e82_400bps_sup_v5.2.0) |
| Options | ||
--r1/--r2 |
Illumina | Forward & Reverse short-reads (.fastq.gz) |
--organism |
AMRFinder+ | Organism (e.g., Escherichia, Staphylococcus) |
--meta |
Flye | Enable Meta-Genome mode, omit --genome-size [Default: disabled] |
| Annotation | ||
--bakta-db |
Bakta | Path to Bakta database |
--bakta-ref |
Bakta | Annotation reference sequence (.gff) |
--complete |
Bakta | Enable if sequence is complete (circular) [Default: disabled] |
| Mobile element Detection | ||
--ice |
MacSyFinder | Use with --bakta-db [Default: disabled] |
--plasticity |
geNomad | Plasticity and prophage tracker [Default: disabled] |
| Classification | ||
--kraken2-db |
Kraken2 | Requires path to pre-built Kraken2 database" |
--confidence |
Kraken2 | Kraken2 confidence threshold [Default: 0.1 |
--emu-db |
EMU | Requires EMU database path, auto-download [16s Amplicon classifier] |
| Other | ||
--cleanup |
Cleanup | Delete intermediate files |
--threads |
Threads | Select number of threads |
--help/-h |
Help | Show help menu |
The report features key run-metrics, including assembly statistics and number of contigs. The report is dynamic and will adapt to user input, as some of the tools like taxonomy and short-read polishing are optional.
The report will also draw interactive genome maps, with visualization of AMR-gene locations, direction, detected ICE, and GC-content.
If ICE detection has been enabled, the pipeline will extract coordinates from the annotation file, and display the results.
Finaly the report willl also feature a detailed AMR table.
1. Fastp 2. Flye 3. Medaka 4. BWA 5. AMRFinder+ 6. Bakta 7. Polypolish 8. Filtlong 9. Samtools 10. Minimap2 11. Kraken2 12. EMU 13. MacSyFinder 14. geNomad 15. Python3 (
If --model is not specified, MOSTAR defaults to r1041_e82_400bps_sup_v5.2.0, which corresponds to R10.4.1 flowcells basecalled with the Super Accuracy model at 400 bps. This default is appropriate for most modern ONT runs but must be changed if your data was generated on a different flowcell or basecalling configuration — using the wrong model is one of the most common causes of poor polishing outcomes. To list all models available in your Medaka installation, run: hmedaka tools list_models
The --ice module depends on protein sequences produced by Bakta to query the MacSyFinder CONJScan database. If --bakta-db is not provided, Bakta annotation is skipped and no .faa file will be produced, causing ICE detection to be silently bypassed. Always pair --ice with --bakta-db to ensure this module runs. If you see the warning No protein file found — skipping ICE detection, this is the cause.
When --kraken2-db is provided, MOSTAR uses the top-confidence Kraken2 hit to identify the organism and passes it to AMRFinder+ as the --organism flag, enabling species-specific point mutation screening in addition to gene-based resistance detection. Point mutation models are only available for a subset of clinically relevant organisms. If your organism is not supported, AMRFinder+ will still run in gene-detection mode without point mutations. To see all supported organisms, run: amrfinder --list_organisms If you know your organism and want to override automatic detection, or if you are running without a Kraken2 database, use: --organism Klebsiella Leave --organism unset if the organism is unknown — AMRFinder+ will still provide a complete gene-level resistome profile.
If your assembly is fragmented, missing expected genomic features, or producing an unusually high contig count, your sample may have uneven read depth — common in direct clinical extractions, environmental samples, mixed cultures, or plasmid-enriched preps. Re-run with the --meta flag to enable Flye's uneven-coverage assembly mode, which does not assume uniform depth across the genome: mostar --ont reads.fq.gz --genome-size 5m --output outdir --meta Note that --meta mode disables some of Flye's coverage-based error correction, so it should only be used when standard assembly fails or produces poor results.
If the hybrid polishing step reports mean read depth: 0.0x across all contigs, your Illumina reads are likely incomplete, truncated, or mismatched to the assembly. Verify your R1/R2 files are complete and correctly paired before re-running. MOSTAR validates that these files exist and are non-empty at startup, but cannot detect partially downloaded or corrupted files. Check read counts with: hecho
If the specified --output directory already exists from a previous run, MOSTAR will write into it and overwrite existing files without warning. If you want to preserve a previous result, rename the output directory before re-running or specify a new output path.
In hybrid mode, if the Medaka and Final assembly statistics are identical, Polypolish ran but made no changes. This is expected when short-read coverage is very low (typically below 5×) and does not indicate an error. Check logs/polypolish.log to confirm — the mean read depth per contig will be reported there.
Developed and maintained by Nermin Zecic (@nermze). For questions, bugs, or feature requests, please open an Issue.





