Skip to content

hillerlab/make_lastz_chains

Repository files navigation

make_lastz_chains

GitHub License

portable solution for generating pairwise genome alignment chains
The Hiller Lab at the Senckenberg Research Institute

format . chains . pipeline



Important

  • Softmask both genomes (lowercase, do NOT hardmask). RepeatModeler 2 per genome is recommended; add WindowMasker if you see runaway LASTZ runtimes.
  • Scaffold names: no spaces; avoid dots (rename NC_00000.1NC_00000)
  • Inputs accepted: .fasta or .2bit.
  • Container image: We offer a pre-built container image for the whole pipeline as well as individual modules. By default the pipeline runs with ghcr.io/hillerlab/make_lastz_chains:latest. Additional images can be found at containers and nextflow modules at core.
  • UCSC replacement: As of >=3.1.0, the pipeline uses chaintools, a Rust library to work with .chain files.

Usage

Note

Requirements: Nextflow ≥ 25.04.6, Docker or Apptainer, Java.

git clone https://github.com/hillerlab/make_lastz_chains.git
cd make_lastz_chains

Edit params.json (set reference_name, query_name, reference_genome, query_genome), then:

# Docker
nextflow run main.nf -params-file params.json -profile docker

# Apptainer / Singularity
nextflow run main.nf -params-file params.json -profile apptainer

Smoke test:

nextflow run main.nf -profile test,apptainer

Resume runs from checkpoints [fill_chains, clean_chains]:

# Restart after alignment but before filling chains [ 04_axtchain/merged_chains ]
nextflow run main.nf -profile <PROFILE> -params-file params.json \
    --from fill_chains \
    --merged_chain_path  results/04_axtchain/merged_chains/<CHAIN> 

# Restart afterf filling chains but before cleaning them [ 05_filled_chains ]
nextflow run main.nf -profile <PROFILE> -params-file params.json \
    --from clean_chains \
    --filled_chain_path  results/fill_chains/hg38.mm39.filled.chain.gz 

Note

You can also specify these options directly in params.json.

A helper sh script is provided to run the pipeline on a SLURM cluster. See details below.

Click to expand

Edit the path variables at the top of assets/scripts/run_nf_slurm_example.sh (cache dir, container image, manifest path), then submit:

sbatch --array=1-<N> run_nf_slurm_example.sh

Each array task spawns one Nextflow head job that submits all compute as child SLURM jobs.

LASTZ, AXT_CHAIN, and REPEAT_FILLER run as SLURM job arrays. Partition routing, array sizes, and resource tiers are documented inline in nextflow.config — edit there to match your cluster.


Output

results/
├── 00_genome_prep/      reference.2bit, query.2bit, *.chrom.sizes
├── 01_partition/        *_partitions.txt
├── 02_lastz_psl/        *.psl 
├── 03_concat_lastz_output/    *.psl.gz 
├── 04_axtchain/         *.chain
├─── • chain_antirepeat/ *.chain.gz
├─── • merged_chains/    *.all.chain.gz     ← checkpoint for --from fill_chains
├── 05_filled_chains/    *.filled.chain.gz  ← checkpoint for --from clean_chains
├── 06_cleaned_chains/   *.final.chain.gz
├── 07_final/            *.final.chain.gz   ← final output
└── pipeline_info/    timeline, trace, DAG

Where to edit

File What
params.json Genome paths, alignment settings, checkpoints — per run
nextflow.config Compute resources, profiles, container, SLURM — rarely

Citation

About

Portable solution to generate genome alignment chains using lastz

Topics

Resources

License

Stars

Watchers

Forks

Contributors