exonBlocks

Map cell barcode + UMI-level read blocks to overlapping exons. This repository provides utilities to extract read blocks from BAMs and assign each block (expanded from semicolon-delimited fields) to annotated exons.

Requirements

R (>= 4.0)
R Packages: data.table, dplyr, tidyr, testthat
Access to an indexed BAM file
htslib
- Mac: brew install htslib
- Linux: check details how to compile from src at https://github.com/samtools/htslib

Installation

Install required packages in R:

if(!requireNamespace("data.table", quietly = TRUE)) install.packages("data.table")
if(!requireNamespace("dplyr", quietly = TRUE)) install.packages("dplyr")
if(!requireNamespace("tidyr", quietly = TRUE)) install.packages("tidyr")
devtools::install_github("https://github.com/yuw444/exonBlocks")

Common Error

Couldn't find the installed htslib
- scan_core.c:3:10: fatal error: htslib/hts.h: No such file or directory 3 | #include <htslib/hts.h>
- Could not find htslib headers. Set HTSLIB_DIR or install via pkg-config/Homebrew/Conda
Solution:
- Reinstall htslib, make sure the following command works in the terminal $ which htsfile

Example usage

The following example is adapted for the manuscript analyzing Tnfrsf9 (CD137) exons in single-cell RNA-seq data. It:

extracts read blocks overlapping a genomic interval into a TSV (extract_exon_reads_hts),
expands semicolon-delimited block fields and maps blocks to exons (cb_umi_exons),
summarizes exon assignments per (CB, UMI).

library(exonBlocks)
library(dplyr)

chrom <- "4"
exon_start <- 150920155
exon_end <- 150946102
bam <- "/scratch/g/chlin/Yu/CD137/data/AI/possorted_genome_bam.bam"
block_tsv <- "AI.tsv"

# extract read blocks overlapping the region -> writes AI.tsv
temp <- extract_exon_reads_hts(
  bam = bam,
  chr = chrom,
  start = exon_start,
  end = exon_end,
  out_bam = NULL,
  tsv = block_tsv
)

# map expanded blocks to annotated exons
df <- cb_umi_exons(
  tsv = block_tsv,
  meta_exons = "/scratch/g/chlin/Yu/exonBlocks/Tnfrsf9_exons_annotated.csv"
)

# summarize exon ids per (CB, UMI)
df_sum <- df %>%
  filter(!is.na(exon)) %>%
  group_by(CB, UMI, .drop = TRUE) %>%
  summarise(exon_ids = paste(sort(unique(exon)), collapse = ";"))

table(df_sum$exon_ids)

Notes

cb_umi_exons expects block_start, block_end, block_seq (semicolon-delimited allowed) and CB, UMI columns in the blocks TSV.
meta exons file must contain columns: chr, start, end, exon.
The implementation uses tidyr::separate_rows to expand semicolon lists and data.table::foverlaps to detect overlaps.
Installation has been tested on Macos/Linux successfully, any feedback on Windows installation would be appreciated

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
R		R
man		man
src		src
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CRAN-SUBMISSION		CRAN-SUBMISSION
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md
cran-comments.md		cran-comments.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

exonBlocks

Requirements

Installation

Common Error

Example usage

Notes

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

exonBlocks

Requirements

Installation

Common Error

Example usage

Notes

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages