Statistics PhD by training, tool builder by compulsion. My papers propose scalable algorithms and prove theorems, and my side projects are tools I build to help other people get their work done faster. I find it hard to leave a solvable problem alone, and whenever I run into repetitive work, I'd rather automate it than let it eat into my time.
latex2arxiv: Submit to arXiv without the headache. One command cleans your LaTeX project, catches rejection-causing errors, and walks you through the upload.
Takes any LaTeX project (zip, directory, or git URL) and outputs a submission-ready zip. Prunes unreachable files, strips draft markup and revision commands, normalizes BibTeX, and runs pre-flight checks that surface errors arXiv silently fails on. Pass --guide and it writes a step-by-step upload walkthrough with copy-paste title/authors/abstract. Gate your paper repo on compliance with --dry-run in CI. Also ships as a VS Code extension and an MCP server so AI agents can run the full pipeline without leaving the chat.
Python CLI PyPI Homebrew VS Code GitHub Actions pre-commit MCP
academic-application-tracker: Local Streamlit dashboard that answers "what do I do today?" for academics juggling dozens of applications, deadlines, and recommendation letters.
Academic job searching is chaos: overlapping deadlines, multiple recommenders per position, materials checklists that differ by institution. I built the Streamlit dashboard that cuts through it: urgency-banded deadlines, per-position recommender state, materials readiness panel, interview log, and daily action items auto-computed. Try the live demo — no install, each session gets its own sandbox. 1000+ tests at 95% coverage, because I actually use it on my own applications.
Python Streamlit SQLite pytest Plotly
python-project-scaffold: Skip the 30-minute setup ritual and start at your first feature commit.
Every new Python project starts with the same 30-minute ritual: wire up ruff, pyright, pytest, CI matrix, coverage gate, pre-commit, Dependabot, ADRs... I automated all of it. One click on Use this template + one python3 scripts/init-project.py and you have a green-CI repo ready for your first feature. Ships with a /new-project Claude Code skill that creates the GitHub repo and sets up branch protection, because even the setup should be one command.
Python GitHub Actions Claude Code pre-commit
I've always enjoyed working on statistical problems that are mathematically challenging and scientifically motivated. I am in particular intrigued by problems where the data or the quantity of interest is combinatorial: some or all coordinates of the observation take values in a discrete, structured set rather than in Euclidean space. The loss of Euclidean geometry brings simultaneous challenges in probabilistic modeling, mathematical theory, and scalable computation. The central question organizing my research is: How can we develop Bayesian methodology with theoretical guarantees for problems that are combinatorial in structure, and how do those methods behave on real scientific data?
Three first-author papers:
- JCGS 2025 (published): blocked Gibbs sampler with anti-correlation Gaussian data augmentation; 23 to 67 times faster than NUTS (the industry-standard sampler) with a geometric ergodicity proof. Code: Anti-correlation-Gaussian.
- JASA (revision submitted): Bayesian regression over combinatorial response data via integer programming duality. Code: combinatorial-regression, a multi-language reproducibility pipeline (R + Rcpp inner loops, JAX/NumPyro baselines, Makefile-orchestrated).
- Bernoulli (revision submitted): first consistency guarantee for graph-based clustering under model misspecification.
More research code: VAE-fMRI-Alzheimer, a 3D-convolutional VAE for Alzheimer's fMRI. CUDA training on HiPerGator, 36 unit tests, 18 tutorial notebooks.
📫 hugh.stats@gmail.com · Google Scholar · ORCID · LinkedIn · Website



