by CeMPA (Centre for Microsimulation and Policy Analysis).
The entire SimPaths documentation is available on its website, which includes: a detailed description of its building blocks; instructions on how to set up and run the model; and information about contributing to the model's development. The documentation/ directory contains supplementary materials that complements this README (model specifications, variable references, etc.).
SimPaths is a family of models for individual and household life course events, all sharing common components. The framework is designed to project life histories through time, building up a detailed picture of career paths, family (inter)relations, health, and financial circumstances. The framework builds upon standardised assumptions and data sources, which facilitates adaptation to alternative countries. This repository, SimPathsEU, covers Greece (EL), Hungary (HU), Italy (IT), and Poland (PL), and integrates with EUROMOD for tax and benefit policy simulation. Careful attention is paid to model validation, and sensitivity of projections to key assumptions. The modular nature of the SimPaths framework is designed to facilitate analysis of alternative assumptions concerning the tax and benefit system, sensitivity to parameter estimates and alternative approaches for projecting labour/leisure and consumption/savings decisions.
Released under the terms in license.txt.
SimPathsEU/
├── src/ # Java source (main + tests)
├── input/ # H2 DB + per-country starting populations and EUROMOD outputs
│ └── <CC>/InitialPopulations/{,training/}
│ └── <CC>/EUROMODoutput/{,training/}
├── input_processing/ # Stata do-files that prepare regression estimates and inputs
├── config/ # YAML configs (default.yml, alignment_*.yml, test_*.yml)
├── scripts/ # Bash wrappers for batch multi-run scenarios
├── validation/ # Stata validation against EU-SILC / EUROMOD targets
├── documentation/ # Supplementary documentation
├── output/ # Simulation outputs (created at runtime)
├── pom.xml
└── README.md
To contribute to this project, you need to fork the repository and set up your development environment.
We are committed to maintaining transparency and open-source principles in this project. All the code, documentation, and resources related to our project are available on GitHub for you to explore, use, and contribute to.
The data used by this project is not freely shareable. If you are interested in accessing the data necessary to run the simulation, get in touch with the repository maintainers for further instructions.
However, please note that training data is provided. It allows the simulation to be run and developed, but results obtained on the basis of the training dataset should not be interpreted, except for the purpose of training and development.
- Click the "Fork" button at the top-right corner of this repository.
- Untick the
Copy only the main branchbox. - This will create a copy of the repository in your own GitHub account.
- Follow instructions here to periodically synchronize your fork with the most recent version of this ("upstream") repository. This will ensure you use an up-to-date version of the model.
- Java Development Kit (JDK): the project targets Java 19 or later (see
pom.xml, which pinssource/targetto 19). Install a compatible JDK, e.g. OpenJDK 19+ from Adoptium. - Maven: required to build from the command line. See installation instructions. (Not required if you only build via the IDE.)
- Download an IDE (integrated development environment) of your choice - we recommend IntelliJ IDEA; download the Community (free) or Ultimate (paid) edition, depending on your needs.
- Clone your forked repository to your local machine. Import the cloned repository into IntelliJ as a Maven project.
SimPaths can also be compiled with Maven (installation instructions here) and run from the command line without an IDE. After cloning the repository and setting up the JDK, in the root directory you can run:
$ mvn clean package
... to create two runnable jars for single- and multi-run SimPaths:
.
SimPaths/
...
|-- multirun.jar
|-- singlerun.jar
`-- src
To build without running the unit test suite (faster turnaround during development):
$ mvn clean package -DskipTests
Unit tests run by default as part of the test / package phases. Integration tests (which exercise a full end-to-end simulation) are bound to the verify phase:
$ mvn verify # run unit + integration tests
$ mvn verify -Dit.test=RunSimPathsIntegrationTest # run just the integration test
singlerun.jar runs a single SimPaths simulation and is also the entry point used to build/rebuild the input population database. It takes the following options:
-cCountry (e.g.EL,HU,IT,PL) — required for the initial setup-sStart year-eEnd year-pSimulated population size-g[true/false] show/hide gui-rRe-write policy schedule from detected policy files-Setupperform the setup phase only (build the input population database, then exit)--rebuild-dbForce a rebuild ofinput/input.mv.dbinstead of reusing it (headless mode)--reuse-existing-dbReuseinput/input.mv.dbif present, otherwise build it (headless mode)-t[true/false] use training data subset. Whentrue, reads frominput/<COUNTRY>/InitialPopulations/training/andinput/<COUNTRY>/EUROMODoutput/training/. Whenfalse(default), reads fromInitialPopulations/andEUROMODoutput/directly. If-tis omitted, an auto-detect kicks in: ifInitialPopulations/<country>/*.csvis empty, the simulator falls back to training data and prints a console message.
Important: the country (-c) and start year (-s) must be specified when creating or rebuilding the input population database — the resulting input/input.mv.db is country- and year-specific.
Important — switching between training and actual data: reuseExistingDatabase = true (the default for headless singlerun) reuses input/input.mv.db without checking which mode it was built in. If you switch the -t flag (e.g. from false to true), you must also pass --rebuild-db (or -Setup) — otherwise the simulation silently runs against a stale DB built from the other dataset.
Typical workflows:
# 1. Build (or rebuild) input.mv.db for a given country/start year.
# Both commands below rebuild the database identically — pick ONE,
# based on what you want to happen after the rebuild finishes:
#
# Option 1 — rebuild the database and exit (no simulation).
# Use this to prep the DB ahead of a multirun.
$ java -jar singlerun.jar -c PL -s 2011 -g false -Setup
#
# Option 2 — rebuild the database and then run a single-run simulation
# straight after, using the freshly built DB.
$ java -jar singlerun.jar -c PL -s 2011 -g false -e 2013 -p 30000 --rebuild-db
# 2. Run a single-run simulation on an existing input.mv.db (reuse as-is,
# or build it first if missing — no rebuild if it already exists)
$ java -jar singlerun.jar -c PL -s 2011 -g false -e 2013 -p 30000 --reuse-existing-db
# 3. Run a single-run simulation using the default DB-handling behaviour (no
# flag needed): reuses input.mv.db if present, builds it first if missing
$ java -jar singlerun.jar -c PL -s 2011 -g false -e 2013 -p 30000
# 4. Switch to training data — note that --rebuild-db (or -Setup) is REQUIRED
# when changing -t, otherwise input.mv.db from the previous mode is reused.
$ java -jar singlerun.jar -c PL -s 2019 -g false -t true --rebuild-db
$ java -jar singlerun.jar -c PL -s 2019 -g false -e 2022 -p 20000 -t true
# 5. Switch back to actual data — again, rebuild the DB when toggling -t.
$ java -jar singlerun.jar -c PL -s 2017 -g false -t false --rebuild-db
For multiple runs, multirun.jar takes the following options:
-rrandom seed for first run (incremented by +1 for subsequent runs)-psimulated population size-nnumber of runs-sstart year of runs-eend year of runs-g[true/false] show/hide gui-fwrite console output and logs to file (in 'output/logs/run_[seed].txt')-config <file>use a custom YAML config fromconfig/instead ofdefault.yml-DBSetupbuild the input population database for the configured country/start year, then exit-t[true/false] use training data subset (same semantics as forsinglerun). Overridesparameter_args.trainingFlagfrom the YAML config. The same caveat applies: if you switch-t, also rebuild the DB with-DBSetupbefore running the multirun.
Note: multirun.jar does not take a -c country flag — it resolves the country from input/DatabaseCountryYear.xlsx. Make sure that file reflects the country you intend to run.
Before running multiruns:
- Ensure
input/DatabaseCountryYear.xlsxis populated with the country/year combination you intend to run — multirun reads it to resolve the country and start year. - Rebuild the input database whenever it needs to be refreshed (especially when changing the start year). A different population size alone does not require a rebuild: the processed starting population is resampled to match the requested size.
Typical multirun workflow:
# 1. (Re)build the input database for the configured country/start year
$ java -jar multirun.jar -DBSetup -s 2011 -g false
# 2. Run N simulations over a year range
$ java -jar multirun.jar -g false -s 2011 -e 2013 -p 30000 -n 3
# 3. Training data — rebuild DB and run with -t true.
# REQUIRED to rebuild when toggling -t (same caveat as singlerun).
$ java -jar multirun.jar -DBSetup -s 2019 -g false -t true
$ java -jar multirun.jar -s 2019 -e 2022 -p 20000 -n 2 -g false -t true
Example with explicit seed and logging:
$ java -jar multirun.jar -r 100 -p 50000 -n 20 -s 2017 -e 2020 -g false -f
Run java -jar singlerun.jar -h or java -jar multirun.jar -h to show these help messages.
Each simulation writes a timestamped subdirectory under output/ (named YYYYMMDDHHMMSS), e.g.:
output/
├── <YYYYMMDDHHMMSS>/ # one run's artefacts
│ ├── database/ # H2 snapshot of the simulated population
│ └── input/ # copy of the inputs used for the run (for reproducibility)
└── logs/
├── run_<seed>.txt # console log when multirun is invoked with -f
└── run_<seed>.log # logger output for the same run
Batch scripts in scripts/ move each scenario's outputs into output/<scenario-name>/ after the runs finish.
Helper Bash scripts in scripts/ run multirun.jar across multiple alignment configs in sequence and move each scenario's output into output/<scenario-name>/:
run_alignment_multiruns.sh— full set of alignment scenarios
Defaults (start/end year, population size, runs per scenario, JVM heap, random seed) are set at the top of each script and can be overridden via environment variables, e.g.:
$ POP_SIZE=10000 RUNS_PER_SCENARIO=2 ./scripts/run_alignment_multiruns.sh
- Create a new branch for your contributions. This will likely be based on either the
mainbranch of this repository (if you seek to modify the stable version of the model) ordevelop(if you seek to modify the most recent version of the model). - Make your changes, add your code, and write tests if applicable.
- Commit your changes.
- Push your changes to your fork.
- Open a Pull Request (PR) on this repository from your fork. Be sure to provide a detailed description of your changes in the PR.