diff --git a/CLAUDE.md b/CLAUDE.md index 7404216..069b37f 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,4 +1,4 @@ -# CLAUDE.md +can you# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. @@ -31,6 +31,14 @@ mvn test -Dtest=SimPathsStartTest CLI help: `java -jar singlerun.jar -h` or `java -jar multirun.jar -h` +### Key CLI flags + +- `-c ` country code (`EL`, `IT`, `HU`, `PL`); `-s` start year; `-e` end year; `-p` population size; `-g true|false` show GUI. +- `-t true|false` (`--training`) — use the training-data subset under `input//InitialPopulations/training/` and `EUROMODoutput/training/` (uses `TaxDonorParserTraining`). On `multirun.jar` this **overrides** `parameter_args.trainingFlag` from the YAML config. +- `singlerun.jar -Setup` — setup phase only (build the H2 input DB, no simulation). Multi-run equivalent is `-DBSetup`. +- `multirun.jar -r ` random seed, `-n ` max runs, `-f` output to file, `-config ` custom config (default `config/default.yml`). +- **Training auto-detect**: if `-t` is omitted and `input//InitialPopulations/*.csv` is empty, `Parameters.trainingFlag` is flipped to `true` automatically and a notice is printed to stdout (`SimPathsStart.java:363-368, 520-525`). To diagnose which mode is active at runtime, look for either `Training-data flag set explicitly via CLI: -t ...` or `auto-switching to training data` in the console output. + ## Architecture ### Entity Hierarchy @@ -66,10 +74,22 @@ CLI help: `java -jar singlerun.jar -h` or `java -jar multirun.jar -h` ### Data Inputs - `input/input.mv.db` — H2 database with processed EU-SILC starting population -- `input/[COUNTRY]/` — Country-specific Excel parameter files, EUROMOD output CSVs +- `input/[COUNTRY]/InitialPopulations/` — actual starting-population CSVs; `…/training/` holds the shipped training subset +- `input/[COUNTRY]/EUROMODoutput/` — EUROMOD donor CSVs; `…/training/` holds the training subset +- `input/[COUNTRY]/` — country-specific Excel parameter files (e.g. `EUROMODpolicySchedule.xlsx`) - `input/DatabaseCountryYear.xlsx` — Cross-country/year index - `config/default.yml` — Default multi-run parameters (population size, year range, run count) - `config/alignment_*.yml` — Staged alignment configurations +- `config/test_create_database.yml`, `config/test_run.yml` — Configs used by the integration test + +### Repository layout (beyond `src/`) + +- `scripts/` — shell wrappers for batch multi-runs (`run_alignment_multiruns.sh`, `run_multiruns-alignPopOFF.sh`, `run_TEST_multiruns.sh`, …) +- `input_processing/` — Stata do-files that prepare model inputs upstream of the Java pipeline (master conditions, regression-estimate cleaning, lag-structure generation) +- `tools/generate_simpaths_eu_variable_codebook.py` — variable codebook generator +- `validation/` — Stata validation against EU-SILC/EUROMOD targets +- `documentation/` — supplementary documentation +- `output/` — timestamped simulation outputs (created at runtime) ### Tax/Benefit Imputation @@ -87,6 +107,7 @@ JUnit 5 + Mockito. Tests in `src/test/java/simpaths/`: - `experiment/SimPathsMultiRunTest` — Multi-run configuration - `experiment/PersonTest` — Person entity logic - `data/MahalanobisDistanceTest` — Statistical matching +- `integrationtest/RunSimPathsIntegrationTest` — End-to-end run using `config/test_create_database.yml` + `config/test_run.yml` ## Branch Conventions diff --git a/README.md b/README.md index 5a5ee78..6112445 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,37 @@ # SimPathsEU -by Matteo Richiardi, Patryk Bronka, Justin van de Ven, Mariia Vartuzova, David Sonnewald +by CeMPA (Centre for Microsimulation and Policy Analysis). + +## Documentation + +The entire SimPaths documentation is available on its [website](https://simpaths.github.io/SimPaths/), which includes: a detailed description of its building blocks; instructions on how to set up and run the model; and information about contributing to the model's development. The `documentation/` directory contains supplementary materials that complements this README (model specifications, variable references, etc.). ## Introduction -SimPaths is a family of models for individual and household life course events, all sharing common components. The framework is designed to project life histories through time, building up a detailed picture of career paths, family (inter)relations, health, and financial circumstances. The framework builds upon standardised assumptions and data sources, which facilitates adaptation to alternative countries. This repository, **SimPathsEU**, covers Greece (`EL`), Hungary (`HU`), Italy (`IT`), and Poland (`PL`), and integrates with EUROMOD for tax and benefit policy simulation. Careful attention is paid to model validation, and sensitivity of projections to key assumptions. The modular nature of the SimPaths framework is designed to facilitate analysis of alternative assumptions concerning the tax and benefit system, sensitivity to parameter estimates and alternative approaches for projecting labour/leisure and consumption/savings decisions. Projections for a workhorse model parameterised to the UK context are reported in [Bronka, P., Richiardi, M., & van de Ven, J. (2023). *SimPaths: an open-source microsimulation model for life course analysis* (No. CEMPA6/23), Centre for Microsimulation and Policy Analysis at the Institute for Social and Economic Research*](https://www.microsimulation.ac.uk/publications/publication-557738/), which closely reflect observed data throughout a 10-year validation window. +SimPaths is a family of models for individual and household life course events, all sharing common components. The framework is designed to project life histories through time, building up a detailed picture of career paths, family (inter)relations, health, and financial circumstances. The framework builds upon standardised assumptions and data sources, which facilitates adaptation to alternative countries. This repository, **SimPathsEU**, covers Greece (`EL`), Hungary (`HU`), Italy (`IT`), and Poland (`PL`), and integrates with EUROMOD for tax and benefit policy simulation. Careful attention is paid to model validation, and sensitivity of projections to key assumptions. The modular nature of the SimPaths framework is designed to facilitate analysis of alternative assumptions concerning the tax and benefit system, sensitivity to parameter estimates and alternative approaches for projecting labour/leisure and consumption/savings decisions. + + +## License + +Released under the terms in [`license.txt`](license.txt). + +## Repository layout + +``` +SimPathsEU/ +├── src/ # Java source (main + tests) +├── input/ # H2 DB + per-country starting populations and EUROMOD outputs +│ └── /InitialPopulations/{,training/} +│ └── /EUROMODoutput/{,training/} +├── input_processing/ # Stata do-files that prepare regression estimates and inputs +├── config/ # YAML configs (default.yml, alignment_*.yml, test_*.yml) +├── scripts/ # Bash wrappers for batch multi-run scenarios +├── validation/ # Stata validation against EU-SILC / EUROMOD targets +├── documentation/ # Supplementary documentation +├── output/ # Simulation outputs (created at runtime) +├── pom.xml +└── README.md +``` ## Getting Started @@ -30,11 +57,11 @@ However, please note that _training_ data is provided. It allows the simulation 1. **Java Development Kit (JDK):** the project targets **Java 19 or later** (see `pom.xml`, which pins `source`/`target` to 19). Install a compatible JDK, e.g. OpenJDK 19+ from [Adoptium](https://adoptium.net/). 2. **Maven:** required to build from the command line. See [installation instructions](https://maven.apache.org/install.html). (Not required if you only build via the IDE.) 3. **Download an IDE** (integrated development environment) of your choice - we recommend [IntelliJ IDEA](https://www.jetbrains.com/idea/download/); download the Community (free) or Ultimate (paid) edition, depending on your needs. -4. Clone your forked repository to your local machine. Import the cloned repository into IntelliJ as a Maven project +4. Clone your forked repository to your local machine. Import the cloned repository into IntelliJ as a Maven project. -### Compiling and running SimPaths with Maven in the CLI +### Compiling and running SimPaths with Maven from the CLI -SimPaths can also be compiled by Maven ([installation instructions here](https://maven.apache.org/install.html)) and run from the command line without an IDE. After cloning the repository and setting up the JDK, in the root directory you can run: +SimPaths can also be compiled with Maven ([installation instructions here](https://maven.apache.org/install.html)) and run from the command line without an IDE. After cloning the repository and setting up the JDK, in the root directory you can run: ``` $ mvn clean package ``` @@ -71,10 +98,10 @@ $ mvn verify -Dit.test=RunSimPathsIntegrationTest # run just the integration - `-p` Simulated population size - `-g` [true/false] show/hide gui - `-r` Re-write policy schedule from detected policy files -- `-Setup` do setup phases (creating input populations database) only +- `-Setup` perform the setup phase only (build the input population database, then exit) - `--rebuild-db` Force a rebuild of `input/input.mv.db` instead of reusing it (headless mode) - `--reuse-existing-db` Reuse `input/input.mv.db` if present, otherwise build it (headless mode) -- `-t` [true/false] use training data subset. When `true`, reads from `input//InitialPopulations/training/` and `input//EUROMODoutput/training/`, and uses `TaxDonorParserTraining` (which drops `deh`/`drgn1`/`lcs` and uses `idhh` as the tax-unit identifier). When `false` (default), reads from `InitialPopulations/` and `EUROMODoutput/` directly and uses the standard `TaxDonorDataParser`. If `-t` is omitted, an auto-detect kicks in: if `InitialPopulations//*.csv` is empty, the simulator falls back to training data and prints a console message. +- `-t` [true/false] use training data subset. When `true`, reads from `input//InitialPopulations/training/` and `input//EUROMODoutput/training/`. When `false` (default), reads from `InitialPopulations/` and `EUROMODoutput/` directly. If `-t` is omitted, an auto-detect kicks in: if `InitialPopulations//*.csv` is empty, the simulator falls back to training data and prints a console message. **Important:** the country (`-c`) and start year (`-s`) must be specified when creating or rebuilding the input population database — the resulting `input/input.mv.db` is country- and year-specific. @@ -153,9 +180,26 @@ $ java -jar multirun.jar -r 100 -p 50000 -n 20 -s 2017 -e 2020 -g false -f Run `java -jar singlerun.jar -h` or `java -jar multirun.jar -h` to show these help messages. +#### Output layout + +Each simulation writes a timestamped subdirectory under `output/` (named `YYYYMMDDHHMMSS`), e.g.: + +``` +output/ +├── / # one run's artefacts +│ ├── database/ # H2 snapshot of the simulated population +│ └── input/ # copy of the inputs used for the run (for reproducibility) +└── logs/ + ├── run_.txt # console log when multirun is invoked with -f + └── run_.log # logger output for the same run +``` + +Batch scripts in `scripts/` move each scenario's outputs into `output//` after the runs finish. + + ### Batch scenario scripts -Helper Bash scripts in `scripts/` run `multirun.jar` across multiple alignment configs in sequence and move each scenario's CSV output into `output//`: +Helper Bash scripts in `scripts/` run `multirun.jar` across multiple alignment configs in sequence and move each scenario's output into `output//`: - `run_alignment_multiruns.sh` — full set of alignment scenarios @@ -166,7 +210,7 @@ $ POP_SIZE=10000 RUNS_PER_SCENARIO=2 ./scripts/run_alignment_multiruns.sh ### Contributing -1. Create a new branch for your contributions. This will likely be based on either the `main` branch of this repository (if you seek to modify the stable version of the model) or `develop` (if you seek to modify the most recent version of the model). Please see branch naming convention below. +1. Create a new branch for your contributions. This will likely be based on either the `main` branch of this repository (if you seek to modify the stable version of the model) or `develop` (if you seek to modify the most recent version of the model). 2. Make your changes, add your code, and write tests if applicable. 3. Commit your changes. 4. Push your changes to your fork. diff --git a/documentation/PL_InitialPopulations_column_mapping_AB.xlsx b/documentation/PL_InitialPopulations_column_mapping_AB.xlsx deleted file mode 100644 index 0b9d79b..0000000 Binary files a/documentation/PL_InitialPopulations_column_mapping_AB.xlsx and /dev/null differ diff --git a/documentation/SimPathsEU_Variable_Codebook_updated.xlsx b/documentation/SimPathsEU_Variable_Codebook.xlsx similarity index 100% rename from documentation/SimPathsEU_Variable_Codebook_updated.xlsx rename to documentation/SimPathsEU_Variable_Codebook.xlsx diff --git a/documentation/SimPathsEU_Variable_Codebook_changes.xlsx b/documentation/SimPathsEU_Variable_Codebook_changes.xlsx deleted file mode 100644 index a5d4736..0000000 Binary files a/documentation/SimPathsEU_Variable_Codebook_changes.xlsx and /dev/null differ diff --git a/documentation/SimPaths_Variable_CodebookUK.xlsx b/documentation/SimPaths_Variable_CodebookUK.xlsx deleted file mode 100644 index a14bd35..0000000 Binary files a/documentation/SimPaths_Variable_CodebookUK.xlsx and /dev/null differ diff --git a/input/PL/DoFilesTargets/TargetsPlots/disability_targets_ts.gph b/input/PL/DoFilesTargets/TargetsPlots/disability_targets_ts.gph deleted file mode 100644 index 8b25806..0000000 Binary files a/input/PL/DoFilesTargets/TargetsPlots/disability_targets_ts.gph and /dev/null differ diff --git a/input/PL/DoFilesTargets/TargetsPlots/disability_targets_ts.png b/input/PL/DoFilesTargets/TargetsPlots/disability_targets_ts.png deleted file mode 100644 index 2443618..0000000 Binary files a/input/PL/DoFilesTargets/TargetsPlots/disability_targets_ts.png and /dev/null differ diff --git a/input/PL/DoFilesTargets/TargetsPlots/employment_targets_ts.gph b/input/PL/DoFilesTargets/TargetsPlots/employment_targets_ts.gph deleted file mode 100644 index 79955a8..0000000 Binary files a/input/PL/DoFilesTargets/TargetsPlots/employment_targets_ts.gph and /dev/null differ diff --git a/input/PL/DoFilesTargets/TargetsPlots/employment_targets_ts.png b/input/PL/DoFilesTargets/TargetsPlots/employment_targets_ts.png deleted file mode 100644 index 29409f6..0000000 Binary files a/input/PL/DoFilesTargets/TargetsPlots/employment_targets_ts.png and /dev/null differ diff --git a/input/PL/DoFilesTargets/TargetsPlots/inSchool_targets_ts.gph b/input/PL/DoFilesTargets/TargetsPlots/inSchool_targets_ts.gph deleted file mode 100644 index 34a7946..0000000 Binary files a/input/PL/DoFilesTargets/TargetsPlots/inSchool_targets_ts.gph and /dev/null differ diff --git a/input/PL/DoFilesTargets/TargetsPlots/inSchool_targets_ts.png b/input/PL/DoFilesTargets/TargetsPlots/inSchool_targets_ts.png deleted file mode 100644 index 6ae7595..0000000 Binary files a/input/PL/DoFilesTargets/TargetsPlots/inSchool_targets_ts.png and /dev/null differ diff --git a/input/PL/DoFilesTargets/TargetsPlots/partnered_BUlogic_targets_ts.gph b/input/PL/DoFilesTargets/TargetsPlots/partnered_BUlogic_targets_ts.gph deleted file mode 100644 index 2a29325..0000000 Binary files a/input/PL/DoFilesTargets/TargetsPlots/partnered_BUlogic_targets_ts.gph and /dev/null differ diff --git a/input/PL/DoFilesTargets/TargetsPlots/partnered_BUlogic_targets_ts.png b/input/PL/DoFilesTargets/TargetsPlots/partnered_BUlogic_targets_ts.png deleted file mode 100644 index 8432a53..0000000 Binary files a/input/PL/DoFilesTargets/TargetsPlots/partnered_BUlogic_targets_ts.png and /dev/null differ diff --git a/input/PL/DoFilesTargets/TargetsPlots/retirement_targets_ts.gph b/input/PL/DoFilesTargets/TargetsPlots/retirement_targets_ts.gph deleted file mode 100644 index e6a9d19..0000000 Binary files a/input/PL/DoFilesTargets/TargetsPlots/retirement_targets_ts.gph and /dev/null differ diff --git a/input/PL/DoFilesTargets/TargetsPlots/retirement_targets_ts.png b/input/PL/DoFilesTargets/TargetsPlots/retirement_targets_ts.png deleted file mode 100644 index 822af97..0000000 Binary files a/input/PL/DoFilesTargets/TargetsPlots/retirement_targets_ts.png and /dev/null differ diff --git a/input/PL/DoFilesTargets/alignment_targets_disability.xlsx b/input/PL/DoFilesTargets/alignment_targets_disability.xlsx deleted file mode 100644 index 1447c24..0000000 Binary files a/input/PL/DoFilesTargets/alignment_targets_disability.xlsx and /dev/null differ diff --git a/input/PL/DoFilesTargets/alignment_targets_employment.xlsx b/input/PL/DoFilesTargets/alignment_targets_employment.xlsx deleted file mode 100644 index bfc0446..0000000 Binary files a/input/PL/DoFilesTargets/alignment_targets_employment.xlsx and /dev/null differ diff --git a/input/PL/DoFilesTargets/alignment_targets_inSchool.xlsx b/input/PL/DoFilesTargets/alignment_targets_inSchool.xlsx deleted file mode 100644 index 6cd2c33..0000000 Binary files a/input/PL/DoFilesTargets/alignment_targets_inSchool.xlsx and /dev/null differ diff --git a/input/PL/DoFilesTargets/alignment_targets_partnered_share.xlsx b/input/PL/DoFilesTargets/alignment_targets_partnered_share.xlsx deleted file mode 100644 index 19c680b..0000000 Binary files a/input/PL/DoFilesTargets/alignment_targets_partnered_share.xlsx and /dev/null differ diff --git a/input/PL/DoFilesTargets/alignment_targets_retirement.xlsx b/input/PL/DoFilesTargets/alignment_targets_retirement.xlsx deleted file mode 100644 index 7f2b2c6..0000000 Binary files a/input/PL/DoFilesTargets/alignment_targets_retirement.xlsx and /dev/null differ diff --git a/input/SimPaths_Variable_Codebook.xlsx b/input/SimPaths_Variable_Codebook.xlsx deleted file mode 100644 index c3e8209..0000000 Binary files a/input/SimPaths_Variable_Codebook.xlsx and /dev/null differ diff --git a/input_processing/00_master_conditions.do b/input_processing/00_master_conditions_PL.do similarity index 95% rename from input_processing/00_master_conditions.do rename to input_processing/00_master_conditions_PL.do index 7428d00..0a23828 100644 --- a/input_processing/00_master_conditions.do +++ b/input_processing/00_master_conditions_PL.do @@ -167,13 +167,13 @@ global r1a_if_condition "dcpst == 2 & dag >= ${age_can_retire} & flag_deceased ! global r1b_if_condition "ssscp != 1 & dcpst == 1 & dag >= ${age_can_retire} & flag_deceased != 1" * Wages -global W1fa_if_condition "dgn == 0 & dag >= ${age_seek_employment} & dag <= ${age_force_retire} & flag_deceased != 1" +global W1fa_if_condition "dgn == 0 & dag >= ${age_seek_employment} & dag <= ${age_force_retire} & deh_c4 != 0 & flag_deceased != 1" -global W1ma_if_condition "dgn == 1 & dag >= ${age_seek_employment} & dag <= ${age_force_retire} & flag_deceased != 1" +global W1ma_if_condition "dgn == 1 & dag >= ${age_seek_employment} & dag <= ${age_force_retire} & deh_c4 != 0 & flag_deceased != 1" -global W1fb_if_condition "dgn == 0 & dag >= ${age_seek_employment} & dag <= ${age_force_retire} & previouslyWorking == 1 & flag_deceased != 1" +global W1fb_if_condition "dgn == 0 & dag >= ${age_seek_employment} & dag <= ${age_force_retire} & deh_c4 != 0 & previouslyWorking == 1 & flag_deceased != 1" -global W1mb_if_condition "dgn == 1 & dag >= ${age_seek_employment} & dag <= ${age_force_retire} & previouslyWorking == 1 & flag_deceased != 1" +global W1mb_if_condition "dgn == 1 & dag >= ${age_seek_employment} & dag <= ${age_force_retire} & deh_c4 != 0 & previouslyWorking == 1 & flag_deceased != 1" * Capital income global i1a_if_condition "dag >= ${age_becomes_semi_responsible} & flag_deceased != 1" diff --git a/input_processing/01_key_function_income_thresholds_from_EUROMOD_2018.do b/input_processing/01_key_function_income_thresholds_from_EUROMOD_2018.do index 30ffa25..4e2b414 100644 --- a/input_processing/01_key_function_income_thresholds_from_EUROMOD_2018.do +++ b/input_processing/01_key_function_income_thresholds_from_EUROMOD_2018.do @@ -51,9 +51,8 @@ if _rc { exit 601 } -local dir_doc "`dir_w'/documentation" -local out_dta "`dir_doc'/key_function_income_thresholds_clean2018.dta" -local out_xlsx "`dir_doc'/key_function_income_thresholds_clean2018.xlsx" +local dir_out "`dir_w'/input_processing" +local out_xlsx "`dir_out'/key_function_income_thresholds_2018.xlsx" local ref_year 2018 local weeks_per_month = 365.25 / (7 * 12) @@ -248,7 +247,6 @@ sort country approach format lo_monthly hi_monthly lo_weekly hi_weekly %12.2f compress -save "`out_dta'", replace export excel using "`out_xlsx'", sheet("results") firstrow(variables) sheetreplace @@ -267,6 +265,5 @@ putexcel A11=("Output values") B11=("Local currency per week, exact and rounde putexcel A12=("Do-file") B12=("01_key_function_income_thresholds_from_EUROMOD_2018.do") di as txt "Saved results to:" -di as txt " `out_dta'" di as txt " `out_xlsx'" list country approach bu_id lo_hi_weekly_exact lo_hi_weekly_round, noobs abbreviate(32) diff --git a/input_processing/data_construction/PL/00_master_data_set_construction_PL.do b/input_processing/data_construction/PL/00_master_data_set_construction_PL.do index b3ef554..d808ac2 100644 --- a/input_processing/data_construction/PL/00_master_data_set_construction_PL.do +++ b/input_processing/data_construction/PL/00_master_data_set_construction_PL.do @@ -9,10 +9,12 @@ * DATA: Longitudinal EU-SILC UDB version, 2005 - 2020 * AUTHORS: Clare Fenwick, Daria Popova, Ashley Burdett, * Aleksandra Kolndrekaj -* LAST UPDATE: Jan 2026 AB +* LAST UPDATE: March 2026 AB * ******************************************************************************** * NOTES: +* ENSURE HAVE ALREADY RUN 00_master_conditions.do FILE. +* * Before running these files, the cumulative panel for each file type * (D, H, R, P) must be constructed. These cumulative panels should be created * following the procedure set out in *GESIS Papers 2022/10*. The do-files to @@ -115,7 +117,7 @@ global dir_ind "/Users/ashleyburdett/Library/CloudStorage/Box-Box" // Aleksandra - C:/Users/ak25793/Box * Working directory -global dir_work "$dir_ind/CeMPA shared area/_SimPaths/_SimPathsEU/initial_populations/PL" +global dir_work "$dir_ind/CeMPA shared area/_SimPaths/_SimPathsEU/input_processing/initial_populations/PL" * Directory containing do files global dir_do "$dir_work/do_files" @@ -146,7 +148,7 @@ global dir_data_05_20 "$dir_data/orig_panel_2005_2020" * DEFINE PARAMETERS & PROCESS IF CONDITIONS *******************************************************************************/ -do "$dir_ind/CeMPA shared area/_SimPaths/_SimPathsEU/00_master_conditions.do" +do "$dir_ind/CeMPA shared area/_SimPaths/_SimPathsEU/input_processing/00_master_conditions_PL.do" /******************************************************************************* @@ -154,13 +156,13 @@ do "$dir_ind/CeMPA shared area/_SimPaths/_SimPathsEU/00_master_conditions.do" *******************************************************************************/ //do "$dir_do/01_prepare_pooled_data.do" -do "$dir_do/02_create_variables_PL.do" +do "$dir_do/02_create_variables_${country}.do" -do "$dir_do/03_create_benefit_units_PL.do" +do "$dir_do/03_create_benefit_units_${country}.do" -do "$dir_do/04_reweight_PL.do" +do "$dir_do/04_reweight_${country}.do" -do "$dir_do/05_drop_hholds_and_slice_PL.do" +do "$dir_do/05_drop_hholds_slice_and_refactoring_${country}.do" -do "$dir_do/06_check_yearly_data_PL.do" +do "$dir_do/06_check_yearly_data_${country}.do" diff --git a/input_processing/data_construction/PL/01_prepare_pooled_data_PL.do b/input_processing/data_construction/PL/01_prepare_pooled_data_PL.do index 00d1c5e..d102f64 100644 --- a/input_processing/data_construction/PL/01_prepare_pooled_data_PL.do +++ b/input_processing/data_construction/PL/01_prepare_pooled_data_PL.do @@ -1,5 +1,5 @@ ******************************************************************************** -* PROJECT: ESPON +* PROJECT: SImPaths EU * DO-FILE NAME: 01_prepare_pooled_data.do * DESCRIPTION: Compiles panel dataset from EU-SILC ******************************************************************************** @@ -23,10 +23,42 @@ merge these chunks of data into one cumulative dataset (separately for the D-,H-,R- and P-data). */ /* -Initial populations: cross-sectional SILC for 2011-2023 (income 2010-2022), -2023 (income 2022) -Estimation sample: longitudinal SILC with observations from 2011-2023 -(income 2010-2022) +STRUCTURE OF THIS FILE + + The script builds a person-level panel dataset for a single country by + sequentially merging the four EU-SILC master files produced by the panel + construction scripts (01-04 in eu_silc_do_2025/). + + Files are merged in the following order, with R as the base: + + R (Personal Register) — loaded first as the base. Contains all persons + in the sample including children under 16. Key identifiers: upid + (unique person ID across releases), uhid (unique household ID), year. + + P (Personal Data) — merged 1:1 on year+upid+uhid. Contains income and + personal variables for adults aged 16 and above only. After this merge: + - Adults (in both R and P): have full R and P variables + - Children (in R only, not P): retained with R variables only + - Records in P but not R: dropped (should not occur in clean data) + + D (Household Register) — merged 1:m on year+uhid. D is household-level + so one D row maps to multiple persons. keep if _merge==3 retains only + persons whose household appears in D. A small number of households may + not merge — this is suspected to be an edge case from the cross-release + deduplication in 01_create_masterD.do but has not been fully investigated. + + H (Household Data) — merged 1:m on year+uhid, same logic as D. + + KEY IDENTIFIERS + upid — unique personal ID across releases (country + rotation group + + dropout year + pid). Not the same as the raw pid in the source data + uhid — unique household ID across releases (same construction logic). + year — income reference year. + + OUTPUT + ${country}-SILC_pooled_all_obs_01.dta — person-level panel for the target + country, containing all household members (adults and children) with + combined R, P, D, and H variables. Flag variables (*_f, *_i) are dropped. */ ******************************************************************************** diff --git a/input_processing/data_construction/PL/02_create_variables_PL.do b/input_processing/data_construction/PL/02_create_variables_PL.do index 2acd982..4349df9 100644 --- a/input_processing/data_construction/PL/02_create_variables_PL.do +++ b/input_processing/data_construction/PL/02_create_variables_PL.do @@ -58,6 +58,7 @@ use "$dir_data/${country}-SILC_pooled_all_obs_01.dta", clear lab def dummy 1 "yes" 0 "no" + /**************************** DATA COLLECTION WAVE ****************************/ /* swv >> used to set the panel. The variable 'year' is unique to this @@ -70,6 +71,7 @@ lab var swv "Data collection wave" fre swv + /*********************** YEAR OF THE PERSONAL INTERVIEW ***********************/ /* stm: year of the personal interview (pb110) or year of the household @@ -96,6 +98,7 @@ fre stm assert swv == stm + /******************************** INTERVIEW DATE ******************************/ /* The exact date of interview is not available in EU-SILC (no day recorded, only @@ -126,6 +129,7 @@ lab var Int_Date "Interview Date" fre Int_Date + /**************************** HOUSEHOLD IDENTIFIER ****************************/ /* In the original EU-SILC longitudinal wave files, a household is identified @@ -142,6 +146,7 @@ format idhh %-18.0g bysort year: sum idhh + /********************************* INDIVIDUALS ID *****************************/ /* In the original EU-SILC longitudinal wave files, a person is identified with @@ -158,6 +163,7 @@ format idperson %-20.0g bysort year: sum idperson + /******************************** SET PANEL ***********************************/ duplicates report year idperson @@ -188,6 +194,7 @@ xtset idperson swv sort upid year + /***************************** DECEASED FLAG **********************************/ gen flag_deceased = 0 @@ -195,6 +202,7 @@ replace flag_deceased = 1 if rb110 == 6 lab var flag_deceased "FLAG: Individual deied in the previous year" + /********************************* GENDER *************************************/ gen dgn = rb090 recode dgn 2 = 0 //dgn = 0 is female, 1 is male @@ -217,6 +225,7 @@ forvalues i = 1/6 { fre year dgn bysort year: sum dgn + /********************************* ID PARTNER *********************************/ /* Dataset quirks: the original variable that identifies an individual's @@ -243,6 +252,7 @@ lab var idpartner "Unique cross wave identifier of partner" recode idpartner . = -9 format idpartner %18.0g + /**************** ID FATHER (includes natural/step/adoptive) ******************/ tostring rb220, replace format(%18.0g) gen idfather = (urtgrp + rb220) @@ -254,6 +264,7 @@ lab var idfather "Father unique identifier" format idfather %18.0g recode idfather . = -9 + /******************* ID MOTHER (includes natural/step/adoptive) ***************/ tostring rb230, replace format(%18.0f) gen str30 idmother = (urtgrp + rb230) @@ -267,6 +278,7 @@ format idmother %18.0g sort idperson year + /******************************* AGE ******************************************/ /* EU-SILC has a number of possible variables that could be used to create age. @@ -340,7 +352,7 @@ drop age_dif * Impose panel consistency to help overcome possible repeat ages due to * interview timing gen dag_new = dag -replace dag_n = dag_n[_n-1] + 1 if idperson == idperson[_n-1] & /// +replace dag_new = dag_new[_n-1] + 1 if idperson == idperson[_n-1] & /// swv == swv[_n-1] + 1 * Enforce top coding @@ -443,6 +455,7 @@ hist dag, discrete graph drop _all + /************************* REGION (NUTS 1) ************************************/ /* ISSUE: The number of NUTS1 regions changes in Poland. @@ -488,10 +501,12 @@ fre drgn1 tab drgn1 year, col bys swv: sum drgn1 if drgn1 > 0 + /******************************** COUNTRY *************************************/ gen dct = . lab var dct "Country code: $country" + /******************************** UNION ***************************************/ /* Generate union variable to indicate if there is a partner in the hh; dun should @@ -535,6 +550,7 @@ tab dun year, col bys dun: sum idpartner if idpartner == -9 bys dun: sum idpartner if idpartner > 0 + /**************************** PARTNER'S AGE ***********************************/ /* To preserve the the empirical relationship between partners ages, we adjust the @@ -863,6 +879,7 @@ drop dag_sim dag_sim2 dag_sim_orig dagsp2 mean_gap sd_gap sum dagsq count if dun == 1 & dagsp == . + /************************** PARTNERSHIP STATUS ********************************/ /* Construct a variable that only indicates whether the individual is single or @@ -913,6 +930,7 @@ tab dcpst widow replace widow = 0 if dcpst == 1 // let idpartner overall widow status + /***************************** PARTNER'S GENDER *******************************/ /* In the cumulative longitidutional dataset created by GESIS, a unique @@ -944,6 +962,7 @@ recode dgnsp (. = -9) fre dgnsp if idpartner > 0 tab dgnsp year, col + /******************************* HEALTH STATUS ********************************/ fre ph010 /* Use ph010 (general health) variable: @@ -1045,6 +1064,7 @@ fre dhe tab dhe year, col bys swv: sum dhe + /************************** PARTNER'S HEALTH STATUS ***************************/ preserve @@ -1108,6 +1128,7 @@ tab2 dcpst dcpen if swv>=2011 & interview_count>=2 & first_appearance!=1, /// restore */ + /****************************** NEW PARTNERSHIP *******************************/ gen new_rel = 0 if dcpst == 1 replace new_rel = 1 if dcpen == 1 @@ -1117,6 +1138,7 @@ lab var new_rel "Partnership in first year" tab new_rel year, col bys swv: sum new_rel if new_rel >= 0 + /**************************** EXIT PARTNERSHIP ********************************/ /* Only populated if can transition out of a partnership (not because of death @@ -1148,7 +1170,7 @@ count if dag >= 65 & pb200 == 2 & idpartner == -9 & dcpex == 1 preserve -keep idperson swv pb190 pb200 pb205 rb110 rb120 flag_deceased +keep idperson swv rb120 flag_deceased rename idperson idpartner rename flag_deceased flag_deceased_sp @@ -1182,6 +1204,7 @@ bys swv: sum dcpex if dcpex >= 0 * Check consistency tab dun dcpex + /**************************** PARTNER AGE DIFFERENCE **************************/ gen dcpagdf = dag - dagsp if dagsp != . & idpartner != -9 @@ -1191,6 +1214,7 @@ fre dcpagdf // tab dcpagdf year, col bys swv: sum dcpagdf + /************************ ECONOMIC ACTIVITY STATUS ****************************/ /* Activity status is coded very differently in EU-SILC to Understanding Society @@ -1306,6 +1330,7 @@ bys swv: sum les_c3 replace les_c3 = -9 if les_c3 == . + /******************** ECONOMIC ACTIVITY STATUS WITH RETIREMENT ****************/ /* Variable construction choice seems to matter here. @@ -1385,6 +1410,7 @@ bys swv: sum les_c4 replace les_c4 = -9 if les_c4 == . + /************************ LONG-TERM SICK OR DISABLED **************************/ /* Effectively treat disabled/long-term sick as a mututlly exclusive activity @@ -1423,6 +1449,7 @@ bys swv: sum dlltsd tab les_c3 dll tab les_c4 dll + /******************* PARTNER LONG-TERM SICK OR DISABLED ***********************/ preserve @@ -1444,6 +1471,7 @@ drop _merge fre dlltsd_sp if idpartner > 0 tab dlltsd_sp year, col + /******************************* UNEMPLOYMENT *********************************/ fre pl020 pl031 @@ -1477,6 +1505,7 @@ tab unemp dlltsd tab unemp les_c3 tab unemp les_c4 + /*********************** IN INITIAL EDUCATION SPELL ***************************/ /* In the understanding society data notes from codebook differ from the code @@ -1554,6 +1583,7 @@ tab ded dlltsd * Age in estimation limited to 16-29 tab dag ded + /******************************** STUDENT *************************************/ gen studentflag = -9 replace studentflag = 0 if les_c3 == 1 | les_c3 == 3 @@ -1561,8 +1591,9 @@ replace studentflag = 1 if les_c3 == 2 label var studentflag "Student" -tab les_c3 student -tab les_c4 student +tab les_c3 studentflag +tab les_c4 studentflag + /**************************** HOURS OF WORK ***********************************/ /* @@ -1799,6 +1830,7 @@ count if les_c3 == -9 count if les_c4 == -9 count if les_c4 == -9 & lhw == . // 8,461 + /********************* LAGGED ECONOMIC ACTIVITY STATUS ************************/ * Without retirement xtset idperson swv @@ -1821,6 +1853,7 @@ lab def l1_les_c4 1 "Employed or self_employed" 2 "Student" /// lab val l1_les_c4 l1_les_c4 lab var l1_les_c4 "LABOUR MARKET: Activity status, inc retirement, t-1" + /************************** PARTNER'S ACTIVITY STATUS *************************/ * Without retirement preserve @@ -1861,6 +1894,7 @@ drop _merge fre lessp_c4 tab lessp_c4 year, col + /********************** OWN AND SPOUSE ACTIVITY LEVELS ************************/ gen lesdf_c4 = -9 replace lesdf_c4 = 1 if les_c3 == 1 & lessp_c3 == 1 & dcpst == 1 @@ -1882,6 +1916,7 @@ fre lesdf_c4 tab lesdf_c4 year, col bys swv: sum lesdf_c4 if lesdf_c4 >= 0 + /*************************** EMPLOYMENT EXPERIENCE ****************************/ gen liwwh = -9 replace liwwh = pl200 if pl200 >= 0 & pl200 != . @@ -1893,6 +1928,7 @@ fre liwwh tab liwwh year, col bys swv: sum liwwh if liwwh >= 0 + /************************* EDUCATIONAL ATTAINMENT *****************************/ /* Use pe040 variable, code negative values to missing @@ -2057,7 +2093,7 @@ forvalues i = 4(-1)1 { * Low in the future, low today (min and monotonicity) replace imp_deh_mono = imp_deh_mono[_n-1] if /// - idperson == idperson[_n-1] & imp_deh[_n-1] == 3 & /// + idperson == idperson[_n-1] & imp_deh_mono[_n-1] == 3 & /// imp_deh_mono == . & count == `i' * Populate with future observation if: @@ -2147,7 +2183,7 @@ missing values. */ -* Create four category version with an unassigned cat for those in iniital edu +* Create four category version with an unassigned cat for those in initial edu * spell gen deh_c4 = deh_c3 @@ -2161,6 +2197,7 @@ lab values deh_c4 deh_c4 count if deh_c4 == -9 // 44,265 count if deh_c4 == -9 & les_c4 == -9 // 3,031 + /*************************** PARENT'S EDUCATION STATUS ************************/ /* There is no variable for parent's education status in EU-SILC, but can be @@ -2175,9 +2212,9 @@ Create variables but leave missing so able to utilize the stadnard strucuture of SimPaths. */ -gen dehm_c3 = . -gen dehf_c3 = . -gen dehmf_c3 = . +gen dehm_c4 = . +gen dehf_c4 = . +gen dehmf_c4 = . /* preserve @@ -2199,20 +2236,20 @@ merge m:1 swv idfather idhh using "$dir_data/father_edu" keep if _merge == 1 | _merge == 3 drop _merge -replace dehm_c3 = mother_educ -replace dehf_c3 = father_educ +replace dehm_c4 = mother_educ +replace dehf_c4 = father_educ -fre dehm_c3 if dgn > 0 & dag > 0 -fre dehf_c3 if dgn > 0 & dag > 0 +fre dehm_c4 if dgn > 0 & dag > 0 +fre dehf_c4 if dgn > 0 & dag > 0 * Identify the highest parental education status -//recode dehm_c3 dehf_c3 (.=0) -egen dehmf_c3 = rowmax(dehm_c3 dehf_c3) -lab var dehmf_c3 "highest parental education status" -fre dehmf_c3 -//recode dehm_c3 dehf_c3 (0 = .) -fre dehmf_c3 if dehm_c3 == . -fre dehmf_c3 if dehf_c3 == . +//recode dehm_c4 dehf_c4 (.=0) +egen dehmf_c4 = rowmax(dehm_c4 dehf_c4) +lab var dehmf_c4 "highest parental education status" +fre dehmf_c4 +//recode dehm_c4 dehf_c4 (0 = .) +fre dehmf_c4 if dehm_c4 == . +fre dehmf_c4 if dehf_c4 == . /* Only a third of the dataset has an observation for parental education and so the following code used to predict the value is not very accurate. @@ -2222,49 +2259,49 @@ is coded out. *Predict highest parental education status if missing *Recode education level (outcome variable) so 1 = Low education, * 2 = Medium education, 3 = High education -recode dehmf_c3 /// +recode dehmf_c4 /// (1 = 3) /// (3 = 1) /// - , gen(dehmf_c3_recoded) + , gen(dehmf_c4_recoded) -la def dehmf_c3_recoded 1 "Low" 2 "Medium" 3 "High" -la val dehmf_c3_recoded dehmf_c3_recoded -fre dehmf_c3_recoded +la def dehmf_c4_recoded 1 "Low" 2 "Medium" 3 "High" +la val dehmf_c4_recoded dehmf_c4_recoded +fre dehmf_c4_recoded *ordered probit model to replace missing values recode dgn dag drgn1 (-9=.) , gen (dgn2 dag2 drgn12) fre dgn2 dag2 drgn12 -xi: oprobit dehmf_c3_recoded i.dgn2 dag2 ib8.drgn12 i.swv, vce(robust) +xi: oprobit dehmf_c4_recoded i.dgn2 dag2 ib8.drgn12 i.swv, vce(robust) predict pred_probs1 pred_probs2 pred_probs3, pr //Identify the category with the highest predicted probability egen max_prob = rowmax(pred_probs1 pred_probs2 pred_probs3) //Impute missing values based on predicted probabilities -gen imp_dehmf_c3_recoded = . -replace imp_dehmf_c3_recoded = 1 if max_prob == pred_probs1 -replace imp_dehmf_c3_recoded = 2 if max_prob == pred_probs2 -replace imp_dehmf_c3_recoded = 3 if max_prob == pred_probs3 +gen imp_dehmf_c4_recoded = . +replace imp_dehmf_c4_recoded = 1 if max_prob == pred_probs1 +replace imp_dehmf_c4_recoded = 2 if max_prob == pred_probs2 +replace imp_dehmf_c4_recoded = 3 if max_prob == pred_probs3 -fre imp_dehmf_c3_recoded if missing(dehmf_c3_recoded) -fre imp_dehmf_c3_recoded if !missing(dehmf_c3_recoded) +fre imp_dehmf_c4_recoded if missing(dehmf_c4_recoded) +fre imp_dehmf_c4_recoded if !missing(dehmf_c4_recoded) -recode imp_dehmf_c3_recoded /// +recode imp_dehmf_c4_recoded /// (1 = 3) /// (3 = 1) /// - , gen(imp_dehmf_c3) + , gen(imp_dehmf_c4) -tab2 imp_dehmf_c3_recoded imp_dehmf_c3 +tab2 imp_dehmf_c4_recoded imp_dehmf_c4 -cap gen dehmf_c3_flag = missing(dehmf_c3) -lab var dehmf_c3_flag "=1 if dehmf_c3 is imputed" -replace dehmf_c3 = round(imp_dehmf_c3) if missing(dehmf_c3) -lab define dehmf_c3 1 "High" 2 "Medium" 3 "Low" +cap gen dehmf_c4_flag = missing(dehmf_c4) +lab var dehmf_c4_flag "=1 if dehmf_c4 is imputed" +replace dehmf_c4 = round(imp_dehmf_c4) if missing(dehmf_c4) +lab define dehmf_c4 1 "High" 2 "Medium" 3 "Low" -bys dehmf_c3_flag: fre dehmf_c3 +bys dehmf_c4_flag: fre dehmf_c4 -drop dehmf_c3_recoded dgn2 dag2 drgn12 _Idgn2_1 _Iswv_* pred_probs* max_prob /// - imp_dehmf_c3_recoded imp_dehmf_c3 +drop dehmf_c4_recoded dgn2 dag2 drgn12 _Idgn2_1 _Iswv_* pred_probs* max_prob /// + imp_dehmf_c4_recoded imp_dehmf_c4 */ */ @@ -2300,6 +2337,7 @@ tab dag der tab der les_c3 tab der les_c4 + /******************************* LEAVE EDUCATION ******************************/ /* Only populated if can transition out of education @@ -2331,6 +2369,7 @@ tab sedex les_c4 tab dag sedex + /****************************** RETIRED ***************************************/ gen dlrtrd = 0 replace dlrtrd = 1 if les_c4 == 4 @@ -2347,6 +2386,7 @@ tab dlrtrd year, col tab les_c3 dlrtrd tab les_c4 dlrtrd + /**************************** ENTER RETIREMENT ********************************/ /* Only populated if at risk of transition. @@ -2368,6 +2408,7 @@ tab drtren year, col tab drtren les_c4 + /**************************** PENSION AGE *************************************/ /*cap gen bdt = mdy(1, 15, birthy) /*no month of birth available in /EU-SILC*/ */ @@ -2433,9 +2474,9 @@ replace dagpns_y = 1 if dgn == 1 & dag == 66 & stm >= 2016 & stm < 2018 replace dagpns_y = 1 if dgn == 1 & dag == 65 & stm >= 2018 & stm <= 2024 * Women -replace dagpns_y = 1 if dgn == 1 & dag == 60 & stm >= 2006 & stm < 2016 -replace dagpns_y = 1 if dgn == 1 & dag == 61 & stm >= 2016 & stm < 2018 -replace dagpns_y = 1 if dgn == 1 & dag == 60 & stm >= 2018 & stm <= 2024 +replace dagpns_y = 1 if dgn == 0 & dag == 60 & stm >= 2006 & stm < 2016 +replace dagpns_y = 1 if dgn == 0 & dag == 61 & stm >= 2016 & stm < 2018 +replace dagpns_y = 1 if dgn == 0 & dag == 60 & stm >= 2018 & stm <= 2024 * Became eligable for state pension last year gen dagpns_y1 = 0 @@ -2446,9 +2487,9 @@ replace dagpns_y1 = 1 if dgn == 1 & dag == 67 & stm >= 2016 & stm < 2018 replace dagpns_y1 = 1 if dgn == 1 & dag == 66 & stm >= 2018 & stm <= 2024 * Women -replace dagpns_y1 = 1 if dgn == 1 & dag == 61 & stm >= 2005 & stm < 2016 -replace dagpns_y1 = 1 if dgn == 1 & dag == 62 & stm >= 2016 & stm < 2018 -replace dagpns_y1 = 1 if dgn == 1 & dag == 61 & stm >= 2018 & stm <= 2024 +replace dagpns_y1 = 1 if dgn == 0 & dag == 61 & stm >= 2005 & stm < 2016 +replace dagpns_y1 = 1 if dgn == 0 & dag == 62 & stm >= 2016 & stm < 2018 +replace dagpns_y1 = 1 if dgn == 0 & dag == 61 & stm >= 2018 & stm <= 2024 lab var dagpns_y "Age became eligable for pension" lab var dagpns_y1 "Age+1 became eligable for pension" @@ -2456,6 +2497,7 @@ lab var dagpns_y1 "Age+1 became eligable for pension" tab dag dagpns_y tab dag dagpns_y + /**************************** PENSION AGE OF SPOUSE ***************************/ * Above state pension age dummy preserve @@ -2502,6 +2544,7 @@ fre dagpns_sp fre dagpns_y_sp fre dagpns_y1_sp + /*************************** NOT RETIRED WORK STATUS **************************/ gen lesnr_c2 = -9 @@ -2515,6 +2558,7 @@ lab val lesnr_c2 lesnr_c2 fre lesnr_c2 tab lesnr_c2 year, col + /*************************** SAME SEX PARTNERSHIP *****************************/ gen ssscp = 0 if idpartner > 0 replace ssscp = 1 if dcpst == 1 & dgn == dgnsp & dgnsp != . @@ -2525,6 +2569,7 @@ lab var ssscp "Partnership is same sex" fre ssscp //0.02% tab ssscp year, col + /*************************** PARTNERSHIP DURATION *****************************/ /* There are no equivalent variables in EU-SILC for partnership duration @@ -2571,6 +2616,7 @@ tab dcpyy_st swv, col tab dcpst dcpyy_st + /*********************** YEAR PRIOR TO ENDING RELATIONSHIP ********************/ /* Impossible to know for the most recent wave so set to 0 to keep the variable. @@ -2588,6 +2634,7 @@ lab var scpexpy "Year prior to exiting partnership" fre scpexpy // 1% tab scpexpy year, col + /*************************** FEMALE FERTILE DUMMY *****************************/ gen sprfm = 0 replace sprfm = 1 if dgn == 0 & dag >= ${age_have_child_min} & /// @@ -2599,6 +2646,7 @@ lab var sprfm "Woman in fertility range dummy (18-49)" fre sprfm tab sprfm year, col + /**************************** NUMBER OF CHILDREN ******************************/ /* Note idmother and idfather are not just reported if the bioloigcal parent but @@ -2698,6 +2746,7 @@ No age consistency imposed here count if dag > 42 & dgn == 0 & dnc02 > 0 & dnc02 != . // 246 cases count if dag > 44 & dgn == 0 & dnc02 > 0 & dnc02 != . // 101 cases + /*********************** NUMBER OF NEW BORN CHILDREN **************************/ gen child0 = 0 replace child0 = 1 if dag < 1 @@ -2767,6 +2816,7 @@ tab dag give_birth if dgn == 0, col hist dag if give_birth == 1 & dgn == 0 + /***************************** ADULT CHILD FLAG *******************************/ /* Decision 24/10/25: Agreed that to be an adult child the following conditions @@ -2853,6 +2903,7 @@ tab adultchildflag year, col tab dag if adultchildflag == 1 & swv > 2010 + /************************ EXIT THE PARENTAL HOME ******************************/ /* Only populated if eligable for transition. 1 means that the individual exits the @@ -2885,6 +2936,7 @@ tab dlftphm year, col tab dlftphm adultchildflag + /************************ HOUSEHOLD COMPOSITION *******************************/ /* Note: For consistency with the simulation adult children and children above @@ -2938,6 +2990,7 @@ fre dhhtp_c8 // 1.87% single parents tab dhhtp_c8 year, col bys swv: sum dhhtp_c8 + /************************** OECD EQUIVALENCE SCALE ****************************/ * Temporary number of children 0-13 and 14-18 to create OECD hh equiv scale gen depChild_013 = 1 if (dag >= 0 & dag <= 13) & (idmother > 0 | idfather > 0) @@ -2956,6 +3009,7 @@ replace moecd_eq = 0.3*dnc013 + 0.5*dnc1418 + 1 if dhhtp_c4 == 4 drop dnc013 dnc1418 + /******************** IN INITIAL EDUCATION SPELL AGE RANGE ********************/ gen sedag = 1 if dag >= $age_leave_school & dag <= $age_force_leave_spell1_edu replace sedag = 0 if missing(sedag) @@ -2966,6 +3020,7 @@ lab var sedag "Initial education spell age range" fre sedag tab sedag year, col + /***************** WAS IN INITIAL EDUCATION SPELL SAMPLE **********************/ /* Consists of those observed in education in all preceding periods t-1,t-2,t-n, @@ -2981,6 +3036,7 @@ lab var sedcsmpl "SYSTEM: Continuous education sample" lab def sedcsmpl 1 "Aged 16-29 and were in continuous education" lab val sedcsmpl sedcsmpl + /********************** RETURN TO EDUCATION SAMPLE ****************************/ /* Consists of those who have left their initial education spell above the age of @@ -2993,6 +3049,7 @@ lab var sedrsmpl "SYSTEM : Return to education sample" lab def sedrsmpl 1 "Aged 16+, not retired and not in initial education spell" lab val sedrsmpl sedrsmpl + /******************* IN INITIAL EDUCATION SPELL SAMPLE ************************/ /* Generated from sedcsmpl and ded variables. Sample: Respondents who were in @@ -3006,6 +3063,7 @@ lab var scedsmpl "SYSTEM : Not in continuous education sample" lab def scedsmpl 1 "Left continuous education" lab val scedsmpl scedsmpl + /**************************** INCOME VARIABLES ********************************/ /* A key difference here appears to be that income in EU-SILC is yearly, whereas @@ -3018,7 +3076,6 @@ below. Generate individual income variables: */ -/*************** GROSS PERSONAL NON-BENEFIT MONTHLY INCOME ********************/ /* UK version: egen ypnb = rowtotal(fimnlabgrs_dv fimnpen_dv fimnmisc_dv /// inc_stp inc_tu inc_ma); @@ -3027,7 +3084,7 @@ inc_stp, inc_tu and inc_ma generated at the beginning from income file 1 - fimnlabgrs_dv: total personal monthly labour income gross: employee cash or near cash income (gross). -DP: Note that in UKHLS the variable fimnlabgrs_dv contains “labour income” +DP: Note that in UKHLS the variable fimnlabgrs_dv contains "labour income" (see here: https://www.understandingsociety.ac.uk/documentation/... mainstage/variables/fihhmnlabgrs_dv/_) so my understanding is that self-employment income should also be included here. @@ -3048,7 +3105,7 @@ These variables correspond to a the previous calender year. DP: The Usoc description says that this variable includes receipts reported in the income data file where w_ficode equals [2] pension from a previous employer, -or [3] pension from a spouse’s previous employer. +or [3] pension from a spouse's previous employer. This is assumed to be reported net of tax. So in the UK these are occupational pensions. I think it is correct to use py080g in SILC as an equivalent. @@ -3098,138 +3155,6 @@ in relation to child income. (in EUROMOD these types of incomes are split between the oldest couple in the household). */ -* Household level variables are assigned to all adult hh members -* ==> split them equally among all adults in hh -gen adult = (dag >= $age_adult) //18 yo and over -bysort stm idhh : egen n_adults = total(adult) - -lab var n_adults "Number of adults in hh" - -gen child = (dag < $age_adult) //below 18 yo -bysort stm idhh : egen n_child = total(child) - -lab var n_child "Number of children in hh" - -foreach var in hy080g hy110g hy040g hy090g { - - gen `var'_pc = `var'/n_adults - replace `var'_pc = 0 if child == 1 - -} - -//order stm idhh dag hy080g hy110g hy040g hy090g hy080g_pc hy110g_pc /// -// hy040g_pc hy090g_pc, last - -egen ypnb_temp = rowtotal(py010g py050g py080g hy080g_pc hy110g_pc /// - hy040g_pc hy090g_pc) -gen ypnb = ypnb_temp / 12 - -fre ypnb if ypnb < 0 -/* obs with negative income (due to negative self-employment income) but many of -these are close to zero ==> recode them to zero */ - -* Impose non-negativity -replace ypnb = 0 if ypnb < 0 - -sum ypnb -assert ypnb >= 0 - -sum ypnb if year == 2013 -sum ypnb if year == 2016 -sum ypnb if year == 2019 -sum ypnb if year == 2023 - -* Check for missing values == if missing on all the components -count if py010g >= . & py050g >= . & py080g >= . & hy080g >= . & /// - hy110g >= . & hy040g >= . & hy090g >= . // 0 obs - -count if (py010g >= . | py050g >= . | py080g >= . | hy080g >= . | /// - hy110g >= . | hy040g >= . | hy090g >= .) & dag >= 16 // 65,891 - -count if dag >= 16 // 332,907 -//=> 20% of adult observations have some missing income information - -/********** GROSS PERSONAL NON-EMPLOYMENT NON-BENEFIT MONTHLY INCOME **********/ -/* -UK version: egen yptc = rowtotal(fimnpen_dv fimnmisc_dv inc_stp inc_tu inc_ma) - -EU SILC use the same variables as indicated above. -*/ - -egen yptc = rowtotal(py080g hy080g_pc hy110g_pc hy040g_pc hy090g_pc) -replace yptc = yptc / 12 - -sum yptc -sum yptc if year == 2013 -sum yptc if year == 2016 -sum yptc if year == 2019 -sum yptc if year == 2023 - -* Check for missing values == if missing on all the components -count if py080g >= . & hy080g >= . & hy110g >= . & hy040g >= . & /// - hy090g >= . // 0 obs with all missing elements - -count if (py080g >= . | hy080g >= . | hy110g >= . | hy040g >= . | /// - hy090g >= .) & dag >= 16 - // 65,891 adult obs with at least one missing element - // 20% of adult observations have some missing income information same as - // gross personal non-ben income - -/***************** GROSS PERSONAL EMPLOYMENT MONTHLY INCOME *******************/ -/* -UK version: gen yplgrs = fimnlabgrs_dv -EU SILC version: As above. -*/ -egen yplgrs = rowtotal(py010g py050g) -replace yplgrs = yplgrs / 12 - -fre yplgrs if yplgrs < 0 // 0 obs - -* Impose non-negativity -replace yplgrs = 0 if yplgrs < 0 - -drop *_temp - -* Check for missing values == if missing on all the components -count if py010g >= . & py050g >= . & dag >= 16 // 22,426 adults missing both -count if (py010g >= . & py050g >= . ) & dag >= 16 & les_c3 == 1 - // 0 employed adults missing information - -replace yplgrs = -9 if (py010g >= . & py050g >= .) & dag >= 16 & les_c3 == 1 - -sum yplgrs -sum yplgrs if year == 2013 -sum yplgrs if year == 2016 -sum yplgrs if year == 2019 -sum yplgrs if year == 2023 - -/************* SPOUSE GROSS PERSONAL NON-BENEFIT MONTHLY INCOME ***************/ -preserve -keep swv idperson idhh ypnb -rename ypnb ypnbsp -rename idperson idpartner -save "$dir_data/temp_ypnb", replace -restore - -merge m:1 swv idpartner idhh using "$dir_data/temp_ypnb" -keep if _merge == 1 | _merge == 3 -drop _merge - -/****************** HH/BEN UNIT GROSS NON-BENEFIT MONTHLY INCOME **************/ -/* -Couples = sum of partners incomes. Singles = own income -*/ -sum ypnb ypnbsp - -egen yhhnb = rowtotal(ypnb ypnbsp) if dhhtp_c4 == 1 | dhhtp_c4 == 2 - -replace yhhnb = ypnb if dhhtp_c4 == 3 | dhhtp_c4 == 4 - -sum yhhnb -sum yhhnb if year == 2013 -sum yhhnb if year == 2016 -sum yhhnb if year == 2019 -sum yhhnb if year == 2023 /************************************ CPI *************************************/ /* @@ -3266,314 +3191,59 @@ replace CPI = 143.5 if stm == 2023 lab var CPI "HICP, all items, base 2015" -/************************ REAL MONTHLY GROSS INCOMES **************************/ -* For household income, equivalise and adjust for inflation -replace yhhnb = (yhhnb/moecd_eq)/(CPI/100) - -* Adjust for inflation: -replace ypnb = ypnb/(CPI/100) -replace yptc = yptc/(CPI/100) -replace yplgrs = yplgrs/(CPI/100) -replace ypnbsp = ypnbsp/(CPI/100) - -lab var ypnb "Gross monthy real personal non-benefit income " -lab var yptc "Gross real monthly personal non-employment, non-benefit income" -lab var yplgrs "Gross monthly real personal employment income" -lab var ypnbsp "Spouse gross real monthly personal non-benefit income" - -/************ INVERSE HYPERBOLIC SINE REAL MONTHLY GROSS INCOME ***************/ -/* -This (monotonic) transformation is useful for data that exhibit highly skewed -distributions, as it can help stabilize variance and normalise the -distribution. -*/ -gen yhhnb_asinh = asinh(yhhnb) -gen ypnbihs_dv = asinh(ypnb) -gen ypnbihs_dv_sp = asinh(ypnbsp) -gen yptciihs_dv = asinh(yptc) -gen yplgrs_dv = asinh(yplgrs) -replace yplgrs_dv = -9 if yplgrs_dv < 0 - // to account for missing values in the raw data coded as -9 in yplgrs - // (626 real changes made) +/****************************** REAL HOURLY WAGES *****************************/ +/* +There are data issues here: + - Data is collected at the annual level + - The annual information corresponds to the previous calender year + - Income from self-employment can be negative + +Decided on the following: -lab var yhhnb_asinh "Gross real monthly household non-benefit income, asinh" -lab var ypnbihs_dv "Gross real monthly personal non-benefit income, asinh" -lab var ypnbihs_dv_sp /// - "Spoues gross real monthly personal non-benefit income, asinh" -lab var yptciihs_dv /// - "Gross real monthly personal non-employment, non-benefit income, asinh" -lab var yplgrs_dv "Gross real monthly personal employment income, asinh" +wage_hr = Annual employemnt income / Annual # hours worked + = Annual employemnt income / (# months worked * # hours worked ... + a week * 4.33) -/* -sum ypnbihs_dv ypnbihs_dv_sp yptciihs_dv yplgrs_dv +Data year: T-1 T T+1 +Hrs: T-1 T T+1 +Income: T-2 T-1 T +# month: T-2 T-1 T - Variable | Obs Mean Std. dev. Min Max --------------+--------------------------------------------------------- - ypnbihs_dv | 754,135 2.907554 3.419338 0 11.12896 -ypnbihs_dv~p | 378,604 4.05074 3.494108 0 11.12896 - yptciihs_dv | 754,135 .3444508 1.204493 0 9.898644 - yplgrs_dv | 754,135 2.734086 3.439811 0 11.12896 +# Months worked last year can be constructed using the PL211* variables -*/ +PY211A - Main activity January -/*********** HOUSEHOLD GROSS NON-BENEFIT MONTHLY INCOME QUINTILES *************/ -sum yhhnb_asinh +1 Employee working full-time +2 Employee working part-time +3 Self-employed working full-time (including family worker) +4 Self-employed working part-time (including family worker) +5 Unemployed +6 Student, pupil +7 Retired +8 Unable to work due to long-standing health problems +10 Fulfilling domestic tasks +11 Other -/* - Variable | Obs Mean Std. dev. Min Max --------------+--------------------------------------------------------- - yhhnb_asinh | 754,135 3.390741 3.436031 0 10.57726 +=> Create a measure of hourly wages for year T using hour info from yr T +and wage info and number of months worked info from T+1 */ -/* -cap drop ydses* -forvalues stm=2005/2020 { - xtile ydses_c5_`stm' = yhhnb_asinh if depChild != 1 & stm==`stm', nq(5) - bys idhh: egen ydses_c5_tmp_`stm' = max(ydses_c5_`stm') if stm==`stm' - replace ydses_c5_`stm' = ydses_c5_tmp_`stm' if missing(ydses_c5_`stm') - drop ydses_c5_tmp_`stm' -} +xtset idperson swv +sort idperson swv -egen ydses_c5 = rowtotal(ydses_c5_2005 ydses_c5_2006 ydses_c5_2007 /// - ydses_c5_2008 ydses_c5_2009 ydses_c5_2010 ydses_c5_2011 ydses_c5_2012 /// - ydses_c5_2013 ydses_c5_2014 ydses_c5_2015 ydses_c5_2016 ydses_c5_2017 /// - ydses_c5_2018 ydses_c5_2019 ydses_c5_2020) -recode ydses_c5 (0=-9) -drop ydses_c5_2* -bys stm: fre ydses_c5 -*/ +* Create monthly income for T-1 +* Annual gross real labour income in T-1 +egen yplgrs_annual = rowtotal(py010g py050g) + // missing = 0 -/* -Problem: if many observations in yhhnb_asinh have exactly the same value, -xtile would group them into a single quintile, causing one or more quintiles to -have very few observations. -This results in 2nd quintile being extremely small compared to the first -quintile, which probably has many similar values -Adding a very small random amount to yhhnb_asinh can help differentiate tied -values enough to distribute them more evenly across quintiles without distorting -the data meaningfully. -*/ -sort idperson swv +* Impose non-negativity +gen flag_neg_labour_annual = (yplgrs_annual < 0) +lab var flag_neg_labour_annual "FLAG: negative labour income reported" -gen yhhnb_asinh_jittered = yhhnb_asinh + runiform() * 1e-5 +replace yplgrs_annual = 0 if yplgrs_annual < 0 -cap drop ydses* -forvalues stm = 2005/2023 { - - xtile ydses_c5_`stm' = yhhnb_asinh_jittered if depChild != 1 & /// - stm == `stm', nq(5) - - bys idhh: egen ydses_c5_tmp_`stm' = max(ydses_c5_`stm') if stm == `stm' - - replace ydses_c5_`stm' = ydses_c5_tmp_`stm' if missing(ydses_c5_`stm') - drop ydses_c5_tmp_`stm' - -} - -egen ydses_c5 = rowtotal(ydses_c5_2005 ydses_c5_2006 ydses_c5_2007 /// - ydses_c5_2008 ydses_c5_2009 ydses_c5_2010 ydses_c5_2011 ydses_c5_2012 /// - ydses_c5_2013 ydses_c5_2014 ydses_c5_2015 ydses_c5_2016 /// - ydses_c5_2017 ydses_c5_2018 ydses_c5_2019 ydses_c5_2020 ydses_c5_2021 /// - ydses_c5_2022 ydses_c5_2023) -recode ydses_c5 (0 = -9) -drop ydses_c5_2* -bys stm: fre ydses_c5 - -lab var ydses_c5 "Gross real monthly household non-benefit income quintiles" - -/********** COUPLE DIFFERENCE IN GROSS PERSONAL NON-BENEFIT INCOME ************/ -gen ynbcpdf_dv = ypnbihs_dv - ypnbihs_dv_sp -recode ynbcpdf_dv (. = -999) if idpartner < 0 -recode ynbcpdf_dv (. = -999) -sum ynbcpdf_dv - -lab var ynbcpdf_dv /// -"Difference between own and spouse's gross personal non-benefit income, asinh" - -/****************************** GROSS NET RATIO ******************************/ -/* -There are no net incomes in EU-SILC, will be computed using EUROMOD anyway -*/ -gen gross_net_ratio = 1 - -/******************** GROSS PERSONAL CAPITAL INCOME ***************************/ -/* -UK version: -gen ypncp = /// - asinh((fimninvnet_dv+fimnmisc_dv+fimnprben_dv)*gross_net_ratio*(1/CPI)) - -1 - fimninvnet_dv: Investment income - -2 - fimnmisc_dv: Net miscellaneous income. Educational grant - (not student loan or tuition fee loan), payments from a - family member not living here, or any other regular payment - (not asked in Wave 1). - -3 - fimnprben_dv: Net private benefit income. Trade union/friendly society - payment, maintenance or alimony, or sickness and accident - insurance. - -EU SILC version see above. -*/ -egen ypncp_temp = rowtotal(hy080g_pc hy110g_pc hy040g_pc hy090g_pc) -gen ypncp = ypncp_temp / 12 -replace ypncp = asinh(ypncp*(100/CPI)) - -lab var ypncp "Gross real monthly personal non-employment capital income, asinh" - -gen ln_ypncp = ln(sinh(ypncp)) - -lab var ln_ypncp "Gross real monthly personal non-employment capital income, ln" - -sum ypncp -sum ypncp if year == 2013 -sum ypncp if year == 2016 -sum ypncp if year == 2019 -sum ypncp if year == 2023 - -* Check for missing values == if missing on all the components -count if hy080g >= . & hy110g >= . & hy040g >= . & hy090g >= . // 0 obs -count if hy080g >= . | hy110g >= . | hy040g >= . | hy090g >= . - // 0 obs have some missing capital income information - -/************************* PRIVATE PENSION INCOME *****************************/ -/* -UK version: -fimnpen_dv: Monthly amount of net pension income - -Eu SILC version -py080g: Pension from individual private plans (gross) -*/ -gen ypnoab_lvl = (py080g/12)*(100/CPI) -recode ypnoab_lvl (. = 0) -gen ypnoab = asinh(ypnoab_lvl) - -lab var ypnoab "Gross real monthly personal private pension income" - -sum ypnoab -sum ypnoab if year == 2013 -sum ypnoab if year == 2016 -sum ypnoab if year == 2019 -sum ypnoab if year == 2023 - -count if py080g >= . & dag >= 16 // 65,855 obs - -* Final check there are no missing values in income vars -foreach var in ydses_c5 ypnbihs_dv yptciihs_dv yplgrs_dv ynbcpdf_dv /// - ypncp ypnoab { - - assert `var'!= . - -} - -/***************************** HOME OWNERSHIP *********************************/ -/* -Dhh_owned is the definition used in the initial population and in the model -predicting house ownership in the homeownership process of the simulation. -*/ -// bys swv: fre hh021 -gen dhh_owned = 0 -replace dhh_owned = 1 if hh021 == 1 | hh021 == 2 - -lab var dhh_owned "Home ownership dummy" - -fre dhh_owned -tab dhh_owned year, col - -/* -TO DO: Adjust for the new home ownership process? YES -*/ - -/**************************** DISABILITY BENEFIT ******************************/ -/* -In EU-SILC, the variables -- py130n: (disability benefits net), -- py130g: (disability benefits gross), -- py131g: (contributory and means-tested), -- py132g: (contributory and non means-tested), -- py133g: (non-contributory and means-tested), -- py134g: (non-contributory and non means-tested) - -All may contain information on disability benefits. - -For Poland, py131g only has zero entries - -The code below may well be PL specific as some of the coding of these variables -changes between countries. -I expect that there is probably a better/more efficient way of constructing this -code. -*/ -recode py130n (0 = -9)(. = -9), gen(py130nr) -recode py130g (0 = -9)(. = -9), gen(py130gr) -recode py132g (0 = -9)(. = -9), gen(py132gr) -recode py133g (0 = -9)(. = -9), gen(py133gr) -recode py134g (0 = -9)(. = -9), gen(py134gr) - -gen bdi = 0 -replace bdi = 1 if py130gr >= 1 | py130gr >= 1 | py132gr >= 1 | /// - py133gr >= 1 | py134gr >= 1 -lab val bdi dummy - -lab var bdi "Disability benefits (dummy)" - -drop py130nr py130gr py132gr py133gr py134gr - -fre bdi -tab bdi year, col - -/**************************** HOURLY LABOUR INCOME ****************************/ -/* -There are data issues here: - - Data is collected at the annual level - - The annual information corresponds to the previous calender year - - Income from self-employment can be negative - -Decided on the following: - -wage_hr = Annual employemnt income / Annual # hours worked - = Annual employemnt income / (# months worked * # hours worked ... - a week * 4.33) - -Data year: T-1 T T+1 -Hrs: T-1 T T+1 -Income: T-2 T-1 T -# month: T-2 T-1 T - -# Months worked last year can be constructed using the PL211* variables - -PY211A - Main activity January - -1 Employee working full-time -2 Employee working part-time -3 Self-employed working full-time (including family worker) -4 Self-employed working part-time (including family worker) -5 Unemployed -6 Student, pupil -7 Retired -8 Unable to work due to long-standing health problems -10 Fulfilling domestic tasks -11 Other - -=> Create a measure of hourly wages for year T using hour info from yr T -and wage info and number of months worked info from T+1 -*/ - -xtset idperson swv -sort idperson swv - -* Create monthly income for T-1 -* Annual gross real labour income in T-1 -egen yplgrs_annual = rowtotal(py010g py050g) - // missing = 0 - -* Impose non-negativity -gen flag_neg_labour_annual = (yplgrs_annual < 0) -lab var flag_neg_labour_annual "FLAG: negative labour income reported" - -replace yplgrs_annual = 0 if yplgrs_annual < 0 - -* Turn into real gross labour income using lagged CPI to account for timing +* Turn into real gross annual labour income using lagged CPI to account for timing gen CPI_5 = 80.2 gen CPI_6 = 81.2 gen CPI_7 = 83.3 @@ -3670,10 +3340,6 @@ replace yplgrs_mnth = yplgrs_annual / 12 if months_wrk == 0 * Check sum yplgrs_mnth -sum yplgrs -sum yplgrs if yplgrs_mnth != . - -bys stm: sum yplgrs_mnth sort idperson swv @@ -3685,7 +3351,7 @@ gen obs_earnings_hourly = . replace obs_earnings_hourly = f.yplgrs_mnth/(lhw*4.33) if les_c4 == 1 lab var obs_earnings_hourly /// - "Observed hourly wages, emp and self-emp, adjusted for timing" + "Observed hourly real wages, emp and self-emp, adjusted for timing" * Impose consistency replace obs_earnings_hourly = 0 if les_c3 == 2 | les_c3 == 3 @@ -3710,174 +3376,582 @@ count if obs_earnings_hourly == 0 & les_c3 == 1 & yplgrs_annual[_n+1] == 0 & /// // accounted for all cases /* -Missing wage observations: -1- almost all due to being the last observation in individual's panel -2- missing activity information -3- missing adjacent observation +Missing wage observations: +1- almost all due to being the last observation in individual's panel +2- missing activity information +3- missing adjacent observation + +Zero wage observations +4- next year is missing labour income information +5- next year reports zero labour income + +How to address each case: +- uprate previously reported wages +- use last years earnings and this years hours +- use next years wages + +- use hot deck imputation + +https://bdl.stat.gov.pl/bdl/dane/podgrup/tablica + +*/ +* Average gross wages +gen avg_wage_5 = 2506.93 +gen avg_wage_6 = 2636.81 +gen avg_wage_7 = 2866.04 +gen avg_wage_8 = 3158.48 +gen avg_wage_9 = 3315.38 +gen avg_wage_10 = 3435.00 +gen avg_wage_11 = 3625.21 +gen avg_wage_12 = 3744.38 +gen avg_wage_13 = 3877.43 +gen avg_wage_14 = 4003.99 +gen avg_wage_15 = 4150.86 +gen avg_wage_16 = 4290.52 +gen avg_wage_17 = 4527.89 +gen avg_wage_18 = 4834.76 +gen avg_wage_19 = 5181.63 +gen avg_wage_20 = 5523.32 +gen avg_wage_21 = 6001.02 +gen avg_wage_22 = 6705.62 +gen avg_wage_23 = 7595.30 + +replace obs_earnings_hourly = . if obs_earnings_hourly == 0 & les_c3 == 1 + +gen x = 1 if les_c3 == 1 & obs_earnings_hourly == . + +* Imputation +forvalues i = 6/23 { + + local j = `i'-1 + + gen nwage_growth_`j'`i' = avg_wage_`i'/avg_wage_`j' + + gen inflation_change_`j'`i' = CPI_`i'/CPI_`j' + + gen growth_factor_`j'`i' = nwage_growth_`j'`i'/inflation_change_`j'`i' + + * Use last years wages + replace obs_earnings_hourly = /// + obs_earnings_hourly[_n-1] * growth_factor_`j'`i' /// + if idperson == idperson[_n-1] & les_c3 == 1 & les_c3[_n-1] == 1 & /// + swv == 2000 +`i' & obs_earnings_hourly == . + + * Use the next years wages + replace obs_earnings_hourly = /// + obs_earnings_hourly[_n+1] / growth_factor_`j'`i' /// + if idperson == idperson[_n+1] & les_c3 == 1 & les_c3[_n+1] == 1 & /// + swv == 2000 +`i' & obs_earnings_hourly == . + + * Use last years earnings and this years hours + replace obs_earnings_hourly = /// + (yplgrs_mnth/(lhw*4.33)) * growth_factor_`j'`i' if /// + obs_earnings_hourly == . & swv == 2000 + `i' & yplgrs_mnth != 0 + +} + +gen flag_wage_imp_panel = (x == 1 & obs_earnings_hourly != . ) + +label var flag_wage_imp_panel /// + "FLAG: wage imputed using surrounding panel information and uprating" + +count if obs_earnings_hourly == . // 25,041 +count if obs_earnings_hourly == . & idperson != idperson[_n+1] // 14,613 +count if obs_earnings_hourly == . & idperson == idperson[_n+1] & /// + les_c3 == -9 // 408 +count if obs_earnings_hourly == . & idperson == idperson[_n+1] & /// + les_c3 == 1 & swv != swv[_n+1] - 1 // 89 + +count if obs_earnings_hourly == . & les_c3 == 1 & yplgrs_annual[_n+1] == 0 & /// + flag_missing_lbr_income[_n+1] == 1 & idperson == idperson[_n+1] // 605 +count if obs_earnings_hourly == . & les_c3 == 1 & yplgrs_annual[_n+1] == 0 & /// + flag_missing_lbr_income[_n+1] == 0 & idperson == idperson[_n+1] // 9,363 + +count if obs_earnings_hourl == 0 & les_c3 == 1 // 0 + +* Use hot deck imputation for the remaining missing observations among the +* working + +gen flag_wage_hotdeck = (les_c3 == 1 & missing(obs_earnings_hourly)) + +lab var flag_wage_hotdeck "FLAG: wage imputed using hotdeck imputation" + +* Strata +cap drop ageband +gen ageband = floor(dag/10)*10 +replace ageband = 60 if ageband == 70 + // group 70+ year olds with 60+ to ensure matches + +cap drop stratum +egen stratum = group(ageband drgn1 dgn swv), label(strutum, replace) + +* Define donor pool +preserve + +keep if les_c3 == 1 & obs_earnings_hourly != . +keep obs_earnings_hourly stratum idperson swv +bys stratum (idperson swv): gen draw = _n +bys stratum (idperson swv): gen n_donors = _N +rename obs_earnings_hourly donor_wages +drop idperson swv +save "$dir_data/temp_wages_donors", replace + +keep stratum n_donors +bys stratum: keep if _n == 1 +save "$dir_data/temp_donorsN", replace + +restore + +* Attached number of donors in each stratum +merge m:1 stratum using "$dir_data/temp_donorsN", nogen + +* Assign random donor +gen draw = . + +sort stratum idperson swv + +by stratum (idperson swv): replace draw = ceil(runiform()*n_donors[1]) if /// + flag_wage_hotdeck == 1 & n_donors > 0 + +* Attach donor +merge m:1 stratum draw using "$dir_data/temp_wages_donors", /// + keepusing(donor_wages draw) + +drop if _m == 2 +drop _m + +replace obs_earnings_hourly = donor_wage if flag_wage_hotdeck == 1 + +drop donor_wage ageband stratum dra n_donor + +count if obs_earnings_hourly == . & les_c3 == 1 + +* Lagged wage +xtset idperson swv + +gen l1_obs_earnings_hourly = . + +replace l1_obs_earnings_hourly = l.obs_earnings_hourly +lab var l1_obs_earnings_hourly /// + "Observed hourly real wages, emp and self-emp, t-1, adjusted for timing" + +sum obs_earnings_hourly if les_c3 == 1 +sum obs_earnings_hourly if les_c3 == 2 +sum obs_earnings_hourly if les_c3 == 3 +sum obs_earnings_hourly if les_c3 == -9 + +drop yplgrs_annual yplgrs_mnth + + +/************** GROSS REAL MONTHLY PERSONAL EMPLOYMENT INCOME *****************/ +/* +Use wage and hours worked info instead of reported amounts in py010g py050g +Use real wages therefore already in real terms. +*/ +/* +egen yplgrs = rowtotal(py010g py050g) +replace yplgrs = yplgrs / 12 + +fre yplgrs if yplgrs < 0 // 0 obs + +* Impose non-negativity +replace yplgrs = 0 if yplgrs < 0 + +*/ + +gen yplgrs = obs_earnings_hourly * lhw * 4.33 +assert yplgrs >= 0 + +count if yplgrs == . // 8,461 + +* Checks +assert yplgrs == 0 if les_c4 != 1 & les_c4 > 0 +assert obs_earnings_hourly == 0 if les_c4 != 1 & les_c4 > 0 + +sum obs_earnings_hourly if les_c3 == 2 +count if obs_earnings_hourly == . & les_c3 == 2 +count if lhw == . & les_c3 == 2 + +sum obs_earnings_hourly if les_c3 == 3 +count if obs_earnings_hourly == . & les_c3 == 3 +count if lhw == . & les_c3 == 3 +// all missing for those who are working + +count if obs_earnings_hourly == . & les_c3 < 0 +count if lhw == . & les_c3 < 0 +// if missing some, missing all relevant info + + +/**************** GROSS NOMINAL MONTHLY PERSONAL CAPITAL INCOME ***************/ +/* +UK version: +gen ypncp = /// + asinh((fimninvnet_dv+fimnmisc_dv+fimnprben_dv)*gross_net_ratio*(1/CPI)) + +1 - fimninvnet_dv: Investment income + +2 - fimnmisc_dv: Net miscellaneous income. Educational grant + (not student loan or tuition fee loan), payments from a + family member not living here, or any other regular payment + (not asked in Wave 1). + +3 - fimnprben_dv: Net private benefit income. Trade union/friendly society + payment, maintenance or alimony, or sickness and accident + insurance. + +EU SILC version see above. +NOTE: The raw variables have no missing or negative values. + +*/ + +* Household level variables are assigned to all adult hh members +* ==> split them equally among all adults in hh +gen adult = (dag >= $age_adult) //18 yo and over +bysort stm idhh : egen n_adults = total(adult) + +lab var n_adults "Number of adults in hh" + +gen child = (dag < $age_adult) //below 18 yo +bysort stm idhh : egen n_child = total(child) + +lab var n_child "Number of children in hh" + +* NOTE: No negative values or missing values +foreach var in hy080g hy110g hy040g hy090g { + + gen `var'_pc = `var'/n_adults + replace `var'_pc = 0 if child == 1 + +} + +egen ypncp_temp = rowtotal(hy080g_pc hy110g_pc hy040g_pc hy090g_pc) +gen ypncp = ypncp_temp / 12 + +* Check for missing values == if missing on all the components +count if hy080g == . & hy110g == . & hy040g == . & hy090g == . // 0 obs +count if hy080g == . | hy110g == . | hy040g == . | hy090g == . + + +/*********** GROSS NONMINAL MONTHLY PERSONAL PRIVATE PENSION INCOME ***********/ +/* +UK version: +fimnpen_dv: Monthly amount of net pension income + +EU SILC version +py080g: Pension from individual private plans (gross) + +NOTE: The raw variable has many missing (.) values. +*/ + +gen ypnoab = py080g / 12 + +* Code missing as zero +recode ypnoab (. = 0) + +sum ypnoab +sum ypnoab if year == 2013 +sum ypnoab if year == 2016 +sum ypnoab if year == 2019 +sum ypnoab if year == 2023 + +count if py080g == . & dag >= 16 + + +/*********** GROSS NOMINAL MONTHLY PERSONAL NON-BENEFIT INCOME ****************/ +/* +Note: This is supposed to mirror UKMOD market income + + = employment income + private pensions income + capital income + +Use components instead of raw vars so that changes feed through +*/ +/* +egen ypnb_temp = rowtotal(py010g py050g py080g hy080g_pc hy110g_pc /// + hy040g_pc hy090g_pc) +gen ypnb = ypnb_temp / 12 + +fre ypnb if ypnb < 0 +/* obs with negative income (due to negative self-employment income) but many of +these are close to zero ==> recode them to zero */ + +* Impose non-negativity +replace ypnb = 0 if ypnb < 0 + +sum ypnb +assert ypnb >= 0 +*/ + +* Adjust gross eomployment income (yplgrs) so in nominal terms +gen temp_yplgrs = yplgrs * (CPI/100) + +egen ypnb = rowtotal(temp_yplgrs ypncp ypnoab) + +sum ypnb if year == 2013 +sum ypnb if year == 2016 +sum ypnb if year == 2019 +sum ypnb if year == 2023 + +* Check for missing values == if missing on all the components +count if py080g >= . & hy080g >= . & /// + hy110g >= . & hy040g >= . & hy090g >= . + +count if (py080g >= . | hy080g >= . | /// + hy110g >= . | hy040g >= . | hy090g >= .) & dag >= 16 + +count if dag >= 16 +//=> 20% of adult observations have some missing income information + + +/****** GROSS NOMINAL MONTHLY PERSONAL NON-EMPLOYMENT NON-BENEFIT INCOME ******/ +/* + = capital income + private pension income + +UK version: egen yptc = rowtotal(fimnpen_dv fimnmisc_dv inc_stp inc_tu inc_ma) + +EU SILC use the same variables as indicated above. +*/ + +egen yptc = rowtotal(ypncp ypnoab) + + +/************* SPOUSE GROSS PERSONAL MONTHLY NON-BENEFIT INCOME ***************/ +preserve +keep swv idperson idhh ypnb +rename ypnb ypnbsp +rename idperson idpartner +save "$dir_data/temp_ypnb", replace +restore + +merge m:1 swv idpartner idhh using "$dir_data/temp_ypnb" +keep if _merge == 1 | _merge == 3 +drop _merge + + +/************* EQUIV HH/BEN UNIT GROSS MONTHLY NON-BENEFIT INCOME *************/ +/* +Couples = sum of partners incomes. Singles = own income +*/ +sum ypnb ypnbsp + +egen yhhnb = rowtotal(ypnb ypnbsp) if dhhtp_c4 == 1 | dhhtp_c4 == 2 + +replace yhhnb = ypnb if dhhtp_c4 == 3 | dhhtp_c4 == 4 + +* Equivalise +replace yhhnb = (yhhnb/moecd_eq) + +sum yhhnb +sum yhhnb if year == 2013 +sum yhhnb if year == 2016 +sum yhhnb if year == 2019 +sum yhhnb if year == 2023 + + +/****************** NOMINAL MONTHLY PERSONAL DISPOSABLE INCOME *****************/ + +* Create hh value of company car variable +replace py021g = 0 if py021g == . +bysort stm idhh : egen hh_comp_car = total(py021g) + +* Hh disp net company car +gen ydisp_hh = hy020 - hh_comp_car + +* Split hh level vars into individual amounts +gen ydisp = ydisp_hh/n_adults +replace ydisp = 0 if child == 1 + +* Create monthly amount +replace ydisp = ydisp / 12 + + +/************************ REAL MONTHLY GROSS INCOMES **************************/ +* Adjust for inflation: +* NOTE: yplgrs already in real terms as derived from real wages +replace ypnb = ypnb/(CPI/100) +replace yptc = yptc/(CPI/100) +replace ypnbsp = ypnbsp/(CPI/100) +replace ypncp = ypncp/(CPI/100) +replace ypnoab = ypnoab/(CPI/100) +replace yhhnb = yhhnb/(CPI/100) +replace ydisp = ydisp/(CPI/100) + +lab var ypnb "Gross real monthly personal non-benefit income" +lab var yptc "Gross real monthly personal non-employment, non-benefit income" +lab var yplgrs "Gross real monthly personal employment income" +lab var ypnbsp "Spouse's gross real monthly personal non-benefit income" +lab var ypncp "Gross real monthly personal capital income" +lab var ypnoab "Gross real monthly personal private pension income" +lab var yhhnb "Equivalized gross real monthly non-benefit hh income" +lab var ydisp "Disposable real monthly personal income" + +gen ypnoab_lvl = ypnoab + + +/*********** INVERSE HYPERBOLIC SINE REAL MONTHLY GROSS INCOMES ***************/ +/* +This (monotonic) transformation is useful for data that exhibit highly skewed +distributions, as it can help stabilize variance and normalise the +distribution. +*/ +gen ypnbihs_dv = asinh(ypnb) +gen yptciihs_dv = asinh(yptc) +gen yplgrs_dv = asinh(yplgrs) +gen ypnbihs_dv_sp = asinh(ypnbsp) +replace ypncp = asinh(ypncp) +replace ypnoab = asinh(ypnoab) +gen yhhnb_asinh = asinh(yhhnb) + +lab var ypnbihs_dv "Gross real monthly personal non-benefit income, asinh" +lab var yptciihs_dv /// + "Gross real monthly personal non-employment, non-benefit income, asinh" +lab var yplgrs_dv "Gross real monthly personal employment income, asinh" +lab var ypnbihs_dv_sp /// + "Spouse's gross real monthly personal non-benefit income, asinh" +lab var ypncp "Gross real monthly personal capital income, asinh" +lab var ypnoab "Gross real monthly personal private pension income, asinh" +lab var yhhnb_asinh "Gross real monthly household non-benefit income, asinh" + +/* +sum ypnbihs_dv ypnbihs_dv_sp yptciihs_dv yplgrs_dv ypncp ypnoab +*/ -Zero wage observations -4- next year is missing labour income information -5- next year reports zero labour income +/************************ LOG CAPTIAL INCOME **********************************/ -How to address each case: -- uprate previously reported wages -- use last years earnings and this years hours -- use next years wages +gen ln_ypncp = ln(sinh(ypncp)) + +lab var ln_ypncp "Gross real monthly personal non-employment capital income, ln" + -- use hot deck imputation +/***** GROSS REAL MONTHLY EQUIV HOUSEHOLD NON-BENEFIT INCOME QUINTILES ********/ +sum yhhnb_asinh -https://bdl.stat.gov.pl/bdl/dane/podgrup/tablica +/* +cap drop ydses* +forvalues stm=2005/2020 { + xtile ydses_c5_`stm' = yhhnb_asinh if depChild != 1 & stm==`stm', nq(5) + bys idhh: egen ydses_c5_tmp_`stm' = max(ydses_c5_`stm') if stm==`stm' + replace ydses_c5_`stm' = ydses_c5_tmp_`stm' if missing(ydses_c5_`stm') + drop ydses_c5_tmp_`stm' +} -*/ -* Average gross wages -gen avg_wage_5 = 2506.93 -gen avg_wage_6 = 2636.81 -gen avg_wage_7 = 2866.04 -gen avg_wage_8 = 3158.48 -gen avg_wage_9 = 3315.38 -gen avg_wage_10 = 3435.00 -gen avg_wage_11 = 3625.21 -gen avg_wage_12 = 3744.38 -gen avg_wage_13 = 3877.43 -gen avg_wage_14 = 4003.99 -gen avg_wage_15 = 4150.86 -gen avg_wage_16 = 4290.52 -gen avg_wage_17 = 4527.89 -gen avg_wage_18 = 4834.76 -gen avg_wage_19 = 5181.63 -gen avg_wage_20 = 5523.32 -gen avg_wage_21 = 6001.02 -gen avg_wage_22 = 6705.62 -gen avg_wage_23 = 7595.30 +egen ydses_c5 = rowtotal(ydses_c5_2005 ydses_c5_2006 ydses_c5_2007 /// + ydses_c5_2008 ydses_c5_2009 ydses_c5_2010 ydses_c5_2011 ydses_c5_2012 /// + ydses_c5_2013 ydses_c5_2014 ydses_c5_2015 ydses_c5_2016 ydses_c5_2017 /// + ydses_c5_2018 ydses_c5_2019 ydses_c5_2020) +recode ydses_c5 (0=-9) +drop ydses_c5_2* +bys stm: fre ydses_c5 +*/ -replace obs_earnings_hourly = . if obs_earnings_hourly == 0 & les_c3 == 1 +/* +Problem: if many observations in yhhnb_asinh have exactly the same value, +xtile would group them into a single quintile, causing one or more quintiles to +have very few observations. +This results in 2nd quintile being extremely small compared to the first +quintile, which probably has many similar values +Adding a very small random amount to yhhnb_asinh can help differentiate tied +values enough to distribute them more evenly across quintiles without distorting +the data meaningfully. +*/ +sort idperson swv -gen x = 1 if les_c3 == 1 & obs_earnings_hourly == . +gen yhhnb_asinh_jittered = yhhnb_asinh + runiform() * 1e-5 -* Imputation -forvalues i = 6/23 { - - local j = `i'-1 - - gen nwage_growth_`j'`i' = avg_wage_`i'/avg_wage_`j' - - gen inflation_change_`j'`i' = CPI_`i'/CPI_`j' - - gen growth_factor_`j'`i' = nwage_growth_`j'`i'/inflation_change_`j'`i' - - * Use last years wages - replace obs_earnings_hourly = /// - obs_earnings_hourly[_n-1] * growth_factor_`j'`i' /// - if idperson == idperson[_n-1] & les_c3 == 1 & les_c3[_n-1] == 1 & /// - swv == 2000 +`i' & obs_earnings_hourly == . - - * Use the next years wages - replace obs_earnings_hourly = /// - obs_earnings_hourly[_n+1] / growth_factor_`j'`i' /// - if idperson == idperson[_n+1] & les_c3 == 1 & les_c3[_n+1] == 1 & /// - swv == 2000 +`i' & obs_earnings_hourly == . +cap drop ydses* +forvalues stm = 2005/2023 { - * Use last years earnings and this years hours - replace obs_earnings_hourly = /// - (yplgrs_mnth/(lhw*4.33)) * growth_factor_`j'`i' if /// - obs_earnings_hourly == . & swv == 2000 + `i' & yplgrs_mnth != 0 + xtile ydses_c5_`stm' = yhhnb_asinh_jittered if depChild != 1 & /// + stm == `stm', nq(5) -} - -gen flag_wage_imp_panel = (x == 1 & obs_earnings_hourly != . ) - -label var flag_wage_imp_panel /// - "FLAG: wage imputed using surrounding panel information and uprating" - -count if obs_earnings_hourly == . // 25,041 -count if obs_earnings_hourly == . & idperson != idperson[_n+1] // 14,613 -count if obs_earnings_hourly == . & idperson == idperson[_n+1] & /// - les_c3 == -9 // 408 -count if obs_earnings_hourly == . & idperson == idperson[_n+1] & /// - les_c3 == 1 & swv != swv[_n+1] - 1 // 89 - -count if obs_earnings_hourly == . & les_c3 == 1 & yplgrs_annual[_n+1] == 0 & /// - flag_missing_lbr_income[_n+1] == 1 & idperson == idperson[_n+1] // 605 -count if obs_earnings_hourly == . & les_c3 == 1 & yplgrs_annual[_n+1] == 0 & /// - flag_missing_lbr_income[_n+1] == 0 & idperson == idperson[_n+1] // 9,363 + bys idhh: egen ydses_c5_tmp_`stm' = max(ydses_c5_`stm') if stm == `stm' -count if obs_earnings_hourl == 0 & les_c3 == 1 // 0 + replace ydses_c5_`stm' = ydses_c5_tmp_`stm' if missing(ydses_c5_`stm') + drop ydses_c5_tmp_`stm' -* Use hot deck imputation for the remaining missing observations among the -* working +} -gen flag_wage_hotdeck = (les_c3 == 1 & missing(obs_earnings_hourly)) +egen ydses_c5 = rowtotal(ydses_c5_2005 ydses_c5_2006 ydses_c5_2007 /// + ydses_c5_2008 ydses_c5_2009 ydses_c5_2010 ydses_c5_2011 ydses_c5_2012 /// + ydses_c5_2013 ydses_c5_2014 ydses_c5_2015 ydses_c5_2016 /// + ydses_c5_2017 ydses_c5_2018 ydses_c5_2019 ydses_c5_2020 ydses_c5_2021 /// + ydses_c5_2022 ydses_c5_2023) +recode ydses_c5 (0 = -9) +drop ydses_c5_2* +bys stm: fre ydses_c5 -lab var flag_wage_hotdeck "FLAG: wage imputed using hotdeck imputation" +lab var ydses_c5 "Gross real monthly household non-benefit income quintiles" -* Strata -cap drop ageband -gen ageband = floor(dag/10)*10 -replace ageband = 60 if ageband == 70 - // group 70+ year olds with 60+ to ensure matches -cap drop stratum -egen stratum = group(ageband drgn1 dgn swv), label(strutum, replace) +/***** COUPLE DIFFERENCE IN GROSS REAL MONTHLY PERSONAL NON-BENEFIT INCOME ****/ +gen ynbcpdf_dv = ypnbihs_dv - ypnbihs_dv_sp +recode ynbcpdf_dv (. = -999) if idpartner < 0 +recode ynbcpdf_dv (. = -999) +sum ynbcpdf_dv -* Define donor pool -preserve +lab var ynbcpdf_dv /// +"Difference between own and spouse's gross personal non-benefit income, asinh" -keep if les_c3 == 1 & obs_earnings_hourly != . -keep obs_earnings_hourly stratum idperson swv -bys stratum (idperson swv): gen draw = _n -bys stratum (idperson swv): gen n_donors = _N -rename obs_earnings_hourly donor_wages -drop idperson swv -save "$dir_data/temp_wages_donors", replace -keep stratum n_donors -bys stratum: keep if _n == 1 -save "$dir_data/temp_donorsN", replace +/****************************** GROSS NET RATIO ******************************/ +/* +There are no net incomes in EU-SILC, will be computed using EUROMOD anyway +*/ +gen gross_net_ratio = 1 -restore -* Attached number of donors in each stratum -merge m:1 stratum using "$dir_data/temp_donorsN", nogen +/***************************** HOME OWNERSHIP *********************************/ +/* +Dhh_owned is the definition used in the initial population and in the model +predicting house ownership in the homeownership process of the simulation. +Thi variable is updated in the benefit unit constrcution do file. +*/ +// bys swv: fre hh021 +gen dhh_owned = 0 +replace dhh_owned = 1 if hh021 == 1 | hh021 == 2 -* Assign random donor -gen draw = . +lab var dhh_owned "Home ownership dummy" -sort stratum idperson swv +fre dhh_owned +tab dhh_owned year, col -by stratum (idperson swv): replace draw = ceil(runiform()*n_donors[1]) if /// - flag_wage_hotdeck == 1 & n_donors > 0 -* Attach donor -merge m:1 stratum draw using "$dir_data/temp_wages_donors", /// - keepusing(donor_wages draw) +/**************************** DISABILITY BENEFIT ******************************/ +/* +In EU-SILC, the variables +- py130n: (disability benefits net), +- py130g: (disability benefits gross), +- py131g: (contributory and means-tested), +- py132g: (contributory and non means-tested), +- py133g: (non-contributory and means-tested), +- py134g: (non-contributory and non means-tested) -drop if _m == 2 -drop _m - -replace obs_earnings_hourly = donor_wage if flag_wage_hotdeck == 1 +All may contain information on disability benefits. -drop donor_wage ageband stratum dra n_donor +For Poland, py131g only has zero entries -count if obs_earnings_hourly == . & les_c3 == 1 +The code below may well be PL specific as some of the coding of these variables +changes between countries. +I expect that there is probably a better/more efficient way of constructing this +code. +*/ +recode py130n (0 = -9)(. = -9), gen(py130nr) +recode py130g (0 = -9)(. = -9), gen(py130gr) +recode py132g (0 = -9)(. = -9), gen(py132gr) +recode py133g (0 = -9)(. = -9), gen(py133gr) +recode py134g (0 = -9)(. = -9), gen(py134gr) -* Lagged wage -xtset idperson swv +gen bdi = 0 +replace bdi = 1 if py130gr >= 1 | py130gr >= 1 | py132gr >= 1 | /// + py133gr >= 1 | py134gr >= 1 +lab val bdi dummy -gen l1_obs_earnings_hourly = . +lab var bdi "Disability benefits (dummy)" -replace l1_obs_earnings_hourly = l.obs_earnings_hourly -lab var l1_obs_earnings_hourly /// - "Observed hourly wages, emp and self-emp, t-1, adjusted for timing" - -sum obs_earnings_hourly if les_c3 == 1 -sum obs_earnings_hourly if les_c3 == 2 -sum obs_earnings_hourly if les_c3 == 3 -sum obs_earnings_hourly if les_c3 == -9 +drop py130nr py130gr py132gr py133gr py134gr + +fre bdi +tab bdi year, col -/* -Note that annual labour income is not aligned with activity status and hours, -but hourly wage is. -*/ /*********************** EDUCATION STATUS - IMPUTATION 2 **********************/ /* AB: At the point missing education level for those that transition out of @@ -3976,7 +4050,6 @@ foreach k in 1 2 3 { } - * Impute cap drop missing_edu gen missing_edu = (deh_c3 == -9) @@ -4106,6 +4179,7 @@ graph drop _all drop dgn2 dag2 dagsq2 drgn12 les_c42 dcpst2 ydses_c52 p1* p2 p3 rnd imp_deh* + /******************** UPDATE PARTNER'S EDUCATION STATUS ***********************/ preserve @@ -4139,6 +4213,7 @@ bys swv: sum dehsp_c4 if dehsp_c4 > 0 sort idperson swv + /***************************** WEIGHTS ****************************************/ /* Clare's notes: @@ -4758,6 +4833,7 @@ Using the rescaled longitudinal weight did not work => use the rescaled base weight */ + /*************************** CONSISTENCY CHECKS *******************************/ * Economic activity tab les_c3 les_c4 @@ -4799,6 +4875,9 @@ sum obs_earnings_hourly if les_c3 != 1 sum obs_earnings_hourly if les_c4 == 1 sum obs_earnings_hourly if les_c4 != 1 +sum yplgrs_dv if les_c3 == 1 +sum yplgrs_dv if les_c3 != 1 + * Partnership tab dun dcpst @@ -4816,6 +4895,7 @@ tab ded deh_c4 tab deh_c3 deh_c4 + /*************************** KEEP RELEVANT WAVES ******************************/ /* Initial populations: longitudinal SILC for 2011-2023 @@ -4826,12 +4906,14 @@ keep if swv >= 2010 save "$dir_data/02_pre_drop.dta", replace + /**************************** SENSE CHECK PLOTS *******************************/ //do "$dir_do/02_01_checks" graph drop _all + /*********************** CREATE ASSUMPTION DESCRIPTIVES **********************/ * Health imputation @@ -5129,29 +5211,31 @@ putexcel D70 = ("All") /*************************** KEEP REQUIRED VARIABLES **************************/ keep idhh idperson idpartner idfather idmother dct drgn1 dnc02 dnc dgn dgnsp /// - dag dagsq dhe dhesp dcpst ded deh_c3 deh_c4 der dehsp_c3 dehm_c3 dehf_c3 /// - dehmf_c3 dcpen dcpyy dcpex dcpagdf dlltsd dlrtrd drtren dlftphm /// + dag dagsq dhe dhesp dcpst ded deh_c3 deh_c4 der dehsp_c3 dehm_c4 dehf_c4 /// + dehmf_c4 dcpen dcpyy dcpex dcpagdf dlltsd dlrtrd drtren dlftphm /// dhhtp_c4 dimlwt dimxwt dhhwt dwt les_c3 les_c4 lessp_c3 lessp_c4 /// lesdf_c4 ydses_c5 ypnbihs_dv yptciihs_dv yplgrs_dv ynbcpdf_dv ypncp /// ln_ypncp ypnoab swv sedex ssscp sprfm sedag stm dagsp lhw der /// adultchildflag sedcsmpl sedrsmpl scedsmpl dhh_owned dchpd dagpns /// - dagpns_sp CPI dlltsd_sp ypnoab_lvl flag_* Int_Date unemp yplgrs liwwh /// - dagpns_y dagpns_y1 dagpns_y_sp dagpns_y1_sp obs_earnings_hourly /// - l1_obs_earnings_hourly l1_les_c3 l1_les_c4 new_rel dcpyy_st student /// + dagpns_sp CPI dlltsd_sp ypnoab_lvl ydisp flag_* Int_Date unemp yplgrs /// + liwwh dagpns_y dagpns_y1 dagpns_y_sp dagpns_y1_sp obs_earnings_hourly /// + l1_obs_earnings_hourly l1_les_c3 l1_les_c4 new_rel dcpyy_st studentflag /// dcpyy_st dhhtp_c8 dehsp_c4 widow rb110 flag_deceased flag_deceased_sp sort swv idhh idperson + /************************* RECODE MISSING VALUES ******************************/ foreach var in idhh idperson idpartner idfather idmother dct drgn1 dnc02 /// dnc dgn dgnsp dag dagsq dhe dhesp dcpst ded deh_c3 deh_c4 der dehsp_c3 /// - dehm_c3 dehf_c3 dehmf_c3 dcpen dcpyy dcpex dlltsd dlrtrd drtren /// + dehm_c4 dehf_c4 dehmf_c4 dcpen dcpyy dcpex dlltsd dlrtrd drtren /// dlftphm dhhtp_c4 les_c3 les_c4 lessp_c3 lessp_c4 lesdf_c4 ydses_c5 /// swv sedex ssscp sprfm sedag stm dagsp lhw der dhh_owned /// dchpd dagpns dagpns_sp CPI dlltsd_sp flag* unemp liwwh /// dagpns_y dagpns_y1 dagpns_y_sp dagpns_y1_sp obs_earnings_hourly /// l1_obs_earnings_hourly l1_les_c3 l1_les_c4 new_rel dcpyy_st new_rel /// - dcpyy_st dhhtp_c8 student dehsp_c4 widow flag_deceased flag_deceased_sp { + dcpyy_st dhhtp_c8 studentflag dehsp_c4 widow flag_deceased /// + flag_deceased_sp { qui recode `var' (-9/-1 = -9) (. = -9) @@ -5207,15 +5291,16 @@ local files_to_drop mother_dchpd.dta temp_orig_econ_status.dta temp_orig_edu.dta + temp_orig_occu.dta temp_dagpns_y.dta temp_depChild_mother.dta temp_depChild_father.dta - temp_dehsp.dta temp_mother_info.dta temp_father_info.dta temp_donorsN.dta temp_lhw_donors.dta temp_wages_donors.dta + temp_rel_end.dta ; #delimit cr diff --git a/input_processing/data_construction/PL/03_create_benefit_units_PL.do b/input_processing/data_construction/PL/03_create_benefit_units_PL.do index a12df04..41266d3 100644 --- a/input_processing/data_construction/PL/03_create_benefit_units_PL.do +++ b/input_processing/data_construction/PL/03_create_benefit_units_PL.do @@ -833,19 +833,19 @@ count if dagsp == -9 & idpartner != -9 // 0 obs replace dropObs = 1 if dagsp == -9 & idpartner != -9 * Health status - remove household if missing for those 16+ -count if (dhe == -9 ) & dag > ${age_sample_min} +count if (dhe == -9 ) & dag > ${age_becomes_semi_responsible} // 0 obs due to imputation -count if (dhe == -9 ) & dag > 0 & dag <= ${age_sample_min} +count if (dhe == -9 ) & dag > 0 & dag <= ${age_becomes_semi_responsible} // 0 obs due to imputation -replace dropObs = 1 if (dhe == -9) & dag > ${age_sample_min} +replace dropObs = 1 if (dhe == -9) & dag > ${age_becomes_semi_responsible} * Health status of spouse - remove household if missing but ind has a spouse count if dhesp == -9 & idpartner != -9 // 0 obs replace dropObs = 1 if (dhesp == -9) & idpartner != -9 * Education - remove household if missing education level for 16+ -count if deh_c3 == -9 & dag >= ${age_sample_min} & ded == 0 -replace dropObs = 1 if deh_c3 == -9 & dag >= ${age_sample_min} & /// +count if deh_c3 == -9 & dag >= ${age_becomes_semi_responsible} & ded == 0 +replace dropObs = 1 if deh_c3 == -9 & dag >= ${age_becomes_semi_responsible} & /// ded == 0 * Education of spouse - remove household if missing but individual has a spouse @@ -857,12 +857,12 @@ count if dcpst == -9 // 0 obs replace dropObs = 1 if dcpst == -9 * Activity status -count if les_c3 == -9 & dag >= ${age_sample_min} -replace dropObs = 1 if les_c3 == -9 & dag >= ${age_sample_min} +count if les_c3 == -9 & dag >= ${age_becomes_semi_responsible} +replace dropObs = 1 if les_c3 == -9 & dag >= ${age_becomes_semi_responsible} * Activity status with retirement as a separate category -count if les_c4 == -9 & dag >= ${age_sample_min} -replace dropObs = 1 if les_c4 == -9 & dag >= ${age_sample_min} +count if les_c4 == -9 & dag >= ${age_becomes_semi_responsible} +replace dropObs = 1 if les_c4 == -9 & dag >= ${age_becomes_semi_responsible} * Partner's activity status count if lessp_c3 == -9 & idpartner != -9 // 2,536 obs @@ -879,29 +879,29 @@ replace dropObs = 1 if dhhtp_c4 == -9 * Income * Gross personal non-benefit income //==> no missing values by construction, theoretically can be zero -count if ypnbihs_dv == 0 & dag >= ${age_sample_min} -count if ypnbihs_dv > 0 & dag >= ${age_sample_min} +count if ypnbihs_dv == 0 & dag >= ${age_becomes_semi_responsible} +count if ypnbihs_dv > 0 & dag >= ${age_becomes_semi_responsible} * Gross personal employment income //==> no missing values by construction but theoretically can be zero -count if yplgrs_dv < 0 & dag >= ${age_sample_min} -count if yplgrs_dv == 0 & dag >= ${age_sample_min} -count if yplgrs_dv > 0 & dag >= ${age_sample_min} +count if yplgrs_dv < 0 & dag >= ${age_becomes_semi_responsible} +count if yplgrs_dv == 0 & dag >= ${age_becomes_semi_responsible} +count if yplgrs_dv > 0 & dag >= ${age_becomes_semi_responsible} * Household income quintile //==> a few missing values for kids who live w/t other adults -count if ydses_c5 == -9 & dag >= ${age_sample_min} // 0 obs +count if ydses_c5 == -9 & dag >= ${age_becomes_semi_responsible} // 0 obs * Gross personal non-employment capital income //==> no missing values by construction -count if ypncp < 0 & dag >= ${age_sample_min} // 0 obs -count if ypncp == 0 & dag >= ${age_sample_min} -count if ypncp > 0 & dag >= ${age_sample_min} +count if ypncp < 0 & dag >= ${age_becomes_semi_responsible} // 0 obs +count if ypncp == 0 & dag >= ${age_becomes_semi_responsible} +count if ypncp > 0 & dag >= ${age_becomes_semi_responsible} -replace dropObs = 1 if ypnbihs_dv == -9 & dag >= ${age_sample_min} -replace dropObs = 1 if yplgrs_dv == -9 & dag >= ${age_sample_min} +replace dropObs = 1 if ypnbihs_dv == -9 & dag >= ${age_becomes_semi_responsible} +replace dropObs = 1 if yplgrs_dv == -9 & dag >= ${age_becomes_semi_responsible} replace dropObs = 1 if ydses_c5 == -9 -replace dropObs = 1 if ypncp == -9 & dag >= ${age_sample_min} +replace dropObs = 1 if ypncp == -9 & dag >= ${age_becomes_semi_responsible} * Indicator for households with missing values @@ -1003,7 +1003,6 @@ drop dncold dnc02old * Home ownership variable - preserve egen tag_bu_wave = tag(idbenefitunit swv) @@ -1044,7 +1043,7 @@ save "$dir_data/temp_dhh_owned", replace restore -rename dhh_owned dhh_owned_old +rename dhh_owned dhh_owned_orig merge 1:1 idperson swv using "$dir_data/temp_dhh_owned" @@ -1052,6 +1051,18 @@ drop _m replace dhh_owned = 0 if dhh_owned == . +rename dhh_owned dhh_owned_ind + +lab var dhh_owned_ind "Home ownership flag, only = 1 for benefit unit head" + +gen dhh_owned = dhh_owned_ind + +bysort idbenefitunit swv (dhh_owned): replace dhh_owned = dhh_owned[_N] + +lab var dhh_owned "Home ownership flag, = 1 for all benefit unit members" + +sort idperson swv + /*************************** UPDATE FLAG EXCEL FILE ***************************/ @@ -1142,6 +1153,7 @@ local files_to_drop orphans.dta temp_depChild_mother.dta temp_depChild_father.dta + temp_dhh_owned.dta ; #delimit cr // cr stands for carriage return diff --git a/input_processing/data_construction/PL/05_drop_hholds_and_slice_PL.do b/input_processing/data_construction/PL/05_drop_hholds_slice_and_refactoring_PL.do similarity index 55% rename from input_processing/data_construction/PL/05_drop_hholds_and_slice_PL.do rename to input_processing/data_construction/PL/05_drop_hholds_slice_and_refactoring_PL.do index 6eb075b..ccee4ea 100644 --- a/input_processing/data_construction/PL/05_drop_hholds_and_slice_PL.do +++ b/input_processing/data_construction/PL/05_drop_hholds_slice_and_refactoring_PL.do @@ -73,7 +73,7 @@ sort idperson swv //cf _all using "$dir_data/${country}_pooled_ipop.dta" -save "$dir_data/${country}_pooled_ipop.dta", replace +save "$dir_data/${country}_pooled_ipop_pre.dta", replace // panel dataset with missing values removed /*************************** GENERATE FREQUENCY WEIGHTS ***********************/ @@ -128,6 +128,7 @@ forvalues yy = $first_sim_year/$last_sim_year { * Load pooled data with missing values removed use "$dir_data/${country}_pooled_ipop.dta", clear + rename *, l * Limit year @@ -153,7 +154,7 @@ forvalues yy = $first_sim_year/$last_sim_year { * Check for number of adults drop adult child adult_count adult_count2 //drop old vars - gen child = dag < $age_becomes_responsible + gen child = dag < ${age_becomes_responsible} gen adult = 1 - child bys stm idhh: egen adult_count = sum(adult) bys stm idbenefitunit: egen adult_count2 = sum(adult) @@ -162,7 +163,7 @@ forvalues yy = $first_sim_year/$last_sim_year { * Check for orphans assert (idfather > 0 | idmother > 0) if /// - (dag > 0 & dag < $age_becomes_responsible) + (dag > 0 & dag < ${age_becomes_responsible}) * Check weight is not zero and non-missing assert dwt > 0 & dwt < . @@ -170,38 +171,145 @@ forvalues yy = $first_sim_year/$last_sim_year { cap gen one = 1 sum one [w = dwt] - *Limit saved variables + * Limit saved variables keep idhh idbenefitunit idperson idpartner idmother idfather swv dgn dag /// - dcpst dnc02 dnc ded deh_c3 deh_c4 sedex les_c3 dlltsd dhe ydses_c5 /// - yplgrs_dv ypnbihs_dv yptciihs_dv dhhtp_c8 ssscp dcpen dcpyy dcpex /// - dcpagdf ynbcpdf_dv der sedag sprfm dagsp dehsp_c3 dehsp_c4 dhesp /// - lessp_c3 dehm_c3 dehf_c3 stm lesdf_c4 dhh_owned lhw drgn1 dct /// - dwt_sampling les_c4 lessp_c4 adultchildflag dwt obs_earnings_hourly /// - l1_obs_earnings_hourly ypncp ypnoab l1_les_c3 l1_les_c4 liwwh + dnc02 dnc ded deh_c3 deh_c4 sedex dlltsd dhe ydses_c5 yplgrs_dv /// + ypnbihs_dv yptciihs_dv dcpyy dcpagdf ynbcpdf_dv der dehm_c4 dehf_c4 stm /// + dhh_owned lhw drgn1 dct les_c4 adultchildflag dwt obs_earnings_hourly /// + l1_obs_earnings_hourly ypncp ypnoab ydisp l1_les_c4 liwwh order idhh idbenefitunit idperson idpartner idmother idfather swv dgn /// - dag dcpst dnc02 dnc ded deh_c3 deh_c4 sedex les_c3 dlltsd dhe ydses_c5 /// - yplgrs_dv ypnbihs_dv yptciihs_dv dhhtp_c8 ssscp dcpen dcpyy dcpex /// - dcpagdf ynbcpdf_dv der sedag sprfm dagsp dehsp_c3 dehsp_c4 dhesp /// - lessp_c3 dehm_c3 dehf_c3 stm lesdf_c4 dhh_owned lhw drgn1 dct /// - dwt_sampling les_c4 lessp_c4 adultchildflag dwt obs_earnings_hourly /// - l1_obs_earnings_hourly ypncp ypnoab l1_les_c3 l1_les_c4 liwwh + dag dnc02 dnc ded deh_c3 deh_c4 sedex dlltsd dhe ydses_c5 yplgrs_dv /// + ypnbihs_dv yptciihs_dv dcpyy dcpagdf ynbcpdf_dv der dehm_c4 dehf_c4 stm /// + dhh_owned lhw drgn1 dct les_c4 adultchildflag dwt obs_earnings_hourly /// + l1_obs_earnings_hourly ypncp ypnoab ydisp l1_les_c4 liwwh recode idhh idbenefitunit idperson idpartner idmother idfather swv dgn /// - dag dcpst dnc02 dnc ded deh_c3 deh_c4 sedex les_c3 dlltsd dhe ydses_c5 /// - yplgrs_dv ypnbihs_dv yptciihs_dv dhhtp_c8 ssscp dcpen dcpyy dcpex /// - dcpagdf ynbcpdf_dv der sedag sprfm dagsp dehsp_c3 dehsp_c4 dhesp /// - lessp_c3 dehm_c3 dehf_c3 stm lesdf_c4 dhh_owned lhw drgn1 dct /// - dwt_sampling les_c4 lessp_c4 adultchildflag dwt obs_earnings_hourly /// - l1_obs_earnings_hourly ypncp ypnoab l1_les_c3 l1_les_c4 liwwh /// - (missing = -9) - - gsort idhh idbenefitunit idperson - save "$dir_data/population_initial_${country}_${year}.dta", replace - - recode dgn (-9 = 0) + dag dnc02 dnc ded deh_c3 deh_c4 sedex dlltsd dhe ydses_c5 /// + yplgrs_dv ypnbihs_dv yptciihs_dv dcpyy dcpagdf ynbcpdf_dv der /// + dehm_c4 dehf_c4 stm dhh_owned lhw drgn1 dct les_c4 adultchildflag dwt /// + obs_earnings_hourly l1_obs_earnings_hourly ypncp ypnoab ydisp l1_les_c4 /// + liwwh (missing = -9) + + * Rename Variables following new Codebook + + * Identifiers + rename idhh idHh + rename idbenefitunit idBu + rename idperson idPers + rename idpartner idPartner + rename idmother idMother + rename idfather idFather + + * Time + rename stm statInterviewYear + rename swv statCollectionWave + + * Location + rename drgn1 demRgn + rename dct demCountry + + * Weights + rename dwt wgtCrossMainSurvey + + * Demographics + rename dgn demMaleFlag + rename dag demAge + //rename dcpst demPartnerStatus + rename dnc02 demNChild0to2 + rename dnc demNChild + //rename ssscp demPartnerSameSexFlag + //rename dcpen demEnterPartnerFlag + rename dcpyy demPartnerNYear + //rename dcpex demExitPartnerFlag + rename dcpagdf demAgePartnerDiff + //rename sedag demAgeEduRangeFlag + //rename sprfm demFertFlag + //rename dchpd demNChild0 + //rename dagsp demAgePartner + rename adultchildflag demAdultChildFlag + //rename dhhtp_c4 demCompHhC4 + //rename dhhtp_c8 demCompHhC8 + //rename multiplier demPopSurveyShare + //rename dot demEthnC4 + //rename dot01 demEthnC6 + + * Education + rename deh_c3 eduHighestC3 + rename deh_c4 eduHighestC4 + //rename dehsp_c3 eduHighestPartnerC3 + //rename dehsp_c4 eduHighestPartnerC4 + rename ded eduSpellFlag + rename sedex eduExitSampleFlag + rename der eduReturnFlag + rename dehm_c4 eduHighestMotherC4 + rename dehf_c4 eduHighestFatherC4 + + * Labour market + //rename les_c3 labStatusC3 + rename les_c4 labC4 + rename l1_les_c4 labC4L1 + //rename lessp_c3 labStatusPartnerC3 + //rename lessp_c4 labStatusPartnerC4 + //rename lesdf_c4 labStatusPartnerAndOwnC4 + rename lhw labHrsWorkWeek + //rename l1_lhw labHrsWorkWeekL1 + + * Income, labour, wealth + rename obs_earnings_hourly labWageHrly + rename l1_obs_earnings_hourly labWageHrlyL1 + + //rename liquid_wealth wealthLiq + //rename tot_pen wealthPensValue + //rename nvmhome wealthPrptyValue + + //rename total_wealth wealthTotValue + //rename mortgage_debt wealthMortgageDebtValue + //rename housing_wealth wealthPrptyValue + //rename total_pensions wealthPensValue + + rename ydses_c5 yHhQuintilesMonthC5 + rename ynbcpdf_dv yPersAndPartnerGrossDiffMonth + rename dhh_owned wealthPrptyFlag + + //rename econ_benefits yBenReceivedFlag + //rename econ_benefits_nonuc yBenNonUCReceivedFlag + //rename econ_benefits_uc yBenUCReceivedFlag + + rename ypncp yCapitalPersMonth + rename ypnoab yPensPersGrossMonth + rename yplgrs_dv yEmpPersGrossMonth + rename ypnbihs_dv yNonBenPersGrossMonth + rename yptciihs_dv yMiscPersGrossMonth + rename ydisp yPersDispMonth + + //rename unemp labUnempFlag + rename liwwh labEmpNyear + + * Health & wellbeing + rename dlltsd healthDsblLongtermFlag + rename dhe healthSelfRated + //rename dhesp healthPartnerSelfRated + //rename dhm healthWbScore0to36 + //rename dhm_ghq healthPsyDstrss0to12 + //rename dhe_mcs healthMentalMcs + //rename dhe_pcs healthPhysicalPcs + //rename dhe_mcssp healthMentalPartnerMcs + //rename dhe_pcssp healthPhysicalPartnerPcs + //rename dls demLifeSatScore0to10 + //rename financial_distress yFinDstrssFlag + + gsort idHh idBu idPers + + save "$dir_data/refactored/population_initial_${country}_${year}.dta", /// + replace + + recode demMaleFlag (-9 = 0) + export delimited using /// - "$dir_data/population_initial_${country}_${year}.csv", nolabel replace + "$dir_data/refactored/population_initial_${country}_${year}.csv", /// + nolabel replace + } cap log close diff --git a/input_processing/data_construction/PL/extra_var_info/prepare_pooled_data _05_20.do b/input_processing/data_construction/PL/extra_var_info/prepare_pooled_data _05_20.do index eeae564..66b2299 100644 --- a/input_processing/data_construction/PL/extra_var_info/prepare_pooled_data _05_20.do +++ b/input_processing/data_construction/PL/extra_var_info/prepare_pooled_data _05_20.do @@ -1,5 +1,5 @@ ******************************************************************************** -* PROJECT: ESPON +* PROJECT: SimPaths EU * DO-FILE NAME: prepare_pooled_data_05_20.do * DESCRIPTION: Compiles 2005-2020 panel dataset from 2005-2020 EU-SILC ******************************************************************************** diff --git a/input_processing/data_construction/PL/extra_var_info/vars_05_20_PL.do b/input_processing/data_construction/PL/extra_var_info/vars_05_20_PL.do deleted file mode 100644 index d6bb1d2..0000000 --- a/input_processing/data_construction/PL/extra_var_info/vars_05_20_PL.do +++ /dev/null @@ -1,74 +0,0 @@ -******************************************************************************** -* PROJECT: ESPON -* DO-FILE NAME: vars_05_20.do -* DESCRIPTION: Collects varaibles from 2005-2020 panel to merge in -******************************************************************************** -* COUNTRY: PL -* DATA: EU-SILC panel based on the EU-SILC longitudinal datasets -* created using GESIS Stata script: -* https://www.gesis.org/gml/european-microdata/eu-silc. -******************************************************************************** -* AUTHORS: Daria Popova, Ashley Burdett -* LAST UPDATE: Feb 2025 -* NOTE: Collects variables from the 2005-2020 panel to merge in -* into 2005-2023 panel. Relevant for variables that were -* replace sinces 2020, but the replacements have not -* been backcoded resulting in missing data. -* Temporary fix. -* If input data run "prepare_pooled_data _05_20.do" first. -******************************************************************************** - - -* Collect economic status information from previous panel -use "$dir_data_05_20/PL-SILC_pooled_all_obs_01.dta", clear - -keep hid pid year pl030 pl031 rb210 - -rename * *_orig -rename pid_orig pid -rename year_orig year -rename hid_orig hid - -save "$dir_data/temp_orig_econ_status", replace - -* Collect education information from previous panel -use "$dir_data_05_20/PL-SILC_pooled_all_obs_01.dta", clear - -keep hid pid year pe040 - -rename * *_orig -rename pid_orig pid -rename year_orig year -rename hid_orig hid - -save "$dir_data/temp_orig_edu", replace - -/* -use "$dir_data_05_20/PL-SILC_pooled_all_obs_02.dta", clear - -keep hid pid year les_c3 - -rename * *_orig -rename pid_orig pid -rename year_orig year -rename hid_orig hid - -save "$dir_data/temp_orig_econ_status2", replace -*/ - -* Collect occupation information - -use "$dir_data_05_20/PL-SILC_pooled_all_obs_01.dta", clear - -keep hid pid year pl051 - -rename * *_orig -rename pid_orig pid -rename year_orig year -rename hid_orig hid - -save "/Users/ashleyburdett/Library/CloudStorage/Box-Box/CeMPA shared area/ESPON - OVERLAP/_countries/Digital_skills/data/temp_orig_occu", replace - - - - diff --git a/documentation/key_function_income_thresholds_clean2018.xlsx b/input_processing/key_function_income_thresholds_2018.xlsx similarity index 100% rename from documentation/key_function_income_thresholds_clean2018.xlsx rename to input_processing/key_function_income_thresholds_2018.xlsx diff --git a/input_processing/regression_estimates/PL/00_master_regression_estimates_PL.do b/input_processing/regression_estimates/PL/00_master_regression_estimates_PL.do index 250fdc9..550dbb3 100644 --- a/input_processing/regression_estimates/PL/00_master_regression_estimates_PL.do +++ b/input_processing/regression_estimates/PL/00_master_regression_estimates_PL.do @@ -5,8 +5,9 @@ * SimPaths for Poland using EU-SILC. * COUNTRY: Poland * DATA: GESIS - EU SILC -* AUTHORS: Daria Popova, Justin van de Ven, Ashley Burdett -* LAST UPDATE: Jannuary 2026 +* AUTHORS: Daria Popova, Justin van de Ven, Ashley Burdett, +* Aleksandra Kolndrekaj +* LAST UPDATE: March 2026 (AB) ******************************************************************************** * NOTES: * @@ -40,7 +41,7 @@ global dir_ind "/Users/ashleyburdett/Library/CloudStorage/Box-Box" //"/Users/ashleyburdett/Library/CloudStorage/Box-Box" * Working directory which contains sub-folders and will contain formatted output -global dir_work "$dir_ind/CeMPA shared area/_SimPaths/_SimpathsEU/regression_estimates/PL/with_time_trend" +global dir_work "$dir_ind/CeMPA shared area/_SimPaths/_SimpathsEU/input_processing/regression_estimates/PL/with_time_trend" * Directory which contains do files global dir_do "$dir_work/do_files" @@ -55,20 +56,17 @@ global dir_log "$dir_work/logs" global dir_raw_results "$dir_work/raw_results" * Directory which contains EU-SILC input dataset -global dir_input_data "$dir_ind/CeMPA shared area/_SimPaths/_SimPathsEU/initial_populations/PL/data" +global dir_input_data "$dir_ind/CeMPA shared area/_SimPaths/_SimPathsEU/input_processing/initial_populations/PL/data" * Directory containing external input data global dir_external_data "$dir_ind/CeMPA shared area/projects - completed/ESPON - OVERLAP/_countries/PL/regression_estimates/external_data" -* Directory containing internal validation output -global dir_internal_validation "$dir_work/internal_validation/graphs" - /******************************************************************************* * DEFINE PARAMETERS & PROCESS IF CONDITIONS *******************************************************************************/ -do "$dir_ind/CeMPA shared area/_SimPaths/_SimPathsEU/00_master_conditions.do" +do "$dir_ind/CeMPA shared area/_SimPaths/_SimPathsEU/input_processing/00_master_conditions.do" /******************************************************************************* diff --git a/input_processing/regression_estimates/PL/01_reg_education_PL.do b/input_processing/regression_estimates/PL/01_reg_education_PL.do index f6015ae..3c2207a 100644 --- a/input_processing/regression_estimates/PL/01_reg_education_PL.do +++ b/input_processing/regression_estimates/PL/01_reg_education_PL.do @@ -449,7 +449,7 @@ fre der if (dag >= 16 & l.les_c4 != 4 & l.les_c4 != 2) replace der = . if der == -9 * Estimation -probit der i.Dgn Dag Dag_sq i.Dcpst_Partnered li.Deh_c4_High /// +probit der i.Dgn Dag Dag_sq li.Dcpst_Partnered li.Deh_c4_High /// li.Deh_c4_Low li.Les_c3_NotEmployed li.Les_c3_Employed l.Dnc l.Dnc02 /// $regions Year_transformed Year_transformed_sq Y2021 if /// ${e1b_if_condition} [pw=dwt], vce(robust) diff --git a/input_processing/regression_estimates/PL/02_reg_leave_parental_home_PL.do b/input_processing/regression_estimates/PL/02_reg_leave_parental_home_PL.do index d044e8c..414bdca 100644 --- a/input_processing/regression_estimates/PL/02_reg_leave_parental_home_PL.do +++ b/input_processing/regression_estimates/PL/02_reg_leave_parental_home_PL.do @@ -179,7 +179,7 @@ xtset idperson swv fre dlftphm if (ded == 0 & dag >= 18) * Estimation -probit dlftphm i.Dgn Dag Dag_sq i.Deh_c3_Medium i.Deh_c3_Low /// +probit dlftphm i.Dgn Dag Dag_sq li.Deh_c4_Medium li.Deh_c4_Low /// li.Les_c3_Student li.Les_c3_NotEmployed li.Ydses_c5_Q2 li.Ydses_c5_Q3 /// li.Ydses_c5_Q4 li.Ydses_c5_Q5 $regions Year_transformed /// Year_transformed_sq i.Y2016 i.Y2018 if ${p1_if_condition} [pw=dwt], /// diff --git a/input_processing/regression_estimates/PL/03_reg_partnership_PL.do b/input_processing/regression_estimates/PL/03_reg_partnership_PL.do index ca5a37f..8ef678d 100644 --- a/input_processing/regression_estimates/PL/03_reg_partnership_PL.do +++ b/input_processing/regression_estimates/PL/03_reg_partnership_PL.do @@ -240,9 +240,13 @@ gen L1_Dnc = l.Dnc gen L1_Dehsp_c3_Medium = l.Dehsp_c3_Medium * Generate interactions -gen Les_c4_Student_Dgn = Dgn * Les_c4_Student -gen Les_c4_NotEmployed_Dgn = Dgn * Les_c4_NotEmployed -gen Les_c4_Retired_Dgn = Dgn * Les_c4_Retired +gen Dgn_Les_c4_Student = Dgn * Les_c4_Student +gen Dgn_Les_c4_NotEmployed = Dgn * Les_c4_NotEmployed +gen Dgn_Les_c4_Retired = Dgn * Les_c4_Retired + +gen Dgn_Les_c4_Student_L1 = Dgn * l.Les_c4_Student +gen Dgn_Les_c4_NotEmployed_L1 = Dgn * l.Les_c4_NotEmployed +gen Dgn_Les_c4_Retired_L1 = Dgn * l.Les_c4_Retired gen Ded_Dag = Ded * Dag gen Ded_Dag_sq = Ded * Dag_sq @@ -285,8 +289,8 @@ probit dcpen c.Dag c.Dag_sq Dgn lc.Dnc lc.Dnc02 li.Ydses_c5_Q2 /// Ded_Dgn Ded_Dnc_L1 Ded_Dnc02_L1 Ded_Ydses_c5_Q2_L1 Ded_Ydses_c5_Q3_L1 /// Ded_Ydses_c5_Q4_L1 Ded_Ydses_c5_Q5_L1 i.Deh_c4_Na i.Deh_c4_High /// i.Deh_c4_Low li.Les_c4_Student li.Les_c4_NotEmployed /// - li.Les_c4_Retired li.Les_c4_Student_Dgn li.Les_c4_NotEmployed_Dgn /// - li.Les_c4_Retired_Dgn i.Dhe_Fair i.Dhe_Good i.Dhe_VeryGood /// + li.Les_c4_Retired Dgn_Les_c4_Student_L1 Dgn_Les_c4_NotEmployed_L1 /// + Dgn_Les_c4_Retired_L1 i.Dhe_Fair i.Dhe_Good i.Dhe_VeryGood /// i.Dhe_Excellent $regions Year_transformed if /// ${u1_if_condition} [pw=dwt], vce(robust) diff --git a/input_processing/regression_estimates/PL/07_reg_retirement_PL.do b/input_processing/regression_estimates/PL/07_reg_retirement_PL.do index 32348c9..4d914d4 100644 --- a/input_processing/regression_estimates/PL/07_reg_retirement_PL.do +++ b/input_processing/regression_estimates/PL/07_reg_retirement_PL.do @@ -290,8 +290,8 @@ graph combine ret2010 ret2011 ret2012 ret2013 ret2014 ret2015 ret2016 ret2017 re xtset idperson swv -probit drtren i.Dgn Dag Dag_sq i.Elig_pen i.Elig_pen_L1 i.Deh_c4_Medium /// - i.Deh_c4_Low i.Reached_Retirement_Age li.Les_c3_NotEmployed /// +probit drtren i.Dgn Dag Dag_sq i.Elig_pen i.Elig_pen_L1 li.Deh_c4_Medium /// + li.Deh_c4_Low i.Reached_Retirement_Age li.Les_c3_NotEmployed /// li.Ydses_c5_Q2 li.Ydses_c5_Q3 li.Ydses_c5_Q4 li.Ydses_c5_Q5 li.Dlltsd /// $regions Year_transformed Post2015 /// if ${r1a_if_condition} [pw=dwt], vce(robust) @@ -316,7 +316,7 @@ title("Process R1a: Probability of Retirement, Single") /// gen in_sample = e(sample) predict p -* Save sample for estimte validation +* Save sample for estimate validation save "$dir_data/R1a_sample", replace * Store model summary statistics @@ -494,7 +494,7 @@ xtset idperson swv * Estimation probit drtren i.Dgn Dag Dag_sq i.Elig_pen i.Elig_pen_L1 /// - i.Deh_c4_Medium i.Deh_c4_Low i.Reached_Retirement_Age /// + li.Deh_c4_Medium li.Deh_c4_Low i.Reached_Retirement_Age /// li.Les_c3_NotEmployed i.Reached_Retirement_Age_Les /// li.Ydses_c5_Q2 li.Ydses_c5_Q3 li.Ydses_c5_Q4 li.Ydses_c5_Q5 /// li.Dlltsd i.Reached_Retirement_Age_Sp i.Elig_pen_Sp i.Elig_pen_L1_Sp /// diff --git a/input_processing/regression_estimates/PL/08_reg_wages_PL.do b/input_processing/regression_estimates/PL/08_reg_wages_PL.do index 2606e28..546df28 100644 --- a/input_processing/regression_estimates/PL/08_reg_wages_PL.do +++ b/input_processing/regression_estimates/PL/08_reg_wages_PL.do @@ -21,6 +21,9 @@ * * Update the winsorization process if alter data * Update CPI if apply to a different country +* +* TO DO: Explore using the centering adjustment when adding +* random draw: -e(sigma)^2/2 *******************************************************************************/ clear all @@ -244,14 +247,14 @@ gen pred_hourly_wage = . /******************** WAGES: FEMALE, NO PREV WAGE OBSERVED ********************/ -global wage_eqn "lwage_hour dag dagsq i.deh_c3 i.deh_c3#c.dag i.dhe i.drgn1 i.pt real_wage_growth y2020 y2021" -global seln_eqn "i.L1les_c3 dag dagsq i.deh_c3 i.deh_c3#c.dag i.mar i.child i.dhe i.drgn1 y2020 y2021" +global wage_eqn "lwage_hour dag dagsq ib1.deh_c4 ib1.deh_c4#c.dag i.dhe i.drgn1 i.pt real_wage_growth y2020 y2021" +global seln_eqn "i.L1les_c3 dag dagsq ib1.deh_c4 ib1.deh_c4#c.dag li.mar li.child i.dhe i.drgn1 y2020 y2021" local filter = "${W1fa_if_condition} & previouslyWorking == 0" heckman $wage_eqn if ${W1fa_if_condition}, select($seln_eqn) twostep mills(lambda) -* Save raw restults +* Save raw results outreg2 stats(coef se pval) using "$dir_raw_results/wages/W1fa.doc", replace /// title("Heckman-corrected wage equation estimated on the sample of women who were not in employment last year") /// ctitle(Not working women) label side dec(2) noparen @@ -357,8 +360,8 @@ restore * Labelling putexcel set "$dir_work/reg_wages_${country}", sheet("W1fa") modify -local var_list Dag Dag_sq Deh_c3_Medium Deh_c3_Low Deh_c3_Medium_Dag /// - Deh_c3_Low_Dag Dhe_Fair Dhe_Good Dhe_VeryGood Dhe_Excellent /// +local var_list Dag Dag_sq Deh_c4_Medium Deh_c4_Low Deh_c4_Medium_Dag /// + Deh_c4_Low_Dag Dhe_Fair Dhe_Good Dhe_VeryGood Dhe_Excellent /// PL4 PL5 PL6 PL10 Pt RealWageGrowth Y2020 Y2021 Constant InverseMillsRatio putexcel A1 = "REGRESSOR" @@ -425,8 +428,8 @@ putexcel set "$dir_work/reg_employmentSelection_${country}", /// sheet("W1fa-sel") modify local var_list Les_c3_Student_L1 Les_c3_NotEmployed_L1 Dag Dag_sq /// - Deh_c3_Medium Deh_c3_Low Deh_c3_Medium_Dag Deh_c3_Low_Dag /// - Dcpst_Partnered D_Children Dhe_Fair Dhe_Good /// + Deh_c4_Medium Deh_c4_Low Deh_c4_Medium_Dag Deh_c4_Low_Dag /// + Dcpst_Partnered_L1 D_Children_L1 Dhe_Fair Dhe_Good /// Dhe_VeryGood Dhe_Excellent PL4 PL5 PL6 PL10 Y2020 Y2021 /// Constant @@ -461,8 +464,8 @@ cap drop lambda /********************* WAGES: MALE, NO PREV WAGE OBSERVED *********************/ -global wage_eqn "lwage_hour dag dagsq i.deh_c3 i.deh_c3#c.dag i.dhe i.drgn1 i.pt real_wage_growth y2020 y2021" -global seln_eqn "i.L1les_c3 dag dagsq i.deh_c3 i.deh_c3#c.dag i.mar i.child i.dhe i.drgn1 y2020 y2021" +global wage_eqn "lwage_hour dag dagsq i.deh_c4 i.deh_c4#c.dag i.dhe i.drgn1 i.pt real_wage_growth y2020 y2021" +global seln_eqn "i.L1les_c3 dag dagsq i.deh_c4 i.deh_c4#c.dag li.mar li.child i.dhe i.drgn1 y2020 y2021" local filter = "${W1ma_if_condition} & previouslyWorking == 0" heckman $wage_eqn if ${W1ma_if_condition}, select($seln_eqn) /// @@ -478,7 +481,7 @@ gen in_sample_mnpw = e(sample) gen epsilon = rnormal()*e(sigma) replace pred_hourly_wage = exp(lwage_hour_hat + epsilon) if `filter' - + * Save sample for validation save "$dir_data/Male_NPW_sample", replace cap drop pred epsilon @@ -566,8 +569,8 @@ restore * Labelling putexcel set "$dir_work/reg_wages_${country}", sheet("W1ma") modify -local var_list Dag Dag_sq Deh_c3_Medium Deh_c3_Low Deh_c3_Medium_Dag /// - Deh_c3_Low_Dag Dhe_Fair Dhe_Good Dhe_VeryGood Dhe_Excellent /// +local var_list Dag Dag_sq Deh_c4_Medium Deh_c4_Low Deh_c4_Medium_Dag /// + Deh_c4_Low_Dag Dhe_Fair Dhe_Good Dhe_VeryGood Dhe_Excellent /// PL4 PL5 PL6 PL10 Pt RealWageGrowth Y2020 Y2021 Constant InverseMillsRatio putexcel A1 = "REGRESSOR" @@ -635,8 +638,8 @@ putexcel set "$dir_work/reg_employmentSelection_${country}", /// sheet("W1ma-sel") modify local var_list Les_c3_Student_L1 Les_c3_NotEmployed_L1 Dag Dag_sq /// - Deh_c3_Medium Deh_c3_Low Deh_c3_Medium_Dag Deh_c3_Low_Dag /// - Dcpst_Partnered D_Children Dhe_Fair Dhe_Good /// + Deh_c4_Medium Deh_c4_Low Deh_c4_Medium_Dag Deh_c4_Low_Dag /// + Dcpst_Partnered_L1 D_Children_L1 Dhe_Fair Dhe_Good /// Dhe_VeryGood Dhe_Excellent PL4 PL5 PL6 PL10 Y2020 Y2021 /// Constant @@ -668,11 +671,11 @@ foreach var in `var_list' { cap drop lambda - +*/ /********************** WAGES: FEMALE, PREV WAGE OBSERVED *********************/ -global wage_eqn "lwage_hour L1.lwage_hour dag dagsq i.deh_c3 i.deh_c3#c.dag i.dhe i.drgn1 i.pt real_wage_growth y2020 y2021" -global seln_eqn "dag dagsq i.deh_c3 i.deh_c3#c.dag i.mar i.child i.dhe i.drgn1 y2020 y2021" +global wage_eqn "lwage_hour L1.lwage_hour dag dagsq i.deh_c4 i.deh_c4#c.dag i.dhe i.drgn1 i.pt real_wage_growth y2020 y2021" +global seln_eqn "dag dagsq i.deh_c4 i.deh_c4#c.dag li.mar li.child i.dhe i.drgn1 y2020 y2021" heckman $wage_eqn if ${W1fb_if_condition}, select($seln_eqn) twostep @@ -775,8 +778,8 @@ restore * Labelling putexcel set "$dir_work/reg_wages_${country}", sheet("W1fb") modify -local var_list L1_log_hourly_wage Dag Dag_sq Deh_c3_Medium Deh_c3_Low /// - Deh_c3_Medium_Dag Deh_c3_Low_Dag Dhe_Fair Dhe_Good Dhe_VeryGood /// +local var_list L1_log_hourly_wage Dag Dag_sq Deh_c4_Medium Deh_c4_Low /// + Deh_c4_Medium_Dag Deh_c4_Low_Dag Dhe_Fair Dhe_Good Dhe_VeryGood /// Dhe_Excellent PL4 PL5 PL6 PL10 Pt RealWageGrowth Y2020 Y2021 Constant /// InverseMillsRatio @@ -809,13 +812,15 @@ foreach var in `var_list' { * Calculate RMSE // Note: Sigma reported in the estimated regressions is the standard deviation // of the residuals (=RMSE assuming residuals are normally distributed) + +preserve + +keep if ${W1fb_if_condition} + cap drop residuals squared_residuals gen residuals = lwage_hour - lwage_hour_hat gen squared_residuals = residuals^2 -preserve - -keep if `filter' sum squared_residuals di "RMSE for Employed women: " sqrt(r(mean)) @@ -846,8 +851,8 @@ putexcel set "$dir_work/reg_employmentSelection_${country}", /// sheet("W1fb-sel") modify local var_list Dag Dag_sq /// - Deh_c3_Medium Deh_c3_Low Deh_c3_Medium_Dag Deh_c3_Low_Dag /// - Dcpst_Partnered D_Children Dhe_Fair Dhe_Good /// + Deh_c4_Medium Deh_c4_Low Deh_c4_Medium_Dag Deh_c4_Low_Dag /// + Dcpst_Partnered_L1 D_Children_L1 Dhe_Fair Dhe_Good /// Dhe_VeryGood Dhe_Excellent PL4 PL5 PL6 PL10 Y2020 Y2021 /// Constant @@ -881,12 +886,27 @@ cap drop lambda /*********************** WAGES: MEN, PREV WAGE OBSERVED ***********************/ +/* +NOTE: Explored using log age due to signs of age and age squared, but in +age validaition there is very little difference - main contrast among those +age > 65. +*/ + + +global wage_eqn "lwage_hour L1.lwage_hour dag dagsq i.deh_c4 i.deh_c4#c.dag i.dhe i.drgn1 i.pt real_wage_growth y2020 y2021" +global seln_eqn "dag dagsq i.deh_c4 i.deh_c4#c.dag li.mar li.child i.dhe i.drgn1 y2020 y2021" +*/ +/* +gen ldag = ln(dag) + +global wage_eqn "lwage_hour L1.lwage_hour ldag i.deh_c4 i.deh_c4#c.ldag i.dhe i.drgn1 i.pt real_wage_growth y2020 y2021" +global seln_eqn "ldag i.deh_c4 i.deh_c4#c.ldag i.mar i.child i.dhe i.drgn1 y2020 y2021" +*/ -global wage_eqn "lwage_hour L1.lwage_hour dag dagsq i.deh_c3 i.deh_c3#c.dag i.dhe i.drgn1 i.pt real_wage_growth y2020 y2021" -global seln_eqn "dag dagsq i.deh_c3 i.deh_c3#c.dag i.mar i.child i.dhe i.drgn1 y2020 y2021" heckman $wage_eqn if ${W1mb_if_condition}, select($seln_eqn) twostep + * Obtain predicted values (log wage) with selection correction predict pred if ${W1mb_if_condition}, ycond // ycond -> include IMR in prediction @@ -894,10 +914,14 @@ replace lwage_hour_hat = pred if ${W1mb_if_condition} gen in_sample_mpw = e(sample) + * Correct bias transforming from log to levels gen epsilon = rnormal()*e(sigma) replace pred_hourly_wage = exp(lwage_hour_hat + epsilon) if ${W1mb_if_condition} + +gen pred_level_noshock = exp(lwage_hour_hat) + * Save sample for validation save "$dir_data/Male_PW_sample", replace @@ -983,8 +1007,8 @@ restore * Labelling putexcel set "$dir_work/reg_wages_${country}", sheet("W1mb") modify -local var_list L1_log_hourly_wage Dag Dag_sq Deh_c3_Medium Deh_c3_Low /// - Deh_c3_Medium_Dag Deh_c3_Low_Dag Dhe_Fair Dhe_Good Dhe_VeryGood /// +local var_list L1_log_hourly_wage Dag Dag_sq Deh_c4_Medium Deh_c4_Low /// + Deh_c4_Medium_Dag Deh_c4_Low_Dag Dhe_Fair Dhe_Good Dhe_VeryGood /// Dhe_Excellent PL4 PL5 PL6 PL10 Pt RealWageGrowth Y2020 Y2021 Constant /// InverseMillsRatio @@ -1023,7 +1047,7 @@ gen squared_residuals = residuals^2 preserve -keep if `filter' +keep if ${W1mb_if_condition} sum squared_residuals di "RMSE for Employed men: " sqrt(r(mean)) @@ -1054,8 +1078,8 @@ putexcel set "$dir_work/reg_employmentSelection_${country}", /// sheet("W1mb-sel") modify local var_list Dag Dag_sq /// - Deh_c3_Medium Deh_c3_Low Deh_c3_Medium_Dag Deh_c3_Low_Dag /// - Dcpst_Partnered D_Children Dhe_Fair Dhe_Good /// + Deh_c4_Medium Deh_c4_Low Deh_c4_Medium_Dag Deh_c4_Low_Dag /// + Dcpst_Partnered_L1 D_Children_L1 Dhe_Fair Dhe_Good /// Dhe_VeryGood Dhe_Excellent PL4 PL5 PL6 PL10 Y2020 Y2021 /// Constant @@ -1123,7 +1147,7 @@ replace les_c3 = orig_les_c3 drop orig_les_c3 */ -save "$dir_data/${country}-_pooled_ipop_wages.dta", replace +save "$dir_data/${country}-_pooled_ipop2.dta", replace capture log close diff --git a/input_processing/regression_estimates/PL/10_parametric_matching_process_PL.do b/input_processing/regression_estimates/PL/10_parametric_matching_process_PL.do index bf2a4c1..6c1646b 100644 --- a/input_processing/regression_estimates/PL/10_parametric_matching_process_PL.do +++ b/input_processing/regression_estimates/PL/10_parametric_matching_process_PL.do @@ -16,7 +16,7 @@ set maxvar 30000 * Call dataset with Heckman estimates -use "$dir_data/${country}-_pooled_ipop_wages.dta", clear +use "$dir_data/${country}-_pooled_ipop2.dta", clear sort idperson stm xtset idperson stm @@ -83,7 +83,7 @@ drop if missing(dagDifference) | missing(predictedWageDifference) sum dagDifference predictedWageDifference //Get sample mean and std dev -putexcel set "$dir_work/scenario_parametricMatching_PL", replace +putexcel set "$dir_work/scenario_parametricMatching_PL", sheet("Parameters") replace putexcel A1 = ("Parameter") putexcel A2 = ("mean_dag_diff") putexcel A3 = ("mean_wage_diff") diff --git a/input_processing/regression_estimates/PL/11_spec_diagnostic_tests_PL.do b/input_processing/regression_estimates/PL/11_spec_diagnostic_tests_PL.do deleted file mode 100644 index 9341f3e..0000000 --- a/input_processing/regression_estimates/PL/11_spec_diagnostic_tests_PL.do +++ /dev/null @@ -1,81 +0,0 @@ -/******************************************************************************* -* PROJECT: SimPaths EU -* SECTION: Regression estimates -* OBJECT: Diagnostics -* AUTHORS: Ashley Burdett -* LAST UPDATE: Jan 2026 -* COUNTRY: Poland -******************************************************************************** -* NOTES: -*******************************************************************************/ - -* Test when fail eigenvalue tests in any of the regression models - -/* -Faiture of any of the processes is due to a multicollinearity issue, which can -prevent SimPaths from being able to invert the var-cov matrix. - -To investigate which variable is creating the issue you need to run -collinearity diagnostic tests as the conditions that break the do-files does not -provide any diagnostic information. This is most strighfroeardly done using -"collin". - -This command should be run after the regression model and does not permit index -or time series operators. - -Example below: - -VIF >10 suggests problematic. - -*/ - -gen l_Dnc = l.Dnc -gen l_Dnc02 = l.Dnc02 -gen l_Ydses_c5_Q2 = l.Ydses_c5_Q2 -gen l_Ydses_c5_Q3 = l.Ydses_c5_Q3 -gen l_Ydses_c5_Q4 = l.Ydses_c5_Q4 -gen l_Ydses_c5_Q5 = l.Ydses_c5_Q5 -gen l_Les_c4_Student = l.Les_c4_Student -gen l_Les_c4_NotEmployed = l.Les_c4_NotEmployed -gen l_Les_c4_Retired = l.Les_c4_Retired -gen l_Les_c4_Student_Dgn = l.Les_c4_Student_Dgn -gen l_Les_c4_NotEmployed_Dgn = l.Les_c4_NotEmployed_Dgn -gen l_Les_c4_Retired_Dgn = l.Les_c4_Retired_Dgn - - -collin Dag Dag_sq Dgn l_Dnc l_Dnc02 l_Ydses_c5_Q2 l_Ydses_c5_Q3 /// - l_Ydses_c5_Q4 l_Ydses_c5_Q5 Ded_Dag Ded_Dag_sq Ded_Dgn Ded_Dnc_L1 /// - Ded_Dnc02_L1 Ded_Ydses_c5_Q2_L1 Ded_Ydses_c5_Q3_L1 Ded_Ydses_c5_Q4_L1 /// - Ded_Ydses_c5_Q5_L1 Deh_c4_Na Deh_c4_High /// - Deh_c4_Low l_Les_c4_Student l_Les_c4_NotEmployed /// - l_Les_c4_Retired l_Les_c4_Student_Dgn l_Les_c4_NotEmployed_Dgn /// - l_Les_c4_Retired_Dgn Dhe_Fair Dhe_Good Dhe_VeryGood /// - Dhe_Excellent PL4 PL5 PL6 PL10 Year_transformed Y2020 Y2021 Y2022 if /// - ${u1_if_condition} - - - -gen l_Dhe_Fair = l.Dhe_Fair -gen l_Dhe_Good = l.Dhe_Good -gen l_Dhe_VeryGood = l.Dhe_VeryGood -gen l_Dhe_Excellent = l.Dhe_Excellent -gen l_Ln_Ypncp = l.Ln_Ypncp -gen l_Yplgrs_dv = l.Yplgrs_dv -gen l2_Yplgrs_dv = l2.Yplgrs_dv -gen l2_Ln_Ypncp = l2.Ln_Ypncp -gen l_Les_c4_Student = l.Les_c4_Student -gen l_Les_c4_NotEmployed = l.Les_c4_NotEmployed -gen l_Les_c4_Retired = l.Les_c4_Retired -gen l_Dhhtp_c4_CoupleChildren = l.Dhhtp_c4_CoupleChildren -gen l_Dhhtp_c4_SingleNoChildren = l.Dhhtp_c4_SingleNoChildren -gen l_Dhhtp_c4_SingleChildren = l.Dhhtp_c4_SingleChildren - -collin Dgn Dag Dag_sq l_Dhe_Fair l_Dhe_Good l_Dhe_VeryGood /// - l_Dhe_Excellent l_Ln_Ypncp l_Yplgrs_dv /// - l2_Yplgrs_dv l2_Ln_Ypncp Ded_Dgn Ded_Dag Ded_Dag_sq /// - Ded_Ln_Ypncp_L1 Ded_Yplgrs_dv_L1 Ded_Yplgrs_dv_L2 Ded_Ln_Ypncp_L2 /// - Deh_c4_Low Deh_c4_Medium Deh_c4_High l_Les_c4_Student /// - l_Les_c4_NotEmployed l_Les_c4_Retired l_Dhhtp_c4_CoupleChildren /// - l_Dhhtp_c4_SingleNoChildren l_Dhhtp_c4_SingleChildren PL4 PL5 PL6 PL10 /// - Year_transformed /// - if ${i1b_if_condition} diff --git a/scripts/run_TEST_multiruns.sh b/scripts/run_TEST_multiruns.sh deleted file mode 100644 index 424ca44..0000000 --- a/scripts/run_TEST_multiruns.sh +++ /dev/null @@ -1,76 +0,0 @@ -#!/usr/bin/env bash - -set -euo pipefail - -ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" -cd "$ROOT_DIR" - -JAR_PATH="${JAR_PATH:-multirun.jar}" -START_YEAR="${START_YEAR:-2011}" -END_YEAR="${END_YEAR:-2023}" -POP_SIZE="${POP_SIZE:-50000}" -RUNS_PER_SCENARIO="${RUNS_PER_SCENARIO:-5}" -SHOW_GUI="${SHOW_GUI:-false}" -RANDOM_SEED="${RANDOM_SEED:-1821}" - -# JVM heap: give the simulation generous room. -# Rule of thumb: (population × runs) drives peak usage. At 50k × 5 runs, -# 8g is comfortable on this machine (18 GB RAM). Lower to 6g if other -# processes are competing; raise to 12g if you see GC pauses or OOM. -JVM_HEAP="${JVM_HEAP:-12g}" - -CONFIGS=( - - "alignment_02c_population_disability.yml" - "alignment_02d_population_inschool.yml" - "alignment_02e_population_retirement.yml" -) - -if [[ ! -f "$JAR_PATH" ]]; then - echo "Jar not found: $JAR_PATH" >&2 - exit 1 -fi - -for cfg in "${CONFIGS[@]}"; do - scenario="${cfg%.yml}" # strip .yml → e.g. alignment_01_population - dest="output/${scenario}" - - echo "============================================" - echo "Running scenario: ${scenario}" - echo "============================================" - - # Marker file: anything created after this point is from the upcoming run. - marker=$(mktemp) - - java -Xms"${JVM_HEAP}" -Xmx"${JVM_HEAP}" -jar "$JAR_PATH" \ - -g "$SHOW_GUI" \ - -s "$START_YEAR" \ - -e "$END_YEAR" \ - -p "$POP_SIZE" \ - -n "$RUNS_PER_SCENARIO" \ - -r "$RANDOM_SEED" \ - -config "$cfg" - - # The CSV output folder is the timestamped directory WITHOUT a seed/counter - # suffix (e.g. output/20260323204849/csv). Find it by comparing to the marker. - new_csv_folder=$(find output -maxdepth 1 -type d -newer "$marker" -name '[0-9]*' \ - | grep -v '_' | sort -r | head -1) - rm -f "$marker" - - if [[ -n "$new_csv_folder" ]]; then - if [[ -d "$dest" ]]; then - echo "Warning: ${dest} already exists — removing before overwrite." >&2 - rm -rf "$dest" - fi - mv "$new_csv_folder" "$dest" - echo "Output saved → ${dest}" - else - echo "Warning: could not find new output folder for scenario ${scenario}." >&2 - fi -done - -echo "" -echo "All scenarios complete. Results are in:" -for cfg in "${CONFIGS[@]}"; do - echo " output/${cfg%.yml}/" -done diff --git a/scripts/run_multiruns-alignPopOFF.sh b/scripts/run_multiruns-alignPopOFF.sh deleted file mode 100644 index 7780a96..0000000 --- a/scripts/run_multiruns-alignPopOFF.sh +++ /dev/null @@ -1,73 +0,0 @@ -#!/usr/bin/env bash - -set -euo pipefail - -ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" -cd "$ROOT_DIR" - -JAR_PATH="${JAR_PATH:-multirun.jar}" -START_YEAR="${START_YEAR:-2011}" -END_YEAR="${END_YEAR:-2023}" -POP_SIZE="${POP_SIZE:-50000}" -RUNS_PER_SCENARIO="${RUNS_PER_SCENARIO:-5}" -SHOW_GUI="${SHOW_GUI:-false}" -RANDOM_SEED="${RANDOM_SEED:-1821}" - -# JVM heap: give the simulation generous room. -# Rule of thumb: (population × runs) drives peak usage. At 50k × 5 runs, -# 8g is comfortable on this machine (18 GB RAM). Lower to 6g if other -# processes are competing; raise to 12g if you see GC pauses or OOM. -JVM_HEAP="${JVM_HEAP:-12g}" - -CONFIGS=( - "alignment_00_populationOFF.yml" -) - -if [[ ! -f "$JAR_PATH" ]]; then - echo "Jar not found: $JAR_PATH" >&2 - exit 1 -fi - -for cfg in "${CONFIGS[@]}"; do - scenario="${cfg%.yml}" # strip .yml → e.g. alignment_01_population - dest="output/${scenario}" - - echo "============================================" - echo "Running scenario: ${scenario}" - echo "============================================" - - # Marker file: anything created after this point is from the upcoming run. - marker=$(mktemp) - - java -Xms"${JVM_HEAP}" -Xmx"${JVM_HEAP}" -jar "$JAR_PATH" \ - -g "$SHOW_GUI" \ - -s "$START_YEAR" \ - -e "$END_YEAR" \ - -p "$POP_SIZE" \ - -n "$RUNS_PER_SCENARIO" \ - -r "$RANDOM_SEED" \ - -config "$cfg" - - # The CSV output folder is the timestamped directory WITHOUT a seed/counter - # suffix (e.g. output/20260323204849/csv). Find it by comparing to the marker. - new_csv_folder=$(find output -maxdepth 1 -type d -newer "$marker" -name '[0-9]*' \ - | grep -v '_' | sort -r | head -1) - rm -f "$marker" - - if [[ -n "$new_csv_folder" ]]; then - if [[ -d "$dest" ]]; then - echo "Warning: ${dest} already exists — removing before overwrite." >&2 - rm -rf "$dest" - fi - mv "$new_csv_folder" "$dest" - echo "Output saved → ${dest}" - else - echo "Warning: could not find new output folder for scenario ${scenario}." >&2 - fi -done - -echo "" -echo "All scenarios complete. Results are in:" -for cfg in "${CONFIGS[@]}"; do - echo " output/${cfg%.yml}/" -done diff --git a/scripts/run_multiruns-alignPopOFF_QUICK.sh b/scripts/run_multiruns-alignPopOFF_QUICK.sh deleted file mode 100644 index f9461da..0000000 --- a/scripts/run_multiruns-alignPopOFF_QUICK.sh +++ /dev/null @@ -1,73 +0,0 @@ -#!/usr/bin/env bash - -set -euo pipefail - -ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" -cd "$ROOT_DIR" - -JAR_PATH="${JAR_PATH:-multirun.jar}" -START_YEAR="${START_YEAR:-2011}" -END_YEAR="${END_YEAR:-2023}" -POP_SIZE="${POP_SIZE:-30000}" -RUNS_PER_SCENARIO="${RUNS_PER_SCENARIO:-3}" -SHOW_GUI="${SHOW_GUI:-false}" -RANDOM_SEED="${RANDOM_SEED:-1821}" - -# JVM heap: give the simulation generous room. -# Rule of thumb: (population × runs) drives peak usage. At 50k × 5 runs, -# 8g is comfortable on this machine (18 GB RAM). Lower to 6g if other -# processes are competing; raise to 12g if you see GC pauses or OOM. -JVM_HEAP="${JVM_HEAP:-12g}" - -CONFIGS=( - "alignment_00_populationOFF.yml" -) - -if [[ ! -f "$JAR_PATH" ]]; then - echo "Jar not found: $JAR_PATH" >&2 - exit 1 -fi - -for cfg in "${CONFIGS[@]}"; do - scenario="${cfg%.yml}" # strip .yml → e.g. alignment_01_population - dest="output/${scenario}" - - echo "============================================" - echo "Running scenario: ${scenario}" - echo "============================================" - - # Marker file: anything created after this point is from the upcoming run. - marker=$(mktemp) - - java -Xms"${JVM_HEAP}" -Xmx"${JVM_HEAP}" -jar "$JAR_PATH" \ - -g "$SHOW_GUI" \ - -s "$START_YEAR" \ - -e "$END_YEAR" \ - -p "$POP_SIZE" \ - -n "$RUNS_PER_SCENARIO" \ - -r "$RANDOM_SEED" \ - -config "$cfg" - - # The CSV output folder is the timestamped directory WITHOUT a seed/counter - # suffix (e.g. output/20260323204849/csv). Find it by comparing to the marker. - new_csv_folder=$(find output -maxdepth 1 -type d -newer "$marker" -name '[0-9]*' \ - | grep -v '_' | sort -r | head -1) - rm -f "$marker" - - if [[ -n "$new_csv_folder" ]]; then - if [[ -d "$dest" ]]; then - echo "Warning: ${dest} already exists — removing before overwrite." >&2 - rm -rf "$dest" - fi - mv "$new_csv_folder" "$dest" - echo "Output saved → ${dest}" - else - echo "Warning: could not find new output folder for scenario ${scenario}." >&2 - fi -done - -echo "" -echo "All scenarios complete. Results are in:" -for cfg in "${CONFIGS[@]}"; do - echo " output/${cfg%.yml}/" -done diff --git a/src/main/java/simpaths/data/startingpop/DataParser.java b/src/main/java/simpaths/data/startingpop/DataParser.java index c56ae06..6183842 100644 --- a/src/main/java/simpaths/data/startingpop/DataParser.java +++ b/src/main/java/simpaths/data/startingpop/DataParser.java @@ -154,7 +154,6 @@ private static void parse(String inputFileLocation, String inputFileName, Connec //Labour Market Economic Status + "ALTER TABLE " + personTable + " ADD activity_status VARCHAR_IGNORECASE;" - + "UPDATE " + personTable + " SET labC4 = 3 WHERE labC4 = 1 AND CAST(labWageHrly AS FLOAT)<0.01;" + "UPDATE " + personTable + " SET activity_status = 'EmployedOrSelfEmployed' WHERE labC4 = 1;" + "UPDATE " + personTable + " SET activity_status = 'Student' WHERE labC4 = 2;" + "UPDATE " + personTable + " SET activity_status = 'NotEmployed' WHERE labC4 = 3;" @@ -164,7 +163,6 @@ private static void parse(String inputFileLocation, String inputFileName, Connec //Lag(1) of labC4 + "ALTER TABLE " + personTable + " ADD activity_status VARCHAR_IGNORECASE;" - + "UPDATE " + personTable + " SET labC4L1 = 3 WHERE labC4L1 = 1 AND CAST(labWageHrlyL1 AS FLOAT)<0.01;" + "UPDATE " + personTable + " SET activity_status = 'EmployedOrSelfEmployed' WHERE labC4L1 = 1;" + "UPDATE " + personTable + " SET activity_status = 'Student' WHERE labC4L1 = 2;" + "UPDATE " + personTable + " SET activity_status = 'NotEmployed' WHERE labC4L1 = 3;" diff --git a/src/main/java/simpaths/model/Person.java b/src/main/java/simpaths/model/Person.java index 5c7855d..24e2b93 100644 --- a/src/main/java/simpaths/model/Person.java +++ b/src/main/java/simpaths/model/Person.java @@ -148,6 +148,7 @@ public class Person implements EventListener, IDoubleSource, IIntSource, Weight, @Transient private Labour labHrsWorkEnumWeekL1; // Lag(1) (previous year's value) of weekly labour supply private Integer labHrsWorkWeek; + private Integer labHrsWorkWeekL1; // Potential earnings is the gross hourly wage an individual can earn while working // and is estimated, for each individual, on the basis of observable characteristics as @@ -493,6 +494,7 @@ else if (demAge > Parameters.MAX_AGE_TO_STAY_IN_CONTINUOUS_EDUCATION) healthPartnerSelfRatedL1 = originalPerson.healthPartnerSelfRatedL1; labHrsWorkWeek = originalPerson.labHrsWorkWeek; + labHrsWorkWeekL1 = originalPerson.labHrsWorkWeekL1; labHrsWorkEnumWeek = originalPerson.getLabourSupplyWeekly(); double[] sampleDifferentials = setMarriageTargets(); demAgeDiffDesired = Objects.requireNonNullElseGet(originalPerson.demAgeDiffDesired, () -> sampleDifferentials[0]); @@ -516,15 +518,24 @@ else if (demAge > Parameters.MAX_AGE_TO_STAY_IN_CONTINUOUS_EDUCATION) yBenReceivedFlag = originalPerson.yBenReceivedFlag; yBenReceivedFlagL1 = originalPerson.yBenReceivedFlagL1; - if (originalPerson.labWageHrly > Parameters.MIN_HOURLY_WAGE_RATE) { + + if (originalPerson.labWageHrly >= Parameters.MIN_HOURLY_WAGE_RATE) { labWageHrly = Math.min(Parameters.MAX_HOURLY_WAGE_RATE, Math.max(Parameters.MIN_HOURLY_WAGE_RATE, originalPerson.labWageHrly)); } else { - labWageHrly = -9.0; + labWageHrly = 0.0; + labHrsWorkWeek = 0; + labC4 = Les_c4.NotEmployed; } - if (originalPerson.labWageHrlyL1!=null && originalPerson.labWageHrlyL1>Parameters.MIN_HOURLY_WAGE_RATE) { + if (originalPerson.labWageHrlyL1!=null && originalPerson.labWageHrlyL1 >= Parameters.MIN_HOURLY_WAGE_RATE) { labWageHrlyL1 = Math.min(Parameters.MAX_HOURLY_WAGE_RATE, Math.max(Parameters.MIN_HOURLY_WAGE_RATE, originalPerson.labWageHrlyL1)); } else { - labWageHrlyL1 = labWageHrly; + if (originalPerson.labWageHrlyL1 == null && originalPerson.labWageHrly != null) + labWageHrlyL1 = labWageHrly; + if (labWageHrlyL1 == null || labWageHrlyL1 < Parameters.MIN_HOURLY_WAGE_RATE) { + labWageHrlyL1 = 0.0; + labHrsWorkWeekL1 = 0; + labC4L1 = Les_c4.NotEmployed; + } } } diff --git a/tools/generate_simpaths_eu_variable_codebook.py b/tools/generate_simpaths_eu_variable_codebook.py deleted file mode 100644 index 16f1825..0000000 --- a/tools/generate_simpaths_eu_variable_codebook.py +++ /dev/null @@ -1,1496 +0,0 @@ -#!/usr/bin/env python3 - -from __future__ import annotations - -import copy -import re -from dataclasses import dataclass -from datetime import date -from pathlib import Path -from typing import Iterable - -from openpyxl import Workbook, load_workbook -from openpyxl.styles import Alignment, Font, PatternFill - - -ROOT = Path(__file__).resolve().parents[1] -UK_CODEBOOK = ROOT / "documentation" / "SimPaths_Variable_CodebookUK.xlsx" -OUTPUT_CODEBOOK = ROOT / "documentation" / "SimPathsEU_variable_Codebook.xlsx" -TEMP_OUTPUT_CODEBOOK = ROOT / "documentation" / "SimPathsEU_variable_Codebook.__tmp__.xlsx" - - -CORE_JAVA_FILES = [ - ROOT / "src/main/java/simpaths/model/Person.java", - ROOT / "src/main/java/simpaths/model/BenefitUnit.java", - ROOT / "src/main/java/simpaths/model/Household.java", - ROOT / "src/main/java/simpaths/data/statistics/Statistics.java", - ROOT / "src/main/java/simpaths/data/statistics/Statistics2.java", - ROOT / "src/main/java/simpaths/data/statistics/AlignmentAdjustmentFactors.java", -] - -METADATA_JAVA_FILES = [ - ROOT / "src/main/java/simpaths/data/startingpop/Processed.java", -] - -CONTROL_ENUM_FILES = { - ROOT / "src/main/java/simpaths/model/enums/TimeSeriesVariable.java": "TimeSeriesVariable", - ROOT / "src/main/java/simpaths/model/enums/AlignmentVariable.java": "AlignmentVariable", -} - -DEFERRED_ENUMS = { - ROOT / "src/main/java/simpaths/model/Person.java": ["IntegerVariables", "DoublesVariables"], - ROOT / "src/main/java/simpaths/model/Validator.java": ["DoublesVariables"], - ROOT / "src/main/java/simpaths/experiment/SimPathsObserver.java": ["LongVariables"], -} - - -COMMENT_RE = re.compile(r"/\*.*?\*/|//.*?$", re.S | re.M) -FIELD_RE = re.compile( - r"^(?P\s*(?:@[A-Za-z_][\w.()=\" ,]+\s*)*\b(?:public|protected|private)\s+(?:static\s+)?(?:final\s+)?)" - r"(?P[^;=(){}]+?)\s+" - r"(?P[A-Za-z_][A-Za-z0-9_]*)\s*(?:=[^;]*)?;\s*$", - re.M, -) -COLUMN_RE = re.compile(r'@Column\s*\(\s*name\s*=\s*"([^"]+)"') -ENUM_DECL_RE = re.compile(r"\benum\s+([A-Za-z_][A-Za-z0-9_]*)\s*\{", re.M) - - -HEADER_STYLE_FILL = PatternFill(fill_type="solid", fgColor="D9E2F3") -MATCHED_FILL = PatternFill(fill_type="solid", fgColor="E2F0D9") -ADAPTED_FILL = PatternFill(fill_type="solid", fgColor="DDEBF7") -NEW_FILL = PatternFill(fill_type="solid", fgColor="FFF2CC") -UK_ONLY_FILL = PatternFill(fill_type="solid", fgColor="F4CCCC") -DEFERRED_FILL = PatternFill(fill_type="solid", fgColor="EDEDED") -REVIEW_FILL = PatternFill(fill_type="solid", fgColor="FCE4D6") - -MODULE_ORDER = { - "dem": 1, - "edu": 2, - "health": 3, - "care": 4, - "lab": 5, - "y": 6, - "x": 7, - "stat": 8, - "covid": 9, - "wealth": 10, - "id": 11, - "wgt": 12, -} - - -EXCLUDED_REASONS = { - "log": "Logger infrastructure", - "model": "Simulation manager reference", - "collector": "Collector/observer reference", - "key": "Persistence key", - "benefitUnit": "Entity association", - "household": "Entity association", - "members": "Entity association", - "benefitUnits": "Entity association", - "processed": "Metadata association", - "personIdCounter": "Static ID counter", - "benefitUnitIdCounter": "Static ID counter", - "householdIdCounter": "Static ID counter", - "ioFlag": "Regression helper flag", - "states": "Decision-state helper object", - "taxDbMatch": "Tax matching helper object", - "sIndexYearMap": "Derived series cache", - "personContinuousHoursLabourSupplyMap": "Derived hours cache", - "countMale": "Temporary diagnostic accumulator", - "countFemale": "Temporary diagnostic accumulator", - "reportedMissing": "Missing-value reporting helper", - "lastYear": "Cache/incremental helper", - "cachedMaleAtRiskOfWork": "Cache field", - "cachedFemaleAtRiskOfWork": "Cache field", - "labourChoiceCacheYear": "Cache field", - "labourChoiceCacheKey": "Cache field", - "cachedPossibleLabourCombinations": "Cache field", - "cachedEvalByLabourPairs": "Cache field", - "cachedUtilityScoreByLabourPairs": "Cache field", - "labourScoreCacheYear": "Cache field", - "labourScoreCacheKey": "Cache field", - "households": "Metadata collection", - "benefitUnits": "Metadata collection", - "persons": "Metadata collection", -} - - -@dataclass -class Candidate: - source_name: str - source_kind: str - java_class: str - java_path: Path - order: int - sibling_name: str | None = None - - -@dataclass -class ManualSpec: - status: str - source_kind: str - variable_name: str | None = None - reference_old: str | None = None - reference_new: str | None = None - match_basis: str = "Manual review" - review_required: str = "No" - review_note: str = "" - overrides: dict[str, object] | None = None - - -MANUAL_SPECS = { - "idMotherImmutable": ManualSpec( - status="New vs UK", - source_kind="Java field", - variable_name="idMotherImmutable", - overrides={ - "Local": None, - "Mod": "id", - "Main attribute": "Mother", - "Level": None, - "Measurement": None, - "Other info": "Immutable", - "Statistics type": None, - "Periodicity": None, - "Description": "Immutable mother ID used to preserve the original maternal link in simulation.", - "Initial population": "No", - "Module": "ID", - "Notes": "No direct UK equivalent found in the reviewed workbook.", - }, - ), - "immutable_mother_id": ManualSpec( - status="New vs UK", - source_kind="DB column", - variable_name="idMotherImmutable", - overrides={ - "Local": None, - "Mod": "id", - "Main attribute": "Mother", - "Other info": "Immutable", - "Statistics type": None, - "Description": "Database column backing the immutable mother ID.", - "Initial population": "No", - "Module": "ID", - "Notes": "Database column paired with idMotherImmutable.", - }, - ), - "idFatherImmutable": ManualSpec( - status="New vs UK", - source_kind="Java field", - variable_name="idFatherImmutable", - overrides={ - "Local": None, - "Mod": "id", - "Main attribute": "Father", - "Other info": "Immutable", - "Statistics type": None, - "Description": "Immutable father ID used to preserve the original paternal link in simulation.", - "Initial population": "No", - "Module": "ID", - "Notes": "No direct UK equivalent found in the reviewed workbook.", - }, - ), - "immutable_father_id": ManualSpec( - status="New vs UK", - source_kind="DB column", - variable_name="idFatherImmutable", - overrides={ - "Local": None, - "Mod": "id", - "Main attribute": "Father", - "Other info": "Immutable", - "Statistics type": None, - "Description": "Database column backing the immutable father ID.", - "Initial population": "No", - "Module": "ID", - "Notes": "Database column paired with idFatherImmutable.", - }, - ), - "staywparentsflag": ManualSpec( - status="New vs UK", - source_kind="Java field", - variable_name="demStayParentFlag", - review_required="Yes", - review_note="Declared in Person.java but not referenced elsewhere in the current EU codebase; confirm whether to retain or remove in the refactor.", - overrides={ - "Local": None, - "Mod": "dem", - "Main attribute": "StayParent", - "Statistics type": "Flag", - "Description": "Flag indicating that the person stays with parents.", - "Initial population": "No", - "Module": "Demography", - "Notes": "New EU-only variable; no UK counterpart located in the reviewed workbook.", - }, - ), - "deh_c4": ManualSpec( - status="Adapted from UK", - source_kind="Java field", - variable_name="eduHighestC4", - reference_old="deh_c3", - review_required="Yes", - review_note="EU uses a 4-state education enum including NotAssigned; confirm whether C4 should remain explicit in the refactor name.", - overrides={ - "Other info": "C4", - "Description": "Education - Highest Status\n 0 Not assigned\n 1 Low\n 2 Medium\n 3 High", - "Notes": "Adapted from UK eduHighestC3 because SimPathsEU uses Education c4 / NotAssigned.", - }, - ), - "deh_c4_lag1": ManualSpec( - status="Adapted from UK", - source_kind="Java field", - variable_name="eduHighestC4L1", - reference_old="deh_c3_lag1", - review_required="Yes", - review_note="Lag naming is clear, but the underlying education coding differs from the UK C3 variant.", - overrides={ - "Other info": "C4", - "Description": "Lag(1) of deh_c4", - "Notes": "Adapted from UK eduHighestC3L1 because SimPathsEU uses Education c4 / NotAssigned.", - }, - ), - "dehm_c4": ManualSpec( - status="Adapted from UK", - source_kind="Java field", - variable_name="eduHighestMotherC4", - reference_old="dehm_c3", - review_required="Yes", - review_note="EU uses a 4-state education enum including NotAssigned; confirm whether C4 should remain explicit in the refactor name.", - overrides={ - "Other info": "C4", - "Description": "Education - Mother's Highest Status\n 0 Not assigned\n 1 Low\n 2 Medium\n 3 High", - "Notes": "Adapted from UK eduHighestMotherC3 because SimPathsEU uses Education c4 / NotAssigned.", - }, - ), - "dehf_c4": ManualSpec( - status="Adapted from UK", - source_kind="Java field", - variable_name="eduHighestFatherC4", - reference_old="dehf_c3", - review_required="Yes", - review_note="EU uses a 4-state education enum including NotAssigned; confirm whether C4 should remain explicit in the refactor name.", - overrides={ - "Other info": "C4", - "Description": "Education - Father's Highest Status\n 0 Not assigned\n 1 Low\n 2 Medium\n 3 High", - "Notes": "Adapted from UK eduHighestFatherC3 because SimPathsEU uses Education c4 / NotAssigned.", - }, - ), - "dehsp_c4_lag1": ManualSpec( - status="Adapted from UK", - source_kind="Java field", - variable_name="eduHighestPartnerC4L1", - reference_old="dehsp_c3_lag1", - review_required="Yes", - review_note="EU uses a 4-state education enum including NotAssigned; confirm whether C4 should remain explicit in the refactor name.", - overrides={ - "Other info": "C4", - "Description": "Lag(1) of dehsp_c4", - "Notes": "Adapted from UK eduHighestPartnerC3L1 because SimPathsEU uses Education c4 / NotAssigned.", - }, - ), - "deh_c4Local": ManualSpec( - status="Adapted from UK", - source_kind="Java field", - variable_name="i_eduHighestC4", - reference_new="i_eduHighestC3", - review_required="Yes", - review_note="Local regression helper name follows the UK i_ prefix, but the EU education coding differs from UK C3.", - overrides={ - "Local": "i_", - "Other info": "C4", - "Description": "Education - Highest Status local (c4 variant).", - "Notes": "Adapted from UK i_eduHighestC3 because SimPathsEU uses Education c4 / NotAssigned.", - }, - ), - "ded_lag1": ManualSpec( - status="Adapted from UK", - source_kind="Java field", - variable_name="eduSpellFlagL1", - reference_old="ded", - overrides={ - "Periodicity": "L1", - "Description": "Lag(1) of ded / in continuous education.", - "Notes": "Derived from the UK eduSpellFlag naming pattern.", - }, - ), - "toRetire": ManualSpec( - status="Adapted from UK", - source_kind="Java field", - variable_name="demRtrdEnterFlag", - reference_new="demRtrdEnterFlag", - overrides={ - "Description": "Flag indicating that the individual is in the pool to retire this year.\n 0 No\n 1 Yes", - "Java class": "Person.java", - "Initial population": "No", - "Notes": "Mapped to the UK retirement-entry flag naming pattern.", - }, - ), - "dhmGhq_lag1": ManualSpec( - status="Adapted from UK", - source_kind="Java field", - variable_name="healthPsyDstrssFlagL1", - reference_new="healthPsyDstrssFlag", - overrides={ - "Periodicity": "L1", - "Description": "Lag(1) of dhm_ghq / psychological distress flag.", - "Notes": "Derived from the UK healthPsyDstrssFlag naming pattern.", - }, - ), - "ydispPersInitial": ManualSpec( - status="New vs UK", - source_kind="Java field", - variable_name="yDispPersMonthInitial", - overrides={ - "Local": None, - "Mod": "y", - "Main attribute": None, - "Level": "Pers", - "Measurement": "Disp", - "Other info": "Initial", - "Statistics type": None, - "Periodicity": "Month", - "Description": "Personal monthly disposable income carried from the initial population.", - "Initial population": "Yes", - "Module": "Income", - "Notes": "No direct UK equivalent found in the reviewed workbook.", - }, - ), - "ydisp_pers_initial": ManualSpec( - status="New vs UK", - source_kind="DB column", - variable_name="yDispPersMonthInitial", - overrides={ - "Local": None, - "Mod": "y", - "Level": "Pers", - "Measurement": "Disp", - "Other info": "Initial", - "Periodicity": "Month", - "Description": "Database column for personal monthly disposable income from the initial population.", - "Initial population": "Yes", - "Module": "Income", - "Notes": "Database column paired with ydispPersInitial.", - }, - ), - "numberChildren02_lag1": ManualSpec( - status="Adapted from UK", - source_kind="Java field", - variable_name="demNChild0to2L1", - reference_old="indicatorChildren03_lag1", - review_required="Yes", - review_note="UK uses an indicator-based lag for 0-2/3-under children here; EU stores a count, so the naming is adapted rather than copied.", - overrides={ - "Local": None, - "Mod": "dem", - "Main attribute": "NChild", - "Other info": "0to2", - "Periodicity": "L1", - "Description": "Lag(1) of the number of children aged 0-2 in the benefit unit.", - "Initial population": "No", - "Module": "Demography", - "Notes": "Adapted from the UK child-count/child-indicator naming pattern.", - }, - ), - "numberChildrenAll_lag1": ManualSpec( - status="Adapted from UK", - source_kind="Java field", - variable_name="demNChildL1", - reference_new="demNChild", - review_required="Yes", - review_note="No direct UK lag row exists for the benefit-unit child count; proposed from the UK demNChild base name.", - overrides={ - "Local": None, - "Mod": "dem", - "Main attribute": "NChild", - "Periodicity": "L1", - "Description": "Lag(1) of the number of dependent children in the benefit unit.", - "Initial population": "No", - "Module": "Demography", - "Notes": "Derived from the UK demNChild naming pattern.", - }, - ), - "dhhOwned_lag1": ManualSpec( - status="Adapted from UK", - source_kind="Java field", - variable_name="wealthPrptyFlagL1", - reference_new="wealthPrptyFlag", - review_required="Yes", - review_note="UK has a person-level demPrptyFlagL1 row; for the EU benefit-unit variable, a wealth-module lag name is more consistent with wealthPrptyFlag.", - overrides={ - "Local": None, - "Mod": "wealth", - "Main attribute": "Prpty", - "Statistics type": "Flag", - "Periodicity": "L1", - "Description": "Lag(1) of dhhOwned / benefit-unit home-ownership flag.", - "Initial population": "No", - "Module": "Wealth", - "Notes": "Derived from wealthPrptyFlag while keeping the lag suffix explicit.", - }, - ), - "edi_p50": ManualSpec( - status="Adapted from UK", - source_kind="Java field", - variable_name="yHhDispEquivP50", - reference_new="yHhDispEquivP50", - overrides={ - "Java class": "Statistics.java", - "Output file name": "File: Statistics", - "Notes": "Mapped to the UK median equivalised disposable income naming pattern.", - }, - ), - "EDI_p50": ManualSpec( - status="Adapted from UK", - source_kind="DB column", - variable_name="yHhDispEquivP50", - reference_new="yHhDispEquivP50", - overrides={ - "Java class": "Statistics.java", - "Output file name": "File: Statistics", - "Notes": "Database column paired with edi_p50.", - }, - ), - "sIndex_p50": ManualSpec( - status="New vs UK", - source_kind="Java field", - variable_name="statSIndexP50", - reference_new="statSIndex", - overrides={ - "Local": None, - "Mod": "stat", - "Main attribute": "SIndex", - "Statistics type": "P50", - "Description": "Median S index.", - "Java class": "Statistics.java", - "Initial population": "No", - "Output file name": "File: Statistics", - "Module": "Statistical display", - "Notes": "UK workbook contains statSIndex/statSIndexNormal but not an S-index median row.", - }, - ), - "SIndex_p50": ManualSpec( - status="New vs UK", - source_kind="DB column", - variable_name="statSIndexP50", - reference_new="statSIndex", - overrides={ - "Local": None, - "Mod": "stat", - "Main attribute": "SIndex", - "Statistics type": "P50", - "Description": "Database column for the median S index.", - "Java class": "Statistics.java", - "Initial population": "No", - "Output file name": "File: Statistics", - "Module": "Statistical display", - "Notes": "Database column paired with sIndex_p50.", - }, - ), -} - -CLASS_MANUAL_SPECS = { - ("Person.java", "socialCareProvision"): ManualSpec( - status="New vs UK", - source_kind="Java field", - variable_name="careProvidedToC4", - match_basis="Manual review of EU-only care-provision categories", - review_required="Yes", - review_note="Person.socialCareProvision is a 4-category enum in SimPathsEU, not a simple provided-care flag as in the reviewed UK workbook.", - overrides={ - "Local": None, - "Mod": "care", - "Main attribute": None, - "Level": None, - "Measurement": None, - "Other info": "ProvidedTo", - "Statistics type": "C4", - "Periodicity": None, - "Description": "Social-care provision category\n 0 None\n 1 Only partner\n 2 Partner and other\n 3 Only other", - "Initial population": "No", - "Module": "Social care", - "Notes": "No exact UK analogue found; the UK workbook row for socialCareProvision is a benefit-unit flag rather than a person-level category.", - }, - ), - ("Person.java", "socialCareProvision_lag1"): ManualSpec( - status="Adapted from UK", - source_kind="Java field", - variable_name="careProvidedToC4L1", - reference_old="socialCareProvision_lag1", - match_basis="Manual adaptation of care-provision lag naming", - review_required="Yes", - review_note="Lag naming follows the EU person-level category variable rather than the UK flag-style naming.", - overrides={ - "Other info": "ProvidedTo", - "Statistics type": "C4", - "Periodicity": "L1", - "Description": "Lag(1) of social-care provision category.", - "Notes": "Adapted from the reviewed UK lag row because SimPathsEU stores a category, not a flag.", - }, - ), - ("Person.java", "idFather"): ManualSpec( - status="Adapted from UK", - source_kind="Java field", - variable_name="idFather", - reference_new="idFather", - match_basis="Manual correction of UK workbook inconsistency", - overrides={ - "Main attribute": "Father", - "Description": "Father ID", - "Module": "ID", - "Notes": "Corrected against a conflicting UK workbook row where old name idFather was assigned the idMother variable name.", - }, - ), - ("Statistics.java", "ydses_p60"): ManualSpec( - status="Adapted from UK", - source_kind="Java field", - variable_name="yHhQuintilesC5P60", - reference_old="ydses_p40", - match_basis="Manual correction of UK workbook inconsistency", - overrides={ - "Statistics type": "C5P60", - "Variable name (concatenate)": "yHhQuintilesC5P60", - "Variable name": "yHhQuintilesC5P60", - "Description": "Percentile (60) of ydses_c5", - "Java class": "Statistics.java", - "Output file name": "File: Statistics", - "Notes": "Corrected because the reviewed UK workbook labels ydses_p60 as yHhQuintilesC5P80.", - }, - ), - ("Statistics.java", "edi_p50"): ManualSpec( - status="Adapted from UK", - source_kind="Java field", - variable_name="yHhDispEquivP50Calc", - reference_new="yHhDispEquivP50", - match_basis="Manual review of duplicate EDI median fields", - review_required="Yes", - review_note="Statistics.java stores both medianEquivalisedHouseholdDisposableIncome and edi_p50; confirm whether both should survive the refactor or be consolidated.", - overrides={ - "Other info": "Calc", - "Variable name (concatenate)": "yHhDispEquivP50Calc", - "Variable name": "yHhDispEquivP50Calc", - "Description": "Percentile-calculated P50 of equivalised household disposable income.", - "Java class": "Statistics.java", - "Output file name": "File: Statistics", - "Notes": "Separated from the main yHhDispEquivP50 row because SimPathsEU stores two median-EDI fields in Statistics.java.", - }, - ), -} - - -ALIGNMENT_SPECS = { - "retirementAdjustmentFactor": ("labRtrdAdj", "lab", "Rtrd", "Adj", None, "Retirement adjustment factor."), - "retirement_adj_factor": ("labRtrdAdj", "lab", "Rtrd", "Adj", None, "Database column for the retirement adjustment factor."), - "disabilityAdjustmentFactor": ("healthDsblAdj", "health", "Dsbl", "Adj", None, "Disability adjustment factor."), - "disability_adj_factor": ("healthDsblAdj", "health", "Dsbl", "Adj", None, "Database column for the disability adjustment factor."), - "retirementShareSimulated": ("labRtrdSimShare", "lab", "Rtrd", None, "Sim", "Simulated retirement share."), - "retirement_share_sim": ("labRtrdSimShare", "lab", "Rtrd", None, "Sim", "Database column for the simulated retirement share."), - "retirementShareTarget": ("labRtrdTargetShare", "lab", "Rtrd", None, "Target", "Target retirement share."), - "retirement_share_tgt": ("labRtrdTargetShare", "lab", "Rtrd", None, "Target", "Database column for the target retirement share."), - "disabilityShareSimulated": ("healthDsblSimShare", "health", "Dsbl", None, "Sim", "Simulated disability share."), - "disability_share_sim": ("healthDsblSimShare", "health", "Dsbl", None, "Sim", "Database column for the simulated disability share."), - "disabilityShareTarget": ("healthDsblTargetShare", "health", "Dsbl", None, "Target", "Target disability share."), - "disability_share_tgt": ("healthDsblTargetShare", "health", "Dsbl", None, "Target", "Database column for the target disability share."), - "inSchoolAdjustmentFactor": ("eduSpellAdj", "edu", "Spell", "Adj", None, "In-school adjustment factor."), - "in_school_adj_factor": ("eduSpellAdj", "edu", "Spell", "Adj", None, "Database column for the in-school adjustment factor."), - "inSchoolShareSimulated": ("eduSpellSimShare", "edu", "Spell", None, "Sim", "Simulated in-school share."), - "in_school_share_sim": ("eduSpellSimShare", "edu", "Spell", None, "Sim", "Database column for the simulated in-school share."), - "inSchoolShareTarget": ("eduSpellTargetShare", "edu", "Spell", None, "Target", "Target in-school share."), - "in_school_share_tgt": ("eduSpellTargetShare", "edu", "Spell", None, "Target", "Database column for the target in-school share."), - "utilityAdjustmentFactorACMale": ("demUtilAdjAcMale", "dem", "Util", "Adj", "AcMale", "Utility adjustment factor adult-child males."), - "utility_adj_factor_ac_male": ("demUtilAdjAcMale", "dem", "Util", "Adj", "AcMale", "Database column for the adult-child male utility adjustment factor."), - "utilityAdjustmentFactorACFemale": ("demUtilAdjAcFemale", "dem", "Util", "Adj", "AcFemale", "Utility adjustment factor adult-child females."), - "utility_adj_factor_ac_female": ("demUtilAdjAcFemale", "dem", "Util", "Adj", "AcFemale", "Database column for the adult-child female utility adjustment factor."), - "utilityAdjustmentFactorMaleWithDep": ("demUtilAdjMaleWithDep", "dem", "Util", "Adj", "MaleWithDep", "Utility adjustment factor males with dependants."), - "utility_adj_factor_male_with_dep": ("demUtilAdjMaleWithDep", "dem", "Util", "Adj", "MaleWithDep", "Database column for the male-with-dependants utility adjustment factor."), - "utilityAdjustmentFactorFemaleWithDep": ("demUtilAdjFemaleWithDep", "dem", "Util", "Adj", "FemaleWithDep", "Utility adjustment factor females with dependants."), - "utility_adj_factor_female_with_dep": ("demUtilAdjFemaleWithDep", "dem", "Util", "Adj", "FemaleWithDep", "Database column for the female-with-dependants utility adjustment factor."), - "employedShareSimSingleMales": ("labEmpSimShareSingleM", "lab", "Emp", None, "SingleM", "Simulated employed share for single males."), - "employed_share_sim_smales": ("labEmpSimShareSingleM", "lab", "Emp", None, "SingleM", "Database column for the simulated employed share of single males."), - "employedShareTgtSingleMales": ("labEmpTargetShareSingleM", "lab", "Emp", None, "SingleMTarget", "Target employed share for single males."), - "employed_share_tgt_smales": ("labEmpTargetShareSingleM", "lab", "Emp", None, "SingleMTarget", "Database column for the target employed share of single males."), - "employedShareSimSingleFemales": ("labEmpSimShareSingleF", "lab", "Emp", None, "SingleF", "Simulated employed share for single females."), - "employed_share_sim_sfemales": ("labEmpSimShareSingleF", "lab", "Emp", None, "SingleF", "Database column for the simulated employed share of single females."), - "employedShareTgtSingleFemales": ("labEmpTargetShareSingleF", "lab", "Emp", None, "SingleFTarget", "Target employed share for single females."), - "employed_share_tgt_sfemales": ("labEmpTargetShareSingleF", "lab", "Emp", None, "SingleFTarget", "Database column for the target employed share of single females."), - "employedShareSimCouples": ("labEmpSimShareCouple", "lab", "Emp", None, "Couple", "Simulated employed share for couples."), - "employed_share_sim_couples": ("labEmpSimShareCouple", "lab", "Emp", None, "Couple", "Database column for the simulated employed share of couples."), - "employedShareTgtCouples": ("labEmpTargetShareCouple", "lab", "Emp", None, "CoupleTarget", "Target employed share for couples."), - "employed_share_tgt_couples": ("labEmpTargetShareCouple", "lab", "Emp", None, "CoupleTarget", "Database column for the target employed share of couples."), - "employedShareSimACMale": ("labEmpSimShareAcMale", "lab", "Emp", None, "AcMale", "Simulated employed share for adult-child males."), - "employed_share_sim_ac_male": ("labEmpSimShareAcMale", "lab", "Emp", None, "AcMale", "Database column for the simulated employed share of adult-child males."), - "employedShareTgtACMale": ("labEmpTargetShareAcMale", "lab", "Emp", None, "AcMaleTarget", "Target employed share for adult-child males."), - "employed_share_tgt_ac_male": ("labEmpTargetShareAcMale", "lab", "Emp", None, "AcMaleTarget", "Database column for the target employed share of adult-child males."), - "employedShareSimACFemale": ("labEmpSimShareAcFemale", "lab", "Emp", None, "AcFemale", "Simulated employed share for adult-child females."), - "employed_share_sim_ac_female": ("labEmpSimShareAcFemale", "lab", "Emp", None, "AcFemale", "Database column for the simulated employed share of adult-child females."), - "employedShareTgtACFemale": ("labEmpTargetShareAcFemale", "lab", "Emp", None, "AcFemaleTarget", "Target employed share for adult-child females."), - "employed_share_tgt_ac_female": ("labEmpTargetShareAcFemale", "lab", "Emp", None, "AcFemaleTarget", "Database column for the target employed share of adult-child females."), - "employedShareSimMaleWithDep": ("labEmpSimShareMaleWithDep", "lab", "Emp", None, "MaleWithDep", "Simulated employed share for males with dependants."), - "employed_share_sim_male_with_dep": ("labEmpSimShareMaleWithDep", "lab", "Emp", None, "MaleWithDep", "Database column for the simulated employed share of males with dependants."), - "employedShareTgtMaleWithDep": ("labEmpTargetShareMaleWithDep", "lab", "Emp", None, "MaleWithDepTarget", "Target employed share for males with dependants."), - "employed_share_tgt_male_with_dep": ("labEmpTargetShareMaleWithDep", "lab", "Emp", None, "MaleWithDepTarget", "Database column for the target employed share of males with dependants."), - "employedShareSimFemaleWithDep": ("labEmpSimShareFemaleWithDep", "lab", "Emp", None, "FemaleWithDep", "Simulated employed share for females with dependants."), - "employed_share_sim_female_with_dep": ("labEmpSimShareFemaleWithDep", "lab", "Emp", None, "FemaleWithDep", "Database column for the simulated employed share of females with dependants."), - "employedShareTgtFemaleWithDep": ("labEmpTargetShareFemaleWithDep", "lab", "Emp", None, "FemaleWithDepTarget", "Target employed share for females with dependants."), - "employed_share_tgt_female_with_dep": ("labEmpTargetShareFemaleWithDep", "lab", "Emp", None, "FemaleWithDepTarget", "Database column for the target employed share of females with dependants."), -} - - -CONTROL_MANUAL = { - "TimeSeriesVariable": { - "CareProvisionAdjustment": ("careAdj", "Adapted from UK", "Social care adjustment factor (time-series control).", "No", ""), - "CarerWageRate": ("careWageRate", "New vs UK", "Carer wage rate used for formal-care cost calculations.", "No", ""), - "FixedRetirementAge": ("labRtrdAgeFixed", "New vs UK", "Fixed retirement age schedule.", "No", ""), - "GDP": ("statGdp", "New vs UK", "GDP time-series control.", "No", ""), - "HighEducationRate": ("eduHighRate", "New vs UK", "Target rate for high education attainment.", "No", ""), - "Inflation": ("statInflation", "New vs UK", "Inflation time-series index.", "No", ""), - "LowEducationRate": ("eduLowRate", "New vs UK", "Target rate for low education attainment.", "No", ""), - "PartnershipAdjustment": ("demPartnerAdj", "Adapted from UK", "Partnership adjustment factor control.", "No", ""), - "FertilityAdjustment": ("demFertAdj", "Adapted from UK", "Fertility adjustment factor control.", "No", ""), - "DisabilityAdjustment": ("healthDsblAdj", "New vs UK", "Disability adjustment factor control.", "No", ""), - "UtilityAdjustment": ("demUtilAdj", "New vs UK", "Top-level utility adjustment factor.", "Yes", "No UK row exists for the aggregate utility-adjustment control; confirm whether the aggregate control should remain separate from subgroup controls."), - "UtilityAdjustmentSingleMales": ("demUtilAdjSingleM", "Adapted from UK", "Utility adjustment factor for single males.", "No", ""), - "UtilityAdjustmentACMales": ("demUtilAdjAcMale", "New vs UK", "Utility adjustment factor for adult-child males.", "Yes", "Adult-child subgroup naming is inferred because the UK workbook has no direct AC row."), - "UtilityAdjustmentSingleFemales": ("demUtilAdjSingleF", "Adapted from UK", "Utility adjustment factor for single females.", "No", ""), - "UtilityAdjustmentACFemales": ("demUtilAdjAcFemale", "New vs UK", "Utility adjustment factor for adult-child females.", "Yes", "Adult-child subgroup naming is inferred because the UK workbook has no direct AC row."), - "UtilityAdjustmentCouples": ("demUtilAdjCouple", "Adapted from UK", "Utility adjustment factor for couples.", "No", ""), - "UtilityAdjustmentMaleWithDep": ("demUtilAdjMaleWithDep", "New vs UK", "Utility adjustment factor for males with dependants.", "Yes", "With-dependants subgroup naming is inferred because the UK workbook has no direct row."), - "UtilityAdjustmentFemaleWithDep": ("demUtilAdjFemaleWithDep", "New vs UK", "Utility adjustment factor for females with dependants.", "Yes", "With-dependants subgroup naming is inferred because the UK workbook has no direct row."), - "RetirementAdjustment": ("labRtrdAdj", "New vs UK", "Retirement adjustment factor control.", "No", ""), - "InSchoolAdjustment": ("eduSpellAdj", "New vs UK", "In-school adjustment factor control.", "No", ""), - "WageGrowth": ("labWageGrowth", "New vs UK", "Wage-growth time-series index.", "No", ""), - }, - "AlignmentVariable": { - "PartnershipAlignment": ("demPartnerAlign", "New vs UK", "Partnership alignment control enum.", "No", ""), - "FertilityAlignment": ("demFertAlign", "New vs UK", "Fertility alignment control enum.", "No", ""), - "RetirementAlignment": ("labRtrdAlign", "New vs UK", "Retirement alignment control enum.", "No", ""), - "DisabilityAlignment": ("healthDsblAlign", "New vs UK", "Disability alignment control enum.", "No", ""), - }, -} - - -def read_variables_sheet(workbook): - ws = workbook["Variables"] - rows = list(ws.iter_rows(values_only=True)) - header = list(rows[0]) - row_dicts = [] - for row_number, row in enumerate(rows[1:], start=2): - values = list(row) + [None] * (len(header) - len(row)) - row_dict = {header[index]: values[index] for index in range(len(header))} - row_dict["_row_number"] = row_number - row_dicts.append(row_dict) - return header, row_dicts - - -def build_row_maps(uk_rows): - by_old_exact = {} - by_old_lower = {} - by_new = {} - for row in uk_rows: - old_name = row.get("SimPaths old name") - new_name = row.get("Variable name") - if old_name: - old_text = str(old_name).strip() - by_old_exact.setdefault(old_text, []).append(row) - by_old_lower.setdefault(old_text.lower(), []).append(row) - if new_name: - by_new.setdefault(str(new_name).strip().lower(), []).append(row) - return by_old_exact, by_old_lower, by_new - - -def strip_comments(text: str) -> str: - return COMMENT_RE.sub("", text) - - -def parse_fields(java_path: Path) -> list[Candidate]: - text = strip_comments(java_path.read_text()) - candidates = [] - order = 0 - for match in FIELD_RE.finditer(text): - order += 1 - name = match.group("name") - column_match = COLUMN_RE.search(match.group(0)) - column_name = column_match.group(1) if column_match else None - candidates.append( - Candidate( - source_name=name, - source_kind="Java field", - java_class=java_path.name, - java_path=java_path, - order=order, - sibling_name=column_name, - ) - ) - return candidates - - -def parse_enum_constants(java_path: Path, enum_name: str) -> list[str]: - text = strip_comments(java_path.read_text()) - match = re.search(r"\benum\s+" + re.escape(enum_name) + r"\s*\{", text) - if not match: - return [] - start = match.end() - depth = 1 - cursor = start - while cursor < len(text) and depth > 0: - if text[cursor] == "{": - depth += 1 - elif text[cursor] == "}": - depth -= 1 - cursor += 1 - body = text[start : cursor - 1] - constants_section = body.split(";", 1)[0] - constants = [] - for raw_part in constants_section.split(","): - token = raw_part.strip() - if re.fullmatch(r"[A-Za-z_][A-Za-z0-9_]*", token): - constants.append(token) - return constants - - -def candidate_is_excluded(candidate: Candidate) -> str | None: - if candidate.source_name in EXCLUDED_REASONS: - return EXCLUDED_REASONS[candidate.source_name] - lower_name = candidate.source_name.lower() - if candidate.java_class == "Processed.java": - return "Metadata record field" - if lower_name.endswith("series"): - return "Series wrapper helper" - if lower_name.endswith("cachekey") or lower_name.endswith("cacheyear"): - return "Cache field" - if lower_name.startswith("cached"): - return "Cache field" - if lower_name.endswith("map") and candidate.source_kind == "Java field": - return "Helper map/cache" - if candidate.source_name in {"benefitUnit", "household", "members"}: - return "Entity association" - return None - - -def make_base_row(header: list[str]) -> dict[str, object]: - row = {column: None for column in header} - row["Variable name (concatenate)"] = None - row["Variable name"] = None - row["Description"] = None - row["Java class"] = None - row["Initial population"] = "No" - row["Module"] = None - row["Output file name"] = None - row["Notes"] = None - row["SimPaths old name"] = None - return row - - -def clone_row(template_row: dict[str, object], header: list[str]) -> dict[str, object]: - row = {column: template_row.get(column) for column in header} - return row - - -def choose_template_by_new( - new_name: str, - class_hint: str, - uk_by_new: dict[str, list[dict[str, object]]], -) -> dict[str, object] | None: - options = uk_by_new.get(new_name.lower(), []) - if not options: - return None - for option in options: - if option.get("Java class") == class_hint: - return option - return options[0] - - -def choose_template_by_old( - old_name: str, - class_hint: str, - uk_by_old_exact: dict[str, list[dict[str, object]]], - uk_by_old_lower: dict[str, list[dict[str, object]]], -) -> tuple[dict[str, object] | None, str | None]: - if not old_name: - return None, None - - best_template = None - best_basis = None - best_score = -1 - - search_spaces = [ - ("UK old name exact", uk_by_old_exact.get(old_name, []), 10), - ("UK old name lower-case", uk_by_old_lower.get(old_name.lower(), []), 5), - ] - - for basis, options, base_score in search_spaces: - for option in options: - score = base_score - if option.get("Java class") == class_hint: - score += 3 - if str(option.get("SimPaths old name") or "") == old_name: - score += 1 - if score > best_score: - best_template = option - best_basis = basis - best_score = score - - return best_template, best_basis - - -def find_template_for_candidate( - candidate: Candidate, - uk_by_old_exact: dict[str, list[dict[str, object]]], - uk_by_old_lower: dict[str, list[dict[str, object]]], -) -> tuple[dict[str, object] | None, str | None]: - search_items = [ - ("EU field", candidate.source_name), - ("EU DB alias", candidate.sibling_name), - ] - - best_template = None - best_basis = None - best_score = -1 - - for label, raw_name in search_items: - if not raw_name: - continue - template, basis = choose_template_by_old(raw_name, candidate.java_class, uk_by_old_exact, uk_by_old_lower) - if not template: - continue - score = 0 - if label == "EU field": - score += 4 - if basis == "UK old name exact": - score += 2 - if template.get("Java class") == candidate.java_class: - score += 3 - if score > best_score: - best_template = template - best_basis = f"{label} via {basis}" - best_score = score - - return best_template, best_basis - - -def build_manual_row( - header: list[str], - uk_by_old_exact: dict[str, list[dict[str, object]]], - uk_by_old_lower: dict[str, list[dict[str, object]]], - uk_by_new: dict[str, list[dict[str, object]]], - candidate: Candidate, - spec: ManualSpec, -) -> tuple[dict[str, object], int | None]: - if spec.reference_old: - template, _ = choose_template_by_old(spec.reference_old, candidate.java_class, uk_by_old_exact, uk_by_old_lower) - elif spec.reference_new: - template = choose_template_by_new(spec.reference_new, candidate.java_class, uk_by_new) - else: - template = None - - if template: - row = clone_row(template, header) - template_row_number = template["_row_number"] - else: - row = make_base_row(header) - template_row_number = None - - if spec.variable_name: - row["Variable name (concatenate)"] = spec.variable_name - row["Variable name"] = spec.variable_name - row["Java class"] = candidate.java_class - row["SimPaths old name"] = candidate.source_name - row["EU DB column / alias"] = candidate.sibling_name if candidate.sibling_name != candidate.source_name else None - if candidate.java_class in {"Statistics.java", "Statistics2.java", "AlignmentAdjustmentFactors.java"}: - row["Output file name"] = f"File: {candidate.java_class.replace('.java', '')}" - - if spec.overrides: - row.update(spec.overrides) - - row["Comparison status"] = spec.status - row["Match basis"] = spec.match_basis - row["UK reference"] = spec.reference_new or spec.reference_old - row["Needs review"] = spec.review_required - row["Review note"] = spec.review_note - return row, template_row_number - - -def build_alignment_row(header: list[str], candidate: Candidate) -> dict[str, object]: - variable_name, mod, main_attribute, measurement, other_info, description = ALIGNMENT_SPECS[candidate.source_name] - row = make_base_row(header) - row["Local"] = None - row["Mod"] = mod - row["Main attribute"] = main_attribute - row["Measurement"] = measurement - row["Other info"] = other_info - row["Variable name (concatenate)"] = variable_name - row["Variable name"] = variable_name - row["Description"] = description - row["Java class"] = candidate.java_class - row["Initial population"] = "No" - row["Output file name"] = "File: AlignmentAdjustmentFactors" - row["Module"] = { - "dem": "Demography", - "edu": "Education", - "health": "Health", - "lab": "Labour", - }[mod] - row["Notes"] = "New SimPathsEU alignment output relative to the reviewed UK workbook." - row["SimPaths old name"] = candidate.source_name - row["EU DB column / alias"] = candidate.sibling_name if candidate.sibling_name != candidate.source_name else None - row["Comparison status"] = "New vs UK" - row["Match basis"] = "Manual review of SimPathsEU alignment outputs" - row["UK reference"] = None - row["Needs review"] = "Yes" if "Ac" in variable_name or "WithDep" in variable_name else "No" - row["Review note"] = ( - "Subgroup naming is inferred because the reviewed UK workbook does not contain this alignment subgroup." - if row["Needs review"] == "Yes" - else "" - ) - return row - - -def build_exact_row( - header: list[str], - template: dict[str, object], - candidate: Candidate, - status: str, - match_basis: str, -) -> dict[str, object]: - row = clone_row(template, header) - row["Java class"] = candidate.java_class - row["SimPaths old name"] = candidate.source_name - row["EU DB column / alias"] = candidate.sibling_name if candidate.sibling_name != candidate.source_name else None - row["Comparison status"] = status - row["Match basis"] = match_basis - row["UK reference"] = template.get("Variable name") - row["Needs review"] = "No" - row["Review note"] = "" - if candidate.java_class in {"Statistics.java", "Statistics2.java", "AlignmentAdjustmentFactors.java"}: - row["Output file name"] = f"File: {candidate.java_class.replace('.java', '')}" - return row - - -def lookup_manual_spec(candidate: Candidate) -> ManualSpec | None: - return CLASS_MANUAL_SPECS.get((candidate.java_class, candidate.source_name)) or MANUAL_SPECS.get(candidate.source_name) - - -def row_fill_for_status(status: str): - if status == "Matched UK": - return MATCHED_FILL - if status == "Adapted from UK": - return ADAPTED_FILL - if status == "New vs UK": - return NEW_FILL - return None - - -def autofit_columns(ws): - for column_cells in ws.columns: - values = ["" if cell.value is None else str(cell.value) for cell in column_cells] - max_length = max(len(value) for value in values) - width = min(max(max_length + 2, 10), 40) - ws.column_dimensions[column_cells[0].column_letter].width = width - - -def reset_sheet(ws): - if ws.max_row > 1: - ws.delete_rows(2, ws.max_row - 1) - - -def write_header(ws, header: list[str]): - for index, column_name in enumerate(header, start=1): - cell = ws.cell(row=1, column=index, value=column_name) - cell.font = Font(bold=True) - cell.fill = HEADER_STYLE_FILL - cell.alignment = Alignment(horizontal="center", vertical="center", wrap_text=True) - ws.freeze_panes = "A2" - ws.auto_filter.ref = ws.dimensions - - -def write_rows(ws, header: list[str], rows: Iterable[dict[str, object]]): - for row_number, row in enumerate(rows, start=2): - for column_number, column_name in enumerate(header, start=1): - ws.cell(row=row_number, column=column_number, value=row.get(column_name)) - fill = row_fill_for_status(str(row.get("Comparison status"))) - if fill: - for column_number in range(1, len(header) + 1): - ws.cell(row=row_number, column=column_number).fill = fill - if row.get("Needs review") == "Yes": - for column_name in ("Needs review", "Review note"): - column_index = header.index(column_name) + 1 - ws.cell(row=row_number, column=column_index).fill = REVIEW_FILL - for column_name in ("Description", "Notes", "Review note"): - column_index = header.index(column_name) + 1 - ws.cell(row=row_number, column=column_index).alignment = Alignment(wrap_text=True, vertical="top") - ws.auto_filter.ref = ws.dimensions - - -def sort_codebook_rows(rows: list[dict[str, object]]) -> list[dict[str, object]]: - def key(row): - mod = str(row.get("Mod") or "") - variable_name = str(row.get("Variable name") or "") - java_class = str(row.get("Java class") or "") - old_name = str(row.get("SimPaths old name") or "") - return (MODULE_ORDER.get(mod, 99), mod, variable_name.lower(), java_class.lower(), old_name.lower()) - - return sorted(rows, key=key) - - -def update_cover_sheet(workbook): - ws = workbook["Cover"] - ws["B2"] = "SimPathsEU Codebook" - ws["C4"] = date.today() - ws["C5"] = "Codex review based on SimPaths_Variable_CodebookUK" - ws["B7"] = "Workbook contents" - ws["B8"] = "Variables: one logical SimPathsEU field per row, with DB aliases attached to the same row" - ws["B9"] = "EU Review Summary: scope, counts, highlight legend, and review totals" - ws["B10"] = "Collision Review: proposed names still shared across multiple EU fields/classes" - ws["B11"] = "UK Review Issues: inconsistencies detected in the reviewed UK workbook" - ws["B12"] = "Controls: time-series and alignment enums reviewed for refactoring" - ws["B13"] = "UK Only: variables present in the reviewed UK workbook but not matched in SimPathsEU" - ws["B14"] = "Excluded Internals: infrastructure/cache/metadata fields excluded from the codebook" - ws["B15"] = "Deferred Inventory: regression and validation enums reviewed but left outside the main UK-style codebook pass" - - -def build_summary_rows( - included_rows: list[dict[str, object]], - uk_only_rows: list[dict[str, object]], - excluded_rows: list[dict[str, object]], - deferred_rows: list[dict[str, object]], - control_rows: list[dict[str, object]], - collision_rows: list[dict[str, object]], - uk_issue_rows: list[dict[str, object]], -): - counts = { - "Core variables included": len(included_rows), - "Matched UK": sum(1 for row in included_rows if row["Comparison status"] == "Matched UK"), - "Adapted from UK": sum(1 for row in included_rows if row["Comparison status"] == "Adapted from UK"), - "New vs UK": sum(1 for row in included_rows if row["Comparison status"] == "New vs UK"), - "Needs review": sum(1 for row in included_rows if row["Needs review"] == "Yes"), - "Remaining EU name-collision groups": len({row["Variable name"] for row in collision_rows}), - "UK workbook review issues logged": len(uk_issue_rows), - "Controls reviewed": len(control_rows), - "UK-only rows": len(uk_only_rows), - "Excluded internal/metadata fields": len(excluded_rows), - "Deferred regression/validation enum constants": len(deferred_rows), - } - - lines = [ - ("Generated on", date.today().isoformat()), - ("Scope", "Person, BenefitUnit, Household, Statistics, Statistics2, AlignmentAdjustmentFactors; UK workbook reviewed as naming reference."), - ("Coverage note", "Core variables are codebooked one logical field per row. Time-series/alignment enums are reviewed in Controls. Regression/validation enums are inventoried separately in Deferred Inventory."), - ] - for key, value in counts.items(): - lines.append((key, value)) - lines.extend( - [ - ("Highlight legend", None), - ("Green rows", "Direct UK mapping reused for SimPathsEU."), - ("Blue rows", "UK naming pattern reused but adapted for an EU variant."), - ("Yellow rows", "New SimPathsEU variable relative to the reviewed UK workbook."), - ("Orange review cells", "Name needs confirmation because the UK workbook had no exact analogue or the EU semantics differ."), - ] - ) - return lines - - -def build_collision_rows(included_rows: list[dict[str, object]]) -> list[dict[str, object]]: - grouped = {} - for row in included_rows: - grouped.setdefault(row["Variable name"], []).append(row) - - collisions = [] - for variable_name, rows in grouped.items(): - if len(rows) < 2: - continue - for row in rows: - collisions.append( - { - "Variable name": variable_name, - "Java class": row.get("Java class"), - "SimPaths old name": row.get("SimPaths old name"), - "EU DB column / alias": row.get("EU DB column / alias"), - "Comparison status": row.get("Comparison status"), - "Needs review": row.get("Needs review"), - "Review note": row.get("Review note") or "Shared proposed name across multiple EU fields/classes; review if global uniqueness is required for the refactor.", - } - ) - return sorted(collisions, key=lambda item: (str(item["Variable name"]), str(item["Java class"]), str(item["SimPaths old name"]))) - - -def build_uk_review_issues(uk_rows: list[dict[str, object]]) -> list[dict[str, object]]: - issues = [] - - for row in uk_rows: - old_name = str(row.get("SimPaths old name") or "") - variable_name = str(row.get("Variable name") or "") - if old_name == "idFather" and variable_name == "idMother": - issues.append( - { - "Issue type": "Incorrect mapping", - "Variable name": variable_name, - "SimPaths old name": old_name, - "Java class": row.get("Java class"), - "Description": row.get("Description"), - "Review note": "UK workbook row uses idMother as the proposed name for old name idFather despite a Father ID description.", - } - ) - if old_name == "ydses_p60" and variable_name == "yHhQuintilesC5P80": - issues.append( - { - "Issue type": "Incorrect statistic label", - "Variable name": variable_name, - "SimPaths old name": old_name, - "Java class": row.get("Java class"), - "Description": row.get("Description"), - "Review note": "UK workbook labels the 60th percentile row as C5P80.", - } - ) - - duplicate_counts = {} - for row in uk_rows: - variable_name = row.get("Variable name") - if variable_name: - duplicate_counts.setdefault(variable_name, []).append(row) - - for variable_name, rows in duplicate_counts.items(): - if len(rows) < 2: - continue - issues.append( - { - "Issue type": "Non-unique variable name", - "Variable name": variable_name, - "SimPaths old name": "; ".join(str(row.get("SimPaths old name")) for row in rows), - "Java class": "; ".join(str(row.get("Java class")) for row in rows), - "Description": rows[0].get("Description"), - "Review note": f"Reviewed UK workbook uses this proposed name for {len(rows)} different rows/classes.", - } - ) - - return sorted(issues, key=lambda item: (str(item["Issue type"]), str(item["Variable name"]))) - - -def create_or_replace_sheet(workbook, title: str): - if title in workbook.sheetnames: - index = workbook.sheetnames.index(title) - del workbook[title] - return workbook.create_sheet(title, index) - return workbook.create_sheet(title) - - -def clone_sheet_layout(source_ws, target_ws): - target_ws.sheet_format.defaultColWidth = source_ws.sheet_format.defaultColWidth - target_ws.sheet_format.defaultRowHeight = source_ws.sheet_format.defaultRowHeight - target_ws.freeze_panes = source_ws.freeze_panes - target_ws.auto_filter.ref = source_ws.auto_filter.ref - target_ws.sheet_view.zoomScale = source_ws.sheet_view.zoomScale - target_ws.sheet_properties.tabColor = source_ws.sheet_properties.tabColor - - for merged_range in source_ws.merged_cells.ranges: - target_ws.merge_cells(str(merged_range)) - - for key, dimension in source_ws.column_dimensions.items(): - target_dimension = target_ws.column_dimensions[key] - target_dimension.width = dimension.width - target_dimension.hidden = dimension.hidden - target_dimension.bestFit = dimension.bestFit - - for key, dimension in source_ws.row_dimensions.items(): - target_dimension = target_ws.row_dimensions[key] - target_dimension.height = dimension.height - target_dimension.hidden = dimension.hidden - target_dimension.outlineLevel = dimension.outlineLevel - - -def clone_sheet_cells(source_ws, target_ws): - for row in source_ws.iter_rows(): - for cell in row: - target = target_ws.cell(row=cell.row, column=cell.column, value=cell.value) - if cell.has_style: - target.font = copy.copy(cell.font) - target.fill = copy.copy(cell.fill) - target.border = copy.copy(cell.border) - target.alignment = copy.copy(cell.alignment) - target.number_format = cell.number_format - target.protection = copy.copy(cell.protection) - if cell.hyperlink: - target._hyperlink = copy.copy(cell.hyperlink) - if cell.comment: - target.comment = copy.copy(cell.comment) - - -def copy_reference_sheet(source_workbook, target_workbook, title: str): - source_ws = source_workbook[title] - target_ws = target_workbook.create_sheet(title) - clone_sheet_cells(source_ws, target_ws) - clone_sheet_layout(source_ws, target_ws) - return target_ws - - -def main(): - uk_workbook = load_workbook(UK_CODEBOOK) - base_header, uk_rows = read_variables_sheet(uk_workbook) - uk_by_old_exact, uk_by_old_lower, uk_by_new = build_row_maps(uk_rows) - - workbook = Workbook() - default_sheet = workbook.active - workbook.remove(default_sheet) - - copy_reference_sheet(uk_workbook, workbook, "Cover") - variables_ws = workbook.create_sheet("Variables") - for title in ("Rules", "Modules", "Abbreviations", "Coding Style", "Country Specific Vars"): - copy_reference_sheet(uk_workbook, workbook, title) - - extended_header = base_header + [ - "EU DB column / alias", - "Comparison status", - "Match basis", - "UK reference", - "Needs review", - "Review note", - ] - - included_rows = [] - excluded_rows = [] - used_uk_rows = set() - - all_candidates = [] - for java_file in CORE_JAVA_FILES: - all_candidates.extend(parse_fields(java_file)) - - for candidate in all_candidates: - exclusion_reason = candidate_is_excluded(candidate) - if exclusion_reason: - excluded_rows.append( - { - "Java class": candidate.java_class, - "Raw name": candidate.source_name, - "Source kind": "Java field", - "Reason": exclusion_reason, - } - ) - continue - - manual_spec = lookup_manual_spec(candidate) - if manual_spec: - row, template_row_number = build_manual_row( - extended_header, - uk_by_old_exact, - uk_by_old_lower, - uk_by_new, - candidate, - manual_spec, - ) - included_rows.append(row) - if template_row_number: - used_uk_rows.add(template_row_number) - continue - - if candidate.source_name in ALIGNMENT_SPECS: - row = build_alignment_row(extended_header, candidate) - included_rows.append(row) - continue - - template, match_basis = find_template_for_candidate(candidate, uk_by_old_exact, uk_by_old_lower) - if template: - row = build_exact_row(extended_header, template, candidate, "Matched UK", match_basis or "UK old name") - included_rows.append(row) - used_uk_rows.add(template["_row_number"]) - continue - - template = choose_template_by_new(candidate.source_name, candidate.java_class, uk_by_new) - if template: - row = build_exact_row(extended_header, template, candidate, "Matched UK", "UK variable name") - included_rows.append(row) - used_uk_rows.add(template["_row_number"]) - continue - - excluded_rows.append( - { - "Java class": candidate.java_class, - "Raw name": candidate.source_name, - "Source kind": "Java field", - "Reason": "Unmapped after review; not added to the core UK-style codebook.", - } - ) - - included_rows = sort_codebook_rows(included_rows) - collision_rows = build_collision_rows(included_rows) - uk_issue_rows = build_uk_review_issues(uk_rows) - - reset_sheet(variables_ws) - write_header(variables_ws, extended_header) - write_rows(variables_ws, extended_header, included_rows) - autofit_columns(variables_ws) - - update_cover_sheet(workbook) - - collision_ws = create_or_replace_sheet(workbook, "Collision Review") - collision_header = [ - "Variable name", - "Java class", - "SimPaths old name", - "EU DB column / alias", - "Comparison status", - "Needs review", - "Review note", - ] - write_header(collision_ws, collision_header) - for row_number, row in enumerate(collision_rows, start=2): - for column_number, column_name in enumerate(collision_header, start=1): - collision_ws.cell(row=row_number, column=column_number, value=row.get(column_name)) - for column_number in range(1, len(collision_header) + 1): - collision_ws.cell(row=row_number, column=column_number).fill = REVIEW_FILL - for column_name in ("Review note",): - collision_ws.cell(row=row_number, column=collision_header.index(column_name) + 1).alignment = Alignment(wrap_text=True, vertical="top") - autofit_columns(collision_ws) - - uk_issues_ws = create_or_replace_sheet(workbook, "UK Review Issues") - uk_issue_header = ["Issue type", "Variable name", "SimPaths old name", "Java class", "Description", "Review note"] - write_header(uk_issues_ws, uk_issue_header) - for row_number, row in enumerate(uk_issue_rows, start=2): - for column_number, column_name in enumerate(uk_issue_header, start=1): - uk_issues_ws.cell(row=row_number, column=column_number, value=row.get(column_name)) - for column_number in range(1, len(uk_issue_header) + 1): - uk_issues_ws.cell(row=row_number, column=column_number).fill = REVIEW_FILL - for column_name in ("Description", "Review note"): - uk_issues_ws.cell(row=row_number, column=uk_issue_header.index(column_name) + 1).alignment = Alignment(wrap_text=True, vertical="top") - autofit_columns(uk_issues_ws) - - uk_only_ws = create_or_replace_sheet(workbook, "UK Only") - uk_only_header = [ - "Variable name", - "SimPaths old name", - "Java class", - "Module", - "Description", - "Reason", - ] - write_header(uk_only_ws, uk_only_header) - uk_only_rows = [] - included_variable_names = {row.get("Variable name") for row in included_rows} - for row in uk_rows: - if row["_row_number"] in used_uk_rows: - continue - if (row.get("SimPaths old name"), row.get("Variable name")) in { - ("idFather", "idMother"), - ("ydses_p60", "yHhQuintilesC5P80"), - }: - continue - if row.get("Variable name") in included_variable_names: - continue - uk_only_rows.append( - { - "Variable name": row.get("Variable name"), - "SimPaths old name": row.get("SimPaths old name"), - "Java class": row.get("Java class"), - "Module": row.get("Module"), - "Description": row.get("Description"), - "Reason": "Present in reviewed UK workbook but no SimPathsEU core-variable match was found in this pass.", - "Comparison status": "UK only", - } - ) - for row_number, row in enumerate(sorted(uk_only_rows, key=lambda item: (str(item["Module"]), str(item["Variable name"]))), start=2): - for column_number, column_name in enumerate(uk_only_header, start=1): - uk_only_ws.cell(row=row_number, column=column_number, value=row.get(column_name)) - for column_number in range(1, len(uk_only_header) + 1): - uk_only_ws.cell(row=row_number, column=column_number).fill = UK_ONLY_FILL - for column_name in ("Description", "Reason"): - uk_only_ws.cell(row=row_number, column=uk_only_header.index(column_name) + 1).alignment = Alignment(wrap_text=True, vertical="top") - autofit_columns(uk_only_ws) - - controls_ws = create_or_replace_sheet(workbook, "Controls") - controls_header = [ - "Enum type", - "Raw name", - "Proposed name", - "Status", - "Needs review", - "Description", - "Notes", - ] - write_header(controls_ws, controls_header) - control_rows = [] - for java_path, enum_name in CONTROL_ENUM_FILES.items(): - for raw_name in parse_enum_constants(java_path, enum_name): - proposed_name, status, description, needs_review, notes = CONTROL_MANUAL[enum_name][raw_name] - control_rows.append( - { - "Enum type": enum_name, - "Raw name": raw_name, - "Proposed name": proposed_name, - "Status": status, - "Needs review": needs_review, - "Description": description, - "Notes": notes, - } - ) - for row_number, row in enumerate(control_rows, start=2): - for column_number, column_name in enumerate(controls_header, start=1): - controls_ws.cell(row=row_number, column=column_number, value=row.get(column_name)) - fill = row_fill_for_status(row["Status"]) - if fill: - for column_number in range(1, len(controls_header) + 1): - controls_ws.cell(row=row_number, column=column_number).fill = fill - if row["Needs review"] == "Yes": - controls_ws.cell(row=row_number, column=controls_header.index("Needs review") + 1).fill = REVIEW_FILL - controls_ws.cell(row=row_number, column=controls_header.index("Notes") + 1).fill = REVIEW_FILL - autofit_columns(controls_ws) - - excluded_ws = create_or_replace_sheet(workbook, "Excluded Internals") - excluded_header = ["Java class", "Raw name", "Source kind", "Reason"] - write_header(excluded_ws, excluded_header) - for row_number, row in enumerate(sorted(excluded_rows, key=lambda item: (item["Java class"], item["Raw name"])), start=2): - for column_number, column_name in enumerate(excluded_header, start=1): - excluded_ws.cell(row=row_number, column=column_number, value=row.get(column_name)) - for column_number in range(1, len(excluded_header) + 1): - excluded_ws.cell(row=row_number, column=column_number).fill = DEFERRED_FILL - autofit_columns(excluded_ws) - - deferred_ws = create_or_replace_sheet(workbook, "Deferred Inventory") - deferred_header = ["Source", "Enum", "Raw variable", "Reason", "Suggested next step"] - write_header(deferred_ws, deferred_header) - deferred_rows = [] - for java_path, enum_names in DEFERRED_ENUMS.items(): - for enum_name in enum_names: - for raw_name in parse_enum_constants(java_path, enum_name): - deferred_rows.append( - { - "Source": java_path.name, - "Enum": enum_name, - "Raw variable": raw_name, - "Reason": "Reviewed but left outside the core UK-style codebook because this enum is a regression term or validation series rather than a core state/output variable.", - "Suggested next step": "Handle in a second pass if you want regression/validation identifiers renamed as well.", - } - ) - for row_number, row in enumerate(deferred_rows, start=2): - for column_number, column_name in enumerate(deferred_header, start=1): - deferred_ws.cell(row=row_number, column=column_number, value=row.get(column_name)) - for column_number in range(1, len(deferred_header) + 1): - deferred_ws.cell(row=row_number, column=column_number).fill = DEFERRED_FILL - deferred_ws.cell(row=row_number, column=deferred_header.index("Reason") + 1).alignment = Alignment(wrap_text=True, vertical="top") - deferred_ws.cell(row=row_number, column=deferred_header.index("Suggested next step") + 1).alignment = Alignment(wrap_text=True, vertical="top") - autofit_columns(deferred_ws) - - summary_ws = create_or_replace_sheet(workbook, "EU Review Summary") - summary_header = ["Item", "Value"] - write_header(summary_ws, summary_header) - summary_rows = build_summary_rows(included_rows, uk_only_rows, excluded_rows, deferred_rows, control_rows, collision_rows, uk_issue_rows) - for row_number, (item, value) in enumerate(summary_rows, start=2): - summary_ws.cell(row=row_number, column=1, value=item) - summary_ws.cell(row=row_number, column=2, value=value) - if item == "Highlight legend": - summary_ws.cell(row=row_number, column=1).font = Font(bold=True) - summary_ws.cell(row=row_number, column=2).alignment = Alignment(wrap_text=True, vertical="top") - autofit_columns(summary_ws) - - if TEMP_OUTPUT_CODEBOOK.exists(): - TEMP_OUTPUT_CODEBOOK.unlink() - workbook.save(TEMP_OUTPUT_CODEBOOK) - TEMP_OUTPUT_CODEBOOK.replace(OUTPUT_CODEBOOK) - - print(f"Wrote {OUTPUT_CODEBOOK}") - print(f"Core rows: {len(included_rows)}") - print(f"UK-only rows: {len(uk_only_rows)}") - print(f"Excluded rows: {len(excluded_rows)}") - print(f"Deferred rows: {len(deferred_rows)}") - - -if __name__ == "__main__": - main() diff --git a/validation/01_estimate_validation/do_files/00_master_estimate_validation_PL.do b/validation/01_estimate_validation/do_files/00_master_estimate_validation_PL.do index d04c652..5de2018 100644 --- a/validation/01_estimate_validation/do_files/00_master_estimate_validation_PL.do +++ b/validation/01_estimate_validation/do_files/00_master_estimate_validation_PL.do @@ -24,6 +24,10 @@ set matsize 1000 //*C:/Users/Aleksandra/Box //*/Users/ashleyburdett/Library/CloudStorage/Box-Box/CeMPA shared area/_SimPaths/_SimPathsEU/validation/01_estimate_validation/PL/TIDIED/with_time_trend +* Individual main path +global dir_ind "/Users/ashleyburdett/Library/CloudStorage/Box-Box" + + * Working directory which contains sub-folders and will contain formatted output global dir_work "/Users/ashleyburdett/Library/CloudStorage/Box-Box/CeMPA shared area/_SimPaths/_SimPathsEU/validation/01_estimate_validation/PL/with_time_trend" @@ -31,7 +35,7 @@ global dir_work "/Users/ashleyburdett/Library/CloudStorage/Box-Box/CeMPA shared global dir_do "$dir_work/do_files" * Directory which contains data files generated in the do files -global dir_data "/Users/ashleyburdett/Library/CloudStorage/Box-Box/CeMPA shared area/_SimPaths/_SimPathsEU/regression_estimates/PL/with_time_trend/data" +global dir_data "/Users/ashleyburdett/Library/CloudStorage/Box-Box/CeMPA shared area/_SimPaths/_SimPathsEU/input_processing/regression_estimates/PL/with_time_trend/data" * Directory containing internal validation output global dir_internal_validation "$dir_work/graphs" @@ -48,92 +52,11 @@ global first_sim_year "2011" global last_sim_year "2023" -* Define threshold ages -/* -Ages used for specifying samples. - -ENSURE THE SAME AS THE GLOBALS USED IN THE INTIIAL POPULATIONS MASTER FILE -*/ - -* Age become an adult in various dimensions -global age_becomes_responsible 18 - -global age_becomes_semi_responsible 16 - -global age_seek_employment 16 - -global age_leave_school 16 - -global age_form_partnership 18 - -global age_have_child_min 18 - -global age_leave_parental_home 18 - -global age_own_home 18 - -* Age can/must/cannot make various transitions -global age_max_dep_child 17 - -global age_adult 18 - -global age_can_retire 50 - -global age_force_retire 75 - -global age_force_leave_spell1_edu 30 - -global age_have_child_max 49 // allow this to be led by the data - - /******************************************************************************* -* PROCESS IF CONDITIONS +* DEFINE PARAMETERS & PROCESS IF CONDITIONS *******************************************************************************/ -* Education -global e1a_if_condition "dag >= ${age_leave_school} & dag < ${age_force_leave_spell1_edu} & l.les_c4 == 2 & flag_deceased != 1" - -global e1b_if_condition "dag >= ${age_leave_school} & l.les_c4 != 4 & l.les_c4 != 2 & flag_deceased != 1" - -global e2_if_condition "dag >= ${age_leave_school} & l.les_c4 == 2 & les_c4 != 2 & flag_deceased != 1" - -* Leave the parental home -global p1_if_condition "ded == 0 & dag >= ${age_leave_parental_home} & flag_deceased != 1" - -* Partnership -global u1_if_condition "dag >= ${age_form_partnership} & ssscp != 1 & flag_deceased != 1" - -global u2_if_condition "dgn == 0 & dag >= ${age_form_partnership} & l.ssscp != 1 & dag < ${age_cannot_separate} & flag_deceased != 1 & flag_deceased != 1" - -* Fertility -global f1_if_condition "dag >= ${age_have_child_min} & dag <= ${age_have_child_max} & dgn == 0 & flag_deceased != 1" - -* Health -global h1_if_condition "dag >= ${age_becomes_semi_responsible} & flag_dhe_imp == 0 & flag_deceased != 1" - -global h2_if_condition "dag >= ${age_becomes_semi_responsible} & ded == 0 & flag_deceased != 1" - -* Home ownership -global ho1_if_condition "dag >= ${age_own_home} & flag_deceased != 1" - -* Retirment -global r1a_if_condition "dcpst == 2 & dag >= ${age_can_retire} & flag_deceased != 1" - -global r1b_if_condition "ssscp != 1 & dcpst == 1 & dag >= ${age_can_retire} & flag_deceased != 1" - -* WAGES -global W1fa_if_condition "dgn == 0 & dag >= ${age_seek_employment} & dag <= ${age_force_retire} & flag_deceased != 1" - -global W1ma_if_condition "dgn == 1 & dag >= ${age_seek_employment} & dag <= ${age_force_retire} & flag_deceased != 1" - -global W1fb_if_condition "dgn == 0 & dag >= ${age_seek_employment} & dag <= ${age_force_retire} & previouslyWorking == 1 & flag_deceased != 1" - -global W1mb_if_condition "dgn == 1 & dag >= ${age_seek_employment} & dag <= ${age_force_retire} & previouslyWorking == 1 & flag_deceased != 1" - -* CAPITAL INCOME -global i1a_if_condition "dag >= ${age_becomes_semi_responsible} & flag_deceased != 1" - -global i1b_if_condition "dag >= ${age_becomes_semi_responsible} & receives_ypncp == 1 & flag_deceased != 1" +do "$dir_ind/CeMPA shared area/_SimPaths/_SimPathsEU/input_processing/00_master_conditions.do" /******************************************************************************* diff --git a/validation/01_estimate_validation/do_files/03_estimate_validation_partnership_PL.do b/validation/01_estimate_validation/do_files/03_estimate_validation_partnership_PL.do index 0b5c784..0a54041 100644 --- a/validation/01_estimate_validation/do_files/03_estimate_validation_partnership_PL.do +++ b/validation/01_estimate_validation/do_files/03_estimate_validation_partnership_PL.do @@ -18,9 +18,9 @@ ******************************************************************************** -****************************************************** -* U1a: Partnership formation, left initial edu spell * -****************************************************** +***************************** +* U1: Partnership formation * +***************************** * Overall use "$dir_data/U1_sample", clear @@ -275,6 +275,7 @@ graph drop _all restore + ******************************* * U2: Partnership termination * ******************************* diff --git a/validation/01_estimate_validation/do_files/08_estimate_validation_wages_PL.do b/validation/01_estimate_validation/do_files/08_estimate_validation_wages_PL.do index 8a755fb..b50a1f5 100644 --- a/validation/01_estimate_validation/do_files/08_estimate_validation_wages_PL.do +++ b/validation/01_estimate_validation/do_files/08_estimate_validation_wages_PL.do @@ -3,18 +3,12 @@ * SECTION: Wages * OBJECT: Internal validation * AUTHORS: Ashley Burdett, Aleksandra Kolndrekaj -* LAST UPDATE: JAn 2026 +* LAST UPDATE: Feb 2026 * COUNTRY: Poland ******************************************************************************** -* NOTES: Compares predicted values to the observed values of the -* 3 education processes estimated. -* Individual heterogeneity added to the standard predicted -* values form the using a random draw like in stochasitic -* imputation. The pooled mean is obtained as in multiple -* imputation by repeating the random draw 20 times for each -* process. +* NOTES: * -* Run after "reg_income_PL.do" +* ******************************************************************************** /******************** WAGES: FEMALE, NO PREV WAGE OBSERVED *********************/ @@ -45,6 +39,38 @@ graph export /// sum wage_hour if `filter' & (!missing(wage_hour)) & in_sample_fnpw == 1 [aw=dwt] sum pred_hourly_wage if `filter' & (!missing(wage_hour)) & /// in_sample_fnpw == 1 [aw=dwt] + + +* Mean by age + +use "$dir_data/Female_NPW_sample", clear + +preserve + collapse (mean) wage_hour pred_hourly_wage if ${W1fa_if_condition} & /// + lwage_hour != . , by(dag) + + twoway (scatter wage_hour dag, mcolor(grey%40)) /// + (line pred_hourly_wage dag, lcolor(blue) lwidth(medthick)), /// + title("Observed vs. Predicted Wages ") /// + xtitle("Age") ytitle("Hourly Wage") /// + legend(label(1 "Observed Mean") label(2 "Predicted ")) /// + graphregion(color(white)) +restore + + +preserve + collapse (mean) lwage_hour lwage_hour_hat if ${W1fa_if_condition} & /// + lwage_hour != . , by(dag) + + twoway (scatter lwage_hour dag, mcolor(grey%40)) /// + (line lwage_hour_hat dag, lcolor(blue) lwidth(medthick)), /// + title("Observed vs. Predicted Wages ") /// + xtitle("Age") ytitle("Log Hourly Wage") /// + legend(label(1 "Observed Mean") label(2 "Predicted ")) /// + graphregion(color(white)) +restore + + /********************* WAGES: MALE, NO PREV WAGE OBSERVED *********************/ @@ -75,6 +101,36 @@ graph export /// sum wage_hour if `filter' & (!missing(wage_hour)) & in_sample_mnpw == 1 [aw=dwt] sum pred_hourly_wage if `filter' & (!missing(wage_hour)) & /// in_sample_mnpw == 1 [aw=dwt] + + +* Mean by age + +use "$dir_data/Male_NPW_sample", clear + +preserve + collapse (mean) wage_hour pred_hourly_wage if ${W1ma_if_condition} & /// + lwage_hour != . , by(dag) + + twoway (scatter wage_hour dag, mcolor(grey%40)) /// + (line pred_hourly_wage dag, lcolor(blue) lwidth(medthick)), /// + title("Observed vs. Predicted Wages ") /// + xtitle("Age") ytitle("Hourly Wage") /// + legend(label(1 "Observed Mean") label(2 "Predicted ")) /// + graphregion(color(white)) +restore + + +preserve + collapse (mean) lwage_hour lwage_hour_hat if ${W1ma_if_condition} & /// + lwage_hour != . , by(dag) + + twoway (scatter lwage_hour dag, mcolor(grey%40)) /// + (line lwage_hour_hat dag, lcolor(blue) lwidth(medthick)), /// + title("Observed vs. Predicted Wages ") /// + xtitle("Age") ytitle("Log Hourly Wage") /// + legend(label(1 "Observed Mean") label(2 "Predicted ")) /// + graphregion(color(white)) +restore /********************** WAGES: FEMALE, PREV WAGE OBSERVED *********************/ @@ -103,6 +159,36 @@ sum wage_hour if ${W1fb_if_condition} [aw=dwt] sum pred_hourly_wage if ${W1fb_if_condition} & (!missing(wage_hour)) [aw=dwt] +* Mean by age + +use "$dir_data/Female_PW_sample", clear + +preserve + collapse (mean) wage_hour pred_hourly_wage if ${W1fb_if_condition} & /// + lwage_hour != . , by(dag) + + twoway (scatter wage_hour dag, mcolor(grey%40)) /// + (line pred_hourly_wage dag, lcolor(blue) lwidth(medthick)), /// + title("Observed vs. Predicted Wages") /// + xtitle("Age") ytitle("Hourly Wage") /// + legend(label(1 "Observed Mean") label(2 "Predicted ")) /// + graphregion(color(white)) +restore + + +preserve + collapse (mean) lwage_hour lwage_hour_hat if ${W1fb_if_condition} & /// + lwage_hour != . , by(dag) + + twoway (scatter lwage_hour dag, mcolor(grey%40)) /// + (line lwage_hour_hat dag, lcolor(blue) lwidth(medthick)), /// + title("Observed vs. Predicted Wages") /// + xtitle("Age") ytitle("Log Hourly Wage") /// + legend(label(1 "Observed Mean") label(2 "Predicted ")) /// + graphregion(color(white)) +restore + + /*********************** WAGES:MALE, PREV WAGE OBSERVED ***********************/ use "$dir_data/Male_PW_sample", clear @@ -129,8 +215,35 @@ sum wage_hour if ${W1fb_if_condition} [aw=dwt] sum pred_hourly_wage if ${W1fb_if_condition} & (!missing(wage_hour)) [aw=dwt] -graph drop _all +* Mean by age +use "$dir_data/Male_PW_sample", clear +preserve + collapse (mean) wage_hour pred_hourly_wage if ${W1mb_if_condition} & /// + lwage_hour != . , by(dag) + + twoway (scatter wage_hour dag, mcolor(grey%40)) /// + (line pred_hourly_wage dag, lcolor(blue) lwidth(medthick)), /// + title("Observed vs. Predicted Wages ") /// + xtitle("Age") ytitle("Hourly Wage") /// + legend(label(1 "Observed Mean") label(2 "Predicted ")) /// + graphregion(color(white)) +restore + + +preserve + collapse (mean) lwage_hour lwage_hour_hat if ${W1mb_if_condition} & /// + lwage_hour != . , by(dag) + + twoway (scatter lwage_hour dag, mcolor(grey%40)) /// + (line lwage_hour_hat dag, lcolor(blue) lwidth(medthick)), /// + title("Observed vs. Predicted Wages ") /// + xtitle("Age") ytitle("Log Hourly Wage") /// + legend(label(1 "Observed Mean") label(2 "Predicted ")) /// + graphregion(color(white)) +restore +graph drop _all + diff --git a/validation/02_simulation_validation/do_files/00_master_simulation_validation.do b/validation/02_simulation_validation/do_files/00_master_simulation_validation.do deleted file mode 100644 index 6512d73..0000000 --- a/validation/02_simulation_validation/do_files/00_master_simulation_validation.do +++ /dev/null @@ -1,131 +0,0 @@ -/******************************************************************************* -* PROJECT: SimPaths EU -* SECTION: Validation -* OBJECT: Master file - longitudinal SILC -* AUTHORS: Ashley Burdett -* LAST UPDATE: 11/2025 (AB) -* COUNTRY: Poland -******************************************************************************** -* NOTES: This master do file organises do files used for validating -* SimPaths model using EU-SILC data for Poland. This version -* utilizes longitudinal SILC data from the initial populations -* -* -* Copy and paste the relevant simulated output do files into -* the data subfolder. -*******************************************************************************/ -clear all - -set logtype smcl -set more off -set mem 200m -set type double - - -/******************************************************************************* -* DEFINE GLOBALS -*******************************************************************************/ - -global country = "PL" -global country_lower = "pl" -display in y "Country selected: ${country}" - -global silc_UDB = "UDB_c" - - -/******************************************************************************* -* DEFINE DIRECTORIES -*******************************************************************************/ - - -* Ashley - -* VM -//global path "C:\Users\aburde\Box\CeMPA shared area\ESPON - OVERLAP\_countries\IT\validation" - -* Mac -global path "/Users/ashleyburdett/Library/CloudStorage/Box-Box/CeMPA shared area/_SimPaths/_SimPathsEU/validation/02_simulation_output_validation/PL" - - -///Users/ashleyburdett/Library/CloudStorage/Box-Box/CeMPA shared area/_SimPaths/_SimPathsEU/validation/PL/validation -//"/Users/ashleyburdett/Library/CloudStorage/Box-Box/CeMPA shared area/ESPON - OVERLAP/_countries/${country}/validation" -//"/Users/ashleyburdett/Documents/ESPON/${country}/validation" - -//global dir_data "$path/data" //folder where output files stored -global dir_do_files "$path/longitudinal_SILC/do_files" //folder where do-files are stored -//global dir_work "$path/data" //folder where output files stored -//global dir_data "$path/data" - -global dir_simulated_data "$path/data" -global dir_work "/$path/data" -global dir_data "$path/data" - -global dir_init_pop_data "/Users/ashleyburdett/Library/CloudStorage/Box-Box/CeMPA shared area/_SimPaths/_SimPathsEU/initial_populations/PL/data" - -global dir_output_files "$path/longitudinal_SILC/graphs" //folder where validations graphs are stored - - -/******************************************************************************* -* DEFINE SAMPLE PARAMETERS -*******************************************************************************/ - -global use_assert "0" - -* Trim outliers -global trim_outliers true - -* Min age of individuals included in plots -global min_age 18 - -* Max age of individuals included in plots -global max_age 65 - -* Observations up to and including this simulated year will be kept in the sample -global max_year 2023 - -* Define age to become responsible as defined in the simulation -global age_become_responsible 16 - -* Set labour supply categories -global ls_cat "ZERO TWENTY FORTY FIFTY" -// works if the genders are symmetric -// still need to alter code in specific do files to print graphs - -global ls_cat_labour "TWENTY FORTY FIFTY" - -/******************************************************************************* -CALL WORKER DO FILES -*******************************************************************************/ - -* Prepare observed data -do "${dir_do_files}/02_prepare_EU_SILC_data.do" -do "${dir_do_files}/04_create_EU_SILC_validation_targets.do" - -* Prepare simulated data -do "${dir_do_files}/01_prepare_simulated_data.do" -do "${dir_do_files}/05_create_simulated_validation_targets.do" - -* Prepare EUROMOD data -//do "${dir_do_files}/07_create_euromod_validation_targets.do" - -* Plot figures -do "${dir_do_files}/06_01_plot_activity_status.do" -do "${dir_do_files}/06_02_plot_education_level.do" -do "${dir_do_files}/06_03_plot_gross_income.do" -do "${dir_do_files}/06_04_plot_gross_labour_income.do" -do "${dir_do_files}/06_05_plot_capital_income.do" -do "${dir_do_files}/06_07_plot_disposable_income.do" -do "${dir_do_files}/06_08_plot_equivalised_disposable_income.do" -do "${dir_do_files}/06_09_plot_hourly_wages.do" -do "${dir_do_files}/06_10_plot_hours_worked.do" -do "${dir_do_files}/06_11_plot_income_shares.do" -do "${dir_do_files}/06_12_plot_partnership_status.do" -do "${dir_do_files}/06_13_plot_health.do" -do "${dir_do_files}/06_14_plot_at_risk_of_poverty.do" -do "${dir_do_files}/06_15_plot_income_ratios.do" -do "${dir_do_files}/06_16_plot_number_children.do" -do "${dir_do_files}/06_17_plot_disability" - -* Calculate other statistics -do "${dir_do_files}/07_01_correlations.do" - diff --git a/validation/02_simulation_validation/do_files/00_master_simulation_validation_PL.do b/validation/02_simulation_validation/do_files/00_master_simulation_validation_PL.do new file mode 100644 index 0000000..317ef75 --- /dev/null +++ b/validation/02_simulation_validation/do_files/00_master_simulation_validation_PL.do @@ -0,0 +1,184 @@ +/******************************************************************************* +* PROJECT: SimPaths EU +* SECTION: Validation +* OBJECT: Master file - longitudinal SILC +* AUTHORS: Ashley Burdett, Mariia Vartuzova +* LAST UPDATE: 05/2026 (AB) +* COUNTRY: Poland +******************************************************************************** +* NOTES: This master do file organises do files used for validating +* SimPaths model using EU-SILC data for Poland. This version +* utilizes the unique input dataset to create the targets. +* +* *** Latest addition +* Adds a new "Plot alignment targets" section that calls +* 06_18_plot_alignment_targets.do to produce simulated-vs- +* target time-series plots (with run variability shown as a +* shaded area band) based on AlignmentAdjustmentFactors1.csv. +* The corresponding output folder alignment_targets is also +* created. +* +* Copy and paste the relevant simulated output data files into +* the data subfolder. +*******************************************************************************/ +clear all + +set logtype smcl +set more off +set mem 200m +set type double + + +/******************************************************************************* +* 1 - STATIC SET UP +*******************************************************************************/ + +/******************************************************************************* +* DEFINE GLOBALS +*******************************************************************************/ + +global country = "PL" +global country_lower = "pl" +display in y "Country selected: ${country}" + +global silc_UDB = "UDB_c" + + +/******************************************************************************* +* DEFINE DIRECTORIES +*******************************************************************************/ + +* Individual path +global dir_ind "/Users/ashleyburdett/Library/CloudStorage/Box-Box/CeMPA shared area/_SimPaths/_SimPathsEU" + +* Main folder path +global path "${dir_ind}/validation/02_simulation_output_validation/PL/longitudinal_SILC" + +global dir_do_files "$path/do_files_refactored" //folder where do-files are stored + +global dir_work "$path/data" +global dir_data "$path/data" + +global dir_init_pop_data "$dir_ind/input_processing/initial_populations/PL/data" + + +/******************************************************************************* +* DEFINE SAMPLE PARAMETERS +*******************************************************************************/ + +global use_assert "0" + +* Trim outliers +global trim_outliers true + +* Min age of individuals included in plots +global min_age 18 + +* Max age of individuals included in plots +global max_age 65 + +* Observations up to and including this simulated year will be kept in the sample +global min_year 2011 +global max_year 2023 + +* Define age to become responsible as defined in the simulation +global age_become_responsible 16 + +* Set labour supply categories +global ls_cat "ZERO TWENTY FORTY FIFTY" +global ls_cat_labour "TWENTY FORTY FIFTY" + +* Number of runs +global max_n_runs 3 + + +/******************************************************************************* +* CALL WORKER DO FILES +*******************************************************************************/ + +* Prepare SILC data +do "${dir_do_files}/03_create_EU_SILC_validation_targets.do" + + +/******************************************************************************* +* 2 - DYNAMIC SET UP +*******************************************************************************/ + +* List of SimPath Set ups to loop through +local alignments "output_refactored_runs3_popsize30000" + +foreach align in `alignments' { + + +/******************************************************************************* +* DEFINE DIRECTORIES +*******************************************************************************/ + + * Simulated data CSV files folder + global dir_simulated_data "$path/simulated_data/`align'/csv" + + * Graphs folder + global dir_output_files "$path/graphs/`align'" + + +/******************************************************************************* +* CREATE OUTPUT FOLDERS +*******************************************************************************/ + + capture mkdir "$path/graphs" + capture mkdir "$path/graphs/`align'" + capture mkdir "$path/graphs/`align'/alignment_targets" + capture mkdir "$path/graphs/`align'/children" + capture mkdir "$path/graphs/`align'/correlations" + capture mkdir "$path/graphs/`align'/disability" + capture mkdir "$path/graphs/`align'/economic_activity" + capture mkdir "$path/graphs/`align'/education" + capture mkdir "$path/graphs/`align'/health" + capture mkdir "$path/graphs/`align'/hours_worked" + capture mkdir "$path/graphs/`align'/income" + capture mkdir "$path/graphs/`align'/income/capital_income" + capture mkdir "$path/graphs/`align'/income/pension_income" + capture mkdir "$path/graphs/`align'/income/disposable_income" + capture mkdir "$path/graphs/`align'/income/equivalised_disposable_income" + capture mkdir "$path/graphs/`align'/income/gross_income" + capture mkdir "$path/graphs/`align'/income/gross_labour_income" + capture mkdir "$path/graphs/`align'/income/income_shares" + capture mkdir "$path/graphs/`align'/inequality" + capture mkdir "$path/graphs/`align'/partnership" + capture mkdir "$path/graphs/`align'/poverty" + capture mkdir "$path/graphs/`align'/wages" + +/******************************************************************************* +* RUN DO FILES +*******************************************************************************/ + + * Prepare simulated data + do "${dir_do_files}/01_prepare_simulated_data.do" + do "${dir_do_files}/02_create_simulated_validation_targets.do" + + * Prepare EUROMOD data + //do "${dir_do_files}/07_create_euromod_validation_targets.do" + + * Plot figures + do "${dir_do_files}/04_01_plot_activity_status.do" + do "${dir_do_files}/04_02_plot_education_level.do" + do "${dir_do_files}/04_03_plot_gross_income.do" + do "${dir_do_files}/04_04_plot_gross_labour_income.do" + do "${dir_do_files}/04_05_plot_capital_income.do" + do "${dir_do_files}/04_07_plot_disposable_income.do" + do "${dir_do_files}/04_08_plot_equivalised_disposable_income.do" + do "${dir_do_files}/04_09_plot_hourly_wages.do" + do "${dir_do_files}/04_10_0_plot_hours_worked.do" + do "${dir_do_files}/04_10_1_plot_hours_worked_discrete.do" + do "${dir_do_files}/04_11_plot_income_shares.do" + do "${dir_do_files}/04_12_plot_partnership_status.do" + do "${dir_do_files}/04_13_plot_health.do" + do "${dir_do_files}/04_14_plot_at_risk_of_poverty.do" + do "${dir_do_files}/04_15_plot_inequality.do" + do "${dir_do_files}/04_16_plot_number_children.do" + do "${dir_do_files}/04_17_plot_disability.do" + + do "${dir_do_files}/04_18_plot_alignment_targets.do" + +} + diff --git a/validation/02_simulation_validation/do_files/01_prepare_simulated_data.do b/validation/02_simulation_validation/do_files/01_prepare_simulated_data.do index 27209ed..1fd1448 100644 --- a/validation/02_simulation_validation/do_files/01_prepare_simulated_data.do +++ b/validation/02_simulation_validation/do_files/01_prepare_simulated_data.do @@ -4,153 +4,68 @@ * OBJECT: Simulation output pre-processing * AUTHORS: Patryk Bronka, Ashley Burdett * LAST UPDATE: 06/2025 (AB) -* COUNTRY: Poland +* COUNTRY: PL * NOTES: ******************************************************************************** +local align "output_refactored_runs3_popsize30000" + +global dir_simulated_data "$path/simulated_data/`align'/csv" + * Import required variables from household file -import delimited "$dir_simulated_data/Household.csv", clear +import delimited "$dir_simulated_data/Household.csv", case(preserve) clear -keep run time id_household +keep run time id_Household -rename id_household idhousehold +rename id_Household idHh keep if time <= ${max_year} save "$dir_data/household_validation", replace + * Import required variables from benefit unit file -import delimited "$dir_simulated_data/BenefitUnit.csv", clear +import delimited "$dir_simulated_data/BenefitUnit.csv", case(preserve) clear -keep run time idhousehold id_benefitunit disposableincome /// - equivaliseddisposableincomeyearl grossincomemonthly dhhtp_c4 region +keep run time idHh id_BenefitUnit yDispMonth yDispEquivYear yGrossMonth -rename id_benefitunit idbenefitunit +rename id_BenefitUnit idBu keep if time <= ${max_year} save "$dir_data/benefitunit_validation", replace -* Import required variables from person file -import delimited "$dir_simulated_data/Person.csv", clear -keep run time id_person idpartner idmother idfather dcpst idbenefitunit dag /// - les_c4 deh_c3 dhe dgn ypnbihs_dv yplgrs_dv ypncp ypnoab /// - fulltimehourlyearningspotential laboursupplyweekly hoursworkedweekly /// - dlltsd ded +* Import required variables from person file +import delimited "$dir_simulated_data/Person.csv", case(preserve) clear + +keep run time id_Person idPartner idBu idMother idFather demAge /// + demMaleFlag demPartnerStatus labC4 eduHighestC4 /// + healthDsblLongtermFlag healthSelfRated /// + yNonBenPersGrossMonth yEmpPersGrossMonth yCapitalPersMonth /// + yPensPersGrossMonth yMiscPersGrossMonth /// + labHrsWorkWeek labHrsWorkEnumWeek labHrsWorkWeek /// + labWageHrly -rename id_person idperson -rename idmother id_mother -rename idfather id_father +rename id_Person idPers keep if time <= ${max_year} save "$dir_data/person_validation", replace + * Merge data sets use "$dir_data/person_validation", clear -merge m:1 run time idbenefitunit using "$dir_data/benefitunit_validation", /// +merge m:1 run time idBu using "$dir_data/benefitunit_validation", /// nogen keep(matched) -save "$dir_data/baseline_validation", replace - -order run time id* - -* Destring vars -destring yplgrs_dv ypnbihs_dv ypncp ypnoab, replace ignore("null") - -* Label vars -capture label var run "Simulation repetition number" -capture label var time "year" -capture label var idhousehold "Household ID" -capture label var idbenefitunit "Benefit unit ID" -capture label var idfemale "Benefit unit responsible female ID" -capture label var idmale "Benefit unit responsible male ID" -capture label var idperson "Person ID" -capture label var idfather "Father ID" -capture label var idmother "Mother ID" -capture label var idpartner "Partner ID" -capture label var atriskofpoverty /// - "At risk of poverty indicator using 60% of median income" -capture label var dhhtp_c4 "Benefit unit composition" -capture label var disposableincomemonthly /// - "Disposable income, benefit unit, monthly" -capture label var equivaliseddisposableincomeyearl /// - "Equivalised disposable income, benefit unit, yearly" -capture label var occupancy "Benefit unit occupancy (responsible persons)" -capture label var region "Region" -capture label var size "Benefit unit size" -capture label var ydses_c5 /// - "Benefit unit gross equivalised normalised income quintile" -capture label var adultchildflag "Adult child living at home" -capture label var dag "Age" -capture label var dcpagdf "Difference in age between partners" -capture label var dcpen "Entered partnership" -capture label var dcpex "Exited partnership " -capture label var dcpst "Partnership status" -capture label var dcpyy "years in partnership" -capture label var ded "In continuous education" -capture label var deh_c3 "Education level" -capture label var dehf_c3 "Father's education level" -capture label var dehm_c3 "Mother's education level" -capture label var dehsp_c3 "Partner's education level'" -capture label var der "Returned to education" -capture label var dgn "Gender" -capture label var dhe "Self-rated health" -capture label var dhm "Pscyhological distress score" -capture label var dhm_ghq "Psychological distress case" -capture label var dhesp "Partner's self-rated health" -capture capture label var dlltsd "Long-term sick / disabled" -capture label var sedex "Left education this year" -capture capture label var women_fertility /// - "Indicator, women aged 18 to 44 who can have children" -capture label var laboursupplyweekly /// - "Discretized hours of labour supply, weekly" -capture label var hoursworkedweekly "Continuous hours of labour supply, weekly" -capture label var les_c4 "Activity status" -capture label var lessp_c4 "Partner's activity status" -capture label var lesdf_c4 /// - "Own and partner's activity status (only if partnered)" -capture label var fulltimehourlyearningspotential /// - "Potential (model-based) hourly gross wage" -capture label var sindex "Security index (5-year lead)" -capture label var sindexnormalised "Normalised security index (5-year lead)" -capture label var scaling_factor /// - "Scaling factor (one individual represents this many in population)" -capture label var ynbcpdf_dv /// - "Difference between (asinh of) own and spouse's gross personal non-benefit income" -capture gen yplgrs_dv_lvl = sinh(yplgrs_dv) -capture label var yplgrs_dv_lvl "Gross personal employment income, in " -capture label var yplgrs_dv "Gross personal employment income, in €, asinh" -capture destring ypnbihs_dv, force replace -capture gen ypnbihs_dv_lvl = sinh(ypnbihs_dv) -capture label var ypnbihs_dv_lvl "Gross personal non-benefit income, in €" -capture label var ypnbihs_dv "Gross personal non-benefit income, in €, asinh" -capture gen ypncp_lvl = sinh(ypncp) -capture label var ypncp_lvl "Capital income, in €" -capture label var ypncp "Capital income, in €, asinh" -capture gen ypnoab_lvl = sinh(ypnoab) -capture label var ypnoab_lvl "Pension income, in €" -capture label var ypnoab "Pension income, in €, asinh" -capture gen yptciihs_dv_lvl = sinh(yptciihs_dv) -capture label var yptciihs_dv_lvl "Gross personal non-employment, in €" -capture label var grossincomemonthly "Gross monthly benefit unit income, in €" -capture label define dhe_lbl 1 "Poor" 2 "Fair" 3 "Good" 4 "VeryGood" /// - 5 "Excellent" - -* Renaming vars -rename dhe dhe2 -encode dhe2, gen(dhe) label(dhe_lbl) -drop dhe2 -capture drop weight - -rename equivaliseddisposableincomeyearl equivalisedincome -rename time year - -* Select runs -keep if run <= 3 - -save "$dir_data/simulated_data_prep1.dta", replace +save "$dir_data/loaded_simulation_data.dta", replace + + +* Tidy up +erase "${dir_data}/household_validation.dta" +erase "${dir_data}/person_validation.dta" +erase "${dir_data}/benefitunit_validation.dta" diff --git a/validation/02_simulation_validation/do_files/02_create_simulated_validation_targets.do b/validation/02_simulation_validation/do_files/02_create_simulated_validation_targets.do new file mode 100644 index 0000000..f23c810 --- /dev/null +++ b/validation/02_simulation_validation/do_files/02_create_simulated_validation_targets.do @@ -0,0 +1,309 @@ +******************************************************************************** +* PROJECT: SimPaths EU +* SECTION: Validation +* OBJECT: Validation targets +* AUTHORS: Ashley Burdett +* LAST UPDATE: 11/2025 (AB) +* COUNTRY: PL +* DESCRIPTION: This file creates variables from the simulated data +* that are used to generate in the comparison plots. +******************************************************************************** +* NOTES: Income amounts are converted from IHS to levels and from +* monthly to annual. +* Two datasets are saved, one containing all observations +* (..._full) and one containing only the adult population +* (18-65 inc). +*******************************************************************************/ + +* Generate/Tidy required variables + +* Load simulated panel +use "$dir_data/loaded_simulation_data.dta", clear + + +** TIME + +rename time year + + +** DEMOGRAPHICS + +* Gender +gen demMaleFlag_coded = . +replace demMaleFlag_coded = 1 if demMaleFlag == "Male" +replace demMaleFlag_coded = 0 if demMaleFlag == "Female" + +drop demMaleFlag +rename demMaleFlag_coded demMaleFlag + + +* Age +* Define age groups +gen ageGroup = . +replace ageGroup = 0 if demAge >= 0 & demAge < 15 +replace ageGroup = 1 if demAge >= 15 & demAge < 20 +replace ageGroup = 2 if demAge >= 20 & demAge < 25 +replace ageGroup = 3 if demAge >= 25 & demAge < 30 +replace ageGroup = 4 if demAge >= 30 & demAge < 35 +replace ageGroup = 5 if demAge >= 35 & demAge < 40 +replace ageGroup = 6 if demAge >= 40 & demAge < 60 +replace ageGroup = 7 if demAge >= 60 & demAge < 80 +replace ageGroup = 8 if demAge >= 80 & demAge <= 100 + +label def ageGroup /// + 0 "ageGroup_0_14" /// + 1 "ageGroup_15_19" /// + 2 "ageGroup_20_24" /// + 3 "ageGroup_25_29" /// + 4 "ageGroup_30_34" /// + 5 "ageGroup_35_39" /// + 6 "ageGroup_40_59" /// + 7 "ageGroup_60_79" /// + 8 "ageGroup_80_100" /// + +label val ageGroup ageGroup + +gen ageGroup2 = . +replace ageGroup2 = 0 if demAge >= 16 & demAge < 25 +replace ageGroup2 = 1 if demAge >= 25 & demAge < 30 +replace ageGroup2 = 2 if demAge >= 30 & demAge < 35 +replace ageGroup2 = 3 if demAge >= 35 & demAge < 40 +replace ageGroup2 = 4 if demAge >= 40 & demAge < 45 +replace ageGroup2 = 5 if demAge >= 45 & demAge < 50 +replace ageGroup2 = 6 if demAge >= 50 & demAge < 55 +replace ageGroup2 = 7 if demAge >= 55 & demAge < 60 +replace ageGroup2 = 8 if demAge >= 60 & demAge <= 65 + +label def ageGrouplb2 /// + 0 "ageGroup_16_24" /// + 1 "ageGroup_25_29" /// + 2 "ageGroup_30_34" /// + 3 "ageGroup_35_39" /// + 4 "ageGroup_40_44" /// + 5 "ageGroup_45_49" /// + 6 "ageGroup_50_54" /// + 7 "ageGroup_55_59" /// + 8 "ageGroup_60_65" /// + +label val ageGroup2 ageGrouplb2 + + +* Max benefit unit age +bys run year idBu: egen demAgeBuMax = max(demAge) + +* Partnership status +gen demPartnerStatus_coded = . +replace demPartnerStatus_coded = 1 if demPartnerStatus == "Partnered" +replace demPartnerStatus_coded = 2 if demPartnerStatus == "Single" + +drop demPartnerStatus +rename demPartnerStatus_coded demPartnerStatus + +gen sim_partnered = (demPartnerStatus == 1) +gen sim_single = (demPartnerStatus == 2) + +replace idPartner = "" if idPartner == "null" +destring idPartner , replace +gen sim_has_partner = (idPartner != .) + + +* Number of children +gen child = (demAge < ${age_become_responsible}) +bys run year idBu: egen sim_demNChild = total(child) + +gen child02 = (demAge < 3) +bys run year idBu: egen sim_demNChild0to2 = total(child02) + +gen child00 = (demAge < 1) +bys run year idBu: egen sim_demNChild0 = total(child00) + +gen sim_new_born = (demMaleFlag == 0 & sim_demNChild0 > 0 & /// + sim_demNChild0 != . & demAge >= 18) + +gen children_0 = (sim_demNChild == 0) +gen children_1 = (sim_demNChild == 1) +gen children_2 = (sim_demNChild == 2) +gen children_3plus = (sim_demNChild >= 3) + + +* Interact partnership status and number of children +foreach var1 in sim_partnered sim_single { + + foreach var2 in children_0 children_1 children_2 children_3plus { + + gen `var1'_`var2' = (`var1' & `var2') + + } +} + + +** EDUCATION +* Education dummies +gen sim_edu_na = (eduHighestC4 == "NotAssigned") +gen sim_edu_high = (eduHighestC4 == "High") +gen sim_edu_med = (eduHighestC4 == "Medium") +gen sim_edu_low = (eduHighestC4 == "Low") + + +** HEALTH +* Disabled / LT sick +gen sim_healthDsblLongtermFlag = (healthDsblLongtermFlag == "True") + +* Self rated heath +gen sim_healthSelfRated = . +replace sim_healthSelfRated = 1 if healthSelfRated == "Poor" +replace sim_healthSelfRated = 2 if healthSelfRated == "Fair" +replace sim_healthSelfRated = 3 if healthSelfRated == "Good" +replace sim_healthSelfRated = 4 if healthSelfRated == "VeryGood" +replace sim_healthSelfRated = 5 if healthSelfRated == "Excellent" + +lab def sim_healthSelfRated 1 "Poor" 2 "Fair" 3 "Good" 4 "VeryGood" /// + 5 "Excellent" + +lab val sim_healthSelfRated sim_healthSelfRated + + +** LABOUR MARKET + +* Economic activity dummies +gen sim_employed = (labC4 == "EmployedOrSelfEmployed") +gen sim_student = (labC4 == "Student") +gen sim_inactive = (labC4 == "NotEmployed") +gen sim_retired = (labC4 == "Retired") + + +* Hours worked weekly (continuous) +gen sim_labHrsWorkWeek = labHrsWorkWeek + + +* Hours worked weekly (categories) +/* +For Poland these categries are symmetric cross genders. +*/ +* Address generic lables +replace labHrsWorkEnumWeek = "ZERO" if labHrsWorkEnumWeek == "null" +replace labHrsWorkEnumWeek = "TWENTY" if labHrsWorkEnumWeek == "CATEGORY_PL_1" +replace labHrsWorkEnumWeek = "FORTY" if labHrsWorkEnumWeek == "CATEGORY_PL_2" +replace labHrsWorkEnumWeek = "FIFTY" if labHrsWorkEnumWeek == "CATEGORY_PL_3" + +gen sim_labHrsWorkEnumWeek = labHrsWorkEnumWeek + +gen sim_labHrsWorkEnum_no = . +replace sim_labHrsWorkEnum_no = 0 if labHrsWorkEnumWeek == "ZERO" +replace sim_labHrsWorkEnum_no = 20 if labHrsWorkEnumWeek == "TWENTY" +replace sim_labHrsWorkEnum_no = 40 if labHrsWorkEnumWeek == "FORTY" +replace sim_labHrsWorkEnum_no = 50 if labHrsWorkEnumWeek == "FIFTY" + +* Categorical variable +gen sim_cat_hours = . +replace sim_cat_hours = 1 if labHrsWorkEnumWeek == "ZERO" +replace sim_cat_hours = 2 if labHrsWorkEnumWeek == "TWENTY" +replace sim_cat_hours = 3 if labHrsWorkEnumWeek == "FORTY" +replace sim_cat_hours = 4 if labHrsWorkEnumWeek == "FIFTY" + +tab labHrsWorkEnumWeek year + + +* Hourly wage +gen sim_pred_wage = labWageHrly + + +** INCOME (ANNUAL) +/* +Amounts of personal income stored with the IHS trasnformation. +Benefit Unit level measure (gross and dispoable income) are stored without the +transformation. +*/ + +* Destring individual amounts +destring yNonBenPersGrossMonth yEmpPersGrossMonth yCapitalPersMonth /// + yPensPersGrossMonth, replace ignore("null" "NaN") + + +* Annual individual gross employment income +* Convert to levels +gen yEmpPersGrossLevelMonth = sinh(yEmpPersGrossMonth) +* Convert to annual +gen sim_yEmpPersGrossLevelYear = yEmpPersGrossLevelMonth * 12 + +* Annual benefit unit gross employment income +bys run year idBu: egen sim_yEmpBuGrossLevelYear = /// + total(sim_yEmpPersGrossLevelYear) + + +* Annual individual capital income +* Convert to levels +gen yCapitalPersLevelMonth = sinh(yCapitalPersMonth) +* Convert to annual +gen sim_yCapitalPersLevelYear = yCapitalPersLevelMonth * 12 + +* Annual benefit unit capital income +bys run year idBu: egen sim_yCapitalBuLevelYear = /// + total(sim_yCapitalPersLevelYear) + + +* Annual individual gross private pension income +* Convert to levels +gen yPensPersGrossLevelMonth = 0 +* Convert to annual +gen sim_yPensPersGrossLevelYear = yPensPersGrossLevelMonth * 12 + +* Annual benefit unit gross private pension income +bys run year idBu: egen sim_yPensBuGrossLevelYear = /// + total(sim_yPensPersGrossLevelYear) + + +* Annual individual gross non-benefit income +* Converts to levels +gen yNonBenPersGrossLevelMonth = sinh(yNonBenPersGrossMonth) +* Convert to annual +gen sim_yNonBenPersGrossLevelYear = sim_yPensPersGrossLevelYear + /// + sim_yCapitalPersLevelYear + sim_yEmpPersGrossLevelYear + +// gen sim_yNonBenPersGrossLevelYear = yNonBenPersGrossLevelMonth * 12 + +* Annual benefit unit gross non-benefit income +gen sim_yNonBenBuGrossLevelYear = sim_yPensBuGrossLevelYear + /// + sim_yCapitalBuLevelYear + sim_yEmpBuGrossLevelYear + + +* Annual benefit unit gross income (level, non-benefit) +/* +Note this should be the same as sim_yNonBenBuGrossLevelYear +*/ +gen sim_yGrossBuLevelMonth = yGrossMonth +gen sim_yGrossBuLevelYear = sim_yGrossBuLevelMonth * 12 + +* Check +gen diff = sim_yGrossBuLevelYear - sim_yNonBenBuGrossLevelYear + +hist diff // very close + +replace sim_yGrossBuLevelYear = sim_yNonBenBuGrossLevelYear + +gen sim_yGrossPersLevelYear = sim_yNonBenPersGrossLevelYear + + +* Annual benefit unit disposable (level) +gen yDispBuLevelMonth = yDispMonth +gen sim_yDispBuLevelYear = yDispBuLevelMonth * 12 + + +* Annual benefit unit equivlaized disposable income (BU, level) +gen sim_yDispEquivYear = yDispEquivYear + + +* Benefit unit - Net transfers +gen sim_net_transfers = sim_yDispBuLevelYear - sim_yNonBenBuGrossLevelYear + + +* Restrict sample to relevant valdiation years +keep if year >= ${min_sim_year} +keep if year <= ${max_sim_year} + +drop diff + +save "$dir_data/simulation_sample.dta", replace + + +graph drop _all diff --git a/validation/02_simulation_validation/do_files/02_prepare_EU_SILC_data.do b/validation/02_simulation_validation/do_files/02_prepare_EU_SILC_data.do deleted file mode 100644 index 6b8ecc4..0000000 --- a/validation/02_simulation_validation/do_files/02_prepare_EU_SILC_data.do +++ /dev/null @@ -1,32 +0,0 @@ -/******************************************************************************* -* PROJECT: SimPaths EU -* SECTION: Validation -* OBJECT: SILC data pre-processing -* AUTHORS: Daria Popova, Ashley Burdett -* LAST UPDATE: Nov 2025 (AB) -* COUNTRY: Poland - -* NOTES: This file computes the additional variables needed from SILC -* that are not in the initial populations data. -* - disposable income -* -*******************************************************************************/ - -* Annual individual real disposable income - -use "$dir_init_pop_data/02_pre_drop.dta", clear - -gen valid_y_disp_ind_yr = (hy020/hhsize - py021g) - -replace valid_y_disp_ind_yr = (hy020/hhsize) if py021g == . - -replace valid_y_disp_ind_yr = 0 if valid_y_disp_ind_yr < 0 -assert valid_y_disp_ind_yr >= 0 - -replace valid_y_disp_ind_yr = valid_y_disp_ind_yr / (CPI/100) - -keep valid_y_disp_ind_yr idperson swv - - - -save "$dir_data/silc_ind_dispos_y.dta", replace diff --git a/validation/02_simulation_validation/do_files/03_create_EU_SILC_benefit_units.do b/validation/02_simulation_validation/do_files/03_create_EU_SILC_benefit_units.do deleted file mode 100644 index 7651702..0000000 --- a/validation/02_simulation_validation/do_files/03_create_EU_SILC_benefit_units.do +++ /dev/null @@ -1,446 +0,0 @@ -******************************************************************************** -* PROJECT: ESPON -* SECTION: Validation -* OBJECT: Benefit unit creation -* AUTHORS: Daria Popova, Ashley Burdett -* LAST UPDATE: Feb 2025 (AB) -* COUNTRY: Poland - -* DESCRIPTION: Screens data and identifies benefit units - -* NOTES: -******************************************************************************** - -cd "${dir_data}" - -use "${country}-eusilc_validation_sample_prep1.dta", clear - -fre swv - - -/********************** Rules and assumptions ********************************** -1. Each HH can contain: Responsible Male, and/or Responsible Female, Children, -Other members. In the simulation everyone starts as "Other member" and is -assigned one of the roles in the HH. - - 1.1. Responsible male and female create a partnership couple leading the HH. - Any additional couple creates new HH. A couple with / composed of people - under the age to leave home (18) will still leave together and set up a new - HH. - - 1.1.1. Children should follow the mother if she's moving to a new HH. - - 1.2. After the above there should be only singles left in addition to the - leading couple. If they are above 18, they will leave and set up their own - HH. - - 1.3. After the above there should only be children left in addition to the - original HH. Children will live with mother if defined in the data, - otherwise with father. If neither exists, they will be considered as - orphans. - - 1.4. Orphans are assigned a woman or a man from the household in which they - live as a parent. -*/****************************************************************************** - -* Adult is defined as 18 or over, or if married or has their own kids -* (DP: last condition added to avoid splitting kids from their teenager parents) -gen child = dag < $age_become_responsible & dcpst!=1 - -* Count number of dep children of each person -* For mother -count -preserve -sort swv idhh idperson -save "$dir_data/motherinfo.dta", replace - -keep swv idhh idmother child -rename idmother idperson -bysort swv idperson: egen int n_child_mother = total(child) - //number of dependent children who have this idmother - -duplicates drop swv idperson, force -drop child -save "$dir_data/motherinfo.dta", replace -restore - -sort swv idhh idperson -merge m:1 swv idhh idperson using "$dir_data/motherinfo.dta" -fre _merge -drop if _merge == 2 -drop _merge -count -recode n_child_mother (. = 0) - -* For father -count -preserve -sort swv idhh idperson -save "$dir_data/fatherinfo.dta", replace - -keep swv idhh idfather child -rename idfather idperson -bysort swv idperson: egen int n_child_father = total(child) - //number of dependent children who have this idfather - -duplicates drop swv idperson, force -drop child -save "$dir_data/fatherinfo.dta", replace -restore - -sort swv idhh idperson -merge m:1 swv idhh idperson using "$dir_data/fatherinfo.dta" -fre _merge -drop if _merge == 2 -drop _merge -count -recode n_child_father (. = 0) - -gen n_child = n_child_mother+n_child_father - //n of kids this individual has ==> no double count because father's kids - //will be in their line while mothers kids will be in their line - -sum n_child_mother if n_child_mother > 0 -sum n_child_father if n_child_father > 0 -sum n_child if n_child > 0 - -order swv idhh idperson idpartner idmother idfather dag n_child, last - -count if child == 1 & n_child > 0 //42 obs who are kids but have their own kids -replace child = 0 if n_child > 0 //rule out - -gen adult = 1 - child -sum child adult //81% adults - -gen partnered = (idpartner > 0) - -* Check if there are hhlds without adult -assert child == 1 - adult -cap drop num_adults -bys swv idhh: egen num_adults = total(adult) -fre num_adults //8 obs have no adults - -fre swv idhh if num_adults == 0 - tab dag if num_adults == 0 //==> these are 16/17 yo kids living alone - -* Define benefit units -cap gen long idbenefitunit = . -cap gen long idbupartner = . -format idbenefitunit %19.0g -format idbupartner %19.0g - -** Create benefit units -* Assign first couples -order swv idhh idbenefitunit idbupartner idperson idpartner idmother /// - idfather dag n_child, last - -gsort swv idhh -partnered -dag idperson / -/* -sort hh members in descending order by partnership status and then age -(this ensures that partnered adults go first) -*/ - -bys swv idhh: replace idbenefitunit = idperson[1] - //oldest person becomes head of first benefit unit - -bys swv idhh: replace idbupartner = idpartner[1] - //partner of oldest person becomes first benefit unit partner - -replace idbupartner = . if idbupartner == -9 - -* Remove those who do not belong to first benefit unit -replace idbupartner = . if (adult == 1 & idperson != idbenefitunit & /// - idpartner != idbenefitunit) - //remove partner id for those who are not head or partner - -replace idbenefitunit = . if (adult == 1 & idperson != idbenefitunit & /// - idpartner != idbenefitunit) - //remove other adults who are not head or partner - -replace idbupartner = . if (child == 1 & idfather != idbenefitunit & /// - idmother != idbenefitunit & ((idfather != idbupartner & /// - idmother != idbupartner) | idbupartner < 0)) - //remove partner id for kids that are not head's or partner's - -replace idbenefitunit = . if (child == 1 & idfather != idbenefitunit & /// - idmother != idbenefitunit & ((idfather != idbupartner & /// - idmother != idbupartner) & idbupartner > 0)) - //remove kids that are not head's or partner's - -* Create new ben unit for single adults in hh -replace idbenefitunit = idperson if (missing(idbenefitunit) & adult == 1 & /// - (missing(idpartner) | idpartner < 0)) - -// Loop over number of extra couples residing in the hh that constitute an -// additional benefit unit -forvalues i = 1/3 { - - //count if adult == 1 & idbenefitunit == . & partnered == 1 - //8 partnered adults still have no benefit unit - //list idperson if adult == 1 & idbenefitunit == . & partnered == 1 - - gsort swv idhh -partnered idbenefitunit -dag idperson - //sort so those with a partner who have been assigned a ben unit come - //before those who haven't - - - bys swv idhh: replace idbenefitunit = idperson if /// - (missing(idbenefitunit) & adult == 1 & !missing(idbenefitunit[_n-1]) & /// - idpartner != idbenefitunit[_n-1]) & idbupartner[_n-1] != . - //added the final condition to ensure only one additional benefit unit - //is created - - //ensure the partner is assigned to the same benefit unit if don't align - //by age - replace idbupartner = idpartner if (missing(idbupartner) & /// - idbenefitunit == idperson & !missing(idpartner) & idpartner > 0) - - replace idbenefitunit = idbenefitunit[_n-1] if idhh == idhh[_n-1] & /// - swv == swv[_n-1] & idbenefitunit == . // fill down idbu for household - - replace idbenefitunit = . if child == 1 & partnered == 0 & /// - idbenefitunit != idmother & idbenefitunit != idfather - //keep ben unit info for children of the first benefit unit - - replace idbenefitunit = . if idbenefitunit != idpartner & /// - idbupartner == . & partnered == 1 - //remove if this individual is in a partnership and the id doesn't - //correspond to their partner - - replace idbupartner = idperson if idbenefitunit != . & /// - idbupartner == . & partnered == 1 - //the person who caught the last condition is the benefit units partner - -} - -* Done? Any unassigned couples? -count if adult == 1 & idbenefitunit == . & partnered == 1 - // 0 partnered adults still have no benefit unit -recode idbupartner (. = -9) - -* Check if all adults are assigned to ben units -count if adult == 1 & idbenefitunit == . // 0 -assert idbenefitunit != . if adult == 1 - -** Assign remaining children to benefit unit -* Assign children to their mothers' benunits (where they are heads or partners) -forvalues i = 1/20 { - replace idbenefitunit = idbenefitunit[_n-`i'] if idmother > 0 & /// - missing(idbenefitunit) & child == 1 & /// - (idmother == idbenefitunit[_n-`i'] | idmother == idbupartner[_n-`i']) -} - -* If some kids are still not assigned - assign them to father's ben units -forvalues i = 1/20 { - replace idbenefitunit = idbenefitunit[_n-`i'] if idfather > 0 & /// - missing(idbenefitunit) & child == 1 & /// - (idfather == idbenefitunit[_n-`i'] | idfather == idbupartner[_n-`i']) -} - -* Check if all kids are assigned -count if child == 1 & idbenefitunit == . //864 kids are still not assigned - -* Address orphans -*bys swv idhh: replace idbenefitunit = idbenefitunit[1] if /// -* missing(idbenefitunit) & orphan ==1 - //for now assign orphans to the first benunit -cap drop orphan -gen orphan = (idfather < 0 & idmother < 0 & child == 1) -fre orphan if idbenefitunit == . //864 => all remaining are orphans - -cap drop n_orphan -bys stm idhh: egen n_orphan = sum(orphan) - //count N of orphans per household ==> can be up to 10! -fre n_orphan -/* -n_orphan - - | Freq. Percent Valid Cum. ---------------+-------------------------------------------- -Valid 0 | 508512 99.51 99.51 99.51 - 1 | 1906 0.37 0.37 99.88 - 2 | 352 0.07 0.07 99.95 - 3 | 107 0.02 0.02 99.97 - 4 | 48 0.01 0.01 99.98 - 5 | 56 0.01 0.01 99.99 - 6 | 9 0.00 0.00 100.00 - 7 | 10 0.00 0.00 100.00 - 10 | 12 0.00 0.00 100.00 - Total | 511012 100.00 100.00 ------------------------------------------------------------ - -*/ -order stm idhh idperson idpartner idfather idmother dag dgn adult orphan /// - n_orphan, last - -* Create variables storing ages for all orphans in hh -preserve -keep if n_orphan > 0 -keep stm idhh idperson idpartner idfather idmother dag dgn adult orphan n_orphan -keep if orphan == 1 -bys stm idhh: gen orphan_number = _n if orphan == 1 - -// Loop over each possible orphan and create corresponding age variables -forvalues i = 1/10 { - bys stm idhh: egen temp_dag_orphan`i' = sum(dag) if orphan_number == `i' - bys stm idhh: egen dag_orphan`i' = sum(temp_dag_orphan`i') - drop temp_dag_orphan`i' -} - -save "$dir_data/orphans.dta", replace -restore - -count - -* Add info on orphan's age to the main dataset -merge 1:1 stm idhh idperson using "$dir_data/orphans.dta", /// - keepusing(dag_orphan* orphan_number) -keep if _merge==1 | _merge ==3 -drop _merge -count - -sum n_orphan -local max_orphan = r(max) - -// Loop over the number of orphans in a hh -forvalues i = 1/`max_orphan' { - - * Create age difference between them and each adult in the hh - gen temp_target_age`i' = dag_orphan`i'+20 if dag_orphan`i' > 0 - bys stm idhh: egen target_age`i' = mean(temp_target_age`i') - gen agediff`i' = abs(dag -target_age`i') if adult == 1 - - * Select new parent for each orphan who's age is closest to target age - sort stm idhh agediff`i' idperson - by stm idhh: gen newparent`i' = _n - by stm idhh: replace newparent`i' = 0 if _n > 1 - replace newparent`i' = . if n_orphan == 0 - - drop dag_orphan`i' temp_target_age`i' target_age`i' agediff`i' - - * Assign this parent's idperson as orphan's idmother or idfather - cap drop temp_idmother_orphan`i' - gen double temp_idmother_orphan`i' = idperson if newparent`i' == 1 & /// - dgn == 0 - bys stm idhh: egen idmother_orphan`i' = max(temp_idmother_orphan`i') - format idmother_orphan`i' %19.0g - replace idmother_orphan`i' = 0 if orphan == 0 - drop temp_idmother_orphan`i' - - cap drop temp_idfather_orphan`i' - gen double temp_idfather_orphan`i' = idperson if newparent`i' == 1 & /// - dgn == 1 - bys stm idhh: egen idfather_orphan`i' = max(temp_idfather_orphan`i') - format idfather_orphan`i' %19.0g - replace idfather_orphan`i' = 0 if orphan == 0 - drop temp_idfather_orphan`i' - -} - -* Create newidmother for orphans -cap gen newidmother = . -cap gen newidfather = . - -// Loop over the number of orphan in a hh -forvalues i = 1/`max_orphan' { - replace newidmother = idmother_orphan`i' if orphan_number == `i' - replace newidfather = idfather_orphan`i' if orphan_number == `i' -} - -format newidmother %19.0g -format newidfather %19.0g - -* Replace idmother/idfather of former orphans -replace idmother = newidmother if orphan == 1 -replace idfather = newidfather if orphan == 1 - -drop newparent* idmother_orphan* idfather_orphan* - -* Assign orphans to their new mothers' benunits (they are heads or partners) -gsort swv idhh -dag -partnered -idperson - -forvalues i = 1/13 { - replace idbenefitunit = idbenefitunit[_n-`i'] if idmother > 0 & /// - missing(idbenefitunit) & orphan == 1 & /// - (idmother == idbenefitunit[_n-`i'] | idmother == idbupartner[_n-`i']) -} - -* If some orphans are still not assinged - asign them to father's benunits -forvalues i = 1/13 { - replace idbenefitunit = idbenefitunit[_n-`i'] if idfather > 0 & /// - missing(idbenefitunit) & orphan == 1 & /// - (idfather == idbenefitunit[_n-`i'] | idfather == idbupartner[_n-`i']) -} - -bys stm idhh idbenefitunit: egen temp_idbupartner = max(idbupartner) -fre temp_idbupartner if orphan == 1 -replace idbupartner = temp_idbupartner if orphan == 1 - //fill in benefit unit partner id - -* Assign them a second parent if first parent partnered -replace idfather = idbenefitunit if idmother == idbupartner & orphan == 1 -replace idfather = idbupartner if idmother == idbenefitunit & orphan == 1 - -** Run checks -* Check if all kids are assigned -count if idbenefitunit == . // 1 -count if child == 1 & idbenefitunit == . //2 obs are still not assigned -count if child == 1 & idbenefitunit == . & orphan == 1 - -drop if child == 1 & idbenefitunit == . & orphan == 1 - -fre adult child orphan dag if idbenefitunit == . -fre idperson if orphan == 1 & missing(idbenefitunit) - -* Check if everyone is assigned to benunits at this point -count if idbenefitunit == . -assert idbenefitunit != . -assert idbenefitunit != -9 - -* Check that everyone in benuint has the same benunit partner id assigned -//AB: idbupartner added as a secondary sorting variable to ensure replicability -assert idbupartner != . -bys swv idbenefitunit (idbupartner): replace idbupartner = idbupartner[1] if /// - idbupartner != idbupartner[1] - //(8,024 real changes made) this applies to children missing other parents - //information -assert idbupartner != idbenefitunit - -* Screen out benefit units with multiple adults of same sex -gen adultMan = adult * (dgn == 1) -gen adultWoman = adult * (dgn == 0) -gsort swv idbenefitunit -bys swv idbenefitunit: egen sumMen = sum(adultMan) -bys swv idbenefitunit: egen sumWomen = sum(adultWoman) -tab swv sumMen -tab swv sumWomen -if ${use_assert} assert sumMen < 2 & sumWomen < 2 - //this is not the case I suppose because we kept same sex couples - -* Check for duplicates in terms of swv and idperson -duplicates report swv idperson -duplicates report stm idperson //no such cases - -sort swv idbenefitunit idperson - -drop child n_child_mother n_child_father partnered num_adults n_child orphan /// - adult n_orphan orphan_number newidmother newidfather temp_idbupartner /// - adultMan adultWoman sumMen sumWomen - -save "$dir_data/${country}-eusilc_validation_sample_prep2.dta", replace - - -******************************************************************************** -* Clean-up and exit -******************************************************************************** -#delimit ; -local files_to_drop - motherinfo.dta - fatherinfo.dta - orphans.dta - ; -#delimit cr // cr stands for carriage return - -foreach file of local files_to_drop { - erase "$dir_data/`file'" -} diff --git a/validation/02_simulation_validation/do_files/03_create_EU_SILC_validation_targets.do b/validation/02_simulation_validation/do_files/03_create_EU_SILC_validation_targets.do new file mode 100644 index 0000000..52c79f4 --- /dev/null +++ b/validation/02_simulation_validation/do_files/03_create_EU_SILC_validation_targets.do @@ -0,0 +1,381 @@ +/******************************************************************************* +* PROJECT: SimPaths EU +* SECTION: Validation +* OBJECT: Targets +* AUTHORS: Patryk Bronka, Ashley Burdett, Daria Popova +* LAST UPDATE: April 2026 (AB) +* COUNTRY: PL +* DESCRIPTION: This file creates validation targets using longitudinal EU +* SILC data +* generated by do files +********************************************************************************* +* NOTES: +* +*******************************************************************************/ + +* Generate/Tidy required variables + +* Load initial populations +use "$dir_init_pop_data/${country}_pooled_ipop.dta", clear + + +** IDENTIFIERS +rename idhh idHh +rename idbenefitunit idBu +rename idperson idPers +rename idpartner idPartner +rename idmother idMother +rename idfather idFather +rename swv statCollectionWave + + +** TIME +fre stm +keep if stm <= $max_year +gen year = stm + + +** DEMOGRAPHICS + +* Gender +rename dgn demMaleFlag + +replace demMaleFlag = . if demMaleFlag < 0 + + +* Age +rename dag demAge + +replace demAge = . if demAge < 0 + +* Age groups +gen ageGroup = . +replace ageGroup = 0 if demAge >= 0 & demAge < 15 +replace ageGroup = 1 if demAge >= 15 & demAge < 20 +replace ageGroup = 2 if demAge >= 20 & demAge < 25 +replace ageGroup = 3 if demAge >= 25 & demAge < 30 +replace ageGroup = 4 if demAge >= 30 & demAge < 35 +replace ageGroup = 5 if demAge >= 35 & demAge < 40 +replace ageGroup = 6 if demAge >= 40 & demAge < 60 +replace ageGroup = 7 if demAge >= 60 & demAge < 80 +replace ageGroup = 8 if demAge >= 80 & demAge <= 100 + +label def ageGrouplb /// + 0 "ageGroup_0_14" /// + 1 "ageGroup_15_19" /// + 2 "ageGroup_20_24" /// + 3 "ageGroup_25_29" /// + 4 "ageGroup_30_34" /// + 5 "ageGroup_35_39" /// + 6 "ageGroup_40_59" /// + 7 "ageGroup_60_79" /// + 8 "ageGroup_80_100" /// + +label val ageGroup ageGrouplb +fre ageGroup + +gen ageGroup2 = . +replace ageGroup2 = 0 if demAge >= 16 & demAge < 25 +replace ageGroup2 = 1 if demAge >= 25 & demAge < 30 +replace ageGroup2 = 2 if demAge >= 30 & demAge < 35 +replace ageGroup2 = 3 if demAge >= 35 & demAge < 40 +replace ageGroup2 = 4 if demAge >= 40 & demAge < 45 +replace ageGroup2 = 5 if demAge >= 45 & demAge < 50 +replace ageGroup2 = 6 if demAge >= 50 & demAge < 55 +replace ageGroup2 = 7 if demAge >= 55 & demAge < 60 +replace ageGroup2 = 8 if demAge >= 60 & demAge <= 65 + +label def ageGrouplb2 /// + 0 "ageGroup_16_24" /// + 1 "ageGroup_25_29" /// + 2 "ageGroup_30_34" /// + 3 "ageGroup_35_39" /// + 4 "ageGroup_40_44" /// + 5 "ageGroup_45_49" /// + 6 "ageGroup_50_54" /// + 7 "ageGroup_55_59" /// + 8 "ageGroup_60_65" /// + +label val ageGroup2 ageGrouplb2 +fre ageGroup2 + + +* Partnership status +rename dcpst demPartnerStatus + +gen valid_partnered = (demPartnerStatus == 1) +gen valid_single = (demPartnerStatus == 2) + +replace valid_partnered = . if demPartnerStatus == . | demPartnerStatus < 0 +replace valid_single = . if demPartnerStatus == . | demPartnerStatus < 0 + + +* Number of children +rename dnc demNChild +rename dnc02 demNChild0to2 + +gen new_born = (demAge == 0 & demNChild0to2 != 0) +bysort idBu year (new_born): gen demNChild0 = (new_born[_N] == 1) + +gen children_0 = (demNChild == 0) +gen children_1 = (demNChild == 1) +gen children_2 = (demNChild == 2) +gen children_3plus = (demNChild >= 3 & demNChild != .) + +* Interaction of partnership status and number of children +foreach var1 in valid_partnered valid_single { + + foreach var2 in children_0 children_1 children_2 children_3plus { + + gen `var1'_`var2' = (`var1' & `var2') + + } + +} + +* Have a new born +gen valid_new_born = (dchpd > 0 & dchpd != .) + +order valid_new_born dchpd + + +** EDUCATION +* Attainment dummies +rename deh_c4 eduHighestC4 + +gen valid_edu_na = (eduHighestC4 == 0) +gen valid_edu_high = (eduHighestC4 == 1) +gen valid_edu_med = (eduHighestC4 == 2) +gen valid_edu_low = (eduHighestC4 == 3) + +gen deh_flag = /// + (flag_deh_imp_mono == 1 | flag_deh_imp_reg == 1 | flag_dehsp_imp_reg == 1) + +replace valid_edu_na = . if eduHighestC4 == . | eduHighestC4 < 0 | deh_flag == 1 +replace valid_edu_high = . if eduHighestC4 == . | eduHighestC4 < 0 | /// + deh_flag == 1 +replace valid_edu_med = . if eduHighestC4 == . | eduHighestC4 < 0 | /// + deh_flag == 1 +replace valid_edu_low = . if eduHighestC4 == . | eduHighestC4 < 0 | /// + deh_flag == 1 + + +** HEALTH +* Disabed / LT sick +rename dlltsd healthDsblLongtermFlag + +gen valid_healthDsblLongtermFlag = healthDsblLongtermFlag +replace valid_healthDsblLongtermFlag = . if valid_healthDsblLongtermFlag < 0 + +* Self rated health +rename dhe valid_healthSelfRated + +replace valid_healthSelfRated = . if valid_healthSelfRated < 0 | /// + flag_dhe_imp == 1 + + +** LABOUR MARKET +* Economic activity dummies +rename les_c3 labC3 +rename les_c4 labC4 + +gen valid_employed = (labC4 == 1) +gen valid_student = (labC4 == 2) +gen valid_inactive = (labC4 == 3) +gen valid_retired = (labC4 == 4) + +replace valid_employed = . if labC4 < 0 | labC4 == . +replace valid_student = . if labC4 < 0 | labC4 == . +replace valid_inactive = . if labC4 < 0 | labC4 == . +replace valid_retired = . if labC4 < 0 | labC4 == . + +* Hours worked weekly (continuous) +rename lhw labHrsWorkWeek + +gen valid_labHrsWorkWeek = labHrsWorkWeek + +* Hours worked weekly (categories) +gen valid_labHrsWorkEnumWeek = "ZERO" +replace valid_labHrsWorkEnumWeek = "TWENTY" if labHrsWorkWeek >= 1 & /// + labHrsWorkWeek < 40 +replace valid_labHrsWorkEnumWeek = "FORTY" if labHrsWorkWeek == 40 +replace valid_labHrsWorkEnumWeek = "FIFTY" if labHrsWorkWeek > 40 & /// + !missing(labHrsWorkWeek) + +gen valid_labHrsWorkEnum_no = . +replace valid_labHrsWorkEnum_no = 0 if valid_labHrsWorkEnumWeek == "ZERO" +replace valid_labHrsWorkEnum_no = 20 if valid_labHrsWorkEnumWeek == "TWENTY" +replace valid_labHrsWorkEnum_no = 40 if valid_labHrsWorkEnumWeek == "FORTY" +replace valid_labHrsWorkEnum_no = 50 if valid_labHrsWorkEnumWeek == "FIFTY" + +* Categorical variable +gen valid_cat_hours = . +replace valid_cat_hours = 1 if valid_labHrsWorkEnumWeek == "ZERO" +replace valid_cat_hours = 2 if valid_labHrsWorkEnumWeek == "TWENTY" +replace valid_cat_hours = 3 if valid_labHrsWorkEnumWeek == "FORTY" +replace valid_cat_hours = 4 if valid_labHrsWorkEnumWeek == "FIFTY" + +* Hourly wage +// obs_earnings_hourly - alternative only containing observed wages +// pred_hourly_wage - alternative containing predicted wages +gen valid_wage = obs_earnings_hourly +replace valid_wag = . if obs_earnings_hourly < 0 +replace valid_wage = . if valid_wage == 0 & labC4 == 1 + + +** INCOME (ANNUAL) +/* +Amounts of personal income stored with the IHS transformation. +Benefit Unit level measure (gross and disposable income) are stored without the +transformation. + +No missing observations in income amounts. +*/ + +* Annual individual employment gross income +rename yplgrs_dv yEmpPersGrossMonth + +sum yEmpPersGrossMonth +count if yEmpPersGrossMonth == . + +* Convert to levels +gen yEmpPersGrossLevelMonth = sinh(yEmpPersGrossMonth) +* Convert to annual +gen valid_yEmpPersGrossLevelYear = yEmpPersGrossLevelMonth * 12 + +* Annual benefit unit gross employment income +bys year idBu: egen valid_yEmpBuGrossLevelYear = /// + total(valid_yEmpPersGrossLevelYear) + + +* Annual individual capital income +rename ypncp yCapitalPersMonth + +sum yCapitalPersMonth +count if yCapitalPersMonth == . + +* Convert to levels +gen yCapitalPersLevelMonth = sinh(yCapitalPersMonth) +* Convert to annual +gen valid_yCapitalPersLevelYear = yCapitalPersLevelMonth * 12 + +* Annual benefit unit capital income +bys year idBu: egen valid_yCapitalBuLevelYear = /// + total(valid_yCapitalPersLevelYear) + + +* Annual individual gross private pension income +rename ypnoab yPensPersGrossMonth + +sum yPensPersGrossMonth +count if yPensPersGrossMonth == . + +* Convert to levels +gen yPensPersGrossLevelMonth = 0 +* Convert to annual +gen valid_yPensPersGrossLevelYear = yPensPersGrossLevelMonth * 12 + +* Annual benefit unit gross private pension income +bys year idBu: egen valid_yPensBuGrossLevelYear = /// + total(valid_yPensPersGrossLevelYear) + + +* Annual individual gross non-benefit income +/* +rename ypnbihs_dv yNonBenPersGrossMonth + +* Convert to levels +gen yNonBenPersGrossLevelMonth = sinh(yNonBenPersGrossMonth) +* Convert to annual +gen valid_yNonBenPersGrossLevelYear = yNonBenPersGrossLevelMonth * 12 +*/ +egen valid_yNonBenPersGrossLevelYear = /// + rowtotal(valid_yPensPersGrossLevelYear valid_yCapitalPersLevelYear /// + valid_yEmpPersGrossLevelYear) + +* Annual benefit unit gross non-benefit income +bys year idBu: egen valid_yNonBenBuGrossLevelYear = /// + total(valid_yNonBenPersGrossLevelYear) + + +* Annual benefit unit gross income (level, non-benefit) +/* +Gross income is the same as non-benefit private income. +*/ +gen valid_yGrossBuLevelYear = valid_yNonBenBuGrossLevelYear + +gen valid_yGrossPersLevelYear = valid_yNonBenPersGrossLevelYear + + +* Annual benefit unit disposable income (level) +rename ydisp yDispPersMonth +* Convert to annual +gen valid_yDispPersYear = yDispPersMonth * 12 + +* Convert to benefit unit +bys year idBu: egen valid_yDispBuLevelYear = total(valid_yDispPersYear) + + +* Benefit unit - Net transfers +gen valid_net_transfers = valid_yDispBuLevelYear - valid_yNonBenBuGrossLevelYear + + +* Equivalised disposable income per benefit unit + +* Compute equivalence scale +* Idenifty types of children +gen is_older_child = 1 if inrange(demAge,14,18) & (idMother < . | idFather < .) +gen is_child = 1 if demAge < 14 & (idMother < . | idFather < .) + +* Sum up number in hh +bysort idHh: egen num_older_children = total(is_older_child) +bysort idHh: egen num_children = total(is_child) + +* Compute Modified OECD equivalence scale + +gen moecd_eq = . +replace moecd_eq = 1.5 if dhhtp_c4 == 1 +replace moecd_eq = 0.3 * num_children + 0.5 * num_older_children + 1.5 if /// + dhhtp_c4 == 2 +replace moecd_eq = 1 if dhhtp_c4 == 3 +replace moecd_eq = 0.3 * num_children + 0.5 * num_older_children + 1 if /// + dhhtp_c4 == 4 + +* Apply equivalence scale +gen valid_yDispBuEquivYear = valid_yDispBuLevelYear / moecd_eq + +drop is_older_child is_child moecd_eq + + +* Minimum wage variable +gen min_wage = 8.66 if year == 2011 +replace min_wage = 9.38 if year == 2012 +replace min_wage = 10.00 if year == 2013 +replace min_wage = 10.5 if year == 2014 +replace min_wage = 10.64 if year == 2015 +replace min_wage = 11.56 if year == 2016 +replace min_wage = 12.50 if year == 2017 +replace min_wage = 13.13 if year == 2018 +replace min_wage = 14.06 if year == 2019 +replace min_wage = 16.25 if year == 2020 +replace min_wage = 17.50 if year == 2021 +replace min_wage = 18.81 if year == 2022 +replace min_wage = 22.8 if year == 2023 + +* Convert into real terms +replace min_wage = min_wage * (CPI/100) + +gen valid_wage_hour_adj_mw = valid_wage +replace valid_wage_hour_adj_mw = min_wage if valid_wage < min_wage + +replace valid_wage_hour_adj = valid_wage +replace valid_wage_hour_adj = . if valid_wage < min_wage + + +* Restrict sample to relevant validation years +drop if year < ${min_sim_year} +drop if year > ${max_sim_year} + + +save "$dir_data/${country}-eusilc_validation_sample.dta", replace + diff --git a/validation/02_simulation_validation/do_files/04_01_plot_activity_status.do b/validation/02_simulation_validation/do_files/04_01_plot_activity_status.do new file mode 100644 index 0000000..7845fdf --- /dev/null +++ b/validation/02_simulation_validation/do_files/04_01_plot_activity_status.do @@ -0,0 +1,2286 @@ +/******************************************************************************* +* PROJECT: SimPaths EU +* SECTION: Validation +* OBJECT: Economic Activity Status plots +* AUTHORS: Ashley Burdett +* LAST UPDATE: 4/26 +* COUNTRY: PL +* DESCRIPTION: This do file plots validation graphs for economic activity +* status (4 cat). +******************************************************************************** +* NOTES: +*******************************************************************************/ + +clear all + +******************************************************************************** +* 0 : Programmes +******************************************************************************** + +* Time series plot, all activity statuses +cap program drop make_activity_plot + +program define make_activity_plot + syntax, subtitle(string) saving(string) note(string) + + twoway /// + (rarea sim_employed_high sim_employed_low year, sort color(green%20) legend(label(1 "Employed, SimPaths"))) /// + (line valid_employed year, sort color(green) legend(label(2 "Employed, UKHLS"))) /// + (rarea sim_student_high sim_student_low year, sort color(blue%20) legend(label(3 "Students, SimPaths"))) /// + (line valid_student year, sort color(blue) legend(label(4 "Students, UKHLS"))) /// + (rarea sim_inactive_high sim_inactive_low year, sort color(red%20) legend(label(5 "Non-employed, SimPaths"))) /// + (line valid_inactive year, sort color(red) legend(label(6 "Non-employed, UKHLS"))) /// + (rarea sim_retired_high sim_retired_low year, sort color(grey%20) legend(label(7 "Retired, SimPaths"))) /// + (line valid_retired year, sort color(grey) legend(label(8 "Retired, UKHLS"))), /// + title("Economic Activity Status") /// + subtitle("`subtitle'") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + graphregion(color(white)) /// + legend(size(small)) /// + note(`note', size(vsmall)) + + graph export "$dir_output_files/economic_activity/`saving'.jpg", replace width(2400) height(1350) quality(100) + +end + + +* Time series plot, non-employed statuses only +* Time series plot all +cap program drop make_activity_ne_plot + +program define make_activity_ne_plot + syntax, subtitle(string) saving(string) note(string) + + twoway /// + (rarea sim_student_high sim_student_low year, sort color(blue%20) legend(label(1 "Students, SimPaths"))) /// + (line valid_student year, sort color(blue) legend(label(2 "Students, UKHLS"))) /// + (rarea sim_inactive_high sim_inactive_low year, sort color(red%20) legend(label(3 "Non-employed, SimPaths"))) /// + (line valid_inactive year, sort color(red) legend(label(4 "Non-employed, UKHLS"))) /// + (rarea sim_retired_high sim_retired_low year, sort color(grey%20) legend(label(5 "Retired, SimPaths"))) /// + (line valid_retired year, sort color(grey) legend(label(6 "Retired, UKHLS"))), /// + title("Non-Employed Economic Activity Status") /// + subtitle("`subtitle'") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + graphregion(color(white)) /// + legend(size(small)) /// + note(`note', size(vsmall)) + + graph export "$dir_output_files/economic_activity/`saving'.jpg", replace width(2400) height(1350) quality(100) +end + + +******************************************************************************** +* 1 : Mean values over time +******************************************************************************** +******************************************************************************** +* 1.1 : Mean values over time - Economic activity status +******************************************************************************** +******************************************************************************** +* 1.1.1 : Young people (16-30) +******************************************************************************** + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive demAge /// + valid_retired using "$dir_data/${country}-eusilc_validation_sample.dta", /// + clear + +drop if demAge > 30 +drop if demAge < 16 + +collapse (mean) valid_employed valid_student valid_inactive valid_retired /// + [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired demAge using /// + "$dir_data/simulation_sample.dta", clear + +drop if demAge > 30 +drop if demAge < 16 + +collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// + by(run year) + +collapse (mean) sim_employed sim_student sim_inactive sim_retired /// + (sd) sim_employed_sd = sim_employed /// + sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(year) + +* Compute 95% confidence interval +foreach varname in sim_employed sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +make_activity_plot, /// + subtitle("Ages 16-30") /// + saving("validation_${country}_activity_status_ts_16_30_both") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired.""') + + +******************************************************************************** +* 1.1.1.1 : Young people (16-30), by gender +******************************************************************************** + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive demAge demMaleFlag /// + valid_retired using "$dir_data/${country}-eusilc_validation_sample.dta", /// + clear + +drop if demAge > 30 +drop if demAge < 16 + +collapse (mean) valid_employed valid_student valid_inactive valid_retired /// + [aw = dwt], by(year demMaleFlag) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired demAge /// + demMaleFlag using "$dir_data/simulation_sample.dta", clear + +drop if demAge > 30 +drop if demAge < 16 + +collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// + by(run year demMaleFlag) + +collapse (mean) sim_employed sim_student sim_inactive sim_retired /// + (sd) sim_employed_sd = sim_employed /// + sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(year demMaleFlag) + +* Compute 95% confidence interval +foreach varname in sim_employed sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figures + +* Males +preserve + +keep if demMaleFlag == 1 + +make_activity_plot, /// + subtitle("Ages 16-30, males") /// + saving("validation_${country}_activity_status_ts_16_30_male") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired.""') + +restore + +* Female + +keep if demMaleFlag == 0 + +* Plot figure +make_activity_plot, /// + subtitle("Ages 16-30, females") /// + saving("validation_${country}_activity_status_ts_16_30_female") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired.""') + + +******************************************************************************** +* 1.1.2 : Working age (16-65) +******************************************************************************** + +* Prepare validation data +use idPers year dwt demMaleFlag demAge valid_employed valid_student /// + valid_inactive valid_retired using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) + +collapse (mean) valid_employed valid_student valid_inactive valid_retired /// + [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired demMaleFlag /// + demAge using "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) + +collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// + by(run year) + +collapse (mean) sim_employed sim_student sim_inactive sim_retired /// + (sd) sim_employed_sd = sim_employed /// + sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(year) + +foreach varname in sim_employed sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +make_activity_plot, /// + subtitle("Ages 16-65") /// + saving("validation_${country}_activity_status_ts_16_65_both") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired.""') + + +******************************************************************************** +* 1.1.2.1 : Working age (16-65), by gender +******************************************************************************** + +* Prepare validation data +use idPers year dwt demMaleFlag demAge valid_employed valid_student /// + valid_inactive valid_retired using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) + + +collapse (mean) valid_employed valid_student valid_inactive valid_retired /// + [aw = dwt], by(year demMaleFlag) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired demMaleFlag /// + demAge using "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) + +collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// + by(run year demMaleFlag) + +collapse (mean) sim_employed sim_student sim_inactive sim_retired /// + (sd) sim_employed_sd = sim_employed /// + sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(year demMaleFlag) + +foreach varname in sim_employed sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figures + +* Male +preserve + +keep if demMaleFlag == 1 + +make_activity_plot, /// + subtitle("Ages 16-65, males") /// + saving("validation_${country}_activity_status_ts_16_65_male") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired.""') + +restore + +* Female + +keep if demMaleFlag == 0 + +make_activity_plot, /// + subtitle("Ages 16-65, females") /// + saving("validation_${country}_activity_status_ts_16_65_female") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired.""') + +graph drop _all + + +******************************************************************************** +* 1.1.2.2 : Working age (18-65) by partnership status +******************************************************************************** + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive valid_retired /// + demPartnerStatus demMaleFlag demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if inrange(demAge,18,65) + +collapse (mean) valid_employed valid_student valid_inactive valid_retired /// + [aw = dwt], by(year demPartnerStatus) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired /// + demPartnerStatus demMaleFlag demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,18,65) + +collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// + by(run year demPartnerStatus) + +collapse (mean) sim_employed sim_student sim_inactive sim_retired /// + (sd) sim_employed_sd = sim_employed /// + sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(year demPartnerStatus) + +foreach varname in sim_employed sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year demPartnerStatus using "$dir_data/temp_valid_stats.dta", /// + keep(3) nogen + +* Plot figures + +* Partnered +preserve + +keep if demPartnerStatus == 1 + +make_activity_plot, /// + subtitle("Ages 18-65, partnered") /// + saving("validation_${country}_activity_status_ts_18_65_both_partnered") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired.""') + +restore + +* Single +keep if demPartnerStatus == 2 + +make_activity_plot, /// + subtitle("Ages 18-65, single") /// + saving("validation_${country}_activity_status_ts_18_65_both_single") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired.""') + +graph drop _all + + +******************************************************************************** +* 1.1.2.2.1 : Working age (18-65), by partnership status, by gender +******************************************************************************** + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive valid_retired /// + demPartnerStatus demMaleFlag demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if inrange(demAge,18,65) + +collapse (mean) valid_employed valid_student valid_inactive valid_retired /// + [aw = dwt], by(year demPartnerStatus demMaleFlag) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired /// + demPartnerStatus demMaleFlag demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,18,65) + +collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// + by(run year demPartnerStatus demMaleFlag) + +collapse (mean) sim_employed sim_student sim_inactive sim_retired /// + (sd) sim_employed_sd = sim_employed /// + sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(year demPartnerStatus demMaleFlag) + +foreach varname in sim_employed sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year demPartnerStatus demMaleFlag using /// + "$dir_data/temp_valid_stats.dta", keep(3) nogen + + +* Plot figures +foreach g in 0 1 { + local gname = cond(`g'==1, "male", "female") + + foreach p in 1 2 { + local pname = cond(`p'==1, "partnered", "single") + + preserve + keep if demMaleFlag == `g' & demPartnerStatus == `p' + + make_activity_plot, /// + subtitle("Ages 18-65, `pname' `gname's") /// + saving("validation_${country}_activity_status_ts_18_65_`gname'_`pname'") /// + note(`""Notes: ..." "..." "') + restore + } +} + +graph drop _all + + +******************************************************************************** +* 1.1.3 : Female working age (16-60) +******************************************************************************** + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive demMaleFlag demAge /// + valid_retired using "$dir_data/${country}-eusilc_validation_sample.dta", /// + clear + +* Select sample +keep if inrange(demAge,16,60) + +collapse (mean) valid_employed valid_student valid_inactive valid_retired /// + [aw = dwt], by(year demMaleFlag) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired demMaleFlag /// + demAge using "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,60) + +collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// + by(run year demMaleFlag) + +collapse (mean) sim_employed sim_student sim_inactive sim_retired /// + (sd) sim_employed_sd = sim_employed /// + sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(year demMaleFlag) + +foreach varname in sim_employed sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Females +keep if demMaleFlag == 0 + +* Plot figure +make_activity_plot, /// + subtitle("Ages 16-60, females") /// + saving("validation_${country}_activity_status_ts_16_60_female") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired.""') + + +******************************************************************************** +* 1.1.4 : All ages +******************************************************************************** + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive demMaleFlag /// + valid_retired demAge labC4 using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +collapse (mean) valid_employed valid_student valid_inactive valid_retired /// + [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired demMaleFlag /// + demAge labC4 using "$dir_data/simulation_sample.dta", clear + +collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// + by(run year) + +collapse (mean) sim_employed sim_student sim_inactive sim_retired /// + (sd) sim_employed_sd = sim_employed /// + sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(year) + +foreach varname in sim_employed sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + + +* Plot figure +make_activity_plot, /// + subtitle("All ages") /// + saving("validation_${country}_activity_status_ts_all_both") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired.""') + +graph drop _all + + +******************************************************************************** +* 1.1.4.1 : All ages, by gender +******************************************************************************** + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive demMaleFlag /// + valid_retired demAge labC4 using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +collapse (mean) valid_employed valid_student valid_inactive valid_retired /// + [aw = dwt], by(year demMaleFlag) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired demMaleFlag /// + demAge labC4 using "$dir_data/simulation_sample.dta", clear + +collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// + by(run year demMaleFlag) + +collapse (mean) sim_employed sim_student sim_inactive sim_retired /// + (sd) sim_employed_sd = sim_employed /// + sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(year demMaleFlag) + +foreach varname in sim_employed sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure + +* Males +preserve +keep if demMaleFlag == 1 + +make_activity_plot, /// + subtitle("All ages, males") /// + saving("validation_${country}_activity_status_ts_all_male") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired.""') + +restore + +* Females +keep if demMaleFlag == 0 + +make_activity_plot, /// + subtitle("All ages, females") /// + saving("validation_${country}_activity_status_ts_all_female") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired.""') + +graph drop _all + + +******************************************************************************** +* 1.1.5 : Adult population 18+ +******************************************************************************** + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive demMaleFlag demAge /// + valid_retired using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +drop if demAge < 18 + +collapse (mean) valid_employed valid_student valid_inactive valid_retired /// + [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired demMaleFlag /// + demAge using "$dir_data/simulation_sample.dta", clear + +drop if demAge < 18 + +collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// + by(run year) + +collapse (mean) sim_employed sim_student sim_inactive sim_retired /// + (sd) sim_employed_sd = sim_employed /// + sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(yea) + +foreach varname in sim_employed sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +make_activity_plot, /// + subtitle("Ages 18+") /// + saving("validation_${country}_activity_status_ts_18plus_both") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired.""') + + +******************************************************************************** +* 1.1.5.1 : Adult population 18+, by gender +******************************************************************************** + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive demMaleFlag demAge /// + valid_retired using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +drop if demAge < 18 + +collapse (mean) valid_employed valid_student valid_inactive valid_retired /// + [aw = dwt], by(year demMaleFlag) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired demMaleFlag /// + demAge using "$dir_data/simulation_sample.dta", clear + +drop if demAge < 18 + +collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// + by(run year demMaleFlag) + +collapse (mean) sim_employed sim_student sim_inactive sim_retired /// + (sd) sim_employed_sd = sim_employed /// + sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(yea demMaleFlag) + +foreach varname in sim_employed sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figures + +* Males +preserve + +keep if demMaleFlag == 1 + +make_activity_plot, /// + subtitle("Ages 18+, males") /// + saving("validation_${country}_activity_status_ts_18plus_male") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired.""') + +restore + +* Females +keep if demMaleFlag == 0 + +make_activity_plot, /// + subtitle("Ages 18+, females") /// + saving("validation_${country}_activity_status_ts_18plus_female") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired.""') + +graph drop _all + + +******************************************************************************** +* 1.1.6 : Labour supply age group (16-75) +******************************************************************************** + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive demMaleFlag demAge /// + valid_retired using "$dir_data/${country}-eusilc_validation_sample.dta", /// + clear + +* Select sample +keep if inrange(demAge,18,75) + +collapse (mean) valid_employed valid_student valid_inactive valid_retired /// + [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired demMaleFlag /// + demAge using "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,18,75) + +collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// + by(run year) + +collapse (mean) sim_employed sim_student sim_inactive sim_retired /// + (sd) sim_employed_sd = sim_employed /// + sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(year) + +foreach varname in sim_employed sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +make_activity_plot, /// + subtitle("Ages 16-75") /// + saving("validation_${country}_activity_status_ts_16_75_both") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired.""') + + +******************************************************************************** +* 1.1.6.1 : Labour supply age group (16-75), by gender +******************************************************************************** + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive demMaleFlag demAge /// + valid_retired using "$dir_data/${country}-eusilc_validation_sample.dta", /// + clear + +* Select sample +keep if inrange(demAge,18,75) + +collapse (mean) valid_employed valid_student valid_inactive valid_retired /// + [aw = dwt], by(year demMaleFlag) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired demMaleFlag /// + demAge using "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,18,75) + +collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// + by(run year demMaleFlag) + +collapse (mean) sim_employed sim_student sim_inactive sim_retired /// + (sd) sim_employed_sd = sim_employed /// + sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(year demMaleFlag) + +foreach varname in sim_employed sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figures + +* Male +preserve + +keep if demMaleFlag == 1 + +make_activity_plot, /// + subtitle("Ages 16-75, males") /// + saving("validation_${country}_activity_status_ts_16_75_male") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired.""') + +restore + +* Female +keep if demMaleFlag == 0 + +* Plot figure +make_activity_plot, /// + subtitle("Ages 16-75, females") /// + saving("validation_${country}_activity_status_ts_16_75_female") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired.""') + +graph drop _all + + +******************************************************************************** +* 1.2 : Mean values over time, share employed +******************************************************************************** + +******************************************************************************** +* 1.2.1 : Mean values over time, share employed, by age group, by gender +******************************************************************************** + +* Prepare validation data +use year dwt demMaleFlag ageGroup valid_employed demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +gen employed_f = (valid_employed) if demMaleFlag == 0 +gen employed_m = (valid_employed) if demMaleFlag == 1 + +drop if ageGroup == 0 | ageGroup == 8 + +collapse (mean) employed_f employed_m [aw=dwt], /// + by(ageGroup year) + +drop if missing(ageGroup) + +reshape wide employed_f employed_m, i(year) j(ageGroup) + +forvalues i = 1(1)7 { + + rename employed_f`i' employed_f_`i'_valid + rename employed_m`i' employed_m_`i'_valid + +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year demMaleFlag ageGroup sim_employed using /// + "$dir_data/simulation_sample.dta", clear + +gen employed_f = (sim_employed) if demMaleFlag == 0 +gen employed_m = (sim_employed) if demMaleFlag == 1 + +drop if ageGroup == 0 | ageGroup == 8 + +collapse (mean) employed_f employed_m, by(ageGroup run year) +drop if missing(ageGroup) + +reshape wide employed_f employed_m, i(year run) j(ageGroup) + +forvalues i = 1(1)7 { + + rename employed_f`i' employed_f_`i'_sim + rename employed_m`i' employed_m_`i'_sim + +} + +collapse (mean) employed* /// + (sd) sd_employed_f_1_sim=employed_f_1_sim /// + sd_employed_f_2_sim=employed_f_2_sim /// + sd_employed_f_3_sim=employed_f_3_sim /// + sd_employed_f_4_sim=employed_f_4_sim /// + sd_employed_f_5_sim=employed_f_5_sim /// + sd_employed_f_6_sim=employed_f_6_sim /// + sd_employed_f_7_sim=employed_f_7_sim /// + sd_employed_m_1_sim=employed_m_1_sim /// + sd_employed_m_2_sim=employed_m_2_sim /// + sd_employed_m_3_sim=employed_m_3_sim /// + sd_employed_m_4_sim=employed_m_4_sim /// + sd_employed_m_5_sim=employed_m_5_sim /// + sd_employed_m_6_sim=employed_m_6_sim /// + sd_employed_m_7_sim=employed_m_7_sim /// + , by(year) + + +forvalues i = 1(1)7 { + + gen employed_f_`i'_sim_high = /// + employed_f_`i'_sim + 1.96*sd_employed_f_`i'_sim + gen employed_f_`i'_sim_low = /// + employed_f_`i'_sim - 1.96*sd_employed_f_`i'_sim + gen employed_m_`i'_sim_high = /// + employed_m_`i'_sim + 1.96*sd_employed_m_`i'_sim + gen employed_m_`i'_sim_low = /// + employed_m_`i'_sim - 1.96*sd_employed_m_`i'_sim + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figures + +* Define the age labels in a local macro +local age_labels `" "16-19" "20-24" "25-29" "30-34" "35-39" "40-59" "60-79" "' + +foreach vble in "employed_f" "employed_m" { + + *Loop through the 7 age groups + forvalues i = 1/7 { + + * Extract the label for the current index i + local title : word `i' of `age_labels' + + twoway (rarea `vble'_`i'_sim_high `vble'_`i'_sim_low year, /// + sort color(green%20) /// + legend(label(1 "SimPaths") position(6) rows(1))) /// + (line `vble'_`i'_valid year, sort color(green) /// + legend(label(2 "UKHLS"))), /// + title("Age `title'") /// + name(`vble'_`i', replace) /// + ylabel(0.2(0.4)1) /// + xtitle("") /// + graphregion(color(white)) + } + + * Determine gender subtitle for the combined plot + local gsubtitle = cond("`vble'" == "employed_m", "Males", "Females") + local gsuffix = cond("`vble'" == "employed_m", "male", "female") + + * Combine plots + grc1leg `vble'_1 `vble'_2 `vble'_3 `vble'_4 `vble'_5 `vble'_6 `vble'_7, /// + title("Share Employed by Age Group") /// + subtitle("`gsubtitle'") /// + legendfrom(`vble'_1) /// + ycomm /// + graphregion(color(white)) /// + note("Notes: ", size(vsmall)) + + * 5. Export + graph export "$dir_output_files/economic_activity/validation_${country}_employed_ts_age_groups_`gsuffix'.jpg", /// + replace width(2400) height(1350) quality(100) + +} + +graph drop _all + + +******************************************************************************** +* 1.3 : Mean values over time, non-employed shares +******************************************************************************** + +******************************************************************************** +* 1.3.1 : Non-employed shares, working age (16-65) +******************************************************************************** + +* Prepare validation data +use idPers year dwt valid_employed valid_student valid_inactive /// + valid_retired demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) + +collapse (mean) valid_student valid_inactive valid_retired /// + [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired demAge /// + using "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) + +collapse (mean) sim_student sim_inactive sim_retired, /// + by(run year) + +collapse (mean) sim_student sim_inactive sim_retired /// + (sd) sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(year) + +foreach varname in sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + + +* Plot figure +make_activity_ne_plot, /// + subtitle("Ages 16-65") /// + saving("validation_${country}_activity_status_ts_not_employed_16_65_both") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. Demonimator is the full population.""') + + +******************************************************************************** +* 1.3.1.1 : Non-employed shares, Working age (16-65), by gender +******************************************************************************** + +* Prepare validation data +use idPers year dwt valid_employed valid_student valid_inactive /// + valid_retired demAge demMaleFlag using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) + +collapse (mean) valid_student valid_inactive valid_retired /// + [aw = dwt], by(year demMaleFlag) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired demAge /// + demMaleFlag using "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) + +collapse (mean) sim_student sim_inactive sim_retired, /// + by(run year demMaleFlag) + +collapse (mean) sim_student sim_inactive sim_retired /// + (sd) sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(year demMaleFlag) + +foreach varname in sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figures + +* Males +preserve + +keep if demMaleFlag == 1 + +make_activity_ne_plot, /// + subtitle("Ages 16-65, males") /// + saving("validation_${country}_activity_status_ts_not_employed_16_65_male") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. Demonimator is the full population.""') + +restore + +* Females +keep if demMaleFlag == 0 + +* Plot figure +make_activity_ne_plot, /// + subtitle("Ages 16-65, females") /// + saving("validation_${country}_activity_status_ts_not_employed_16_65_female") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. Demonimator is the full population.""') + +graph drop _all + + +******************************************************************************** +* 1.3.1.2 : Non-employed shares, working age (16-65), by partnership status +******************************************************************************** + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive valid_retired /// + demPartnerStatus demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if inrange(demAge,18,65) + +collapse (mean) valid_student valid_inactive valid_retired /// + [aw = dwt], by(year demPartnerStatus) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired /// + demPartnerStatus demMaleFlag demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,18,65) + +* Compute mean and sd +collapse (mean) sim_student sim_inactive sim_retired, /// + by(run year demPartnerStatus) + +collapse (mean) sim_student sim_inactive sim_retired /// + (sd) sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(year demPartnerStatus) + +* Approx 95% confidence interval +foreach varname in sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year demPartnerStatus using "$dir_data/temp_valid_stats.dta", /// + keep(3) nogen + + +* Plot figure + +* Partnered +preserve + +keep if demPartnerStatus == 1 + +make_activity_ne_plot, /// + subtitle("Ages 18-65, partnered") /// + saving("validation_${country}_activity_status_ts_not_employed_18_65_partnered") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. Demonimator is the full population.""') + +restore + +* Single +keep if demPartnerStatus == 2 + +make_activity_ne_plot, /// + subtitle("Ages 18-65, singles") /// + saving("validation_${country}_activity_status_ts_not_employed_18_65_single") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. Demonimator is the full population.""') + +graph drop _all + + +******************************************************************************** +* 1.3.1.3 : Non-employed shares, working age (18-65), by partnership status, +* by gender +******************************************************************************** + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive valid_retired /// + demPartnerStatus demMaleFlag demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if inrange(demAge,18,65) + +collapse (mean) valid_student valid_inactive valid_retired /// + [aw = dwt], by(year demPartnerStatus demMaleFlag) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired /// + demPartnerStatus demMaleFlag demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,18,65) + +* Compute mean and sd +collapse (mean) sim_student sim_inactive sim_retired, /// + by(run year demPartnerStatus demMaleFlag) + +collapse (mean) sim_student sim_inactive sim_retired /// + (sd) sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(year demPartnerStatus demMaleFlag) + +* Approx 95% confidence interval +foreach varname in sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year demPartnerStatus demMaleFlag using /// + "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figures + +foreach g in 1 0 { + * Define gender labels + local gname = cond(`g' == 1, "male", "female") + local gtitle = cond(`g' == 1, "males", "females") + + foreach p in 1 2 { + * Define partnership labels + local pname = cond(`p' == 1, "partnered", "single") + + preserve + * Filter data + keep if demMaleFlag == `g' & demPartnerStatus == `p' + + * Generate the plot + make_activity_ne_plot, /// + subtitle("Ages 18-65, `pname' `gtitle'") /// + saving("validation_${country}_activity_status_ts_not_employed_18_65_`gname'_`pname'") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. Demonimator is the full population.""') + restore + + } +} + +graph drop _all + + +******************************************************************************** +* 1.3.2 : Non-employed shares, Female working age (16-60) +******************************************************************************** + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive demMaleFlag demAge /// + valid_retired using "$dir_data/${country}-eusilc_validation_sample.dta", /// + clear + +* Select sample +keep if inrange(demAge,16,60) + +drop if demMaleFlag == 1 +drop valid_employed demMaleFlag + +collapse (mean) valid_student valid_inactive valid_retired /// + [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired demMaleFlag /// + demAge using "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,60) + +drop if demMaleFlag == 1 +drop sim_employed demMaleFlag + +collapse (mean) sim_student sim_inactive sim_retired, /// + by(run year) + +collapse (mean) sim_student sim_inactive sim_retired /// + (sd) sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(year) + +foreach varname in sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + + +* Plot figure +make_activity_ne_plot, /// + subtitle("Ages 16-60, females") /// + saving("validation_${country}_activity_status_ts_not_employed_16_60_female") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. Demonimator is the full population.""') + + +******************************************************************************** +* 1.3.3 : Non-employed shares, all ages +******************************************************************************** + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive demMaleFlag /// + valid_retired using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +drop valid_employed + +collapse (mean) valid_student valid_inactive valid_retired /// + [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired /// + demMaleFlag using "$dir_data/simulation_sample.dta", clear + +drop sim_employed + +collapse (mean) sim_student sim_inactive sim_retired, /// + by(run year) + +collapse (mean) sim_student sim_inactive sim_retired /// + (sd) sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(year) + +foreach varname in sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +make_activity_ne_plot, /// + subtitle("All ages") /// + saving("validation_${country}_activity_status_ts_not_employed_all_both") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. Demonimator is the full population.""') + + +******************************************************************************** +* 1.3.3.1 : Non-employed shares, all ages, by gender +******************************************************************************** + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive demMaleFlag /// + valid_retired using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +drop valid_employed + +collapse (mean) valid_student valid_inactive valid_retired /// + [aw = dwt], by(year demMaleFlag) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired /// + demMaleFlag using "$dir_data/simulation_sample.dta", clear + +drop sim_employed + +collapse (mean) sim_student sim_inactive sim_retired, /// + by(run year demMaleFlag) + +collapse (mean) sim_student sim_inactive sim_retired /// + (sd) sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(year demMaleFlag) + +foreach varname in sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figures + +* Males +preserve + +keep if demMaleFlag == 1 + +* Plot figure +make_activity_ne_plot, /// + subtitle("All ages, males") /// + saving("validation_${country}_activity_status_ts_not_employed_all_male") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. Demonimator is the full population.""') + +restore + +* Females +keep if demMaleFlag == 0 + +* Plot figure +make_activity_ne_plot, /// + subtitle("All ages, females") /// + saving("validation_${country}_activity_status_ts_not_employed_all_female") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. Demonimator is the full population.""') + +graph drop _all + + +******************************************************************************** +* 1.3.4 : Non-employed shares, adult population 18+ +******************************************************************************** + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive demMaleFlag demAge /// + valid_retired using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +drop if demAge < 18 + +collapse (mean) valid_student valid_inactive valid_retired /// + [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired demMaleFlag /// + demAge using "$dir_data/simulation_sample.dta", clear + +* Select sample +drop if demAge < 18 + +collapse (mean) sim_student sim_inactive sim_retired, /// + by(run year) + +collapse (mean) sim_student sim_inactive sim_retired /// + (sd) sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(year) + +foreach varname in sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +make_activity_ne_plot, /// + subtitle("Ages 18+") /// + saving("validation_${country}_activity_status_ts_not_employed_18plus_both") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. Demonimator is the full population.""') + + +******************************************************************************** +* 1.3.4.1 : Non-employed shares, adult population 18+, by gender +******************************************************************************** + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive demMaleFlag demAge /// + valid_retired using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +drop if demAge < 18 + +collapse (mean) valid_student valid_inactive valid_retired /// + [aw = dwt], by(year demMaleFlag) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_employed sim_student sim_inactive sim_retired demMaleFlag /// + demAge using "$dir_data/simulation_sample.dta", clear + +* Select sample +drop if demAge < 18 + +collapse (mean) sim_student sim_inactive sim_retired, /// + by(run year demMaleFlag) + +collapse (mean) sim_student sim_inactive sim_retired /// + (sd) sim_student_sd = sim_student /// + sim_inactive_sd = sim_inactive /// + sim_retired_sd = sim_retired /// + , by(year demMaleFlag) + +foreach varname in sim_student sim_inactive sim_retired { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + + +* Plot figure + +* Males +preserve + +keep if demMaleFlag == 1 + +make_activity_ne_plot, /// + subtitle("Ages 18+, males") /// + saving("validation_${country}_activity_status_ts_not_employed_18plus_male") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. Demonimator is the full population.""') + +restore + +* Females +keep if demMaleFlag == 0 + +make_activity_ne_plot, /// + subtitle("Ages 18+, females") /// + saving("validation_${country}_activity_status_ts_not_employed_18plus_female") /// + note(`""Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. Demonimator is the full population.""') + +graph drop _all + + +******************************************************************************** +* 1.4 Mean values over time, share students +******************************************************************************** + +******************************************************************************** +* 1.4.1 Share of students, by age group +******************************************************************************** + +* Prepare validation data +use year dwt demMaleFlag ageGroup valid_student demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +gen student = valid_student + +* Select sample +drop if ageGroup == 0 | ageGroup == 8 + +* Compute means +collapse (mean) student [aw=dwt], by(ageGroup year) + +drop if missing(ageGroup) + +* Restructure data +reshape wide student , i(year) j(ageGroup) + +forvalues i = 1(1)7 { + + rename student`i' student_`i'_valid + +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year demMaleFlag ageGroup sim_student using /// + "$dir_data/simulation_sample.dta", clear + +gen student = sim_student + +* Compute means +collapse (mean) student, by(ageGroup run year) + +drop if missing(ageGroup) + +* Restructure data +reshape wide student, i(year run) j(ageGroup) + +forvalues i=1(1)7{ + + rename student`i' student_`i'_sim + +} + +collapse (mean) student* /// + (sd) sd_student_1_sim =student_1_sim /// + sd_student_2_sim = student_2_sim /// + sd_student_3_sim = student_3_sim /// + sd_student_4_sim = student_4_sim /// + sd_student_5_sim = student_5_sim /// + sd_student_6_sim = student_6_sim /// + sd_student_7_sim = student_7_sim /// + , by(year) + + +forvalues i = 1(1)7 { + + gen student_`i'_sim_high = student_`i'_sim + 1.96*sd_student_`i'_sim + gen student_`i'_sim_low = student_`i'_sim - 1.96*sd_student_`i'_sim + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figures +* Define the specific age titles in a local macro +local age_titles `" "16-19" "20-24" "25-29" "30-34" "35-39" "40-59" "' + +* Loop through the 6 groups +forvalues i = 1/6 { + + * Extract the corresponding title from the macro + local title : word `i' of `age_titles' + + twoway (rarea student_`i'_sim_high student_`i'_sim_low year, /// + sort color(blue%20) /// + legend(label(1 "SimPaths") position(6) rows(1))) /// + (line student_`i'_valid year, sort color(blue) /// + legend(label(2 "UKHLS"))), /// + title("Age `title'") /// + name(student_`i', replace) /// + ylabel(0(0.4)0.8) /// // Note: Standard Stata syntax is 0(step)max + xtitle("") /// + graphregion(color(white)) +} + +* 3. Combine and Save +grc1leg student_1 student_2 student_3 student_4 student_5 student_6 , /// + title("Share of Students by Age Group") /// + legendfrom(student_1) /// + graphregion(color(white)) /// + note("Notes: ", size(vsmall)) + +graph export /// +"$dir_output_files/economic_activity/validation_${country}_students_ts_age_groups_both.jpg", /// + replace width(2400) height(1350) quality(100) + +graph drop _all + + +******************************************************************************** +* 1.5 Mean values over time - Partners combined status +******************************************************************************** + +* LF Non-employed partners with LF Non-employed + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive valid_retired demAge /// + demPartnerStatus idPers idPartner idBu idHh /// + using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Only keep those with a partner +keep if idPartner != -9 + +sort idPartner year + +* Address multiple partners +gen to_drop = 1 if idPartner == idPartner[_n-1] & year == year[_n-1] +replace to_drop = 1 if to_drop[_n+1] == 1 & idPartner == idPartner[_n+1] + +* Collect partner employement information +preserve + +drop idPartner demAge +rename idPers idPartner +rename valid_* valid_ptnr_* +rename to_drop to_drop_ptnr + +save "$dir_data/temp_valid_partner.dta", replace + +restore + +* Address multiple partners +drop if to_drop == 1 + +* Merge in partner info +merge 1:1 year idPartner using "$dir_data/temp_valid_partner.dta" + +drop if to_drop_ptnr == 1 + +* Only keep those in which partner's info is available +keep if _m == 3 + +* Select sample +keep if valid_inactive == 1 + +keep if inrange(demAge,18,65) + +collapse (mean) valid_ptnr_inactive [aw=dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulation data +use year sim_employed sim_student sim_inactive sim_retired demAge run /// + idPers idPartner idBu /// + using "$dir_data/simulation_sample.dta", clear + +* Only keep those with a partner +keep if idPartner != . + +* Collect partner employement information +preserve + +drop idPartner demAge +rename idPers idPartner +rename sim_* sim_ptnr_* + +save "$dir_data/temp_sim_partner.dta", replace + +restore + +* Merge in partner info +merge 1:1 year idPartner run using "$dir_data/temp_sim_partner.dta" +drop _m + +* Compute share of those who are non-employed and in a partnership whose partner +* is also non-emplyed compared to working + +* Select sample +keep if sim_inactive == 1 + +keep if inrange(demAge,18,65) + + +* Compute mean and sd +collapse (mean) sim_ptnr_inactive, by(run year) + +collapse (mean) sim_ptnr_inactive /// + (sd) sim_ptnr_inactive_sd = sim_ptnr_inactive /// + , by(year) + +* Compute 95% confidence interval +foreach varname in sim_ptnr_inactive { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +* Combine datasets +merge 1:1 year using "$dir_data/temp_valid_stats.dta" +keep if _m == 3 +drop _m + +* Plot +twoway /// +(rarea sim_ptnr_inactive_high sim_ptnr_inactive_low year, sort /// + color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_ptnr_inactive year, sort color(green) /// + legend(label(2 " UKHLS"))), /// + title("Economic Activity Status") /// + subtitle("Share of LF non-employed partnered & partner LF non-employed") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + graphregion(color(white)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(vsmall)) /// + note("Notes: Ages 18-65 included in sample. Non-employed includes the unemployed and inactive (homemakers, incapacity, carers," "discouraged workers etc.) minus students and retired. ", /// + size(vsmall)) + +graph export /// +"$dir_output_files/economic_activity/validation_${country}_activity_status_ts_partnerhip_shares_non_non_18_65.jpg", /// + replace width(2400) height(1350) quality(100) + + +* LF Non-employed partners with not LF +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive valid_retired /// + demAge demPartnerStatus idPers idPartner idBu idHh using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Only keep those with a partner +keep if idPartner != -9 + +sort idPartner year + +* Address multiple partners +gen to_drop = 1 if idPartner == idPartner[_n-1] & year == year[_n-1] +replace to_drop = 1 if to_drop[_n+1] == 1 & idPartner == idPartner[_n+1] + +* Collect partner employement information +preserve + +drop idPartner demAge +rename idPers idPartner +rename valid_* valid_ptnr_* +rename to_drop to_drop_ptnr + +save "$dir_data/temp_valid_partner.dta", replace + +restore + +* Address multiple partners +drop if to_drop == 1 + +* Merge in partner info +merge 1:1 year idPartner using "$dir_data/temp_valid_partner.dta" + +drop if to_drop_ptnr == 1 + +* Only keep those in which partner's info is available +keep if _m == 3 + +* Select sample +keep if valid_inactive == 1 + +keep if inrange(demAge,18,65) + +gen valid_partner_nlf = 0 +replace valid_partner_nlf = 1 if valid_ptnr_student == 1 | /// + valid_ptnr_retired ==1 + +collapse (mean) valid_partner_nlf [aw=dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulation data +use year sim_employed sim_student sim_inactive sim_retired demAge run /// + idPers idPartner idBu /// + using "$dir_data/simulation_sample.dta", clear + +* Only keep those with a partner +keep if idPartner != . + +* Collect partner employement information +preserve + +drop idPartner demAge +rename idPers idPartner +rename sim_* sim_ptnr_* + +save "$dir_data/temp_sim_partner.dta", replace + +restore + +* Merge in partner info +merge 1:1 year idPartner run using "$dir_data/temp_sim_partner.dta" +drop _m + +* Compute share of those who are non-employed and in a partnership whose partner +* is also non-emplyed compared to working + +* Select sample +keep if sim_inactive == 1 + +keep if inrange(demAge,18,65) + +gen sim_partner_nlf = 0 +replace sim_partner_nlf = 1 if sim_ptnr_student == 1 | sim_ptnr_retired == 1 + + +* Compute mean and sd +collapse (mean) sim_partner_nlf, by(run year) + +collapse (mean) sim_partner_nlf /// + (sd) sim_partner_nlf_sd = sim_partner_nlf /// + , by(year) + +* Compute 95% confidence interval +foreach varname in sim_partner_nlf { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +* Combine datasets +merge 1:1 year using "$dir_data/temp_valid_stats.dta" +keep if _m == 3 +drop _m + +* Plot +twoway /// +(rarea sim_partner_nlf_high sim_partner_nlf_low year, sort /// + color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_partner_nlf year, sort color(green) /// + legend(label(2 " UKHLS"))), /// + title("Economic Activity Status") /// + subtitle("Share of LF non-employed partnered & partner not LF") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + graphregion(color(white)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(vsmall)) /// + note("Notes: Ages 18-65 included in sample. Not LF includes student and retired here.", /// + size(vsmall)) + +graph export /// + "$dir_output_files/economic_activity/validation_${country}_activity_status_ts_partnerhip_shares_non_notlf_18_65.jpg", /// + replace width(2400) height(1350) quality(100) + + +* Employed partnered with employed + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive valid_retired /// + demAge demPartnerStatus idPers idPartner idBu idHh /// + using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Only keep those with a partner +keep if idPartner != -9 + +sort idPartner year + +* Address multiple partners +gen to_drop = 1 if idPartner == idPartner[_n-1] & year == year[_n-1] +replace to_drop = 1 if to_drop[_n+1] == 1 & idPartner == idPartner[_n+1] + +* Collect partner employement information +preserve + +drop idPartner demAge +rename idPers idPartner +rename valid_* valid_ptnr_* +rename to_drop to_drop_ptnr + +save "$dir_data/temp_valid_partner.dta", replace + +restore + +* Address multiple partners +drop if to_drop == 1 + +* Merge in partner info +merge 1:1 year idPartner using "$dir_data/temp_valid_partner.dta" + +drop if to_drop_ptnr == 1 + +* Only keep those in which partner's info is available +keep if _m == 3 + +* Select sample +keep if valid_employed == 1 + +keep if inrange(demAge,18,65) + +collapse (mean) valid_ptnr_employed [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulation data +use year sim_employed sim_student sim_inactive sim_retired demAge run /// + idPers idPartner idBu /// + using "$dir_data/simulation_sample.dta", clear + +* Only keep those with a partner +keep if idPartner != . + +* Collect partner employement information +preserve + +drop idPartner demAge +rename idPers idPartner +rename sim_* sim_ptnr_* + +save "$dir_data/temp_sim_partner.dta", replace + +restore + +* Merge in partner info +merge 1:1 year idPartner run using "$dir_data/temp_sim_partner.dta" +drop _m + +* Compute share of those who are non-employed and in a partnership whose partner +* is also non-emplyed compared to working + +* Select sample +keep if sim_employed == 1 + +keep if inrange(demAge,18,65) + +* Compute mean and sd +collapse (mean) sim_ptnr_employed, by(run year) + +collapse (mean) sim_ptnr_employed /// + (sd) sim_ptnr_employed_sd = sim_ptnr_employed /// + , by(year) + +* Compute 95% confidence interval +foreach varname in sim_ptnr_employed { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + + +merge 1:1 year using "$dir_data/temp_valid_stats.dta" +keep if _m == 3 +drop _m + + +twoway /// +(rarea sim_ptnr_employed_high sim_ptnr_employed_low year, sort /// + color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_ptnr_employed year, sort color(green) /// + legend(label(2 " UKHLS"))), /// + title("Economic Activity Status") /// + subtitle("Share of employed partnered whose partner is also employed") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + graphregion(color(white)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(vsmall)) /// + note("Notes: Ages 18-65. ", /// + size(vsmall)) + +graph export /// + "$dir_output_files/economic_activity/validation_${country}_activity_status_ts_partnerhip_shares_emp_emp_18_65.jpg", /// + replace width(2400) height(1350) quality(100) + + + +* Employed share patterned with LF non-employed + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive valid_retired demAge /// + demPartnerStatus idPers idPartner idBu idHh /// + using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Only keep those with a partner +keep if idPartner != -9 + +sort idPartner year + +* Address multiple partners +gen to_drop = 1 if idPartner == idPartner[_n-1] & year == year[_n-1] +replace to_drop = 1 if to_drop[_n+1] == 1 & idPartner == idPartner[_n+1] + +* Collect partner employement information +preserve + +drop idPartner demAge +rename idPers idPartner +rename valid_* valid_ptnr_* +rename to_drop to_drop_ptnr + +save "$dir_data/temp_valid_partner.dta", replace + +restore + +* Address multiple partners +drop if to_drop == 1 + +* Merge in partner info +merge 1:1 year idPartner using "$dir_data/temp_valid_partner.dta" + +drop if to_drop_ptnr == 1 + +* Only keep those in which partner's info is available +keep if _m == 3 + +* Select sample +keep if valid_employed == 1 + +drop if demAge > 65 +drop if demAge < 18 + +collapse (mean) valid_ptnr_inactive [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulation data +use year sim_employed sim_student sim_inactive sim_retired demAge run /// + idPers idPartner idBu /// + using "$dir_data/simulation_sample.dta", clear + +* Only keep those with a partner +keep if idPartner != . + +* Collect partner employement information +preserve + +drop idPartner demAge +rename idPers idPartner +rename sim_* sim_ptnr_* + +save "$dir_data/temp_sim_partner.dta", replace + +restore + +* Merge in partner info +merge 1:1 year idPartner run using "$dir_data/temp_sim_partner.dta" +drop _m + +* Compute share of those who are non-employed and in a partnership whose partner +* is also non-emplyed compared to working + +* Select sample +keep if sim_employed == 1 + +drop if demAge > 65 +drop if demAge < 18 + +* Compute mean and sd +collapse (mean) sim_ptnr_inactive, by(run year) + +collapse (mean) sim_ptnr_inactive /// + (sd) sim_ptnr_inactive_sd = sim_ptnr_inactive /// + , by(year) + +* Compute 95% confidence interval +foreach varname in sim_ptnr_inactive { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + + +merge 1:1 year using "$dir_data/temp_valid_stats.dta" +keep if _m == 3 +drop _m + + +twoway /// +(rarea sim_ptnr_inactive_high sim_ptnr_inactive_low year, sort /// + color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_ptnr_inactive year, sort color(green) /// + legend(label(2 " UKHLS"))), /// + title("Economic Activity Status") /// + subtitle("Share of employed partnered & partner is non-employed") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + graphregion(color(white)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(vsmall)) /// + note("Notes: Ages 18-65.", /// + size(vsmall)) + + +graph export /// + "$dir_output_files/economic_activity/validation_${country}_activity_status_ts_partnerhip_shares_emp_non_18_65.jpg", /// + replace width(2400) height(1350) quality(100) + + +* Employed share patterned not not lf + +* Prepare validation data +use year dwt valid_employed valid_student valid_inactive valid_retired /// + demAge demPartnerStatus idPers idPartner idBu idHh /// + using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Only keep those with a partner +keep if idPartner != -9 + +sort idPartner year + +* Address multiple partners +gen to_drop = 1 if idPartner == idPartner[_n-1] & year == year[_n-1] +replace to_drop = 1 if to_drop[_n+1] == 1 & idPartner == idPartner[_n+1] + +* Collect partner employement information +preserve + +drop idPartner demAge +rename idPers idPartner +rename valid_* valid_ptnr_* +rename to_drop to_drop_ptnr + +save "$dir_data/temp_valid_partner.dta", replace + +restore + +* Address multiple partners +drop if to_drop == 1 + +* Merge in partner info +merge 1:1 year idPartner using "$dir_data/temp_valid_partner.dta" + +drop if to_drop_ptnr == 1 + +* Only keep those in which partner's info is available +keep if _m == 3 + +* Select sample +keep if valid_employed == 1 + +drop if demAge > 65 +drop if demAge < 18 + +gen valid_ptnr_out = 0 +replace valid_ptnr_out = valid_ptnr_student + valid_ptnr_retired + +collapse (mean) valid_ptnr_out [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulation data +use year sim_employed sim_student sim_inactive sim_retired demAge run /// + idPers idPartner idBu /// + using "$dir_data/simulation_sample.dta", clear + +* Only keep those with a partner +keep if idPartner != . + +* Collect partner employement information +preserve + +drop idPartner demAge +rename idPers idPartner +rename sim_* sim_ptnr_* + +save "$dir_data/temp_sim_partner.dta", replace + +restore + +* Merge in partner info +merge 1:1 year idPartner run using "$dir_data/temp_sim_partner.dta" +drop _m + +* Compute share of those who are non-employed and in a partnership whose partner +* is also non-emplyed compared to working + +* Select sample +keep if sim_employed == 1 + +drop if demAge > 65 +drop if demAge < 18 + + +gen sim_ptnr_out = 0 +replace sim_ptnr_out = sim_ptnr_student + sim_ptnr_retired + + +* Compute mean and sd +collapse (mean) sim_ptnr_out, by(run year) + +collapse (mean) sim_ptnr_out /// + (sd) sim_ptnr_out_sd = sim_ptnr_out /// + , by(year) + +* Compute 95% confidence interval +foreach varname in sim_ptnr_out { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta" +keep if _m == 3 +drop _m + +twoway /// +(rarea sim_ptnr_out_high sim_ptnr_out_low year, sort /// + color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_ptnr_out year, sort color(green) /// + legend(label(2 " UKHLS"))), /// + title("Economic Activity Status") /// + subtitle("Share of employed partnered & partner is not LF") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + graphregion(color(white)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(vsmall)) /// + note("Notes: Ages 18-65. ", /// + size(vsmall)) + +graph export /// +"$dir_output_files/economic_activity/validation_${country}_activity_status_ts_partnerhip_shares_emp_notlf_18_65.jpg", /// + replace width(2400) height(1350) quality(100) + + +graph drop _all diff --git a/validation/02_simulation_validation/do_files/04_02_plot_education_level.do b/validation/02_simulation_validation/do_files/04_02_plot_education_level.do new file mode 100644 index 0000000..4549f20 --- /dev/null +++ b/validation/02_simulation_validation/do_files/04_02_plot_education_level.do @@ -0,0 +1,721 @@ +******************************************************************************** +* PROJECT: SimPaths EU +* SECTION: Validation +* OBJECT: Education +* AUTHORS: Ashley Burdett +* LAST UPDATE: April 2026 (AB) +* COUNTRY: PL +******************************************************************************** +* NOTES: This do file plots simulated and UKHLS education. +* Unable to look at transitions because use X-sectional +* SILC data. +******************************************************************************** + +******************************************************************************** +* 0 : Programmes +******************************************************************************** + +* Time series plot, all +cap program drop make_edu_plot + +program define make_edu_plot + syntax, subtitle(string) saving(string) note(string) + + twoway /// + (rarea sim_edu_high_high sim_edu_high_low year, sort color(green%20) /// + legend(label(1 "High education, SimPaths"))) /// + (line valid_edu_high year, sort color(green) /// + legend(label(2 "High education, UKHLS"))) /// + (rarea sim_edu_med_high sim_edu_med_low year, sort color(blue%20) /// + legend(label(3 "Medium education, SimPaths"))) /// + (line valid_edu_med year, sort color(blue) /// + legend(label(4 "Medium education, UKHLS"))) /// + (rarea sim_edu_low_high sim_edu_low_low year, sort color(red%20) /// + legend(label(5 "Low education, SimPaths"))) /// + (line valid_edu_low year, sort color(red) /// + legend(label(6 "Low education, UKHLS"))) /// + (rarea sim_edu_na_high sim_edu_na_low year, sort color(purple%20) /// + legend(label(7 "Initial education spell, SimPaths"))) /// + (line valid_edu_na year, sort color(purple) /// + legend(label(8 "Initial education spell, UKHLS"))), /// + title("Educational Attainment") /// + subtitle("`subtitle'") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + graphregion(color(white)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(small)) /// + note(`note', size(vsmall)) + + graph export "$dir_output_files/education/`saving'.jpg", replace width(2400) height(1350) quality(100) +end + +******************************************************************************** +* 1 : Mean values over time +******************************************************************************** + +******************************************************************************** +* 1.1 : Educational attainment +******************************************************************************** + +******************************************************************************** +* 1.1.1 : Educational attainment - 16-65 +******************************************************************************** + +* Prepare validation data +use year dwt valid_edu_high valid_edu_med valid_edu_low valid_edu_na /// + demAge labC4 using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) + +* Compute annual shares +collapse (mean) valid_edu_high valid_edu_med valid_edu_low valid_edu_na /// + [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_edu_high sim_edu_med sim_edu_low sim_edu_na labC4 /// + demAge using "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) + +* Compute shares and standard deviation +collapse (mean) sim_edu_high sim_edu_med sim_edu_low sim_edu_na, by(run year) + +collapse (mean) sim_edu_high sim_edu_med sim_edu_low sim_edu_na /// + (sd) sim_edu_high_sd = sim_edu_high /// + sim_edu_med_sd = sim_edu_med /// + sim_edu_low_sd = sim_edu_low /// + sim_edu_na_sd = sim_edu_na /// + , by(year) + +* Approx 95% confidence interval +foreach varname in sim_edu_high sim_edu_med sim_edu_low sim_edu_na { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +* Combine datasets +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +make_edu_plot, /// + subtitle("Ages 16-65") /// + saving("validation_${country}_education_ts_16_65_both") /// + note(`""Notes:""') + + +******************************************************************************** +* 1.1.2 : Educational attainment - 16-65, by gender +******************************************************************************** + +* Prepare validation data +use year dwt valid_edu_high valid_edu_med valid_edu_low valid_edu_na /// + demMaleFlag labC4 demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) + +* Compute annual shares +collapse (mean) valid_edu_high valid_edu_med valid_edu_low valid_edu_na /// + [aw = dwt], by(year demMaleFlag) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_edu_high sim_edu_med sim_edu_low sim_edu_na demMaleFlag /// + labC4 demAge using "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) + +* Compute shares and sd +collapse (mean) sim_edu_high sim_edu_med sim_edu_low sim_edu_na, /// + by(run year demMaleFlag) + +collapse (mean) sim_edu_high sim_edu_med sim_edu_low sim_edu_na /// + (sd) sim_edu_high_sd = sim_edu_high /// + (sd) sim_edu_med_sd = sim_edu_med /// + (sd) sim_edu_low_sd = sim_edu_low /// + (sd) sim_edu_na_sd = sim_edu_na /// + , by(year demMaleFlag ) + +* Approx 95% confidence interval +foreach varname in sim_edu_high sim_edu_med sim_edu_low sim_edu_na { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +* Combine datasets +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Female +preserve + +keep if demMaleFlag == 0 + +* Plot figure +make_edu_plot, /// + subtitle("Ages 16-65, females") /// + saving("validation_${country}_education_ts_16_65_female") /// + note(`""Notes:""') + +restore + +* male +preserve + +keep if demMaleFlag == 1 + +* Plot figure +make_edu_plot, /// + subtitle("Ages 16-65, males") /// + saving("validation_${country}_education_ts_16_65_male") /// + note(`""Notes:""') + +restore + + +******************************************************************************** +* 1.1.3 : Educational attainment - 16-30 +******************************************************************************** + +* Prepare validation data +use year dwt valid_edu_high valid_edu_med valid_edu_low valid_edu_na /// + demAge labC4 demAge using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +drop if demAge > 30 +drop if demAge < 16 +drop labC4 + +* Compute shares +collapse (mean) valid_edu_high valid_edu_med valid_edu_low valid_edu_na /// + [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_edu_high sim_edu_med sim_edu_low sim_edu_na demAge labC4 /// + using "$dir_data/simulation_sample.dta", clear + +* Select sample +drop if demAge > 30 +drop if demAge < 16 + +* Compute shares and sd +collapse (mean) sim_edu_high sim_edu_med sim_edu_low sim_edu_na, by(run year) + +collapse (mean) sim_edu_high sim_edu_med sim_edu_low sim_edu_na /// + (sd) sim_edu_high_sd = sim_edu_high /// + sim_edu_med_sd = sim_edu_med /// + sim_edu_low_sd = sim_edu_low /// + sim_edu_na_sd = sim_edu_na /// + , by(year) + +* Approx 95% confidence interval +foreach varname in sim_edu_high sim_edu_med sim_edu_low sim_edu_na { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +* Combine datasets +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +make_edu_plot, /// + subtitle("Ages 16-30") /// + saving("validation_${country}_education_ts_16_30_both") /// + note(`""Notes:""') + + +******************************************************************************** +* 1.1.4 : Educational attainment - 16-30, by gender +******************************************************************************** +* Prepare validation data +use year dwt valid_edu_high valid_edu_med valid_edu_low valid_edu_na demAge /// + demMaleFlag labC4 demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select smaple +drop if demAge > 30 +drop if demAge < 16 +drop labC4 + +* Compute shares +collapse (mean) valid_edu_high valid_edu_med valid_edu_low valid_edu_na /// + [aw = dwt], by(year demMaleFlag) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_edu_high sim_edu_med sim_edu_low sim_edu_na demAge /// + demMaleFlag labC4 using "$dir_data/simulation_sample.dta", clear + +* Select sample +drop if demAge > 30 +drop if demAge < 16 + +* Cmpute shares and sd +collapse (mean) sim_edu_high sim_edu_med sim_edu_low sim_edu_na, /// + by(run year demMaleFlag) + +collapse (mean) sim_edu_high sim_edu_med sim_edu_low sim_edu_na /// + (sd) sim_edu_high_sd = sim_edu_high /// + (sd) sim_edu_med_sd = sim_edu_med /// + (sd) sim_edu_low_sd = sim_edu_low /// + (sd) sim_edu_na_sd = sim_edu_na /// + , by(year demMaleFlag) + +* Approx 95% confidence interval +foreach varname in sim_edu_high sim_edu_med sim_edu_low sim_edu_na { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +* Combine datasets +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + + +* Female +preserve + +keep if demMaleFlag == 0 + +* Plot figure +make_edu_plot, /// + subtitle("Ages 16-30, females") /// + saving("validation_${country}_education_ts_16_30_female") /// + note(`""Notes:""') + +restore + +* male +preserve + +keep if demMaleFlag == 1 + +* Plot figure +make_edu_plot, /// + subtitle("Ages 16-30, males") /// + saving("validation_${country}_education_ts_16_30_male") /// + note(`""Notes:""') + +restore + + +******************************************************************************** +* 1.1.5 : Educational attainment - 31-40 +******************************************************************************** + +* Prepare validation data +use year dwt valid_edu_high valid_edu_med valid_edu_low valid_edu_na /// + demAge labC4 demAge using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +drop if demAge > 40 +drop if demAge < 31 +drop labC4 + +* Compute shares +collapse (mean) valid_edu_high valid_edu_med valid_edu_low valid_edu_na /// + [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_edu_high sim_edu_med sim_edu_low sim_edu_na demAge labC4 /// + using "$dir_data/simulation_sample.dta", clear + +* Select sample +drop if demAge > 40 +drop if demAge < 31 + +* Compute shares and sd +collapse (mean) sim_edu_high sim_edu_med sim_edu_low sim_edu_na, by(run year) + +collapse (mean) sim_edu_high sim_edu_med sim_edu_low sim_edu_na /// + (sd) sim_edu_high_sd = sim_edu_high /// + sim_edu_med_sd = sim_edu_med /// + sim_edu_low_sd = sim_edu_low /// + sim_edu_na_sd = sim_edu_na /// + , by(year) + +* Approx 95% confidence interval +foreach varname in sim_edu_high sim_edu_med sim_edu_low sim_edu_na { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +* Combine datasets +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +make_edu_plot, /// + subtitle("Ages 31-40") /// + saving("validation_${country}_education_ts_31_40_both") /// + note(`""Notes:""') + + +******************************************************************************** +* 1.1.6 : Educational attainment 31-40, by gender +******************************************************************************** +* Prepare validation data +use year dwt valid_edu_high valid_edu_med valid_edu_low valid_edu_na demAge /// + demMaleFlag labC4 demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select smaple +drop if demAge > 40 +drop if demAge < 31 +drop labC4 + +* Compute shares +collapse (mean) valid_edu_high valid_edu_med valid_edu_low valid_edu_na /// + [aw = dwt], by(year demMaleFlag) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_edu_high sim_edu_med sim_edu_low sim_edu_na demAge /// + demMaleFlag labC4 using "$dir_data/simulation_sample.dta", clear + +* Select sample +drop if demAge > 40 +drop if demAge < 31 + +* Cmpute shares and sd +collapse (mean) sim_edu_high sim_edu_med sim_edu_low sim_edu_na, /// + by(run year demMaleFlag) + +collapse (mean) sim_edu_high sim_edu_med sim_edu_low sim_edu_na /// + (sd) sim_edu_high_sd = sim_edu_high /// + (sd) sim_edu_med_sd = sim_edu_med /// + (sd) sim_edu_low_sd = sim_edu_low /// + (sd) sim_edu_na_sd = sim_edu_na /// + , by(year demMaleFlag) + +* Approx 95% confidence interval +foreach varname in sim_edu_high sim_edu_med sim_edu_low sim_edu_na { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +* Combine datasets +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + + +* Female +preserve + +keep if demMaleFlag == 0 + +* Plot figure +make_edu_plot, /// + subtitle("Ages 31-40, females") /// + saving("validation_${country}_education_ts_31_40_female") /// + note(`""Notes:""') + +restore + +* male +preserve + +keep if demMaleFlag == 1 + +* Plot figure +make_edu_plot, /// + subtitle("Ages 31-40, males") /// + saving("validation_${country}_education_ts_31_40_male") /// + note(`""Notes:""') + +restore + + +******************************************************************************** +* 1.1.7 : Educational attainment - 41-65 +******************************************************************************** + +* Prepare validation data +use year dwt valid_edu_high valid_edu_med valid_edu_low valid_edu_na /// + demAge labC4 demAge using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +drop if demAge > 65 +drop if demAge < 41 +drop labC4 + +* Compute shares +collapse (mean) valid_edu_high valid_edu_med valid_edu_low valid_edu_na /// + [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_edu_high sim_edu_med sim_edu_low sim_edu_na demAge labC4 /// + using "$dir_data/simulation_sample.dta", clear + +* Select sample +drop if demAge > 65 +drop if demAge < 41 + +* Compute shares and sd +collapse (mean) sim_edu_high sim_edu_med sim_edu_low sim_edu_na, by(run year) + +collapse (mean) sim_edu_high sim_edu_med sim_edu_low sim_edu_na /// + (sd) sim_edu_high_sd = sim_edu_high /// + sim_edu_med_sd = sim_edu_med /// + sim_edu_low_sd = sim_edu_low /// + sim_edu_na_sd = sim_edu_na /// + , by(year) + +* Approx 95% confidence interval +foreach varname in sim_edu_high sim_edu_med sim_edu_low sim_edu_na { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +* Combine datasets +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +make_edu_plot, /// + subtitle("Ages 41-65") /// + saving("validation_${country}_education_ts_41_65_both") /// + note(`""Notes:""') + + +******************************************************************************** +* 1.1.8 : Educational attainment 41-65, by gender +******************************************************************************** +* Prepare validation data +use year dwt valid_edu_high valid_edu_med valid_edu_low valid_edu_na demAge /// + demMaleFlag labC4 demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select smaple +drop if demAge > 65 +drop if demAge < 41 +drop labC4 + +* Compute shares +collapse (mean) valid_edu_high valid_edu_med valid_edu_low valid_edu_na /// + [aw = dwt], by(year demMaleFlag) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_edu_high sim_edu_med sim_edu_low sim_edu_na demAge /// + demMaleFlag labC4 using "$dir_data/simulation_sample.dta", clear + +* Select sample +drop if demAge > 65 +drop if demAge < 41 + +* Cmpute shares and sd +collapse (mean) sim_edu_high sim_edu_med sim_edu_low sim_edu_na, /// + by(run year demMaleFlag) + +collapse (mean) sim_edu_high sim_edu_med sim_edu_low sim_edu_na /// + (sd) sim_edu_high_sd = sim_edu_high /// + (sd) sim_edu_med_sd = sim_edu_med /// + (sd) sim_edu_low_sd = sim_edu_low /// + (sd) sim_edu_na_sd = sim_edu_na /// + , by(year demMaleFlag) + +* Approx 95% confidence interval +foreach varname in sim_edu_high sim_edu_med sim_edu_low sim_edu_na { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +* Combine datasets +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + + +* Female +preserve + +keep if demMaleFlag == 0 + +* Plot figure +make_edu_plot, /// + subtitle("Ages 41-65, females") /// + saving("validation_${country}_education_ts_41_65_female") /// + note(`""Notes:""') + +restore + +* Male +preserve + +keep if demMaleFlag == 1 + +* Plot figure +make_edu_plot, /// + subtitle("Ages 41-65, males") /// + saving("validation_${country}_education_ts_41_65_male") /// + note(`""Notes:""') + +restore + + +******************************************************************************** +* 1.1.9 : Educational attainment - 66-70 +******************************************************************************** + +* Prepare validation data +use year dwt valid_edu_high valid_edu_med valid_edu_low valid_edu_na /// + demAge labC4 demAge using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +drop if demAge > 70 +drop if demAge < 66 +drop labC4 + +* Compute shares +collapse (mean) valid_edu_high valid_edu_med valid_edu_low valid_edu_na /// + [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year sim_edu_high sim_edu_med sim_edu_low sim_edu_na demAge labC4 /// + using "$dir_data/simulation_sample.dta", clear + +* Select sample +drop if demAge > 70 +drop if demAge < 66 + +* Compute shares and sd +collapse (mean) sim_edu_high sim_edu_med sim_edu_low sim_edu_na, by(run year) + +collapse (mean) sim_edu_high sim_edu_med sim_edu_low sim_edu_na /// + (sd) sim_edu_high_sd = sim_edu_high /// + sim_edu_med_sd = sim_edu_med /// + sim_edu_low_sd = sim_edu_low /// + sim_edu_na_sd = sim_edu_na /// + , by(year) + +* Approx 95% confidence interval +foreach varname in sim_edu_high sim_edu_med sim_edu_low sim_edu_na { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +* Combine datasets +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +make_edu_plot, /// + subtitle("Ages 66-70") /// + saving("validation_${country}_education_ts_66_70_both") /// + note(`""Notes:""') + + +******************************************************************************** +* 1.2 : Educational attainment when leave education +******************************************************************************** + +******************************************************************************** +* 1.2.1 : Educational attainment when leave education - 16 - 65 +******************************************************************************** + +* Prepare validation data +use year idPers dwt valid_edu_high valid_edu_med valid_edu_low labC4 /// + demAge using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) + +* Select relevant observations +sort idPers year +gen left_edu = 1 if idPers == idPers[_n-1] & /// + labC4 != 2 & labC4[_n-1] == 2 & year == year[_n-1]+1 + +keep if left_edu == 1 + +collapse (mean) valid_edu_high valid_edu_med valid_edu_low [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year idPers sim_edu_high sim_edu_med sim_edu_low labC4 demAge /// + using "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) + +* Select relevant observations +sort idPers year +gen left_edu_sim = 1 if idPers == idPers[_n-1] & /// + labC4 != "Student" & labC4[_n-1] == "Student" & year == year[_n-1]+1 + +keep if left_edu_sim == 1 + +collapse (mean) sim_edu_high sim_edu_med sim_edu_low, by(run year) + +collapse (mean) sim_edu_high sim_edu_med sim_edu_low /// + (sd) sim_edu_high_sd = sim_edu_high /// + sim_edu_med_sd = sim_edu_med /// + sim_edu_low_sd = sim_edu_low /// + , by(year) + +foreach varname in sim_edu_high sim_edu_med sim_edu_low { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway /// +(rarea sim_edu_high_high sim_edu_high_low year, sort color(green%20) /// + legend(label(1 "High education, simulated"))) /// +(line valid_edu_high year, sort color(green) /// + legend(label(2 "High education, UKHLS"))) /// +(rarea sim_edu_med_high sim_edu_med_low year, sort color(blue%20) /// + legend(label(3 "Medium education, simulated"))) /// +(line valid_edu_med year, sort color(blue) /// + legend(label(4 "Medium education, UKHLS"))) /// +(rarea sim_edu_low_high sim_edu_low_low year, sort color(red%20) /// + legend(label(5 "Low education, simulated"))) /// +(line valid_edu_low year, sort color(red) /// + legend(label(6 "Low education, UKHLS"))), /// + title("Educational Attainment When Leave Education") /// + subtitle("Ages 16-65") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + graphregion(color(white)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(small)) /// + note("Notes: ", /// + size(vsmall)) + +graph export /// +"$dir_output_files/education/validation_${country}_leave_education_ts_16_65_both.jpg", /// + replace width(2400) height(1350) quality(100) + +graph drop _all diff --git a/validation/02_simulation_validation/do_files/04_03_plot_gross_income.do b/validation/02_simulation_validation/do_files/04_03_plot_gross_income.do new file mode 100644 index 0000000..8c7aa81 --- /dev/null +++ b/validation/02_simulation_validation/do_files/04_03_plot_gross_income.do @@ -0,0 +1,991 @@ +******************************************************************************** +* PROJECT: SimPaths EU +* SECTION: Validation +* OBJECT: Gross income +* AUTHORS: Ashley Burdett +* LAST UPDATE: April 2026 +* COUNTRY: PL +******************************************************************************** +* NOTES: +******************************************************************************** + +******************************************************************************** +* 1 : Mean values over time +******************************************************************************** + +******************************************************************************** +* 1.1 : Mean values over time - Benefit unit amounts +******************************************************************************** + +* Prepare validation data +use year dwt idBu idPers demAge valid_yGrossBuLevelYear using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if demAge >= 16 + +* Keep one observation per benefit unit +bysort year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yGrossBuLevelYear, d + + replace valid_yGrossBuLevelYear = . if /// + valid_yGrossBuLevelYear < r(p1) | /// + valid_yGrossBuLevelYear > r(p99) + +} + +collapse (mean) valid_yGrossBuLevelYear [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year idPers idBu demAge sim_yGrossBuLevelYear using /// + "$dir_data/simulation_sample.dta", clear + +keep if demAge >= 16 + +* Keep one observation per benefit unit +bysort run year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yGrossBuLevelYear, d + + replace sim_yGrossBuLevelYear = . if sim_yGrossBuLevelYear < r(p1) | /// + sim_yGrossBuLevelYear > r(p99) + +} + +collapse (mean) sim_yGrossBuLevelYear, by(run year) + +collapse (mean) sim_yGrossBuLevelYear /// + (sd) sim_yGrossBuLevelYear_sd = sim_yGrossBuLevelYear /// + , by(year) + +foreach varname in sim_yGrossBuLevelYear { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + + +* Plot figure +twoway /// + (rarea sim_yGrossBuLevelYear_high sim_yGrossBuLevelYear_low year, /// + sort color(green%20) legend(label(1 "SimPaths"))) /// + (line valid_yGrossBuLevelYear year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("Benefit Unity Gross Income") /// + subtitle("") /// + xtitle("Year", size(small)) /// + ytitle("Euro per year", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Series represents average benefit unit gross income through time. Gross income is the sum of captial income, private" "pension income and employment income. One observation per benefit unit plotted. Amounts in 2015 prices. Top and" "bottom percentiles trimmed.", /// + size(vsmall)) + +graph export /// +"$dir_output_files/income/gross_income/validation_${country}_gross_income_bu_ts.jpg", /// + replace width(2400) height(1350) quality(100) + + +******************************************************************************** +* 1.2 : Mean values over time, individual level amounts +******************************************************************************** + +* Prepare validation data +use year demAge dwt valid_yGrossPersLevelYear using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yGrossPersLevelYear, d + + replace valid_yGrossPersLevelYear = . if /// + valid_yGrossPersLevelYear < r(p1) | /// + valid_yGrossPersLevelYear > r(p99) + +} + +collapse (mean) valid_yGrossPersLevelYear [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year demAge sim_yGrossPersLevelYear using /// + "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,18,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yGrossPersLevelYear, d + + replace sim_yGrossPersLevelYear = . if /// + sim_yGrossPersLevelYear < r(p1) | /// + sim_yGrossPersLevelYear > r(p99) + +} + +collapse (mean) sim_yGrossPersLevelYear, by(run year) + +collapse (mean) sim_yGrossPersLevelYear /// + (sd) sim_yGrossPersLevelYear_sd = sim_yGrossPersLevelYear, /// + by(year) + +foreach varname in sim_yGrossPersLevelYear{ + + gen `varname'_hi = `varname' + 1.96*`varname'_sd + gen `varname'_lo = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway /// +(rarea sim_yGrossPersLevelYear_hi sim_yGrossPersLevelYear_lo /// + year, sort color(green%20) /// + legend(label(1 "SimPaths"))) /// +(line valid_yGrossPersLevelYear year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("Individual Gross Income") /// + subtitle("Ages 18-65") /// + xtitle("Year", size(small)) /// + ytitle("Euro per year", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Series represents average individual gross income through time. Gross income is the sum of captial income, private pension" "income and employment income. Values in 2015 prices. Top and bottom percentiles trimmed.", /// + size(vsmall)) + +graph export /// +"$dir_output_files/income/gross_income/validation_${country}_ind_gross_income_ts_18_65_both.jpg", /// + replace width(2400) height(1350) quality(100) + + +******************************************************************************** +* 1.2.1 : Mean values over time, individual level amounts, by gender +******************************************************************************** + +* Prepare validation data +use year demAge dwt valid_yGrossPersLevelYear demMaleFlag using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) +keep if demMaleFlag == 1 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yGrossPersLevelYear, d + + replace valid_yGrossPersLevelYear = . if /// + valid_yGrossPersLevelYear < r(p1) | /// + valid_yGrossPersLevelYear > r(p99) + +} + +collapse (mean) valid_yGrossPersLevelYear [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year demAge sim_yGrossPersLevelYear demMaleFlag using /// + "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,18,65) +keep if demMaleFlag == 1 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yGrossPersLevelYear, d + + replace sim_yGrossPersLevelYear = . if /// + sim_yGrossPersLevelYear < r(p1) | /// + sim_yGrossPersLevelYear > r(p99) + +} + +collapse (mean) sim_yGrossPersLevelYear, by(run year) + +collapse (mean) sim_yGrossPersLevelYear /// + (sd) sim_yGrossPersLevelYear_sd = /// + sim_yGrossPersLevelYear, by(year) + +foreach varname in sim_yGrossPersLevelYear { + + gen `varname'_hi = `varname' + 1.96*`varname'_sd + gen `varname'_lo = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway /// +(rarea sim_yGrossPersLevelYear_hi sim_yGrossPersLevelYear_lo /// + year, sort color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_yGrossPersLevelYear year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("Individual Gross Income") /// + subtitle("Ages 18-65, males") /// + xtitle("Year", size(small)) /// + ytitle("Euro per year", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Series represents average individual gross income through time. Gross income is the sum of captial income, private pension" "income and employment income. Values in 2015 prices. Top and bottom percentiles trimmed.", /// + size(vsmall)) + +graph export /// +"$dir_output_files/income/gross_income/validation_${country}_ind_gross_income_ts_18_65_male.jpg", /// + replace width(2400) height(1350) quality(100) + + +* Female +* Prepare validation data +use year demAge dwt valid_yGrossPersLevelYear demMaleFlag using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) + +keep if demMaleFlag == 0 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yGrossPersLevelYear, d + + replace valid_yGrossPersLevelYear = . if /// + valid_yGrossPersLevelYear < r(p1) | /// + valid_yGrossPersLevelYear > r(p99) + +} + +collapse (mean) valid_yGrossPersLevelYear [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year demAge sim_yGrossPersLevelYear demMaleFlag using /// + "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,18,65) + +keep if demMaleFlag == 0 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yGrossPersLevelYear, d + + replace sim_yGrossPersLevelYear= . if /// + sim_yGrossPersLevelYear< r(p1) | /// + sim_yGrossPersLevelYear> r(p99) + +} + +collapse (mean) sim_yGrossPersLevelYear, by(run year) + +collapse (mean) sim_yGrossPersLevelYear /// + (sd) sim_yGrossPersLevelYear_sd = /// + sim_yGrossPersLevelYear, by(year) + +foreach varname in sim_yGrossPersLevelYear{ + + gen `varname'_hi = `varname' + 1.96*`varname'_sd + gen `varname'_lo = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway /// +(rarea sim_yGrossPersLevelYear_hi /// + sim_yGrossPersLevelYear_lo year, sort color(green%20) /// + legend(label(1 "SimPaths"))) /// +(line valid_yGrossPersLevelYear year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("Individual Gross Income") /// + subtitle("Ages 18-65, females") /// + xtitle("Year", size(small)) /// + ytitle("Euro per year", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Series represents average individual gross through time. Gross income is the sum of captial income, private pension" "income and employment income. Values in 2015 prices. Top and bottom percentiles trimmed.", /// + size(vsmall)) + +graph export /// +"$dir_output_files/income/gross_income/validation_${country}_ind_gross_income_ts_18_65_female.jpg", /// + replace width(2400) height(1350) quality(100) + +graph drop _all + + +/******************************************************************************* +* 2 : Histograms +*******************************************************************************/ + +/******************************************************************************* +* 2.1 : Histograms - Benefit unit gross income by year, and by category of +weekly labour supply +*******************************************************************************/ + +* Prepare validation data +use year demAge dwt valid_yGrossBuLevelYear valid_labHrsWorkEnumWeek /// + using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yGrossBuLevelYear, d + + replace valid_yGrossBuLevelYear = . if /// + valid_yGrossBuLevelYear < r(p1) | /// + valid_yGrossBuLevelYear > r(p99) + +} + +* Prepare info needed for dynamic y axis labels +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway__histogram_gen valid_yGrossBuLevelYear if year == `year', /// + width(750) den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year' = r(max) + + drop d_valid v2 + + foreach ls in $ls_cat { + + twoway__histogram_gen valid_yGrossBuLevelYear if /// + year == `year' & valid_labHrsWorkEnumWeek == "`ls'", width(750) /// + den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year'_`ls' = r(max) + + drop d_valid v2 + + } + +} + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year demAge sim_yGrossBuLevelYear sim_labHrsWorkEnumWeek using /// + "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,18,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yGrossBuLevelYear, d + + replace sim_yGrossBuLevelYear = . if /// + sim_yGrossBuLevelYear < r(p1) | sim_yGrossBuLevelYear > r(p99) + +} + +keep if run == 1 + +append using "$dir_data/temp_valid_stats.dta" + +* Plot sub-figures +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +//local year = 2010 + +forval year = `min_year'/`max_year' { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_yGrossBuLevelYear if year == `year', /// + width(750) den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim + replace max_value = max_d_sim if max_value == . + + quietly sum max_value, meanonly + if (r(N) == 0 | missing(r(max)) | r(max) <= 0) { + local max_y = 1 + local steps = 0.3333333 + } + else { + local max_y = 1.25*r(max) + local steps = `max_y'/3 + } + + + twoway (hist sim_yGrossBuLevelYear if year == `year', width(750) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_yGrossBuLevelYear if year == `year', width(750) /// + color(red%30) legend(label(2 "SILC"))) , /// + title("ALL hours") /// + name(gross_inc_`year'_all, replace) /// + xtitle("Euro", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(vsmall) angle(forty_five)) /// + ylabel(0(`steps')`max_y',labsize(vsmall)) /// + graphregion(color(white)) + + drop d_sim v1 max_d_sim max_value + + foreach ls in $ls_cat { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_yGrossBuLevelYear if /// + year == `year' & sim_labHrsWorkEnumWeek == "`ls'", width(750) /// + den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year'_`ls' if /// + max_d_valid_`year'_`ls' > max_d_sim + replace max_value = max_d_sim if max_value == . + + quietly sum max_value, meanonly + if (r(N) == 0 | missing(r(max)) | r(max) <= 0) { + local max_y = 1 + local steps = 0.3333333 + } + else { + local max_y = 1.25*r(max) + local steps = `max_y'/3 + } + + * Plot by weekly hours work + twoway (hist sim_yGrossBuLevelYear if year == `year' & /// + sim_labHrsWorkEnumWeek == "`ls'", width(750) color(green%30) /// + legend(label(1 "SimPaths"))) /// + (hist valid_yGrossBuLevelYear if year == `year' & /// + valid_labHrsWorkEnumWeek == "`ls'", width(750) color(red%30) /// + legend(label(2 "SILC"))) , /// + title("`ls' hours") /// + name(gross_inc_`year'_`ls', replace) /// + xtitle("Euro", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(vsmall) angle(forty_five)) /// + ylabel(0(`steps')`max_y',labsize(vsmall)) /// + legend(size(small)) /// + graphregion(color(white)) + + drop d_sim v1 max_d_sim max_value + + } +} + +* Combine plots by year +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forvalues year = `min_year'/`max_year' { + + grc1leg gross_inc_`year'_all /// + gross_inc_`year'_ZERO /// + gross_inc_`year'_TWENTY /// + gross_inc_`year'_FORTY /// + gross_inc_`year'_FIFTY , /// + title("Benefit Unit Gross Income by Weekly Hours of Work") /// + subtitle("`year'") /// + legendfrom(gross_inc_`year'_ZERO) /// + rows(2) /// + graphregion(color(white)) /// + note("Notes: Amounts in Euro per year, 2015 prices. Gross income is the sum of capital income, private pension income and employment income." "Individual observations of benefit unit amounts plotted.", /// + size(vsmall)) + + graph export "$dir_output_files/income/gross_income/validation_${country}_gross_income_bu_dist_`year'_18_65.png", /// + replace width(2400) height(1350) + +} + +graph drop _all + + +/******************************************************************************* +* 2.2 : Histograms - Individual gross income by year, and by category of weekly +labour supply, by gender +*******************************************************************************/ + +* Males + +* Prepare validation data +use year demAge dwt valid_yGrossBuLevelYear valid_labHrsWorkEnumWeek /// + demMaleFlag using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) + +keep if demMaleFlag == 1 +drop demMaleFlag + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yGrossBuLevelYear, d + + replace valid_yGrossBuLevelYear = . if /// + valid_yGrossBuLevelYear < r(p1) | /// + valid_yGrossBuLevelYear > r(p99) + +} + + +* Prepare info needed for dynamic y axis labels +qui sum year +local min_year = 2011 +local max_year = r(max) +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway__histogram_gen valid_yGrossBuLevelYear if /// + year == `year', width(750) den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year' = r(max) + + drop d_valid v2 + + foreach ls in $ls_cat { + + twoway__histogram_gen valid_yGrossBuLevelYear if /// + year == `year' & valid_labHrsWorkEnumWeek == "`ls'", width(750) /// + den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year'_`ls' = r(max) + + drop d_valid v2 + + } +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year demAge sim_yGrossBuLevelYear sim_labHrsWorkEnumWeek /// + demMaleFlag using "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,18,65) + +keep if demMaleFlag == 1 +drop demMaleFlag + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yGrossBuLevelYear, d + + replace sim_yGrossBuLevelYear= . if /// + sim_yGrossBuLevelYear < r(p1) | /// + sim_yGrossBuLevelYear > r(p99) + +} + +keep if run == 1 + +append using "$dir_data/temp_valid_stats.dta" + +* Plot sub-figures +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_yGrossBuLevelYear if year == `year', /// + width(750) den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim + replace max_value = max_d_sim if max_value == . + + quietly sum max_value, meanonly + if (r(N) == 0 | missing(r(max)) | r(max) <= 0) { + local max_y = 1 + local steps = 0.3333333 + } + else { + local max_y = 1.25*r(max) + local steps = `max_y'/3 + } + + * Plot all hours + twoway (hist sim_yGrossBuLevelYear if year == `year', width(750) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_yGrossBuLevelYear if year == `year', width(750) /// + color(red%30) legend(label(2 "SILC"))) , /// + title("ALL hours") /// + name(ind_gross_inc_`year'_all, replace) /// + xtitle("Euro", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(vsmall) angle(forty_five)) /// + ylabel(0(`steps')`max_y', labsize(vsmall)) /// + graphregion(color(white)) + + drop d_sim v1 max_d_sim max_value + + foreach ls in $ls_cat { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_yGrossBuLevelYear if /// + year == `year' & sim_labHrsWorkEnumWeek == "`ls'", width(750) /// + den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year'_`ls' if /// + max_d_valid_`year'_`ls' > max_d_sim + replace max_value = max_d_sim if max_value == . + + quietly sum max_value, meanonly + if (r(N) == 0 | missing(r(max)) | r(max) <= 0) { + local max_y = 1 + local steps = 0.3333333 + } + else { + local max_y = 1.25*r(max) + local steps = `max_y'/3 + } + + * Plot by weekly hours work + twoway (hist sim_yGrossBuLevelYear if year == `year' & /// + sim_labHrsWorkEnumWeek == "`ls'", width(750) color(green%30) /// + legend(label(1 "SimPaths"))) /// + (hist valid_yGrossBuLevelYear if year == `year' & /// + valid_labHrsWorkEnumWeek == "`ls'", width(750) color(red%30) /// + legend(label(2 "SILC"))) , /// + title("`ls' hours") /// + name(ind_gross_inc_`year'_`ls', replace) /// + xtitle("Euro", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(vsmall) angle(forty_five)) /// + ylabel(0(`steps')`max_y', labsize(vsmall)) /// + legend(size(small)) /// + graphregion(color(white)) + + drop d_sim v1 max_d_sim max_value + + } +} + +* Combine plots by year +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forvalues year = `min_year'/`max_year' { + + grc1leg ind_gross_inc_`year'_all /// + ind_gross_inc_`year'_ZERO /// + ind_gross_inc_`year'_TWENTY /// + ind_gross_inc_`year'_FORTY /// + ind_gross_inc_`year'_FIFTY , /// + title("Individual Gross Income by Weekly Hours of Work") /// + subtitle("`year', males") /// + legendfrom(ind_gross_inc_`year'_ZERO) /// + rows(2) /// + graphregion(color(white)) /// + note("Notes: Amounts in Euro per year, 2015 prices. Sample include males age 18-65. Top and bottom percentiles trimmed. Gross income is" "the sum of capital income, private pension income and employment income.", /// + size(vsmall)) + + graph export "$dir_output_files/income/gross_income/validation_${country}_ind_gross_income_dist_`year'_male.png", /// + replace width(2400) height(1350) + +} + +graph drop _all + + +* Females + +* Prepare validation data +use year demAge dwt valid_yGrossBuLevelYear valid_labHrsWorkEnumWeek /// + demMaleFlag using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) + +keep if demMaleFlag == 0 +drop demMaleFlag + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yGrossBuLevelYear, d + + replace valid_yGrossBuLevelYear = . if /// + valid_yGrossBuLevelYear < r(p1) | /// + valid_yGrossBuLevelYear > r(p99) + +} + + +* Prepare info needed for dynamic y axis labels +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway__histogram_gen valid_yGrossBuLevelYear if /// + year == `year' , width(750) den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year' = r(max) + + drop d_valid v2 + + foreach ls in $ls_cat { + + twoway__histogram_gen valid_yGrossBuLevelYear if /// + year == `year' & valid_labHrsWorkEnumWeek == "`ls'", width(750) /// + den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year'_`ls' = r(max) + + drop d_valid v2 + + } +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year demAge sim_yGrossBuLevelYear sim_labHrsWorkEnumWeek /// + demMaleFlag using "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,18,65) + +keep if demMaleFlag == 0 +drop demMaleFlag + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yGrossBuLevelYear, d + + replace sim_yGrossBuLevelYear = . if sim_yGrossBuLevelYear< r(p1) | /// + sim_yGrossBuLevelYear> r(p99) + +} + +keep if run == 1 + + +append using "$dir_data/temp_valid_stats.dta" + +* Plot sub-figures +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_yGrossBuLevelYear if year == `year', /// + width(750) den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/3 + + * Plot all hours + twoway (hist sim_yGrossBuLevelYear if year == `year', width(750) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_yGrossBuLevelYear if year == `year', width(750) /// + color(red%30) legend(label(2 "SILC"))) , /// + title("ALL hours") /// + name(ind_gross_inc_`year'_all, replace) /// + xtitle("Euro", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(vsmall) angle(forty_five)) /// + ylabel(0(`steps')`max_y', labsize(vsmall)) /// + legend(size(small)) /// + graphregion(color(white)) + + drop d_sim v1 max_d_sim max_value + + foreach ls in $ls_cat { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_yGrossBuLevelYear if /// + year == `year' & sim_labHrsWorkEnumWeek == "`ls'", /// + width(750) den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year'_`ls' if /// + max_d_valid_`year'_`ls' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/3 + + * Plot by weekly hours work + twoway (hist sim_yGrossBuLevelYear if year == `year' & /// + sim_labHrsWorkEnumWeek == "`ls'", width(750) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_yGrossBuLevelYear if year == `year' & /// + valid_labHrsWorkEnumWeek == "`ls'", width(750) color(red%30) /// + legend(label(2 "SILC"))) , /// + title("`ls' hours") /// + name(ind_gross_inc_`year'_`ls', replace) /// + xtitle("Euro", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(vsmall) angle(forty_five)) /// + ylabel(0(`steps')`max_y', labsize(vsmall)) /// + legend(size(small)) /// + graphregion(color(white)) + + drop d_sim v1 max_d_sim max_value + + } +} + +* Combine plots by year +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forvalues year = `min_year'/`max_year' { + + grc1leg ind_gross_inc_`year'_all /// + ind_gross_inc_`year'_ZERO /// + ind_gross_inc_`year'_TWENTY /// + ind_gross_inc_`year'_FORTY /// + ind_gross_inc_`year'_FIFTY, /// + title("Individual Gross Income by Weekly Hours of Work") /// + subtitle("`year', females") /// + legendfrom(ind_gross_inc_`year'_ZERO) /// + rows(2) /// + graphregion(color(white)) /// + note("Notes: Amounts in Euro per year, 2015 prices. Sample include females age 18-65. Top and bottom percentiles trimmed. Gross income is" "the sum of capital income, private pension income and employment income.", /// + size(vsmall)) + + graph export "$dir_output_files/income/gross_income/validation_${country}_ind_gross_income_dist_`year'_female.png", /// + replace width(2400) height(1350) + +} + +graph drop _all + + + +/* + +* Investigation into who the people are with high working hours and low gross +* income +/* +Note plot ben unit observations using individual level data. + +Components of gross income. + +Gross personal income components +• PY010G - Gross employee cash or near cash employee income +• PY050G - Gross cash benefits or losses from self-employment + (including royalties) +• PY080G - Pensions received from individual private plans (other than those + covered under ESSPROS) + +Plus gross income components at household level +• HY040G - Income from rental of a property or land +• HY080G - Regular inter-household cash transfers received +• HY090G - Interests, dividends, profit from capital investments in + unincorporated business +• HY110G - Income received by people aged under 16 +*/ + +* Explore 2018 FIFTY hours +use "$dir_data/SILC_validation_full_sample.dta", clear + +keep if year == 2018 & labHrsWorkEnumWeek == "FIFTY" + +order idperson idbenefit lhw valid_yGrossBuLevelYear /// + y_gross_labour_person valid_wage_hour /// + py010g* py050g py080g /// + hy080g_pc hy110g_pc hy040g_pc hy090g_pc missing* + +fre missing_py010g missing_py050g missing_py080g missing_hy080g /// + missing_hy110g missing_hy040g missing_hy090g missing_lhw if /// + valid_yGrossBuLevelYear == 0 // none missing seems to be in the data + + diff --git a/validation/02_simulation_validation/do_files/04_04_plot_gross_labour_income.do b/validation/02_simulation_validation/do_files/04_04_plot_gross_labour_income.do new file mode 100644 index 0000000..09b7ec1 --- /dev/null +++ b/validation/02_simulation_validation/do_files/04_04_plot_gross_labour_income.do @@ -0,0 +1,583 @@ +******************************************************************************** +* PROJECT: SimPaths EU +* SECTION: Validation +* OBJECT: Gross labour income +* AUTHORS: Ashley Burdett +* LAST UPDATE: Feb 2026 +* COUNTRY: PL +******************************************************************************** +* NOTES: Plotted using individual level data +* => multiple observations per ben unit. +******************************************************************************** + +******************************************************************************** +* 1 : Mean labour income +******************************************************************************** + +******************************************************************************** +* 1.1: Mean labour income - benefit unit +******************************************************************************** + +* Prepare validation data +use year idBu idPers demAge dwt labC4 valid_yEmpBuGrossLevelYear using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Keep only employed individuals +keep if labC4 == 1 + +keep if demAge >= 16 + +* Keep one observatioon per benefit unit +bysort year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yEmpBuGrossLevelYear, d + + replace valid_yEmpBuGrossLevelYear = . if /// + valid_yEmpBuGrossLevelYear < r(p1) | valid_yEmpBuGrossLevelYear > r(p99) + +} + +collapse (mean) valid_yEmpBuGrossLevelYear [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare SimPaths data +use run year idPers idBu demAge labC4 sim_yEmpBuGrossLevelYear using /// + "$dir_data/simulation_sample.dta", clear + +* Keep only employed individuals +keep if labC4 == "EmployedOrSelfEmployed" + +keep if demAge >= 16 + +* Keep one observatioon per benefit unit +bysort run year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yEmpBuGrossLevelYear, d + + replace sim_yEmpBuGrossLevelYear = . if /// + sim_yEmpBuGrossLevelYear < r(p1) | sim_yEmpBuGrossLevelYear > r(p99) + +} + +collapse (mean) sim_yEmpBuGrossLevelYear, by(run year) + +collapse (mean) sim_yEmpBuGrossLevelYear /// + (sd) sim_yEmpBuGrossLevelYear_sd = sim_yEmpBuGrossLevelYear /// + , by(year) + +foreach varname in sim_yEmpBuGrossLevelYear { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway /// +(rarea sim_yEmpBuGrossLevelYear_high sim_yEmpBuGrossLevelYear_low year, /// + sort color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_yEmpBuGrossLevelYear year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("Benefit Unit Gross Labour Income") /// + subtitle("") /// + xtitle("Year", size(small)) /// + ytitle("Euro per year", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + graphregion(color(white)) /// + legend(size(small)) /// + note("Note: Amounts in 2015 prices. Top and bottom percentiles trimmed.", /// + size(vsmall)) + +* Save figure +graph export /// +"$dir_output_files/income/gross_labour_income/validation_${country}_gross_labour_income_bu_ts.jpg", /// + replace width(2400) height(1350) quality(100) + + +******************************************************************************** +* 1.1: Mean labour income - individual +******************************************************************************** + +* Prepare validation data +use year demAge dwt labC4 valid_yEmpPersGrossLevelYear using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Keep only employed individuals +keep if labC4 == 1 + +keep if inrange(demAge,18,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yEmpPersGrossLevelYear, d + + replace valid_yEmpPersGrossLevelYear = . if /// + valid_yEmpPersGrossLevelYear < r(p1) | /// + valid_yEmpPersGrossLevelYear > r(p99) + +} + +collapse (mean) valid_yEmpPersGrossLevelYear [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare SimPaths data +use run year demAge labC4 sim_yEmpPersGrossLevelYear using /// + "$dir_data/simulation_sample.dta", clear + +* Keep only employed individuals +keep if labC4 == "EmployedOrSelfEmployed" + +keep if inrange(demAge,18,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yEmpPersGrossLevelYear, d + + replace sim_yEmpPersGrossLevelYear = . if /// + sim_yEmpPersGrossLevelYear < r(p1) | sim_yEmpPersGrossLevelYear > r(p99) + +} + +collapse (mean) sim_yEmpPersGrossLevelYear, by(run year) + +collapse (mean) sim_yEmpPersGrossLevelYear /// + (sd) sim_yEmpPersGrossLevelYear_sd = sim_yEmpPersGrossLevelYear /// + , by(year) + +foreach varname in sim_yEmpPersGrossLevelYear { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway /// +(rarea sim_yEmpPersGrossLevelYear_high sim_yEmpPersGrossLevelYear_low year, /// + sort color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_yEmpPersGrossLevelYear year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("Gross Labour Income") /// + subtitle("Ages 18-65") /// + xtitle("Year", size(small)) /// + ytitle("Euro per year", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + graphregion(color(white)) /// + legend(size(small)) /// + note("Note: Amounts at the individual level, individual data plotted. Statistics calculated on the sample of employed individuals" "ages 18-65. Amounts in 2015 prices. Top and bottom percentiles trimmed.", /// + size(vsmall)) + +* Save figure +graph export /// +"$dir_output_files/income/gross_labour_income/validation_${country}_ind_gross_labour_income_ts_18_65.jpg", /// + replace width(2400) height(1350) quality(100) + + +******************************************************************************** +* 2 : Histograms +******************************************************************************** + +******************************************************************************** +* 2.1 : Histograms - working age, benefit unit +******************************************************************************** + +* Prepare validation data +use year idPers idBu demAge dwt labC4 valid_yEmpBuGrossLevelYear /// + valid_labHrsWorkEnumWeek using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Keep only employed individuals +keep if labC4 == 1 +drop labC4 + +keep if inrange(demAge,18,65) + +* Keep one observatioon per benefit unit +bysort year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yEmpBuGrossLevelYear, d + + replace valid_yEmpBuGrossLevelYear = . if /// + valid_yEmpBuGrossLevelYear < r(p1) | valid_yEmpBuGrossLevelYear > r(p99) + +} + +* Prepare info needed for dynamic y axis labels +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway__histogram_gen valid_yEmpBuGrossLevelYear if year == `year', /// + width(750) den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year' = r(max) + + drop d_valid v2 + + foreach ls in $ls_cat { + + twoway__histogram_gen valid_yEmpBuGrossLevelYear if /// + year == `year' & valid_labHrsWorkEnumWeek == "`ls'", width(750) /// + den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year'_`ls' = r(max) + + drop d_valid v2 + + } + +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare SimPaths data +use run year idPers idBu demAge labC4 sim_yEmpBuGrossLevelYear /// + sim_labHrsWorkEnumWeek using "$dir_data/simulation_sample.dta", clear + +* Keep only employed individuals +keep if labC4 == "EmployedOrSelfEmployed" +drop labC4 + +keep if inrange(demAge,18,65) + +* Keep one observatioon per benefit unit +bysort run year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yEmpBuGrossLevelYear, d + + replace sim_yEmpBuGrossLevelYear = . if /// + sim_yEmpBuGrossLevelYear < r(p1) | sim_yEmpBuGrossLevelYear > r(p99) + + } + +keep if run == 1 + +append using "$dir_data/temp_valid_stats.dta" + +* Plot sub-figures +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_yEmpBuGrossLevelYear if year == `year', /// + width(750) den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + + * Plot all hours + twoway (hist sim_yEmpBuGrossLevelYear if year == `year' , width(750) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_yEmpBuGrossLevelYear if year == `year' , width(750) /// + color(red%30) legend(label(2 "SILC"))) , /// + subtitle("ALL hours") /// + name(gross_lab_inc_`year'_all, replace) /// + xtitle("Euro", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(vsmall) angle(forty_five)) /// + ylabel(0(`steps')`max_y',labsize(vsmall)) /// + graphregion(color(white)) + + + drop d_sim v1 max_d_sim max_value + + * Plot by weekly hours work + foreach ls in $ls_cat_labour { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_yEmpBuGrossLevelYear if year == `year' & /// + sim_labHrsWorkEnumWeek == "`ls'", width(750) den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year'_`ls' if /// + max_d_valid_`year'_`ls' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + * Plot by weekly hours work + twoway (hist sim_yEmpBuGrossLevelYear if year == `year' & /// + sim_labHrsWorkEnumWeek == "`ls'", width(750) color(green%30) /// + legend(label(1 "SimPaths"))) /// + (hist valid_yEmpBuGrossLevelYear if year == `year' & /// + valid_labHrsWorkEnumWeek == "`ls'", width(750) color(red%30) /// + legend(label(2 "SILC"))) , /// + subtitle("`ls' hours") /// + name(gross_lab_inc_`year'_`ls', replace) /// + xtitle("Euro", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(vsmall) angle(forty_five)) /// + ylabel(0(`steps')`max_y',labsize(vsmall)) /// + legend(size(small)) /// + graphregion(color(white)) + + drop d_sim v1 max_d_sim max_value + + } +} + +* Combine plots by year +forvalues year = 2011/2023 { + + grc1leg gross_lab_inc_`year'_all /// + gross_lab_inc_`year'_TWENTY /// + gross_lab_inc_`year'_FORTY /// + gross_lab_inc_`year'_FIFTY, /// + title("Benefit Unit Gross Labour Income by Weekly Hours of Work") /// + subtitle("`year'") /// + legendfrom(gross_lab_inc_`year'_TWENTY) /// + rows(2) /// + graphregion(color(white)) /// + note("Notes: Amounts in Euro per year, 2015 prices. Top and bottom percentiles trimmed. Individual observations of benefit unit amount plotted", /// + size(vsmall)) + + graph export "$dir_output_files/income/gross_labour_income/validation_${country}_gross_labour_income_bu_dist_`year'_18_65.png", /// + replace width(2400) height(1350) + +} + +graph drop _all + + +******************************************************************************** +* 2.1 : Histograms - working age, individual +******************************************************************************** + +* Prepare validation data +use year demAge dwt labC4 valid_yEmpPersGrossLevelYear /// + valid_labHrsWorkEnumWeek using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Keep only employed individuals +keep if labC4 == 1 +drop labC4 + +keep if inrange(demAge,18,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yEmpPersGrossLevelYear, d + + replace valid_yEmpPersGrossLevelYear = . if /// + valid_yEmpPersGrossLevelYear < r(p1) | /// + valid_yEmpPersGrossLevelYear > r(p99) + +} + +* Prepare info needed for dynamic y axis labels +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway__histogram_gen valid_yEmpPersGrossLevelYear if year == `year', /// + width(750) den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year' = r(max) + + drop d_valid v2 + + foreach ls in $ls_cat { + + twoway__histogram_gen valid_yEmpPersGrossLevelYear if /// + year == `year' & valid_labHrsWorkEnumWeek == "`ls'", width(750) /// + den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year'_`ls' = r(max) + + drop d_valid v2 + + } + +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare SimPaths data +use run year demAge labC4 sim_yEmpPersGrossLevelYear sim_labHrsWorkEnumWeek /// + using "$dir_data/simulation_sample.dta", clear + +* Keep only employed individuals +keep if labC4 == "EmployedOrSelfEmployed" +drop labC4 + +keep if inrange(demAge,18,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yEmpPersGrossLevelYear, d + + replace sim_yEmpPersGrossLevelYear = . if /// + sim_yEmpPersGrossLevelYear < r(p1) | sim_yEmpPersGrossLevelYear > r(p99) + + } + +keep if run == 1 + +append using "$dir_data/temp_valid_stats.dta" + +* Plot sub-figures +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_yEmpPersGrossLevelYear if year == `year', /// + width(750) den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + + * Plot all hours + twoway (hist sim_yEmpPersGrossLevelYear if year == `year' , width(750) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_yEmpPersGrossLevelYear if year == `year' , width(750) /// + color(red%30) legend(label(2 "SILC"))) , /// + subtitle("ALL hours") /// + name(gross_lab_inc_`year'_all, replace) /// + xtitle("Euro", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(vsmall) angle(forty_five)) /// + ylabel(0(`steps')`max_y',labsize(vsmall)) /// + graphregion(color(white)) + + + drop d_sim v1 max_d_sim max_value + + * Plot by weekly hours work + foreach ls in $ls_cat_labour { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_yEmpPersGrossLevelYear if year == `year' & /// + sim_labHrsWorkEnumWeek == "`ls'", width(750) den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year'_`ls' if /// + max_d_valid_`year'_`ls' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + * Plot by weekly hours work + twoway (hist sim_yEmpPersGrossLevelYear if year == `year' & /// + sim_labHrsWorkEnumWeek == "`ls'", width(750) color(green%30) /// + legend(label(1 "SimPaths"))) /// + (hist valid_yEmpPersGrossLevelYear if year == `year' & /// + valid_labHrsWorkEnumWeek == "`ls'", width(750) color(red%30) /// + legend(label(2 "SILC"))) , /// + subtitle("`ls' hours") /// + name(gross_lab_inc_`year'_`ls', replace) /// + xtitle("Euro", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(vsmall) angle(forty_five)) /// + ylabel(0(`steps')`max_y',labsize(vsmall)) /// + legend(size(small)) /// + graphregion(color(white)) + + drop d_sim v1 max_d_sim max_value + + } +} + +* Combine plots by year +forvalues year = `min_year'/`max_year' { + + grc1leg gross_lab_inc_`year'_all /// + gross_lab_inc_`year'_TWENTY /// + gross_lab_inc_`year'_FORTY /// + gross_lab_inc_`year'_FIFTY, /// + title("Individual Gross Labour Income by Weekly Hours of Work") /// + subtitle("`year'") /// + legendfrom(gross_lab_inc_`year'_TWENTY) /// + rows(2) /// + graphregion(color(white)) /// + note("Notes: Amounts in Euro per year, 2015 prices. Employed 18-65 years olds included in the sample. Top and bottom percentiles trimmed.", /// + size(vsmall)) + + graph export "$dir_output_files/income/gross_labour_income/validation_${country}_ind_gross_labour_income_dist_`year'_both.png", /// + replace width(2400) height(1350) + +} + +graph drop _all + diff --git a/validation/02_simulation_validation/do_files/04_05_plot_capital_income.do b/validation/02_simulation_validation/do_files/04_05_plot_capital_income.do new file mode 100644 index 0000000..5c3e147 --- /dev/null +++ b/validation/02_simulation_validation/do_files/04_05_plot_capital_income.do @@ -0,0 +1,379 @@ +******************************************************************************** +* PROJECT: SimPaths EU +* SECTION: Validation +* OBJECT: Capital income +* AUTHORS: Ashley Burdett +* LAST UPDATE: April 2026 (AB) +* COUNTRY: PL +******************************************************************************** +* NOTES: This do file plots simulated and SILC capital income, +* per benefit unit + +******************************************************************************** + +******************************************************************************** +* 1 : Time series +******************************************************************************** + +******************************************************************************** +* 1.1 : Mean through time, benefit unit +******************************************************************************** + +* Prepare validation data +use year idPers idBu demAge dwt valid_yCapitalBuLevelYear using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if demAge >= 16 + +* Keep one observatioon per benefit unit +bysort year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + +* Trim outliers +/* +if "$trim_outliers" == "true" { + + sum valid_yCapitalBuLevelYear, d + + replace valid_yCapitalBuLevelYear = . if /// + valid_yCapitalBuLevelYear < r(p1) | /// + valid_yCapitalBuLevelYear > r(p99) + +} +*/ + +collapse (mean) valid_yCapitalBuLevelYear [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year idPers idBu demAge sim_yCapitalBuLevelYear using /// + "$dir_data/simulation_sample.dta", clear + +keep if demAge >= 16 + +* Keep one observatioon per benefit unit +bysort run year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + +* Trim outliers +/* +if "$trim_outliers" == "true" { + + sum sim_yCapitalBuLevelYear, d + + replace sim_yCapitalBuLevelYear = . if /// + sim_yCapitalBuLevelYear < r(p1) | sim_yCapitalBuLevelYear > r(p99) + +} +*/ + +collapse (mean) sim_yCapitalBuLevelYear, by(run year) + +collapse (mean) sim_yCapitalBuLevelYear /// + (sd) sim_yCapitalBuLevelYear_sd = sim_yCapitalBuLevelYear /// + , by(year) + +foreach varname in sim_yCapitalBuLevelYear { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway (rarea sim_yCapitalBuLevelYear_high sim_yCapitalBuLevelYear_low year, /// + sort color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_yCapitalBuLevelYear year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("Benefit Unit Capital income") /// + subtitle("") /// + xtitle("Year", size(small)) /// + ytitle("Euro per year", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Series represents average benefit unit capital income per year. Amounts in 2015 prices.", /// + size(vsmall)) + +* Save figure +graph export /// +"$dir_output_files/income/capital_income/validation_${country}_capital_income_bu_ts.jpg", /// + replace width(2400) height(1350) quality(100) + + +******************************************************************************** +* 1.2 : Share with no capital income, benefit unit +******************************************************************************** + +* Share with no capital income +* Prepare validation data +use year idPers idBu demAge dwt valid_yCapitalBuLevelYear using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if demAge >= 16 + +* Keep one observatioon per benefit unit +bysort year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + +/* +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yCapitalBuLevelYear, d + + replace valid_yCapitalBuLevelYear = . if /// + valid_yCapitalBuLevelYear < r(p1) | /// + valid_yCapitalBuLevelYear > r(p99) + +} +*/ + +gen valid_no_capital = (valid_yCapitalBuLevelYear == 0) + +collapse (mean) valid_no_capital [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year idPers idBu demAge sim_yCapitalBuLevelYear using /// + "$dir_data/simulation_sample.dta", clear + +keep if demAge >= 16 + +* Keep one observatioon per benefit unit +bysort run year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + +/* +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yCapitalBuLevelYear, d + + replace sim_yCapitalBuLevelYear = . if /// + sim_yCapitalBuLevelYear < r(p1) | sim_yCapitalBuLevelYear > r(p99) + +} +*/ + +gen sim_no_capital = (sim_yCapitalBuLevelYear == 0) + +collapse (mean) sim_no_capital, by(run year) + +collapse (mean) sim_no_capital /// + (sd) sim_no_capital_sd = sim_no_capital /// + , by(year) + +foreach varname in sim_no_capital { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway (rarea sim_no_capital_high sim_no_capital_low year, sort /// + color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_no_capital year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("No Capital Income") /// + subtitle("") /// + xtitle("Year", size(small)) /// + ytitle("Share of benefit units", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: ", /// + size(vsmall)) + +* Save figure +graph export /// +"$dir_output_files/income/capital_income/validation_${country}_no_capital_income_bu_ts.jpg", /// + replace width(2400) height(1350) quality(100) + + +******************************************************************************** +* 2 : Histograms +******************************************************************************** + + +******************************************************************************** +* 2.1 : Benefit unit by year, +******************************************************************************** + +* Prepare validation data +use year idBu demAge dwt valid_yCapitalBuLevelYear using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +drop if demAge < 16 + +* Keep one observation per benefit unit +bysort year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yCapitalBuLevelYear, d + + replace valid_yCapitalBuLevelYear = . if /// + valid_yCapitalBuLevelYear < r(p1) | /// + valid_yCapitalBuLevelYear > r(p99) + +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year idBu demAge sim_yCapitalBuLevelYear using /// + "$dir_data/simulation_sample.dta", clear + +drop if demAge < 16 + +* Keep one observation per benefit unit +bysort run year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yCapitalBuLevelYear, d + + replace sim_yCapitalBuLevelYear = . if /// + sim_yCapitalBuLevelYear < r(p1) | sim_yCapitalBuLevelYear > r(p99) + +} + +append using "$dir_data/temp_valid_stats.dta" + +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway (hist sim_yCapitalBuLevelYear if year == `year', width(50) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_yCapitalBuLevelYear if year == `year', /// + width(50) color(red%30) legend(label(2 "SILC"))) , /// + title("Benefit Unit Capital Income") /// + subtitle("`year'") /// + name(capital_inc_`year'_all, replace) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + xtitle("Euro", size(small)) /// + ytitle("Density", size(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Amounts in Euro per year, 2015 prices. Top and bottom percentiles trimmed.", /// + size(vsmall)) + + graph export /// + "$dir_output_files/income/capital_income/validation_${country}_capital_income_bu_dist_`year'.png", /// + replace width(2560) height(1440) + +} + +******************************************************************************** +* 2.2 : Positive amounts only, benefit unit +******************************************************************************** + +* Prepare validation data +use year idBu demAge dwt valid_yCapitalBuLevelYear using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if demAge >= 16 + +* Keep one observatioon per benefit unit +bysort year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + +* Trim outliers + +if "$trim_outliers" == "true" { + + sum valid_yCapitalBuLevelYear, d + + replace valid_yCapitalBuLevelYear = . if /// + valid_yCapitalBuLevelYear < r(p1) | /// + valid_yCapitalBuLevelYear > r(p99) + +} + + +drop if valid_yCapitalBuLevelYear == 0 + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year idBu demAge sim_yCapitalBuLevelYear using /// + "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,18,65) + +* Keep one observatioon per benefit unit +bysort run year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + +* Trim outliers + +if "$trim_outliers" == "true" { + + sum sim_yCapitalBuLevelYear, d + + replace sim_yCapitalBuLevelYear = . if /// + sim_yCapitalBuLevelYear < r(p1) | sim_yCapitalBuLevelYear > r(p99) + +} + +drop if sim_yCapitalBuLevelYear == 0 + +append using "$dir_data/temp_valid_stats.dta" + +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway (hist sim_yCapitalBuLevelYear if year == `year', /// + width(50) color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_yCapitalBuLevelYear if year == `year', width(50) /// + color(red%30) legend(label(2 "SILC"))) , /// + title("Benefit Unit Capital Income") /// + subtitle("Positive amounts, `year'") /// + name(capital_inc_`year'_all, replace) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + xtitle("Euro", size(small)) /// + ytitle("Density", size(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Amounts in Euro per year, 2015 prices. Top and bottom percentailes trimmed. ", /// + size(vsmall)) + + graph export /// +"$dir_output_files/income/capital_income/validation_${country}_positive_capital_income_bu_dist_`year'.png", /// + replace width(2560) height(1440) + +} + +graph drop _all diff --git a/validation/02_simulation_validation/do_files/04_07_plot_disposable_income.do b/validation/02_simulation_validation/do_files/04_07_plot_disposable_income.do new file mode 100644 index 0000000..fe0fc43 --- /dev/null +++ b/validation/02_simulation_validation/do_files/04_07_plot_disposable_income.do @@ -0,0 +1,379 @@ +******************************************************************************** +* PROJECT: SimPath EU +* SECTION: Validation +* OBJECT: Disposable income +* AUTHORS: Ashley Burdett +* LAST UPDATE: April 2026 +* COUNTRY: PL +******************************************************************************** +* NOTES: This do file plots simulated and SILC disposable income, +* per benefit unit. Individual level data plotted. +******************************************************************************** + +******************************************************************************** +* 1 : Mean values over time +******************************************************************************** + +******************************************************************************** +* 1.1 : Mean values over time, benefit unit +******************************************************************************** + +* Prepare validation data +use year idBu demAge dwt valid_yDispBuLevelYear using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +drop if demAge < 16 + +* Keep one observation per benefit unit +bysort year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yDispBuLevelYear, d + + replace valid_yDispBuLevelYear = . if /// + valid_yDispBuLevelYear < r(p1) | valid_yDispBuLevelYear > r(p99) + +} + +collapse (mean) valid_yDispBuLevelYear [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run idBu year demAge sim_yDispBuLevelYear using /// + "$dir_data/simulation_sample.dta", clear + +drop if demAge < 16 + +* Keep one observation per benefit unit +bysort run year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yDispBuLevelYear, d + + replace sim_yDispBuLevelYear = . if /// + sim_yDispBuLevelYear < r(p1) | sim_yDispBuLevelYear > r(p99) + +} + +collapse (mean) sim_yDispBuLevelYear, by(run year) + +collapse (mean) sim_yDispBuLevelYear /// + (sd) sim_yDispBuLevelYear_sd = sim_yDispBuLevelYear /// + , by(year) + +foreach varname in sim_yDispBuLevelYear { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway (rarea sim_yDispBuLevelYear_high sim_yDispBuLevelYear_low year, /// + sort color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_yDispBuLevelYear year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("Benefit Unit Disposable Income") /// + subtitle("") /// + xtitle("Year", size(small)) /// + ytitle("Euro per year", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Amounts in Euro per year, 2015 prices. Top and bottom percentiles trimmed.", /// + size(vsmall)) + +graph export /// +"$dir_output_files/income/disposable_income/validation_${country}_disposable_income_bu_ts.jpg", /// + replace + + +******************************************************************************** +* 2 : Histograms +******************************************************************************** + +******************************************************************************** +* 2.1 : By year, benefit unit +******************************************************************************** + +* Prepare validation data +use year idBu demAge dwt valid_yDispBuLevelYear using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Sample selection +drop if demAge < 16 + +* Keep one observation per benefit unit +bysort year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yDispBuLevelYear, d + + replace valid_yDispBuLevelYear = . if /// + valid_yDispBuLevelYear < r(p1) | /// + valid_yDispBuLevelYear > r(p99) + +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year idBu sim_yDispBuLevelYear demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Sample selection +drop if demAge < 16 + +* Keep one observation per benefit unit +bysort run year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yDispBuLevelYear, d + + replace sim_yDispBuLevelYear = . if /// + sim_yDispBuLevelYear < r(p1) | sim_yDispBuLevelYear > r(p99) + +} + +append using "$dir_data/temp_valid_stats.dta" + +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway (hist sim_yDispBuLevelYear if year == `year', width(750) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_yDispBuLevelYear if year == `year', /// + width(750) color(red%30) legend(label(2 "SILC"))) , /// + title("Benefit Unit Disposable Income") /// + subtitle("`year'") /// + name(disp_inc_`year'_all, replace) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + xtitle("Euro", size(small)) /// + ytitle("Density", size(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Amounts in Euro per year, 2015 prices. Top and bottom percentiles trimmed.", /// + size(vsmall)) + + graph export /// + "$dir_output_files/income/disposable_income/validation_${country}_disposable_income_bu_dist_`year'.png", /// + replace width(2560) height(1440) + +} + +graph drop _all + + +******************************************************************************** +* 2.2 : Histograms - Benefit unit, ages 18-65, by year, by hours of work +******************************************************************************** + +* Prepare validation data +use year demAge dwt valid_yDispBuLevelYear valid_labHrsWorkEnumWeek /// + using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yDispBuLevelYear, d + + replace valid_yDispBuLevelYear = . if /// + valid_yDispBuLevelYear < r(p1) | valid_yDispBuLevelYear > r(p99) + +} + +* Prepare info needed for dynamic y axis labels +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway__histogram_gen valid_yDispBuLevelYear if year == `year' , /// + width(750) den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year' = r(max) + + drop d_valid v2 + + foreach ls in $ls_cat { + + twoway__histogram_gen valid_yDispBuLevelYear if /// + year == `year' & valid_labHrsWorkEnumWeek == "`ls'", width(750) /// + den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year'_`ls' = r(max) + + drop d_valid v2 + + } +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year sim_yDispBuLevelYear sim_labHrsWorkEnumWeek demAge using /// + "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,18,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yDispBuLevelYear, d + + replace sim_yDispBuLevelYear = . if /// + sim_yDispBuLevelYear < r(p1) | sim_yDispBuLevelYear > r(p99) + +} + +keep if run == 1 + +append using "$dir_data/temp_valid_stats.dta" + +* Plot sub-figures +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_yDispBuLevelYear if year == `year', width(750) /// + den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + * Plot all hours + twoway (hist sim_yDispBuLevelYear if year == `year', width(750) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_yDispBuLevelYear if year == `year', width(750) /// + color(red%30) /// + legend(label(2 "SILC"))) , /// + subtitle("ALL hours") /// + name(disp_inc_`year'_all, replace) /// + xtitle("Euro", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(vsmall) angle(forty_five)) /// + ylabel(0(`steps')`max_y', labsize(vsmall)) /// + graphregion(color(white)) + + drop d_sim v1 max_d_sim max_value + + foreach ls in $ls_cat { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_yDispBuLevelYear if year == `year' & /// + sim_labHrsWorkEnumWeek == "`ls'", width(750) den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year'_`ls' if /// + max_d_valid_`year'_`ls' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + * Plot by weekly hours work + twoway (hist sim_yDispBuLevelYear if year == `year' & /// + sim_labHrsWorkEnumWeek == "`ls'", width(750) color(green%30) /// + legend(label(1 "SimPaths"))) /// + (hist valid_yDispBuLevelYear if year == `year' & /// + valid_labHrsWorkEnumWeek == "`ls'", width(750) color(red%30) /// + legend(label(2 "SILC"))) , /// + subtitle("`ls' hours") /// + name(disp_inc_`year'_`ls', replace) /// + xtitle("Euro", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(vsmall) angle(forty_five)) /// + ylabel(0(`steps')`max_y', labsize(vsmall)) /// + legend(size(small)) /// + graphregion(color(white)) + + drop d_sim v1 max_d_sim max_value + + } +} + +* Combine plots by year +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forvalues year = `min_year'/`max_year' { + + grc1leg disp_inc_`year'_all /// + disp_inc_`year'_ZERO /// + disp_inc_`year'_TWENTY /// + disp_inc_`year'_FORTY /// + disp_inc_`year'_FIFTY , /// + title("Benefit Unit Disposable Income by Weekly Hours of Work") /// + subtitle("`year'") /// + legendfrom(disp_inc_`year'_ZERO) /// + rows(2) /// + graphregion(color(white)) /// + note("Notes: Amounts in Euro per year, 2015 prices. Indiviudal level data of benefit level amount plotted." "Top and bottom percentiles trimmed.", /// + size(vsmall)) + + graph export /// + "$dir_output_files/income/disposable_income/validation_${country}_disposable_income_bu_dist_`year'_hrs_work.png", /// + replace width(2400) height(1350) + + +} + + +graph drop _all + diff --git a/validation/02_simulation_validation/do_files/04_08_plot_equivalised_disposable_income.do b/validation/02_simulation_validation/do_files/04_08_plot_equivalised_disposable_income.do new file mode 100644 index 0000000..bb50189 --- /dev/null +++ b/validation/02_simulation_validation/do_files/04_08_plot_equivalised_disposable_income.do @@ -0,0 +1,368 @@ +******************************************************************************** +* PROJECT: SimPaths EU +* SECTION: Validation +* OBJECT: Equivalised disposable income +* AUTHORS: Ashley Burdett +* LAST UPDATE: April 2026 +* COUNTRY: PL +******************************************************************************** +* NOTES: This do file plots simulated and SILC equivalised +* disposable income, per benefit unit +******************************************************************************** + +******************************************************************************** +* 1 : Mean values over time, benefit unit +******************************************************************************** + +* Prepare validation data +use year idBu demAge dwt valid_yDispBuEquivYear using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +drop if demAge < 16 + +* Keep one observation per benefit unit +bysort year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yDispBuEquivYear, d + + replace valid_yDispBuEquivYear = . if /// + valid_yDispBuEquivYear < r(p1) | valid_yDispBuEquivYear > r(p99) + +} + +collapse (mean) valid_yDispBuEquivYear [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run idBu year demAge sim_yDispEquivYear using /// + "$dir_data/simulation_sample.dta", clear + +drop if demAge < 16 + +* Keep one observation per benefit unit +bysort run year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yDispEquivYear, d + + replace sim_yDispEquivYear = . if /// + sim_yDispEquivYear < r(p1) | sim_yDispEquivYear > r(p99) + +} + +collapse (mean) sim_yDispEquivYear, by(run year) + +collapse (mean) sim_yDispEquivYear /// + (sd) sim_yDispEquivYear_sd = sim_yDispEquivYear, by(year) + +foreach varname in sim_yDispEquivYear { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway (rarea sim_yDispEquivYear_high sim_yDispEquivYear_low year, /// + sort color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_yDispBuEquivYear year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("Benefit Unit Equivalised Disposable Income") /// + subtitle("") /// + xtitle("Year", size(small)) /// + ytitle("Euro per year", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Equivalised disposable income computed by the modified OECD scale. Top and bottom percentiles trimmed. Amounts" "annual, in 2015 prices.", /// + size(vsmall)) + +graph export /// +"$dir_output_files/income/equivalised_disposable_income/validation_${country}_equivalised_disposable_income_bu_ts.jpg", /// + replace width(2400) height(1350) + + +******************************************************************************** +* 2 : Histograms +******************************************************************************** + +******************************************************************************** +* 2.1 : By year, benefit unit +******************************************************************************** + +* Prepare validation data +use year idBu demAge dwt valid_yDispBuEquivYear using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Sample selection +drop if demAge < 16 + +* Keep one observation per benefit unit +bysort year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yDispBuEquivYear, d + + replace valid_yDispBuEquivYear = . if /// + valid_yDispBuEquivYear < r(p1) | /// + valid_yDispBuEquivYear > r(p99) + +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year idBu sim_yDispEquivYear demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Sample selection +drop if demAge < 16 + +* Keep one observation per benefit unit +bysort run year idBu: gen first_person = (_n == 1) +keep if first_person == 1 + + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yDispEquivYear, d + + replace sim_yDispEquivYear = . if /// + sim_yDispEquivYear < r(p1) | sim_yDispEquivYear > r(p99) + +} + +append using "$dir_data/temp_valid_stats.dta" + +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway (hist sim_yDispEquivYear if year == `year', width(250) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_yDispBuEquivYear if year == `year', /// + width(250) color(red%30) legend(label(2 "SILC"))) , /// + title("Benefit Unit Equivalised Disposable Income") /// + subtitle("`year'") /// + name(disp_inc_`year'_all, replace) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + xtitle("Euro", size(small)) /// + ytitle("Density", size(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Amounts in Euro per year, 2015 prices. Top and bottom percentiles trimmed.", /// + size(vsmall)) + + graph export /// + "$dir_output_files/income/equivalised_disposable_income/validation_${country}_equivalised_disposable_income_bu_dist_`year'.png", /// + replace width(2560) height(1440) + +} + +graph drop _all + + +******************************************************************************** +* 2 : Histograms by year, and by category of weekly labour supply, ben unit +******************************************************************************** + +* Prepare validation data +use year demAge dwt valid_yDispBuEquivYear valid_labHrsWorkEnumWeek using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yDispBuEquivYear, d + + replace valid_yDispBuEquivYear = . if /// + valid_yDispBuEquivYear < r(p1) | valid_yDispBuEquivYear > r(p99) + +} + +* Prepare info needed for dynamic y axis labels +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway__histogram_gen valid_yDispBuEquivYear if year == `year' , /// + width(250) den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year' = r(max) + + drop d_valid v2 + + foreach ls in $ls_cat { + + twoway__histogram_gen valid_yDispBuEquivYear if /// + year == `year' & valid_labHrsWorkEnumWeek == "`ls'", width(250) /// + den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year'_`ls' = r(max) + + drop d_valid v2 + + } +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year demAge sim_yDispEquivYear sim_labHrsWorkEnumWeek using /// + "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,18,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yDispEquivYear, d + + replace sim_yDispEquivYear = . if /// + sim_yDispEquivYear < r(p1) | sim_yDispEquivYear > r(p99) + +} + +keep if run == 1 + +append using "$dir_data/temp_valid_stats.dta" + +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_yDispEquivYear if year == `year', width(250) /// + den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + * Plot all hours + twoway (hist sim_yDispEquivYear if year == `year', width(250) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_yDispBuEquivYear if year == `year', width(250) color(red%30) /// + legend(label(2 "SILC"))) , /// + subtitle("ALL hours") /// + xtitle("Euro", size(small)) /// + ytitle("Density", size(small)) /// + name(eqdisp_inc_`year'_all, replace) /// + xlabel(,labsize(vsmall) angle(forty_five)) /// + ylabel(0(`steps')`max_y', labsize(vsmall)) /// + legend(size(small)) /// + graphregion(color(white)) + + drop d_sim v1 max_d_sim max_value + + foreach ls in $ls_cat { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_yDispEquivYear if year == `year' & /// + sim_labHrsWorkEnumWeek == "`ls'", width(250) den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year'_`ls' if /// + max_d_valid_`year'_`ls' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + twoway (hist sim_yDispEquivYear if year == `year' & /// + sim_labHrsWorkEnumWeek == "`ls'", width(250) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_yDispBuEquivYear if year == `year' & /// + valid_labHrsWorkEnumWeek == "`ls'", width(250) color(red%30) /// + legend(label(2 "SILC"))) , /// + subtitle("`ls' hours") /// + name(eqdisp_inc_`year'_`ls', replace) /// + xtitle("Euro", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(vsmall) angle(forty_five)) /// + ylabel(0(`steps')`max_y', labsize(vsmall)) /// + legend(size(small)) /// + graphregion(color(white)) + + drop d_sim v1 max_d_sim max_value + + } +} + +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forvalues year = `min_year'/`max_year' { + + grc1leg eqdisp_inc_`year'_all /// + eqdisp_inc_`year'_ZERO /// + eqdisp_inc_`year'_TWENTY /// + eqdisp_inc_`year'_FORTY /// + eqdisp_inc_`year'_FIFTY , /// + title("Equivalised Disposable Income") /// + subtitle("`year'") /// + legendfrom(eqdisp_inc_`year'_all) /// + rows(2) /// + graphregion(color(white)) /// + note("Notes: Distribution of benefit unit equivalised disposable income. Individual level data plotted 18-65 year olds included in sample. Amounts in" "Euro per year, 2015 prices. Top and bottom percentiles trimmed.", /// + size(vsmall)) + + graph export /// + "$dir_output_files/income/equivalised_disposable_income/validation_${country}_equivalised_disposable_inc_dist_`year'.png", /// + replace width(2560) height(1440) + + +} + +graph drop _all + diff --git a/validation/02_simulation_validation/do_files/04_09_plot_hourly_wages.do b/validation/02_simulation_validation/do_files/04_09_plot_hourly_wages.do new file mode 100644 index 0000000..9bf7ec0 --- /dev/null +++ b/validation/02_simulation_validation/do_files/04_09_plot_hourly_wages.do @@ -0,0 +1,572 @@ +******************************************************************************** +* PROJECT: SimPaths EU +* SECTION: Validation +* OBJECT: Hourly wages +* AUTHORS: Ashley Burdett +* LAST UPDATE: April 2026 +* COUNTRY: EU +******************************************************************************** +* NOTES: This master do file organises do files used for validating +* SimPaths model using SILC data. +******************************************************************************** + +******************************************************************************** +* 1 : Mean values over time +******************************************************************************** + +******************************************************************************** +* 1.1 : Mean values over time - 16-65 +******************************************************************************** + +* Prepare validation data +use year demAge dwt labC4 valid_wage using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + + +* Select sample +keep if labC4 == 1 +keep if inrange(demAge,16,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_wage, d + + replace valid_wage = . if /// + valid_wage < r(p1) | valid_wage > r(p99) + +} + +* Compute means +collapse (mean) valid_wage [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year labC4 sim_pred_wage demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" +keep if inrange(demAge,16,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_pred_wage, d + + replace sim_pred_wage = . if /// + sim_pred_wage < r(p1) | sim_pred_wage > r(p99) + +} + +* Compute means and sd +collapse (mean) sim_pred_wage, by(run year) + +collapse (mean) sim_pred_wage /// + (sd) sim_pred_wage_sd = sim_pred_wage /// + , by(year) + +* Approx 95% confidence interval +foreach varname in sim_pred_wage { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +* Combine datasets +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway (rarea sim_pred_wage_high /// + sim_pred_wage_low year, /// + sort color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_wage year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("Hourly Wage") /// + subtitle("Ages 16-65") /// + xtitle("Year", size(small)) /// + ytitle("Euro per hour", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Statistics calculated on sample of employed and self-employed individuals. Amounts in 2015 prices.""Top and bottom percentiles trimmed.", /// + size(vsmall)) + +* Save figure +graph export /// +"$dir_output_files/wages/validation_${country}_wages_ts_16_65_both.jpg", /// + replace width(2560) height(1440) quality(100) + + +******************************************************************************** +* 1.2 : Mean values over time - 16-65, by gender +******************************************************************************** + +* Prepare validation data +use year demAge dwt labC4 valid_wage demMaleFlag using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* select sample +keep if labC4 == 1 +keep if inrange(demAge,16,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_wage, d + + replace valid_wage = . if /// + valid_wage < r(p1) | valid_wage > r(p99) + +} + +* Compute mean +collapse (mean) valid_wage [aw = dwt], by(year demMaleFlag) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year labC4 sim_pred_wage demMaleFlag demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" +keep if inrange(demAge,16,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_pred_wage , d + + replace sim_pred_wage = . if /// + sim_pred_wage < r(p1) | sim_pred_wage > r(p99) + +} + +collapse (mean) sim_pred_wage, by(run year demMaleFlag) + +collapse (mean) sim_pred_wage /// + (sd) sim_pred_wage_sd = sim_pred_wage /// + , by(year demMaleFlag) + +foreach varname in sim_pred_wage { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway (rarea sim_pred_wage_high /// + sim_pred_wage_low year if demMaleFlag == 0, /// + sort color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_wage year if demMaleFlag == 0, sort color(green) /// + legend(label(2 "SILC"))), /// + subtitle("Females") /// + name(wages_female, replace) /// + xtitle("Year", size(small)) /// + ytitle("Euro per hour", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) + +twoway (rarea sim_pred_wage_high /// + sim_pred_wage_low year if demMaleFlag == 1, /// + sort color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_wage year if demMaleFlag == 1, sort color(green) /// + legend(label(2 "SILC"))), /// + subtitle("Males") /// + name(wages_male, replace) /// + xtitle("Year", size(small)) /// + ytitle("Euro per hour", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) + + +grc1leg wages_female wages_male, /// + title("Hourly Wage") /// + subtitle("Ages 16-65") /// + legendfrom(wages_female) rows(1) /// + graphregion(color(white)) /// + ycomm /// + note("Notes: Statistics calculated on sample of employed anf self-employed individuals. Amounts in 2015 prices. Top and bottom" "percentiles trimmed.", /// + size(vsmall)) + +* Save figure +graph export /// +"$dir_output_files/wages/validation_${country}_wages_ts_16_65_gender.jpg", /// + replace width(2560) height(1440) quality(100) + +graph drop _all + + +******************************************************************************** +* 2 : Histograms by year +******************************************************************************** + +******************************************************************************** +* 2.1 : Histograms by year - ages 16-65 +******************************************************************************** + +* Prepare validation data +use year demAge dwt labC4 valid_wage demAge flag_wage_imp_panel /// + flag_wage_hotdeck using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +keep if inrange(demAge,16,65) + +drop labC4 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_wage, d + + replace valid_wage = . if /// + valid_wage < r(p1) | valid_wage > r(p99) + +} + +* Prepare info needed for dynamic y axis labels +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway__histogram_gen valid_wage if year == `year' , /// + bin(10) den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year' = r(max) + + drop d_valid v2 + +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year labC4 sim_pred_wage demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" +keep if inrange(demAge,16,65) + +drop labC4 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_pred_wage, d + + replace sim_pred_wage = . if /// + sim_pred_wage < r(p1) | sim_pred_wage > r(p99) + +} + +append using "$dir_data/temp_valid_stats.dta" + +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_pred_wage if year == `year', /// + bin(5) den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + * Plot all hours + twoway (hist sim_pred_wage if year == `year', /// + width(0.5) color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_wage if year == `year', width(0.5) color(red%30) /// + legend(label(2 "SILC"))) , /// + title("Hourly Wage") /// + subtitle("`year'") /// + name(hourly_wages_`year'_all, replace) /// + xtitle("Euro per hour", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Statistics calculated on subsample of employed and self-employed individuals aged 16-65. Amounts in 2015 prices.""Top percentiles and bottom percentiles trimmed.", size(vsmall)) + + graph export /// + "$dir_output_files/wages/validation_${country}_wages_dist_`year'.png", /// + replace width(2400) height(1350) + + drop d_sim v1 max_d_sim max_value + +} + +graph drop _all + + +******************************************************************************** +* 2.2 : Histograms by year - ages 16-65 by gender +******************************************************************************** + +* Females +* Prepare validation data +use year demAge dwt labC4 valid_wage demMaleFlag demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +keep if demMaleFlag == 0 +keep if inrange(demAge,16,65) + +drop labC4 demMaleFlag + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_wage, d + + replace valid_wage = . if /// + valid_wage < r(p1) | valid_wage > r(p99) + +} + +* Prepare info needed for dynamic y axis labels +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway__histogram_gen valid_wage if year == `year' , /// + bin(10) den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year' = r(max) + + drop d_valid v2 + +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year labC4 sim_pred_wage demMaleFlag demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" +keep if demMaleFlag == 0 +keep if inrange(demAge,16,65) + +drop labC4 demMaleFlag + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_pred_wage, d + + replace sim_pred_wage = . if /// + sim_pred_wage < r(p1) | sim_pred_wage > r(p99) + +} + +append using "$dir_data/temp_valid_stats.dta" + +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_pred_wage if year == `year', /// + bin(10) den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + * Plot all hours + twoway (hist sim_pred_wage if year == `year', /// + width(0.5) color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_wage if year == `year', width(0.5) color(red%30) /// + legend(label(2 "SILC"))) , /// + title("Hourly Wage") /// + subtitle("`year', females") /// + name(hourly_wages_`year'_all, replace) /// + xtitle("Euro per hour", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Statistics calculated on subsample of employed and self-employed individuals aged 16-65. Amounts in 2015 prices.""Top and bottom percentails trimmed.", size(vsmall)) + + graph export /// + "$dir_output_files/wages/validation_${country}_wages_dist_`year'_female.png", /// + replace width(2400) height(1350) + + drop d_sim v1 max_d_sim max_value + +} + + +* Males +* Prepare validation data +use year demAge dwt labC4 valid_wage demMaleFlag using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +keep if demMaleFlag == 1 +drop if inrange(demAge,16,65) + +drop labC4 demMaleFlag + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_wage, d + + replace valid_wage = . if /// + valid_wage < r(p1) | valid_wage > r(p99) + +} + +* Prepare info needed for dynamic y axis labels +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway__histogram_gen valid_wage if year == `year' , /// + bin(10) den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year' = r(max) + + drop d_valid v2 + +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year labC4 sim_pred_wage demMaleFlag demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" +keep if demMaleFlag == 1 +keep if inrange(demAge,16,65) + +drop labC4 demMaleFlag + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_pred_wage, d + + replace sim_pred_wage = . if /// + sim_pred_wage < r(p1) | sim_pred_wage > r(p99) + +} + +append using "$dir_data/temp_valid_stats.dta" + +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_pred_wage if year == `year', /// + bin(10) den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + * Plot all hours + twoway (hist sim_pred_wage if year == `year', /// + width(0.5) color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_wage if year == `year', width(0.5) color(red%30) /// + legend(label(2 "SILC"))) , /// + title("Hourly Wage") /// + subtitle("`year', males") /// + name(hourly_wages_`year'_all, replace) /// + xtitle("Euro per hour", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Statistics calculated on subsample of employed and self-employed individuals aged 16-65. Amounts in 2015 prices.""Top and bottom percentiles trimmed.", size(vsmall)) + + + graph export /// + "$dir_output_files/wages/validation_${country}_wages_dist_`year'_male.png", /// + replace width(2400) height(1350) + + drop d_sim v1 max_d_sim max_value + + +} + + +graph drop _all diff --git a/validation/02_simulation_validation/do_files/04_10_0_plot_hours_worked.do b/validation/02_simulation_validation/do_files/04_10_0_plot_hours_worked.do new file mode 100644 index 0000000..d448fbf --- /dev/null +++ b/validation/02_simulation_validation/do_files/04_10_0_plot_hours_worked.do @@ -0,0 +1,2077 @@ +******************************************************************************** +* PROJECT: SimPaths EU +* SECTION: Validation +* OBJECT: Hours worked per week +* AUTHORS: Ashley Burdett +* LAST UPDATE: Jan 2026 +* COUNTRY: PL +******************************************************************************** +* NOTES: Current implementation explores the impact how the +* heterogeneity of the upper most category is instructed. +******************************************************************************** + +set seed 12345 + +******************************************************************************** +* UNIFORM HETEROGENIETY +******************************************************************************** + +******************************************************************************** +* 1 : Mean values over time +******************************************************************************** + +******************************************************************************** +* 1.1 : Mean values over time - Ages 16-65 +******************************************************************************** + +* Prepare validation data +use year dwt labC4 valid_labHrsWorkWeek demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +keep if inrange(demAge,16,65) + +* Compute mean +collapse (mean) valid_labHrsWorkWeek [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year idPers labC4 sim_labHrsWorkWeek demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) + +rename sim_labHrsWorkWeek sim_labHrsWorkWeek + +* Keep only employed individuals +keep if labC4 == "EmployedOrSelfEmployed" + +* Compute mean and sd +collapse (mean) sim_labHrsWorkWeek, by(run year) + +collapse (mean) sim_labHrsWorkWeek /// + (sd) sim_labHrsWorkWeek_sd = sim_labHrsWorkWeek /// + , by(year) + +* Approx 95% confidence interval +foreach varname in sim_labHrsWorkWeek { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +save "$dir_data/temp_sim_mean_uni.dta", replace + +* Combine datasets +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway (rarea sim_labHrsWorkWeek_high sim_labHrsWorkWeek_low year, sort /// + color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_labHrsWorkWeek year, sort color(green) legend(label(2 "SILC"))), /// + title("Average Weekly Hours Worked") /// + subtitle("Ages 16-65") /// + xtitle("Year", size(small)) /// + ytitle("Hours per week", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of working age employed and self-employed individuals.", /// + size(vsmall)) + +* Save figure +graph export /// +"$dir_output_files/hours_worked/validation_${country}_hours_worked_ts_16_65_both.jpg", /// + replace width(2560) height(1440) quality(100) + + +******************************************************************************** +* 1.2 : Mean values over time - Ages 16-65, by gender +******************************************************************************** + +* Prepare validation data +use year dwt labC4 valid_labHrsWorkWeek demAge demMaleFlag using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +keep if inrange(demAge,16,65) + +* Compute mean +collapse (mean) valid_labHrsWorkWeek [aw = dwt], by(year demMaleFlag) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year idPers labC4 sim_labHrsWorkWeek demAge demMaleFlag using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" +keep if inrange(demAge,16,65) + +* Compute mean and sd +collapse (mean) sim_labHrsWorkWeek, by(run year demMaleFlag) + +collapse (mean) sim_labHrsWorkWeek /// + (sd) sim_labHrsWorkWeek_sd = sim_labHrsWorkWeek /// + , by(year demMaleFlag) + +* Approx 95% confidence interval +foreach varname in sim_labHrsWorkWeek { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +* Combine datasets +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure + +* Males +preserve + +keep if demMaleFlag == 1 + +twoway (rarea sim_labHrsWorkWeek_high sim_labHrsWorkWeek_low year, sort /// + color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_labHrsWorkWeek year, sort color(green) legend(label(2 "SILC"))), /// + title("Average Weekly Hours Worked") /// + subtitle("Males, ages 16-65") /// + xtitle("Year", size(small)) /// + ytitle("Hours per week", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of working age employed and self-employed individuals.", /// + size(vsmall)) + +* Save figure +graph export /// +"$dir_output_files/hours_worked/validation_${country}_hours_worked_ts_16_65_male.jpg", /// + replace width(2560) height(1440) quality(100) + +restore + +* Females + +keep if demMaleFlag == 0 + +twoway (rarea sim_labHrsWorkWeek_high sim_labHrsWorkWeek_low year, sort /// + color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_labHrsWorkWeek year, sort color(green) legend(label(2 "SILC"))), /// + title("Average Weekly Hours Worked") /// + subtitle("Females, ages 16-65") /// + xtitle("Year", size(small)) /// + ytitle("Hours per week", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of working age employed and self-employed individuals.", /// + size(vsmall)) + +* Save figure +graph export /// +"$dir_output_files/hours_worked/validation_${country}_hours_worked_ts_16_65_female.jpg", /// + replace width(2560) height(1440) quality(100) + + +******************************************************************************** +* 1.3 : Mean values over time - Ages 16-75 +******************************************************************************** + +* Prepare validation data +use year dwt labC4 valid_labHrsWorkWeek demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +keep if inrange(demAge,16,75) + +* Compute mean +collapse (mean) valid_labHrsWorkWeek [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year idPers labC4 sim_labHrsWorkWeek demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,75) + +* Keep only employed individuals +keep if labC4 == "EmployedOrSelfEmployed" + +* Compute mean and sd +collapse (mean) sim_labHrsWorkWeek, by(run year) + +collapse (mean) sim_labHrsWorkWeek /// + (sd) sim_labHrsWorkWeek_sd = sim_labHrsWorkWeek /// + , by(year) + +* Approx 95% confidence interval +foreach varname in sim_labHrsWorkWeek { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +save "$dir_data/temp_sim_mean_uni.dta", replace + +* Combine datasets +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway (rarea sim_labHrsWorkWeek_high sim_labHrsWorkWeek_low year, sort /// + color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_labHrsWorkWeek year, sort color(green) legend(label(2 "SILC"))), /// + title("Average Weekly Hours Worked") /// + subtitle("Ages 16-75") /// + xtitle("Year", size(small)) /// + ytitle("Hours per week", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of working age employed and self-employed individuals.", /// + size(vsmall)) + +* Save figure +graph export /// +"$dir_output_files/hours_worked/validation_${country}_hours_worked_ts_16_75_both.jpg", /// + replace width(2560) height(1440) quality(100) + + +******************************************************************************** +* 1.2 : Mean values over time - Ages 16-75, by gender +******************************************************************************** + +* Prepare validation data +use year dwt labC4 valid_labHrsWorkWeek demAge demMaleFlag using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +keep if inrange(demAge,16,75) + +* Compute mean +collapse (mean) valid_labHrsWorkWeek [aw = dwt], by(year demMaleFlag) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year idPers labC4 sim_labHrsWorkWeek demAge demMaleFlag using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" +keep if inrange(demAge,16,75) + +* Compute mean and sd +collapse (mean) sim_labHrsWorkWeek, by(run year demMaleFlag) + +collapse (mean) sim_labHrsWorkWeek /// + (sd) sim_labHrsWorkWeek_sd = sim_labHrsWorkWeek /// + , by(year demMaleFlag) + +* Approx 95% confidence interval +foreach varname in sim_labHrsWorkWeek { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +* Combine datasets +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figures + +* Male +preserve + +keep if demMaleFlag == 1 + +twoway (rarea sim_labHrsWorkWeek_high sim_labHrsWorkWeek_low year, sort /// + color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_labHrsWorkWeek year, sort color(green) legend(label(2 "SILC"))), /// + title("Average Weekly Hours Worked") /// + subtitle("Males, ages 16-75") /// + xtitle("Year", size(small)) /// + ytitle("Hours per week", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of working age employed and self-employed individuals.", /// + size(vsmall)) + +* Save figure +graph export /// +"$dir_output_files/hours_worked/validation_${country}_hours_worked_ts_16_75_male.jpg", /// + replace width(2560) height(1440) quality(100) + +restore + +* Females + +keep if demMaleFlag == 0 + +twoway (rarea sim_labHrsWorkWeek_high sim_labHrsWorkWeek_low year, /// + sort color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_labHrsWorkWeek year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("Average Weekly Hours Worked") /// + subtitle("Females, ages 16-75") /// + xtitle("Year", size(small)) /// + ytitle("Hours per week", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of working age employed and self-employed individuals.", /// + size(vsmall)) + +* Save figure +graph export /// +"$dir_output_files/hours_worked/validation_${country}_hours_worked_ts_16_75_female.jpg", /// + replace width(2560) height(1440) quality(100) + + +******************************************************************************** +* 2 : Histograms by year +******************************************************************************** + +******************************************************************************** +* 2.1 : Histograms by year - ages 16-65 +******************************************************************************** + +* Prepare validation data +use year dwt labC4 valid_labHrsWorkWeek demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +keep if inrange(demAge,16,65) + +* Prepare info needed for dynamic y axis labels +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway__histogram_gen valid_labHrsWorkWeek if year == `year' , /// + bin(60) den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year' = r(max) + + drop d_valid v2 + +} + +drop labC4 + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year labC4 idPers sim_labHrsWorkWeek demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" +keep if inrange(demAge,16,65) + +* Combine datasets +append using "$dir_data/temp_valid_stats.dta" + +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_labHrsWorkWeek if year == `year', bin(60) /// + den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + twoway (hist sim_labHrsWorkWeek if year == `year', width(1) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_labHrsWorkWeek if year == `year', width(1) color(red%30) /// + legend(label(2 "SILC"))), /// + title("Weekly Hours Worked") /// + subtitle(" `year', ages 16-65") /// + xtitle("Hours per week", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(small)) /// + ylabel(0(`steps')`max_y', labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of employed and self-employed individuals.", /// + size(vsmall)) + + graph export /// + "$dir_output_files/hours_worked/validation_${country}_hours_worked_hist_`year'.png", /// + replace width(2400) height(1350) + + drop d_sim v1 max_d_sim max_value + +} + +******************************************************************************** +* 2.2 : Histograms by year - ages 16-65, by gender +******************************************************************************** + +* Female + +* Prepare validation data +use year dwt labC4 valid_labHrsWorkWeek demMaleFlag demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +keep if demMaleFlag == 0 +keep if inrange(demAge,16,65) + +* Prepare info needed for dynamic y axis labels +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway__histogram_gen valid_labHrsWorkWeek if year == `year' , /// + bin(60) den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year' = r(max) + + drop d_valid v2 + +} + +drop labC4 demMaleFlag + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run idPers year labC4 demMaleFlag sim_labHrsWorkWeek demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" +keep if demMaleFlag == 0 +keep if inrange(demAge,16,65) + +* Combine datasets +append using "$dir_data/temp_valid_stats.dta" + +* Plot by year +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_labHrsWorkWeek if year == `year', bin(60) /// + den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + twoway (hist sim_labHrsWorkWeek if year == `year', width(1) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_labHrsWorkWeek if year == `year', width(1) color(red%30) /// + legend(label(2 "SILC"))), /// + title("Weekly Hours Worked") /// + subtitle("`year', females, ages 16-65") /// + xtitle("Hours per week", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(small)) /// + ylabel(0(`steps')`max_y', labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of employed and self-employed females.", /// + size(vsmall)) + + graph export /// + "$dir_output_files/hours_worked/validation_${country}_hours_worked_hist_`year'_female.png", /// + replace width(2400) height(1350) + + drop d_sim v1 max_d_sim max_value + +} + + +* Male + +* Prepare validation data +use year dwt labC4 valid_labHrsWorkWeek demMaleFlag demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +keep if demMaleFlag == 1 +keep if inrange(demAge,16,65) + +* Prepare info needed for dynamic y axis labels +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway__histogram_gen valid_labHrsWorkWeek if year == `year' , /// + bin(60) den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year' = r(max) + + drop d_valid v2 + +} + +drop labC4 demMaleFlag + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run idPers year labC4 demMaleFlag sim_labHrsWorkWeek demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" +keep if demMaleFlag == 1 +keep if inrange(demAge,16,65) + +* Combine datasets +append using "$dir_data/temp_valid_stats.dta" + +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_labHrsWorkWeek if year == `year', bin(60) /// + den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + twoway (hist sim_labHrsWorkWeek if year == `year' , width(1) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_labHrsWorkWeek if year == `year' , width(1) color(red%30) /// + legend(label(2 "SILC"))), /// + title("Weekly Hours Worked") /// + subtitle("`year', males, ages 16-65") /// + xtitle("Hours per week", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(small)) /// + ylabel(0(`steps')`max_y', labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of employed and self-employed males.", /// + size(vsmall)) + + + graph export /// + "$dir_output_files/hours_worked/validation_${country}_hours_worked_hist_`year'_male.png", /// + replace width(2400) height(1350) + + drop d_sim v1 max_d_sim max_value +} + + +graph drop _all + + +******************************************************************************** +* 2.3 : Histograms by year - ages 16-75 +******************************************************************************** + +* Prepare validation data +use year dwt labC4 valid_labHrsWorkWeek demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +keep if inrange(demAge,16,75) + +* Prepare info needed for dynamic y axis labels +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway__histogram_gen valid_labHrsWorkWeek if year == `year' , /// + bin(60) den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year' = r(max) + + drop d_valid v2 + +} + +drop labC4 + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year labC4 idPers sim_labHrsWorkWeek demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" +keep if inrange(demAge,16,75) + +* Combine datasets +append using "$dir_data/temp_valid_stats.dta" + +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = 2023 +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_labHrsWorkWeek if year == `year', bin(60) /// + den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + twoway (hist sim_labHrsWorkWeek if year == `year', width(1) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_labHrsWorkWeek if year == `year', width(1) color(red%30) /// + legend(label(2 "SILC"))), /// + title("Weekly Hours Worked") /// + subtitle("Ages 16-75, `year'") /// + xtitle("Hours per week", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(small)) /// + ylabel(0(`steps')`max_y', labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of employed and self-employed individuals.", /// + size(vsmall)) + + graph export /// + "$dir_output_files/hours_worked/validation_${country}_hours_worked_hist_`year'_16_75.png", /// + replace width(2400) height(1350) + + drop d_sim v1 max_d_sim max_value + +} + +graph drop _all + +/* + +******************************************************************************** +* LOG-NORMAL HETEROGENIETY +******************************************************************************** + +******************************************************************************** +* 0 : IMPUTATION OF valid_labHrsWorkWeek WORK FOR THOSE IN TOP CATEGORY, LOG NORMAL +******************************************************************************** + +* Estimate parameters of truncated log normal distribution fit to SILC +use year dwt labC4 valid_labHrsWorkWeek demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) +keep if labC4 == 1 +//remove those that report working very low >0 valid_labHrsWorkWeek + +drop labC4 + +* Keep those in top valid_labHrsWorkWeek category +keep if valid_labHrsWorkWeek >= 50 & valid_labHrsWorkWeek != . + +* Proportion of the top group that work 40 valid_labHrsWorkWeek +gen exact_40 = (valid_labHrsWorkWeek == 40) + +preserve +collapse (mean) exact_40 [aw=dwt] +local valid_share_40 = exact_40 +restore + +* Create log variable +gen ln_y = ln(valid_labHrsWorkWeek) + +* Non-truncation +sum ln_y + +* Set truncation points (using observed range) +sum valid_labHrsWorkWeek +local a = r(min) +local b = r(max) +local ln_a = ln(`a') +local ln_b = ln(`b') + +* Estimate parameters +truncreg ln_y, ll(`ln_a') ul(`ln_b') nolog + +* Results +matrix b = e(b) +local mu_hat = b[1,1] +local sigma_hat = b[1,2] +local median_est = exp(`mu_hat') +local mean_est = exp(`mu_hat' + `sigma_hat'^2/2) + +* Generate new values directly in simulated dataset +use run idPers year labC4 sim_labHrsWorkWeek using /// + "$dir_data/simulation_sample.dta", clear + +rename sim_labHrsWorkWeek labHrsWorkWeek + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" + +* Observations to be adjusted +gen top = 1 if labHrsWorkWeek > 35 + +gen new_sim_labHrsWorkWeek = labHrsWorkWeek + +* Calculate the CDF bounds once +local Fa = normal((`ln_a' - `mu_hat')/`sigma_hat') +local Fb = normal((`ln_b' - `mu_hat')/`sigma_hat') + +* For observations with valid_labHrsWorkWeek >= 36, generate random values +replace new_sim_labHrsWorkWeek = exp(`mu_hat' + `sigma_hat' * /// + invnormal(`Fa' + runiform()*(`Fb' - `Fa'))) /// + if top == 1 + +keep run year idPers new_sim_labHrsWorkWeek top + +save "$dir_data/simulation_sample_hrs_adjusted.dta", replace + + +******************************************************************************** +* 1 : Mean values over time +******************************************************************************** + +******************************************************************************** +* 1.1 : Mean values over time - Ages 16-65 +******************************************************************************** + +* Prepare validation data +use year dwt labC4 valid_labHrsWorkWeek demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +keep if inrange(demAge,16,65) + +* Compute mean +collapse (mean) valid_labHrsWorkWeek [aw = dwt], by(year) + +save "$dir_data/temp_valid_mean.dta", replace + +* Prepare simulated data +use run year idPers labC4 sim_labHrsWorkWeek demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) + +* Merge in new valid_labHrsWorkWeek worked for top category using log normal distribution +merge 1:1 year run idPers using "$dir_data/simulation_sample_hrs_adjusted.dta" + +rename sim_labHrsWorkWeek sim_labHrsWorkWeek_orig +rename new_sim_labHrsWorkWeek sim_labHrsWorkWeek + +* Keep only employed individuals +keep if labC4 == "EmployedOrSelfEmployed" + +* Compute mean and sd +collapse (mean) sim_labHrsWorkWeek, by(run year) + +collapse (mean) sim_labHrsWorkWeek /// + (sd) sim_labHrsWorkWeek_sd = sim_labHrsWorkWeek /// + , by(year) + +* Approx 95% confidence interval +foreach varname in sim_labHrsWorkWeek { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +save "$dir_data/temp_sim_mean_ln.dta", replace + +* Combine datasets +merge 1:1 year using "$dir_data/temp_valid_mean.dta", keep(3) nogen + +* Plot figure +twoway (rarea sim_labHrsWorkWeek_high sim_labHrsWorkWeek_low year, sort /// + color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_labHrsWorkWeek year, sort color(green) legend(label(2 "SILC"))), /// + title("Average Weekly Hours Worked") /// + subtitle("Ages 16-65") /// + xtitle("Year", size(small)) /// + ytitle("Hours per week", size(small)) /// + ylabel(33 [2] 40 ,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of working age employed and self-employed individuals. Log-normal heterogeneity" "imposed on top category.", /// + size(vsmall)) + +* Save figure +graph export /// +"$dir_output_files/hours_worked/validation_${country}_hours_worked_ts_16_65_both_ln.jpg", /// + replace width(2560) height(1440) quality(100) + + +******************************************************************************** +* 1.1.1 : Mean values over time - Ages 16-65, by gender +******************************************************************************** + +* Males + +* Prepare validation data +use year dwt labC4 valid_labHrsWorkWeek demAge demMaleFlag using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 + +keep if inrange(demAge,16,65) +keep if demMaleFlag == 1 + +* Compute mean +collapse (mean) valid_labHrsWorkWeek [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year idPers labC4 sim_labHrsWorkWeek demAge demMaleFlag using /// + "$dir_data/simulation_sample.dta", clear + +* Merge in new valid_labHrsWorkWeek worked for top category using log normal distribution +merge 1:1 year run idPers using "$dir_data/simulation_sample_hrs_adjusted.dta" +drop _m + +rename sim_labHrsWorkWeek sim_labHrsWorkWeek_orig +rename new_sim_labHrsWorkWeek sim_labHrsWorkWeek + +* Select sample +keep if demMaleFlag == 1 +keep if labC4 == "EmployedOrSelfEmployed" +keep if inrange(demAge,16,65) + +* Compute mean and sd +collapse (mean) sim_labHrsWorkWeek, by(run year) + +collapse (mean) sim_labHrsWorkWeek /// + (sd) sim_labHrsWorkWeek_sd = sim_labHrsWorkWeek /// + , by(year) + +* Approx 95% confidence interval +foreach varname in sim_labHrsWorkWeek { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +* Combine datasets +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway (rarea sim_labHrsWorkWeek_high sim_labHrsWorkWeek_low year, sort /// + color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_labHrsWorkWeek year, sort color(green) legend(label(2 "SILC"))), /// + title("Average Weekly Hours Worked") /// + subtitle("Males, ages 16-65") /// + xtitle("Year", size(small)) /// + ytitle("Hours per week", size(small)) /// + ylabel(35 [2] 43 ,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of working age employed and self-employed individuals. Log-normal heterogeneity" "imposed on top category.", /// + size(vsmall)) + +* Save figure +graph export /// +"$dir_output_files/hours_worked/validation_${country}_hours_worked_ts_16_65_male_ln.jpg", /// + replace width(2560) height(1440) quality(100) + + +* Females + +* Prepare validation data +use year dwt labC4 valid_labHrsWorkWeek demAge demMaleFlag using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +keep if inrange(demAge,16,65) +keep if demMaleFlag == 0 + +* Compute mean +collapse (mean) valid_labHrsWorkWeek [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year idPers labC4 sim_labHrsWorkWeek demAge demMaleFlag using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) + +* Merge in update valid_labHrsWorkWeek worked for top category +merge 1:1 year run idPers using "$dir_data/simulation_sample_hrs_adjusted.dta" +drop _m + +rename sim_labHrsWorkWeek sim_labHrsWorkWeek_orig +rename new_sim_labHrsWorkWeek sim_labHrsWorkWeek + +* Select sample +keep if demMaleFlag == 0 +keep if labC4 == "EmployedOrSelfEmployed" + +* Compute mean and sd +collapse (mean) sim_labHrsWorkWeek, by(run year) + +collapse (mean) sim_labHrsWorkWeek /// + (sd) sim_labHrsWorkWeek_sd = sim_labHrsWorkWeek /// + , by(year) + +* Approx 95% confidnece interval +foreach varname in sim_labHrsWorkWeek { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + + *Combine datasets +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway (rarea sim_labHrsWorkWeek_high sim_labHrsWorkWeek_low year, sort color(green%20) /// + legend(label(1 "SimPaths"))) /// +(line valid_labHrsWorkWeek year, sort color(green) legend(label(2 "SILC"))), /// + title("Average Weekly Hours Worked") /// + subtitle("Females, ages 16-65") /// + xtitle("Year", size(small)) /// + ytitle("Hours per week", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of working age employed and self-employed individuals. Log-normal heterogeneity" "imposed on top category.", /// + size(vsmall)) + +* Save figure +graph export /// +"$dir_output_files/hours_worked/validation_${country}_hours_worked_ts_16_65_female_ln.jpg", /// + replace width(2560) height(1440) quality(100) + + + +******************************************************************************** +* 2 : Histograms by year +******************************************************************************** + +******************************************************************************** +* 2.1 : Histograms by year - ages 16-65 +******************************************************************************** + +* Prepare validation data +use year dwt labC4 valid_labHrsWorkWeek demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +keep if inrange(demAge,16,65) + +* Prepare info needed for dynamic y axis labels +qui sum year +local min_year = 2019 +local max_year = r(max) +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway__histogram_gen valid_labHrsWorkWeek if year == `year' , /// + bin(60) den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year' = r(max) + + drop d_valid v2 + +} + +drop labC4 + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year labC4 idPers sim_labHrsWorkWeek demAge using /// + "$dir_data/simulation_sample.dta", clear + + +* Merge in new valid_labHrsWorkWeek worked for top category using log normal distribution +merge 1:1 year run idPers using "$dir_data/simulation_sample_hrs_adjusted.dta" +drop _m + +rename sim_labHrsWorkWeek sim_labHrsWorkWeek_orig +rename new_sim_labHrsWorkWeek sim_labHrsWorkWeek + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" +keep if inrange(demAge,16,65) + +* Combine datasets +append using "$dir_data/temp_valid_stats.dta" + +qui sum year +local min_year = 2019 +local max_year = r(max) +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_labHrsWorkWeek if year == `year', bin(60) /// + den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + twoway (hist sim_labHrsWorkWeek if year == `year' /*& labHrsWorkWeek <= 65*/, width(1) color(green%30) /// + legend(label(1 "SimPaths"))) /// + (hist valid_labHrsWorkWeek if year == `year' /*& valid_labHrsWorkWeek <= 65*/, width(1) color(red%30) /// + legend(label(2 "SILC"))), /// + title("Weekly Hours Worked") /// + subtitle("`year', age 16-65") /// + xtitle("Hours per week", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(small)) /// + ylabel(0(`steps')`max_y', labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of employed and self-employed individuals.", /// + size(vsmall)) + + graph export /// + "$dir_output_files/hours_worked/validation_${country}_hours_worked_hist_`year'_ln.png", /// + replace width(2400) height(1350) + + drop d_sim v1 max_d_sim max_value + +} + +******************************************************************************** +* 2.1.1 : Histograms by year - ages 16-65, by gender +******************************************************************************** + +* Female + +* Prepare validation data +use year dwt labC4 valid_labHrsWorkWeek demMaleFlag demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +keep if demMaleFlag == 0 +keep if inrange(demAge,16,65) + +* Prepare info needed for dynamic y axis labels +qui sum year +local min_year = 2019 +local max_year = r(max) +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway__histogram_gen valid_labHrsWorkWeek if year == `year' , /// + bin(60) den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year' = r(max) + + drop d_valid v2 + +} + +drop labC4 demMaleFlag + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run idPers year labC4 demMaleFlag sim_labHrsWorkWeek demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Merge in new valid_labHrsWorkWeek worked for top category using log normal distribution +merge 1:1 year run idPers using "$dir_data/simulation_sample_hrs_adjusted.dta" +drop _m + +rename sim_labHrsWorkWeek sim_labHrsWorkWeek_orig +rename new_sim_labHrsWorkWeek sim_labHrsWorkWeek + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" +keep if demMaleFlag == 0 +keep if inrange(demAge,16,65) + +* Combine datasets +append using "$dir_data/temp_valid_stats.dta" + +* Plot by year +qui sum year +local min_year = 2019 +local max_year = r(max) +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_labHrsWorkWeek if year == `year', bin(60) /// + den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + twoway (hist sim_labHrsWorkWeek if year == `year', width(1) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_labHrsWorkWeek if year == `year', width(1) color(red%30) /// + legend(label(2 "SILC"))), /// + title("Weekly Hours Worked") /// + subtitle("`year', females, ages 16-65") /// + xtitle("Hours per week", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(small)) /// + ylabel(0(`steps')`max_y', labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of employed and self-employed females.", /// + size(vsmall)) + + graph export /// + "$dir_output_files/hours_worked/validation_${country}_hours_worked_hist_`year'_female_ln.png", /// + replace width(2400) height(1350) + + drop d_sim v1 max_d_sim max_value + +} + + +* Male + +* Prepare validation data +use year dwt labC4 valid_labHrsWorkWeek demMaleFlag demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +keep if demMaleFlag == 1 +keep if inrange(demAge,16,65) + +* Prepare info needed for dynamic y axis labels +qui sum year +local min_year = 2019 +local max_year = r(max) +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway__histogram_gen valid_labHrsWorkWeek if year == `year' , /// + bin(60) den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year' = r(max) + + drop d_valid v2 + +} + +drop labC4 demMaleFlag + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run idPers year labC4 demMaleFlag sim_labHrsWorkWeek demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Merge in new valid_labHrsWorkWeek worked for top category using log normal distribution +merge 1:1 year run idPers using "$dir_data/simulation_sample_hrs_adjusted.dta" +drop _m + +rename sim_labHrsWorkWeek sim_labHrsWorkWeek_orig +rename new_sim_labHrsWorkWeek sim_labHrsWorkWeek + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" +keep if demMaleFlag == 1 +keep if inrange(demAge,16,65) + +* Combine datasets +append using "$dir_data/temp_valid_stats.dta" + +qui sum year +local min_year = 2019 +local max_year = r(max) +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_labHrsWorkWeek if year == `year', bin(60) /// + den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + twoway (hist sim_labHrsWorkWeek if year == `year' , width(1) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_labHrsWorkWeek if year == `year' , width(1) color(red%30) /// + legend(label(2 "SILC"))), /// + title("Weekly Hours Worked") /// + subtitle("`year', males, ages 16-65") /// + xtitle("Hours per week", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(small)) /// + ylabel(0(`steps')`max_y', labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of employed and self-employed males.", /// + size(vsmall)) + + + graph export /// + "$dir_output_files/hours_worked/validation_${country}_hours_worked_hist_`year'_male_ln.png", /// + replace width(2400) height(1350) + + drop d_sim v1 max_d_sim max_value +} + +graph drop _all + + +/* +******************************************************************************** +* LOG-NORMAL HETEROGENIETY WITH SPIKE AT 40 +******************************************************************************** + +******************************************************************************** +* 0 : IMPUTATION OF valid_labHrsWorkWeek WORK FOR THOSE IN TOP CATEGORY, LOG NORMAL WITH MASS +******************************************************************************** +/* +* Load data +use year dwt labC4 valid_labHrsWorkWeek demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) +keep if labC4 == 1 +//remove those that report working very low >0 valid_labHrsWorkWeek + +drop labC4 + +save "$dir_data/temp_valid_stats.dta", replace + +twoway (histogram valid_labHrsWorkWeek, fraction color(ltblue)) + +* Top group only +keep if valid_labHrsWorkWeek > 35 & valid_labHrsWorkWeek != . + +* Proportion of the top group that work 40 valid_labHrsWorkWeek +gen exact_40 = (valid_labHrsWorkWeek == 40) + +preserve +collapse (mean) exact_40 [aw=dwt] +local valid_share_40 = exact_40 +restore + +* Create log variable +gen ln_y = ln(valid_labHrsWorkWeek) + +* Set truncation points (using observed range) +summarize valid_labHrsWorkWeek +local a = r(min) +local b = r(max) +local ln_a = ln(`a') +local ln_b = ln(`b') + +* Estimate parameters +truncreg ln_y, ll(`ln_a') ul(`ln_b') nolog + +* Results +matrix b = e(b) +local mu_hat = b[1,1] +local sigma_hat = b[1,2] +local median_est = exp(`mu_hat') +local mean_est = exp(`mu_hat' + `sigma_hat'^2/2) + +disp "** RESULTS **" +disp "μ (for lnY): " round(`mu_hat', 0.001) +disp "σ (for lnY): " round(`sigma_hat', 0.001) +disp "Estimated median: " round(`median_est', 0.01) +disp "Estimated mean: " round(`mean_est', 0.01) + +* Visualization +range y_plot `a' `b' 150 +gen pdf_fitted = (1/(y_plot*`sigma_hat')) * /// + normalden((ln(y_plot)-`mu_hat')/`sigma_hat') / /// + (normal((`ln_b'-`mu_hat')/`sigma_hat') - /// + normal((`ln_a'-`mu_hat')/`sigma_hat')) + +twoway (histogram valid_labHrsWorkWeek, fraction color(ltblue)) /// + (line pdf_fitted y_plot, color(red) lwidth(*1)), /// + title("Truncated Log-Normal Distribution Fit") /// + legend(order(1 "SILC" 2 "Fitted Distribution")) /// + xtitle("valid_labHrsWorkWeek") /// + ytitle("Density") /// + graphregion(color(white)) /// + note("Note: ", /// + size(vsmall)) + +** Apply to simulated data +* Load simulation data +use run idPers year labC4 sim_labHrsWorkWeek demAge using /// + "$dir_data/simulation_sample.dta", clear + +rename sim_labHrsWorkWeek labHrsWorkWeek + + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" & inrange(demAge,16,65) + +* Observations to be adjusted +gen top = 1 if labHrsWorkWeek > 35 + +* Add log-normal heterogeneity +gen new_labHrsWorkWeek = labHrsWorkWeek + +* Calculate the CDF bounds once +local Fa = normal((`ln_a' - `mu_hat')/`sigma_hat') +local Fb = normal((`ln_b' - `mu_hat')/`sigma_hat') + +* For observations with valid_labHrsWorkWeek >= 36, generate random values +replace new_labHrsWorkWeek = exp(`mu_hat' + `sigma_hat' * /// + invnormal(`Fa' + runiform()*(`Fb' - `Fa'))) /// + if top == 1 +*/ + + +use "$dir_data/simulation_sample_hrs_adjusted", clear + +replace new_labHrsWorkWeek = round(new_labHrsWorkWeek,1) + +twoway(hist new_labHrsWorkWeek) + + +* Proportion at 40 +gen sim_exact_40 = (new_labHrsWorkWeek == 40) + +preserve +mean sim_exact_40 +local sim_40_share = el(r(table),1,1) +restore + +local add_to_40 = `valid_share_40' - `sim_40_share' + +* Identify candidates (36-39 valid_labHrsWorkWeek) +gen candidate = inrange(new_labHrsWorkWeek, 36, 39) if top == 1 + +* Calculate how many to convert +count if top == 1 +local total_top = r(N) +local num_to_convert = round(`total_top' * `add_to_40') + +* Randomly select candidates +gen u = runiform() if candidate == 1 +gsort u +gen convert = (_n <= `num_to_convert') if candidate == 1 + +* Apply conversion +replace new_labHrsWorkWeek = 40 if convert == 1 + +* Clean up +drop u convert candidate + +twoway hist new_labHrsWorkWeek + +rename new_labHrsWorkWeek new_sim_labHrsWorkWeek + +save "$dir_data/simulation_sample_hrs_adjusted_40.dta", replace + +graph drop _all + +******************************************************************************** +* 1 : Mean values over time +******************************************************************************** + +******************************************************************************** +* 1.1 : Mean values over time - Ages 16-65 +******************************************************************************** + +* Prepare validation data +use year dwt labC4 valid_labHrsWorkWeek demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 + +keep if inrange(demAge,16,65) + +* Compute mean +collapse (mean) valid_labHrsWorkWeek [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use run year idPers labC4 sim_labHrsWorkWeek demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) + +* Merge in new valid_labHrsWorkWeek worked for top category using log normal distribution +merge 1:1 year run idPers using "$dir_data/simulation_sample_hrs_adjusted_40.dta" +keep if _m == 3 +drop _m + +rename sim_labHrsWorkWeek sim_labHrsWorkWeek_orig +rename new_sim_labHrsWorkWeek sim_labHrsWorkWeek + +* Keep only employed individuals +keep if labC4 == "EmployedOrSelfEmployed" + +twoway hist sim_labHrsWorkWeek + +* Compute mean and sd +collapse (mean) sim_labHrsWorkWeek, by(run year) + +collapse (mean) sim_labHrsWorkWeek /// + (sd) sim_labHrsWorkWeek_sd = sim_labHrsWorkWeek /// + , by(year) + +* Approx 95% confidence interval +foreach varname in sim_labHrsWorkWeek { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +save "$dir_data/temp_sim_mean_ln_40.dta", replace + +* Combine datasets +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway (rarea sim_labHrsWorkWeek_high sim_labHrsWorkWeek_low year, sort color(green%20) /// + legend(label(1 "SimPaths"))) /// +(line valid_labHrsWorkWeek year, sort color(green) legend(label(2 "SILC"))), /// + title("Average Weekly Hours Worked") /// + subtitle("Ages 16-65 ") /// + xtitle("Year", size(small)) /// + ytitle("Hours per week", size(small)) /// + ylabel(33 [2] 40 ,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of working age employed and self-employed individuals. Log-normal heterogeneity" "imposed on top category with a mass at 40.", /// + size(vsmall)) + +* Save figure +graph export /// +"$dir_output_files/hours_worked/validation_${country}_hours_worked_ts_16_65_both_ln_40.jpg", /// + replace width(2560) height(1440) quality(100) + + +******************************************************************************** +* 1.2 : Mean values over time - Ages 18-65, by gender +******************************************************************************** + +* Males + +* Prepare validation data +use year dwt labC4 valid_labHrsWorkWeek demAge demMaleFlag using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 + +keep if inrange(demAge,16,65) +keep if demMaleFlag == 1 + +* Compute mean +collapse (mean) valid_labHrsWorkWeek [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year idPers labC4 sim_labHrsWorkWeek demAge demMaleFlag using /// + "$dir_data/simulation_sample.dta", clear + +* Merge in new valid_labHrsWorkWeek worked for top category using log normal distribution +merge 1:1 year run idPers using "$dir_data/simulation_sample_hrs_adjusted_40.dta" +keep if _m == 3 +drop _m + +rename sim_labHrsWorkWeek sim_labHrsWorkWeek_orig +rename new_sim_labHrsWorkWeek sim_labHrsWorkWeek + +twoway hist sim_labHrsWorkWeek + +* Select sample +keep if demMaleFlag == +keep if labC4 == "EmployedOrSelfEmployed" +keep if inrange(demAge,16,65) + +* Compute mean and sd +collapse (mean) sim_labHrsWorkWeek, by(run year) + +collapse (mean) sim_labHrsWorkWeek /// + (sd) sim_labHrsWorkWeek_sd = sim_labHrsWorkWeek /// + , by(year) + +* Approx 95% confidence interval +foreach varname in sim_labHrsWorkWeek { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +* Combine datasets +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway (rarea sim_labHrsWorkWeek_high sim_labHrsWorkWeek_low year, sort /// + color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_labHrsWorkWeek year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("Average Weekly Hours Worked") /// + subtitle("Males, ages 16-65") /// + xtitle("Year", size(small)) /// + ytitle("Hours per week", size(small)) /// + ylabel(35 [2] 43 ,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of working age employed and self-employed individuals. Log-normal heterogeneity" "imposed on top category with a mass at 40.", /// + size(vsmall)) + +* Save figure +graph export /// +"$dir_output_files/hours_worked/validation_${country}_hours_worked_ts_16_65_male_ln_40.jpg", /// + replace width(2560) height(1440) quality(100) + + +* Females + +* Prepare validation data +use year dwt labC4 valid_labHrsWorkWeek demAge demMaleFlag using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +keep if inrange(demAge,16,65) +keep if demMaleFlag == 0 + +* Compute mean +collapse (mean) valid_labHrsWorkWeek [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year idPers labC4 sim_labHrsWorkWeek demAge demMaleFlag using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if inrange(demAge,16,65) + +* Merge in new valid_labHrsWorkWeek worked for top category using log normal distribution +merge 1:1 year run idPers using "$dir_data/simulation_sample_hrs_adjusted_40.dta" +drop _m + +rename sim_labHrsWorkWeek sim_labHrsWorkWeek_orig +rename new_sim_labHrsWorkWeek sim_labHrsWorkWeek + +* Select sample +keep if demMaleFlag == 0 +keep if labC4 == "EmployedOrSelfEmployed" +keep if inrange(demAge,16,65) + +twoway hist sim_labHrsWorkWeek + +* Compute mean and sd +collapse (mean) sim_labHrsWorkWeek, by(run year) + +collapse (mean) sim_labHrsWorkWeek /// + (sd) sim_labHrsWorkWeek_sd = sim_labHrsWorkWeek /// + , by(year) + +* Approx 95% confidnece interval +foreach varname in sim_labHrsWorkWeek { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + + *Combine datasets +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway (rarea sim_labHrsWorkWeek_high sim_labHrsWorkWeek_low year, /// + sort color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_labHrsWorkWeek year, sort color(green) legend(label(2 "SILC"))), /// + title("Average Weekly Hours Worked") /// + subtitle("Females, ages 16-65") /// + xtitle("Year", size(small)) /// + ytitle("Hours per week", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of working age employed and self-employed individuals. Log-normal heterogeneity" "imposed on top category with a mass at 40.", /// + size(vsmall)) + +* Save figure +graph export /// +"$dir_output_files/hours_worked/validation_${country}_hours_worked_ts_16_65_female_ln_40.jpg", /// + replace width(2560) height(1440) quality(100) + + + +******************************************************************************** +* 2 : Histograms by year +******************************************************************************** + +******************************************************************************** +* 2.1 : Histograms by year - ages 16-65 +******************************************************************************** + +* Prepare validation data +use year dwt labC4 valid_labHrsWorkWeek demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +keep if inrange(demAge,16,65) + +* Prepare info needed for dynamic y axis labels +qui sum year +local min_year = 2019 +local max_year = r(max) +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway__histogram_gen valid_labHrsWorkWeek if year == `year' , /// + bin(60) den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year' = r(max) + + drop d_valid v2 + +} + +drop labC4 + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year labC4 idPers sim_labHrsWorkWeek demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" +keep if inrange(demAge,16,65) + +* Merge in new valid_labHrsWorkWeek worked for top category using log normal distribution +merge 1:1 year run idPers using "$dir_data/simulation_sample_hrs_adjusted_40.dta" +keep if _m == 3 +drop _m + +rename sim_labHrsWorkWeek sim_labHrsWorkWeek_orig +rename new_sim_labHrsWorkWeek sim_labHrsWorkWeek + +twoway hist sim_labHrsWorkWeek + +* Combine datasets +append using "$dir_data/temp_valid_stats.dta" + +qui sum year +local min_year = 2019 +local max_year = r(max) +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_labHrsWorkWeek if year == `year', bin(60) /// + den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + twoway (hist sim_labHrsWorkWeek if year == `year', width(1) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_labHrsWorkWeek if year == `year', width(1) color(red%30) /// + legend(label(2 "SILC"))), /// + title("Weekly Hours Worked") /// + subtitle("Ages 16-65, `year'") /// + xtitle("Hours per week", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(small)) /// + ylabel(0(`steps')`max_y', labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of employed and self-employed individuals.", /// + size(vsmall)) + + graph export /// + "$dir_output_files/hours_worked/validation_${country}_hours_worked_hist_`year'_ln_40.png", /// + replace width(2400) height(1350) + + drop d_sim v1 max_d_sim max_value + +} + +******************************************************************************** +* 2.1 : Histograms by year - ages 16-65, by gender +******************************************************************************** + +* Female + +* Prepare validation data +use year dwt labC4 valid_labHrsWorkWeek demMaleFlag demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +keep if demMaleFlag == 0 +keep if inrange(demAge,16,65) + + +* Prepare info needed for dynamic y axis labels +qui sum year +local min_year = 2019 +local max_year = r(max) +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway__histogram_gen valid_labHrsWorkWeek if year == `year' , /// + bin(60) den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year' = r(max) + + drop d_valid v2 + +} + +drop labC4 demMaleFlag + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run idPers year labC4 demMaleFlag sim_labHrsWorkWeek demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Merge in new valid_labHrsWorkWeek worked for top category using log normal distribution +merge 1:1 year run idPers using "$dir_data/simulation_sample_hrs_adjusted_40.dta" +keep if _m == 3 +drop _m + +rename sim_labHrsWorkWeek sim_labHrsWorkWeek_orig +rename new_sim_labHrsWorkWeek sim_labHrsWorkWeek + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" +keep if demMaleFlag == 0 +keep if inrange(demAge,16,65) + +* Combine datasets +append using "$dir_data/temp_valid_stats.dta" + +* Plot by year +qui sum year +local min_year = 2019 +local max_year = r(max) +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_labHrsWorkWeek if year == `year', bin(60) /// + den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + twoway (hist sim_labHrsWorkWeek if year == `year', width(1) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_labHrsWorkWeek if year == `year', width(1) color(red%30) /// + legend(label(2 "SILC"))), /// + title("Weekly Hours Worked") /// + subtitle("`year', females, ages 16-65") /// + xtitle("Hours per week", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(small)) /// + ylabel(0(`steps')`max_y', labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of employed and self-employed females.", /// + size(vsmall)) + + graph export /// + "$dir_output_files/hours_worked/validation_${country}_hours_worked_hist_`year'_female_ln_40.png", /// + replace width(2400) height(1350) + + drop d_sim v1 max_d_sim max_value + +} + + +* Male + +* Prepare validation data +use year dwt labC4 valid_labHrsWorkWeek demMaleFlag demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +keep if demMaleFlag == 1 +keep if inrange(demAge,16,65) + + +* Prepare info needed for dynamic y axis labels +qui sum year +local min_year = 2019 +local max_year = r(max) +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + twoway__histogram_gen valid_labHrsWorkWeek if year == `year' , /// + bin(60) den gen(d_valid v2) + + qui sum d_valid + gen max_d_valid_`year' = r(max) + + drop d_valid v2 + +} + +drop labC4 demMaleFlag + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run idPers year labC4 demMaleFlag sim_labHrsWorkWeek demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Merge in new valid_labHrsWorkWeek worked for top category using log normal distribution +merge 1:1 year run idPers using "$dir_data/simulation_sample_hrs_adjusted_40.dta" +keep if _m == 3 +drop _m + +rename sim_labHrsWorkWeek sim_labHrsWorkWeek_orig +rename new_sim_labHrsWorkWeek sim_labHrsWorkWeek + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" +keep if demMaleFlag == 1 +keep if inrange(demAge,16,65) + +* Combine datasets +append using "$dir_data/temp_valid_stats.dta" + +qui sum year +local min_year = 2019 +local max_year = r(max) +if "$max_sim_year" != "" local max_year = $max_sim_year + + +forval year = `min_year'/`max_year' { + + * Prepare info needed for dynamic y axis labels + twoway__histogram_gen sim_labHrsWorkWeek if year == `year', bin(60) /// + den gen(d_sim v1) + + qui sum d_sim + gen max_d_sim = r(max) + + gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim + replace max_value = max_d_sim if max_value == . + + sum max_value + local max_y = 1.25*r(max) + local steps = `max_y'/2 + + twoway (hist sim_labHrsWorkWeek if year == `year' , width(1) /// + color(green%30) legend(label(1 "SimPaths"))) /// + (hist valid_labHrsWorkWeek if year == `year' , width(1) color(red%30) /// + legend(label(2 "SILC"))), /// + title("Weekly Hours Worked") /// + subtitle("`year', males, ages 16-65") /// + xtitle("Hours per week", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(small)) /// + ylabel(0(`steps')`max_y', labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of employed and self-employed males.", /// + size(vsmall)) + + + graph export /// + "$dir_output_files/hours_worked/validation_${country}_hours_worked_hist_`year'_male_ln_40.png", /// + replace width(2400) height(1350) + + drop d_sim v1 max_d_sim max_value +} + + +graph drop _all diff --git a/validation/02_simulation_validation/do_files/04_10_1_plot_hours_worked_discrete.do b/validation/02_simulation_validation/do_files/04_10_1_plot_hours_worked_discrete.do new file mode 100644 index 0000000..b133fbc --- /dev/null +++ b/validation/02_simulation_validation/do_files/04_10_1_plot_hours_worked_discrete.do @@ -0,0 +1,351 @@ +/******************************************************************************* +* PROJECT: SimPaths EU +* SECTION: Validation +* OBJECT: Hours worked (discrete) +* AUTHORS: Ashley Burdett +* LAST UPDATE: Jan 2026 +* COUNTRY: EU +******************************************************************************** +* NOTES: Need to update to acocunt for additional labour supply +* categories +*******************************************************************************/ + +******************************************************************************** +* 1 : Distribution +******************************************************************************** + +******************************************************************************** +* 1.1 : Distribution, 16-65 +******************************************************************************** + +* Comparison of the discretized labour supply hours + +* Load SILC data +use year dwt labC4 valid_cat_hours valid_labHrsWorkEnum_no demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +drop if valid_labHrsWorkEnum_no == 0 +keep if inrange(demAge,16,65) + +* Hours dummies +tab valid_cat_hours, gen(hours_cat_) + +* Calculate weighted proportions +collapse (mean) hours_cat_* [aw=dwt] + +gen sim = 0 + +save "$dir_data/valid_props", replace + +* Prepare simulated data +use run year labC4 idPers sim_cat_hours sim_labHrsWorkEnum_no demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" +drop if sim_labHrsWorkEnum_no == 0 +keep if inrange(demAge,16,65) + +* Hours dummies +tab sim_cat_hours, gen(hours_cat_) + +* Calculate proportions +collapse (mean) hours_cat_*, by(run) + +collapse (mean) hours_cat_* /// + (sd) hours_cat_1_sd = hours_cat_1 /// + hours_cat_2_sd = hours_cat_2 /// + hours_cat_3_sd = hours_cat_3 + +foreach varname in hours_cat_1 hours_cat_2 hours_cat_3 /// + { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +gen sim = 1 + +* Combine datasets +append using "$dir_data/valid_props" + +* Plot +reshape long hours_cat_@ hours_cat_@_high hours_cat_@_low, i(sim) j(category) + +gen prop_SILC = hours_cat_ if sim == 0 +gen prop_sim = hours_cat_ if sim == 1 + +gen x_SILC = category - 0.2 +gen x_sim = category + 0.2 + +twoway (bar prop_SILC x_SILC, barw(0.4) color(red%50)) /// + (bar prop_sim x_sim, barw(0.4) color(green%50)) /// + (rcap hours_cat__high hours_cat__low x_sim, lcolor(green)), /// + xlabel(1/3, valuelabel) /// + xtitle("Hours Category", size(small)) /// + ytitle("Proportion", size(small)) /// + title("Share in Each Labour Hours Category") /// + subtitle("Ages 16-65") /// + legend(order(1 "SILC" 2 "SimPaths" 3 "95% CI")) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Years 2011-2023.", size(vsmall)) + +graph export /// + "$dir_output_files/hours_worked/validation_${country}_hours_worked_cat_all.png", /// + replace width(2400) height(1350) + + +******************************************************************************** +* 1.2 : Distribution, 16-65 by year +******************************************************************************** + +* Load SILC data +use year dwt labC4 valid_cat_hours valid_labHrsWorkEnum_no demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +drop if valid_labHrsWorkEnum_no == 0 +keep if inrange(demAge,16,65) + +tab valid_cat_hours, gen(hours_cat_) + +* Calculate weighted proportions by year +collapse (mean) hours_cat_* [aw=dwt], by(year) + +gen sim = 0 + +save "$dir_data/valid_props", replace + +* Load SimPaths data +use run year labC4 idPers sim_cat_hours sim_labHrsWorkEnum_no demAge using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" +drop if sim_labHrsWorkEnum_no == 0 +keep if inrange(demAge,16,65) + +tab sim_cat_hours, gen(hours_cat_) + +* Calculate proportions by run and year +collapse (mean) hours_cat_*, by(run year) + +* Calculate Mean and SD across runs by year +collapse (mean) hours_cat_* /// + (sd) hours_cat_1_sd = hours_cat_1 /// + hours_cat_2_sd = hours_cat_2 /// + hours_cat_3_sd = hours_cat_3, by(year) + +foreach varname in hours_cat_1 hours_cat_2 hours_cat_3 /// + { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd +} + +gen sim = 1 + +* Combine datasets +append using "$dir_data/valid_props" + +* Plot +reshape long hours_cat_@ hours_cat_@_high hours_cat_@_low, i(sim year) /// + j(category) + +gen prop_SILC = hours_cat_ if sim == 0 +gen prop_sim = hours_cat_ if sim == 1 +gen x_SILC = category - 0.2 +gen x_sim = category + 0.2 + +levelsof year, local(years) +foreach y in `years' { + + twoway (bar prop_SILC x_SILC if year == `y', barw(0.4) color(red%50)) /// + (bar prop_sim x_sim if year == `y', barw(0.4) color(green%50)) /// + (rcap hours_cat__high hours_cat__low x_sim if year == `y', lcolor(green)), /// + xlabel(1/3, valuelabel) /// + xtitle("Hours Category", size(small)) /// + ytitle("Proportion", size(small)) /// + title("Share in Each Labour Hours Category") /// + subtitle("`y'") /// + legend(order(1 "SILC" 2 "SimPaths" 3 "95% CI")) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Ages 16-65. ", size(vsmall)) + + graph export /// + "$dir_output_files/hours_worked/validation_${country}_hours_worked_cat_`y'.png", /// + replace width(2400) height(1350) +} + +graph drop _all + + +******************************************************************************** +* 1.3 : Distribution, 16-65 by year, by gender +******************************************************************************** + +* Load SILC data +use year dwt labC4 valid_cat_hours valid_labHrsWorkEnum_no demAge /// + demMaleFlag using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +keep if labC4 == 1 +drop if valid_labHrsWorkEnum_no == 0 +keep if inrange(demAge,16,65) + +tab valid_cat_hours, gen(hours_cat_) + +* Calculate weighted proportions by year and gender +collapse (mean) hours_cat_* [aw=dwt], by(year demMaleFlag) + +gen sim = 0 + +save "$dir_data/valid_props", replace + +* Load SimPaths data +use run year labC4 idPers sim_cat_hours sim_labHrsWorkEnum_no demAge /// + demMaleFlag using "$dir_data/simulation_sample.dta", clear + +* Select sample +keep if labC4 == "EmployedOrSelfEmployed" +drop if sim_labHrsWorkEnum_no == 0 +keep if inrange(demAge,16,65) + +tab sim_cat_hours, gen(hours_cat_) + +* Calculate proportions and SD by run, year and gender +collapse (mean) hours_cat_*, by(run year demMaleFlag) + +* Calculate Mean and SD across runs by year and gender +collapse (mean) hours_cat_* /// + (sd) hours_cat_1_sd = hours_cat_1 /// + hours_cat_2_sd = hours_cat_2 /// + hours_cat_3_sd = hours_cat_3, by(year demMaleFlag) + +foreach varname in hours_cat_1 hours_cat_2 hours_cat_3 /// + { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd +} + +gen sim = 1 + +* Combine datasets +append using "$dir_data/valid_props" + +* PLot +* Note: Added demMaleFlag to the identifier i() +reshape long hours_cat_@ hours_cat_@_high hours_cat_@_low, /// + i(sim year demMaleFlag) j(category) + +gen prop_SILC = hours_cat_ if sim == 0 +gen prop_sim = hours_cat_ if sim == 1 +gen x_SILC = category - 0.2 +gen x_sim = category + 0.2 + +* Label gender for plot titles +label define sex_lbl 0 "Females" 1 "Males" +label values demMaleFlag sex_lbl + +levelsof year, local(years) +levelsof demMaleFlag, local(sexes) + +foreach y in `years' { + foreach s in `sexes' { + + * Get the label text + local sextext : label sex_lbl `s' + + twoway (bar prop_SILC x_SILC if year == `y' & demMaleFlag == `s', /// + barw(0.4) color(red%50)) /// + (bar prop_sim x_sim if year == `y' & demMaleFlag == `s', /// + barw(0.4) color(green%50)) /// + (rcap hours_cat__high hours_cat__low x_sim if year == `y' & /// + demMaleFlag == `s', lcolor(green)), /// + xlabel(1/3, valuelabel) /// + xtitle("Hours Category", size(small)) /// + ytitle("Proportion", size(small)) /// + title("Share in Each Labour Hours Category") /// + subtitle("`y', `sextext'") /// + legend(order(1 "SILC" 2 "SimPaths" 3 "95% CI")) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Comparison for `sextext' in `y'.", size(vsmall)) + + graph export /// + "$dir_output_files/hours_worked/validation_${country}_hours_worked_cat_`y'_`sextext'.png", /// + replace width(2400) height(1350) + } + +} + +graph drop _all + + +/* +******************************************************************************** +* 3 : Mean hours of work, comparison across all options explored, 16-655 +******************************************************************************** + +use "$dir_data/temp_valid_mean_disc", replace + +merge 1:1 year using "$dir_data/temp_valid_mean", nogen + +rename hours valid_hours + +merge 1:1 year using "$dir_data/temp_sim_mean_uni.dta", nogen + +drop sim_labHrsWorkWeek_sim sim_labHrsWorkWeek_sim_sd +rename sim_labHrsWorkWeek_sim_high sim_labHrsWorkWeek_sim_u_high +rename sim_labHrsWorkWeek_sim_low sim_labHrsWorkWeek_sim_u_low + +merge 1:1 year using "$dir_data/temp_sim_mean_ln.dta", nogen + +drop sim_labHrsWorkWeek_sim sim_labHrsWorkWeek_sim_sd +rename sim_labHrsWorkWeek_sim_high sim_labHrsWorkWeek_sim_ln_high +rename sim_labHrsWorkWeek_sim_low sim_labHrsWorkWeek_sim_ln_low + +merge 1:1 year using "$dir_data/temp_sim_mean_ln_40.dta", nogen + +drop sim_labHrsWorkWeek_sim sim_labHrsWorkWeek_sim_sd +rename sim_labHrsWorkWeek_sim_high sim_labHrsWorkWeek_sim_40_high +rename sim_labHrsWorkWeek_sim_low sim_labHrsWorkWeek_sim_40_low + + +* Plot comparison +drop if year < 2011 + +twoway (line valid_hours year, sort color(cranberry) /// + legend(label(1 "SILC, continuous"))) /// +(rarea sim_labHrsWorkWeek_sim_u_high sim_labHrsWorkWeek_sim_u_low year, sort color(purple%20) /// + legend(label(2 "Simulated, uniform"))) /// +(rarea sim_labHrsWorkWeek_sim_ln_high sim_labHrsWorkWeek_sim_ln_low year, sort color(blue%20) /// + legend(label(3 "Simulated, log normal"))) /// +(rarea sim_labHrsWorkWeek_sim_40_high sim_labHrsWorkWeek_sim_40_low year, sort color(green%20) /// + legend(label(4 "Simulated, log normal + 40"))), /// + title("Average Weekly Hours Worked") /// + subtitle("Ages 16-65 ") /// + xtitle("Year", size(small)) /// + ytitle("Hours per week", size(small)) /// + ylabel(33 [2] 40 ,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Statistics calculated on sample of working age employed and self-employed individuals.", /// + size(vsmall)) + +graph export /// + "$dir_output_files/hours_worked/validation_${country}_hours_worked_comparison.png", /// + replace width(2400) height(1350) + + +//(line valid_disc_hours year, sort color(blue) /// +// legend(label(2 "SILC, discretized"))) /// + \ No newline at end of file diff --git a/validation/02_simulation_validation/do_files/04_11_plot_income_shares.do b/validation/02_simulation_validation/do_files/04_11_plot_income_shares.do new file mode 100644 index 0000000..f54929e --- /dev/null +++ b/validation/02_simulation_validation/do_files/04_11_plot_income_shares.do @@ -0,0 +1,448 @@ +/******************************************************************************* +* PROJECT: SimPaths EU +* SECTION: Validation +* OBJECT: Income shares +* AUTHORS: Patryk Bronka, Ashley Burdett +* LAST UPDATE: Jan 2026 +* COUNTRY: PL +******************************************************************************** +* NOTES: This do file plots simulated and observed income shares and +* incomes by deciles of gross income +* Altered pension age to 65 + +*******************************************************************************/ + +** SimPaths output + + +* Create variables + +use "$dir_data/simulation_sample.dta", clear + +* keep only one observation per benefit unit +sort run idBu year + +* Check data structure +count if year == year[_n-1] & idBu == idBu[_n-1] & /// + sim_yEmpBuGrossLevelYear != sim_yEmpBuGrossLevelYear[_n-1] + +count if year == year[_n-1] & idBu == idBu[_n-1] & /// + sim_yCapitalBuLevelYear != sim_yCapitalBuLevelYear[_n-1] + +count if year == year[_n-1] & idBu == idBu[_n-1] & /// + sim_yPensBuGrossLevelYear != sim_yPensBuGrossLevelYear[_n-1] + + +* Variables of interest + +* Keep one observation per benefit unit +* Create a marker for the first observation in each group +bys run year idBu (demAge): gen byte to_keep = (_n == _N) + +* Keep only the marked rows +keep if to_keep == 1 +drop to_keep + +* Create gross income deciles +sort run idPers year + +xtile sim_decile = sim_yNonBenBuGrossLevelYear , n(10) + +tab sim_decile + +gen zero_gross = (sim_yNonBenBuGrossLevelYear == 0) +tab zero_gross + +/* +8.6% of observations have 0 gross non-benefit income. +Equal shares in deciles. +*/ + +* Genrate share variables +gen sim_share_emp = sim_yEmpBuGrossLevelYear / sim_yNonBenBuGrossLevelYear +gen sim_share_cap = sim_yCapitalBuLevelYear / sim_yNonBenBuGrossLevelYear +gen sim_share_pen = sim_yPensBuGrossLevelYear / sim_yNonBenBuGrossLevelYear + +order idPers idBu year sim_yNonBenBuGrossLevelYear sim_yEmpBuGrossLevelYear /// + sim_yCapitalBuLevelYear sim_yPensBuGrossLevelYear sim_yDispBuLevelYear /// + sim_share_* + +gen check1 = sim_share_emp + sim_share_cap + sim_share_pen +sum check1, det + +order idPers idBu year sim_decile sim_yNonBenBuGrossLevelYear /// + sim_yEmpBuGrossLevelYear sim_yCapitalBuLevelYear /// + sim_yPensBuGrossLevelYear check1 sim_share_* + +drop check* + + +* Plots + +* Sources + +* All +graph bar (mean) sim_yEmpBuGrossLevelYear (mean) sim_yCapitalBuLevelYear /// + (mean) sim_yPensBuGrossLevelYear, over(sim_decile) stack /// + title("SimPaths") /// + legend(order(1 "Labour" 2 "Capital" 3 "Private Pension") /// + position(6) rows(1)) /// + name(simulated_income_comp_all, replace) /// + b1title("Decile", size(small)) /// + ytitle("Euro") /// + ylabel(,labsize(small)) /// + graphregion(color(white)) + +* Oldest <= 65 +preserve + +drop if demAge > 65 + +graph bar (mean) sim_yEmpBuGrossLevelYear (mean) sim_yCapitalBuLevelYear /// + (mean) sim_yPensBuGrossLevelYear, over(sim_decile) stack /// + title("SimPaths") /// + legend(order(1 "Labour" 2 "Capital" 3 "Private Pension") /// + position(6) rows(1)) /// + name(simulated_income_comp_upto65, replace) /// + b1title("Decile", size(small)) /// + ytitle("Euro") /// + ylabel(#5, format(%12.0fc)) /// + ylabel(,labsize(small)) /// + graphregion(color(white)) + +restore + +* Oldest > 65 +preserve + +drop if demAge <= 65 + +graph bar (mean) sim_yEmpBuGrossLevelYear (mean) sim_yCapitalBuLevelYear /// + (mean) sim_yPensBuGrossLevelYear, over(sim_decile) stack /// + title("SimPaths") /// + legend(order(1 "Labour" 2 "Capital" 3 "Private Pension") /// + position(6) rows(1)) /// + name(simulated_income_comp_66plus, replace) /// + b1title("Decile", size(small)) /// + ytitle("Euro") /// + ylabel(#5, format(%12.0fc)) /// + ylabel(,labsize(small)) /// + graphregion(color(white)) + +restore + +* Shares +preserve + +collapse (mean) sim_share_emp sim_share_cap sim_share_pen, /// + by(sim_decile) + +graph bar (asis) sim_share_emp sim_share_cap sim_share_pen, /// + over(sim_decile) stack /// + title("SimPaths") /// + legend(label(1 "Employment") label(2 "Capital") /// + label(3 "Private Pension")) /// + name(simulated_income_share_all, replace) /// + graphregion(color(white)) + +restore + +* Age <= 65 +preserve + +keep if demAge <= 65 + +collapse (mean) sim_share_emp sim_share_cap sim_share_pen, /// + by(sim_decile) + +graph bar (asis) sim_share_emp sim_share_cap sim_share_pen, /// + over(sim_decile) stack /// + title("SimPaths") /// + legend(label(1 "Employment") label(2 "Capital") /// + label(3 "Private Pension")) /// + name(simulated_income_share_upto65, replace) /// + graphregion(color(white)) + +restore + +* Age > 65 +preserve + +keep if demAge > 65 + +collapse (mean) sim_share_emp sim_share_cap sim_share_pen, /// + by(sim_decile) + +graph bar (asis) sim_share_emp sim_share_cap sim_share_pen, /// + over(sim_decile) stack /// + title("SimPaths") /// + legend(label(1 "Employment") label(2 "Capital") /// + label(3 "Private Pension")) /// + name(simulated_income_share_66plus, replace) /// + graphregion(color(white)) + +restore + + +** SILC data + +use "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* keep only one observation per benefit unit +sort idBu year + +* Check data structure +count if year == year[_n-1] & idBu == idBu[_n-1] & /// + valid_yEmpBuGrossLevelYear != valid_yEmpBuGrossLevelYear[_n-1] + +count if year == year[_n-1] & idBu == idBu[_n-1] & /// + valid_yCapitalBuLevelYear != valid_yCapitalBuLevelYear[_n-1] + +count if year == year[_n-1] & idBu == idBu[_n-1] & /// + valid_yPensBuGrossLevelYear != valid_yPensBuGrossLevelYear[_n-1] + +* Variable of interest + +* Keep one observation per benefit unit +* Create a marker for the first observation in each group +bys year idBu (demAge): gen byte to_keep = (_n == _N) + +* Keep only the marked rows +keep if to_keep == 1 +drop to_keep + +* Create gross income deciles +sort idPers year + +xtile valid_decile = valid_yNonBenBuGrossLevelYear , n(10) + +tab valid_decile + +gen zero_gross = (valid_yNonBenBuGrossLevelYear == 0) +tab zero_gross + +/* +19.36% of observations have 0 gross non-benefit income. +Non-equal shares in in bottom two deciles +*/ + +* Genrate share variables +gen valid_share_emp = valid_yEmpBuGrossLevelYear / valid_yNonBenBuGrossLevelYear +gen valid_share_cap = valid_yCapitalBuLevelYear / valid_yNonBenBuGrossLevelYear +gen valid_share_pen = valid_yPensBuGrossLevelYear / valid_yNonBenBuGrossLevelYear + +order idPers idBu year valid_yNonBenBuGrossLevelYear valid_yEmpBuGrossLevelYear /// + valid_yCapitalBuLevelYear valid_yPensBuGrossLevelYear valid_yDispBuLevelYear /// + valid_share_* + +gen check1 = valid_share_emp + valid_share_cap + valid_share_pen +sum check1, det + +order idPers idBu year valid_decile valid_yNonBenBuGrossLevelYear /// + valid_yEmpBuGrossLevelYear valid_yCapitalBuLevelYear /// + valid_yPensBuGrossLevelYear check1 valid_share_* + +drop check* + +* Plots + +* Sources + +* All +graph bar (mean) valid_yEmpBuGrossLevelYear (mean) valid_yCapitalBuLevelYear /// + (mean) valid_yPensBuGrossLevelYear [aw = dwt], over(valid_decile) stack /// + title("SILC") /// + legend(order(1 "Labour" 2 "Capital" 3 "Private Pension") /// + position(6) rows(1)) /// + name(SILC_income_comp_all, replace) /// + b1title("Decile", size(small)) /// + ytitle("Euro") /// + ylabel(#5, format(%12.0fc)) /// + ylabel(,labsize(small)) /// + graphregion(color(white)) + +* Oldest <= 65 +preserve + +drop if demAge > 65 + +graph bar (mean) valid_yEmpBuGrossLevelYear (mean) valid_yCapitalBuLevelYear /// + (mean) valid_yPensBuGrossLevelYear [aw = dwt], over(valid_decile) stack /// + title("SILC") /// + legend(order(1 "Labour" 2 "Capital" 3 "Private Pension") /// + position(6) rows(1)) /// + name(SILC_income_comp_upto65, replace) /// + b1title("Decile", size(small)) /// + ytitle("Euro") /// + ylabel(#5, format(%12.0fc)) /// + ylabel(,labsize(small)) /// + graphregion(color(white)) + +restore + +* Oldest > 65 +preserve + +drop if demAge <= 65 + +graph bar (mean) valid_yEmpBuGrossLevelYear (mean) valid_yCapitalBuLevelYear /// + (mean) valid_yPensBuGrossLevelYear [aw = dwt], over(valid_decile) stack /// + title("SILC") /// + legend(order(1 "Labour" 2 "Capital" 3 "Private Pension") /// + position(6) rows(1)) /// + name(SILC_income_comp_66plus , replace) /// + b1title("Decile", size(small)) /// + ytitle("Euro") /// + ylabel(#5, format(%12.0fc)) /// + ylabel(,labsize(small)) /// + graphregion(color(white)) + +restore + +* Shares + +* All ages +preserve + +collapse (mean) valid_share_emp valid_share_cap valid_share_pen [aw = dwt], /// + by(valid_decile) + +graph bar (asis) valid_share_emp valid_share_cap valid_share_pen, /// + over(valid_decile) stack /// + title("SILC") /// + legend(label(1 "Employment") label(2 "Capital") /// + label(3 "Private Pension")) /// + name(SILC_income_share_all, replace) /// + graphregion(color(white)) +restore + +* Age <= 65 +preserve + +keep if demAge <= 65 + +collapse (mean) valid_share_emp valid_share_cap valid_share_pen [aw = dwt], /// + by(valid_decile) + +graph bar (asis) valid_share_emp valid_share_cap valid_share_pen, /// + over(valid_decile) stack /// + title("SILC") /// + legend(label(1 "Employment") label(2 "Capital") /// + label(3 "Private Pension")) /// + name(SILC_income_share_upto65, replace) /// + graphregion(color(white)) + +restore + +* Age > 65 +preserve + +keep if demAge > 65 + +collapse (mean) valid_share_emp valid_share_cap valid_share_pen [aw = dwt], /// + by(valid_decile) + +graph bar (asis) valid_share_emp valid_share_cap valid_share_pen, /// + over(valid_decile) stack /// + title("SILC") /// + legend(label(1 "Employment") label(2 "Capital") /// + label(3 "Private Pension")) /// + name(SILC_income_share_66plus, replace) /// + graphregion(color(white)) + +restore + + +* Combine plots + +* Composition - All +grc1leg simulated_income_comp_all SILC_income_comp_all, /// + rows(1) ycommon /// + legendfrom(simulated_income_comp_all) /// + graphregion(color(white)) /// + title("Average Benefit Unit Gross Income Sources") /// + subtitle("All age") /// + note("NOTE: Benefit income excluded. Values in real 2015 amounts.", /// + size(vsmall)) + +graph export /// + "$dir_output_files/income/income_shares/validation_${country}_income_levels_all.png", /// + replace width(2400) height(1350) + +* Composition - Working age +grc1leg simulated_income_comp_upto65 SILC_income_comp_upto65, /// + rows(1) ycommon /// + legendfrom(simulated_income_comp_upto65) /// + graphregion(color(white)) /// + title("Average Benefit Unit Gross Income Sources") /// + subtitle("Oldest Working Age, <=65") /// + note("NOTE: Benefit income excluded. Values in real 2015 amounts.", /// + size(vsmall)) + +graph export /// + "$dir_output_files/income/income_shares/validation_${country}_income_levels_upto65.png", /// + replace width(2400) height(1350) + + +* Composition - Retirement age +grc1leg simulated_income_comp_66plus SILC_income_comp_66plus, /// + rows(1) ycommon /// + legendfrom(simulated_income_comp_66plus) /// + graphregion(color(white)) /// + title("Average Benefit Unit Gross Income Sources") /// + subtitle("Oldest Age > 65") /// + note("NOTE: Benefit income excluded. Values in real 2015 amounts.", /// + size(vsmall)) + +graph export /// + "$dir_output_files/income/income_shares/validation_${country}_income_levels_66plus.png", /// + replace width(2400) height(1350) + + + +* Shares - All +grc1leg simulated_income_share_all SILC_income_share_all, /// + rows(1) ycommon /// + legendfrom(simulated_income_share_all) /// + graphregion(color(white)) /// + title("Average Benefit Unit Gross Income Shares") /// + subtitle("All age") /// + note("NOTE: Benefit income excluded. Values in real 2015 amounts.", /// + size(vsmall)) + +graph export /// + "$dir_output_files/income/income_shares/validation_${country}_income_shares_all.png", /// + replace width(2400) height(1350) + +* Shares - Working age +grc1leg simulated_income_share_upto65 SILC_income_share_upto65, /// + rows(1) ycommon /// + legendfrom(simulated_income_share_upto65) /// + graphregion(color(white)) /// + title("Average Benefit Unit Gross Income Shares") /// + subtitle("Oldest Working Age, <=65") /// + note("NOTE: Benefit income excluded. Values in real 2015 amounts.", /// + size(vsmall)) + +graph export /// + "$dir_output_files/income/income_shares/validation_${country}_income_sharess_upto65.png", /// + replace width(2400) height(1350) + + +* Shares - Retirement age +grc1leg simulated_income_share_66plus SILC_income_share_66plus, /// + rows(1) ycommon /// + legendfrom(simulated_income_share_66plus) /// + graphregion(color(white)) /// + title("Average Benefit Unit Gross Income Shares") /// + subtitle("Oldest Age > 65") /// + note("NOTE: Benefit income excluded. Values in real 2015 amounts.", /// + size(vsmall)) + +graph export /// + "$dir_output_files/income/income_shares/validation_${country}_income_sharess_66plus.png", /// + replace width(2400) height(1350) + + +graph drop _all + diff --git a/validation/02_simulation_validation/do_files/04_12_plot_partnership_status.do b/validation/02_simulation_validation/do_files/04_12_plot_partnership_status.do new file mode 100644 index 0000000..999a13b --- /dev/null +++ b/validation/02_simulation_validation/do_files/04_12_plot_partnership_status.do @@ -0,0 +1,613 @@ +******************************************************************************** +* PROJECT: SimPaths EU +* SECTION: Validation +* OBJECT: Partnership +* AUTHORS: Ashley Burdett +* LAST UPDATE: Jan 2026 +* COUNTRY: PL +******************************************************************************** +* NOTES: +******************************************************************************** + +******************************************************************************** +* 1 : Mean values over time +******************************************************************************** + +******************************************************************************** +* 1.1 : Mean values over time - ages 18-65 +******************************************************************************** + +* Prepare validation data +use year demAge dwt valid_partnered valid_single using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) + +* Compute shares +collapse (mean) valid_partnered valid_single [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare SimPaths data +use run year demAge sim_partnered sim_single using /// + "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,18,65) + +* Compute shares and sd +collapse (mean) sim_partnered sim_single /// + , by(run year) + +collapse (mean) sim_partnered sim_single /// + (sd) sim_partnered_sd = sim_partnered /// + sim_single_sd = sim_single /// + , by(year) + +* Approx 95% confidence interval +foreach varname in sim_partnered sim_single { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +* Combine datasets +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + + +* Plot figures +* Share partnered +twoway (rarea sim_partnered_high sim_partnered_low year, sort color(green%20) /// + legend(label(1 "SimPaths"))) /// +(line valid_partnered year, sort color(green) /// + legend(label(2 "SILC "))), /// + title("Partnered") /// + subtitle("Ages 18-65") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + xlabel(,labsize(small)) /// + ylabel(0[0.1]0.9, labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: ", size(vsmall)) + +* Save figure +graph export /// +"$dir_output_files/partnership/validation_${country}_partnered_ts_18_65_both.jpg", /// + replace width(2400) height(1350) quality(100) + +* Partnership status shares +twoway (rarea sim_partnered_high sim_partnered_low year, sort color(green%20) /// + legend(label(1 "Partnered, SimPaths"))) /// +(line valid_partnered year, sort color(green) /// + legend(label(2 "Partnered, SILC "))) /// +(rarea sim_single_high sim_single_low year, sort color(red%20) /// + legend(label(3 "Single, SimPaths"))) /// +(line valid_single year, sort color(red) /// + legend(label(4 "Single, SILC "))), /// + title("Partnership status") /// + subtitle("Ages 18-65") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + xlabel(,labsize(small)) /// + ylabel(0[0.1]0.8, labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: ", size(vsmall)) + +graph export /// +"$dir_output_files/partnership/validation_${country}_partnership_ts_18_65_both.jpg", /// + replace width(2400) height(1350) quality(100) + + +******************************************************************************** +* 1.2 : Mean values over time - by age group +******************************************************************************** + +* Define the groupings using a semi-colon or a specific delimiter +local age_cond1 "ageGroup == 2 | ageGroup == 3" +local age_sub1 "Ages 20-29" +local age_suff1 "20_29" + +local age_cond2 "ageGroup == 4 | ageGroup == 5" +local age_sub2 "Ages 30-39" +local age_suff2 "30_39" + +local age_cond3 "ageGroup == 6" +local age_sub3 "Ages 40-59" +local age_suff3 "40_59" + +* Loop through the 3 groups +forvalues i = 1/3 { + + * Validation data + use year demAge dwt valid_partnered valid_single ageGroup using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + + * Select sample + keep if `age_cond`i'' + + collapse (mean) valid_partnered valid_single [aw = dwt], by(year) + tempfile valid_stats + save `valid_stats' + + * Simuated data + use run year demAge sim_partnered sim_single ageGroup using /// + "$dir_data/simulation_sample.dta", clear + + * Select sample + keep if `age_cond`i'' + + collapse (mean) sim_partnered sim_single, by(run year) + + collapse (mean) sim_partnered sim_single /// + (sd) sim_partnered_sd = sim_partnered /// + sim_single_sd = sim_single, by(year) + + foreach varname in sim_partnered sim_single { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + + } + + * Combine + merge 1:1 year using `valid_stats', keep(3) nogen + + twoway (rarea sim_partnered_high sim_partnered_low year, sort color(green%20) /// + legend(label(1 "Partnered, SimPaths"))) /// + (line valid_partnered year, sort color(green) /// + legend(label(2 "Partnered, SILC"))) /// + (rarea sim_single_high sim_single_low year, sort color(red%20) /// + legend(label(3 "Single, SimPaths"))) /// + (line valid_single year, sort color(red) /// + legend(label(4 "Single, SILC"))), /// + title("Partnership status") /// + subtitle("`age_sub`i''") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + xlabel(, labsize(small)) /// + ylabel(0(0.2)1, labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: ", size(vsmall)) + + graph export /// + "$dir_output_files/partnership/validation_${country}_partnership_ts_`age_suff`i''_both.jpg", /// + replace width(2400) height(1350) quality(100) + +} + +graph drop _all + + +******************************************************************************** +* 1.3 : Mean values over time - by children +******************************************************************************** + +* Load validation data +use year demAge dwt valid_partnered_children_0 valid_partnered_children_1 /// + valid_partnered_children_2 valid_partnered_children_3plus /// + valid_single_children_0 valid_single_children_1 /// + valid_single_children_2 valid_single_children_3plus using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +collapse (mean) valid_partnered_children_0 valid_partnered_children_1 /// + valid_partnered_children_2 valid_partnered_children_3plus /// + valid_single_children_0 valid_single_children_1 /// + valid_single_children_2 valid_single_children_3plus /// + [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Load SimPaths data +use run year demAge sim_partnered_children_0 sim_partnered_children_1 /// + sim_partnered_children_2 sim_partnered_children_3plus /// + sim_single_children_0 sim_single_children_1 sim_single_children_2 /// + sim_single_children_3plus /// + using "$dir_data/simulation_sample.dta", clear + +* Compute shares and sd +collapse (mean) sim_partnered_children_0 sim_partnered_children_1 /// + sim_partnered_children_2 sim_partnered_children_3plus /// + sim_single_children_0 sim_single_children_1 /// + sim_single_children_2 sim_single_children_3plus, /// + by(run year) + +collapse (mean) sim_partnered_children_0 sim_partnered_children_1 /// + sim_partnered_children_2 sim_partnered_children_3plus /// + sim_single_children_0 sim_single_children_1 sim_single_children_2 /// + sim_single_children_3plus /// + (sd) sim_partnered_children_0_sd = sim_partnered_children_0 /// + sim_partnered_children_1_sd = sim_partnered_children_1 /// + sim_partnered_children_2_sd = sim_partnered_children_2 /// + sim_partnered_children_3plus_sd = sim_partnered_children_3plus /// + sim_single_children_0_sd = sim_single_children_0 /// + sim_single_children_1_sd = sim_single_children_1 /// + sim_single_children_2_sd = sim_single_children_2 /// + sim_single_children_3plus_sd = sim_single_children_3plus /// + , by(year) + +* Approx 95% confidence interval +foreach varname in sim_partnered_children_0 sim_partnered_children_1 /// + sim_partnered_children_2 sim_partnered_children_3plus sim_single_children_0 /// + sim_single_children_1 sim_single_children_2 sim_single_children_3plus { + + gen `varname'_h = `varname' + 1.96*`varname'_sd + gen `varname'_l = `varname' - 1.96*`varname'_sd + +} + +* Combine datasets +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Label variables +label var sim_partnered_children_0 "No children" +label var sim_partnered_children_1 "1 child" +label var sim_partnered_children_2 "2 children" +label var sim_partnered_children_3plus "3+ children" +label var sim_single_children_0 "No children" +label var sim_single_children_1 "1 child" +label var sim_single_children_2 "2 children" +label var sim_single_children_3plus "3+ children" + +* Plot figures + +* Partnered +foreach varname in partnered_children_0 partnered_children_1 /// + partnered_children_2 partnered_children_3plus { + + local vtext : variable label sim_`varname' + if `"`vtext'"' == "" local vtext "sim_`varname'" + twoway (rarea sim_`varname'_h sim_`varname'_l year, sort color(red%20) /// + legend(label(1 "SimPaths") position(6) rows(1))) /// + (line valid_`varname' year, sort color(red) /// + legend(label(2 "SILC"))), /// + subtitle("`vtext'") /// + name(`varname', replace) /// + ytitle("Share", size(small)) /// + xtitle("") /// + ylabel(0[0.1]0.5,labsize(vsmall)) /// + xlabel(,labsize(vsmall)) /// + legend(size(small)) /// + graphregion(color(white)) + +} + +* Combine plots +grc1leg partnered_children_0 partnered_children_1 partnered_children_2 /// + partnered_children_3plus , /// + title("Share Partnered and Number of Children") /// + legendfrom(partnered_children_0) /// + rows(2) /// + graphregion(color(white)) /// + ycomm /// + note("Notes: Samples contains all individual ages 18-65. ", size(vsmall)) + +graph export /// +"$dir_output_files/partnership/validation_${country}_partnership_children_ts_18_65_partnered.jpg", /// + replace width(2400) height(1350) quality(100) + + +* Single +foreach varname in single_children_0 single_children_1 single_children_2 /// + single_children_3plus { + + local vtext : variable label sim_`varname' + if `"`vtext'"' == "" local vtext "sim_`varname'" + twoway (rarea sim_`varname'_h sim_`varname'_l year, sort color(red%20) /// + legend(label(1 "SimPaths") position(6) rows(1))) /// + (line valid_`varname' year, sort color(red) /// + legend(label(2 "SILC"))), /// + subtitle("`vtext'") /// + name(`varname', replace) /// + ytitle("Share", size(small)) /// + xtitle("") /// + ylabel(0[0.1]0.5,labsize(vsmall)) /// + xlabel(,labsize(vsmall)) /// + legend(size(small)) /// + graphregion(color(white)) + +} + +* Combine plots +grc1leg single_children_0 single_children_1 single_children_2 /// + single_children_3plus , /// + title("Share Single and Number of Children") /// + legendfrom(single_children_0) /// + rows(2) /// + graphregion(color(white)) /// + ycomm /// + note("Notes: Samples contains all individual ages 18-65. ", size(vsmall)) + +graph export /// +"$dir_output_files/partnership/validation_${country}_partnership_children_ts_18_65_single.jpg", /// + replace width(2400) height(1350) quality(100) + + + + +*** Investigate transitions + +** At risk +* Into partnership + +use idPers year dwt valid_partnered valid_single demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if demAge > 17 +keep if demAge < 66 + +sort idPers year +gen form_rel = 0 if valid_partnered[_n-1] == 0 & idPers == idPers[_n-1] +replace form_rel = 1 if valid_partnered == 1 & form_rel == 0 + +drop if form_rel == . + +collapse (mean) form_rel [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use idPers run year demAge sim_partnered sim_single using /// + "$dir_data/simulation_sample.dta", clear + +keep if demAge > 17 +keep if demAge < 66 + +sort run idPers year + +gen sim_form_rel = 0 if sim_partnered[_n-1] == 0 & idPers == idPers[_n-1] +replace sim_form_rel = 1 if sim_partnered == 1 & sim_form_rel == 0 + +drop if sim_form_rel == . + + +collapse (mean) sim_form_rel, by(run year) + +collapse (mean) sim_form_rel /// + (sd) sim_form_rel_sd = sim_form_rel /// + , by(year) + +foreach varname in sim_form_rel { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +drop if year < 2012 + +* Plot figure +twoway (rarea sim_form_rel_high sim_form_rel_low year, sort color(green%20) /// + legend(label(1 "Simulated"))) /// +(line form_rel year, sort color(green) /// + legend(label(2 "SILC "))), /// + title("Share of singles that form partnership") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + ylabel(, labsize(small)) /// + xlabel(,labsize(small)) /// + graphregion(color(white)) /// + note("Notes: Sample contains all individual ages 18-65.", size(vsmall)) + +graph export /// +"$dir_output_files/partnership/validation_${country}_into_partnership_ts_18_65_at_risk.jpg", /// + replace width(2400) height(1350) quality(100) + + +* Out of parnership +use idPers year dwt valid_partnered valid_single demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if demAge > 17 +keep if demAge < 66 + +sort idPers year +gen exit_rel = 0 if valid_partnered[_n-1] == 1 & idPers == idPers[_n-1] +replace exit_rel = 1 if valid_partnered == 0 & exit_rel == 0 + +drop if exit_rel == . + +collapse (mean) exit_rel [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use idPers run year demAge sim_partnered sim_single using /// + "$dir_data/simulation_sample.dta", clear + +keep if demAge > 17 +keep if demAge < 66 + +sort run idPers year + +gen sim_exit_rel = 0 if sim_partnered[_n-1] == 1 & idPers == idPers[_n-1] +replace sim_exit_rel = 1 if sim_partnered == 0 & sim_exit_rel == 0 + +drop if sim_exit_rel == . + +collapse (mean) sim_exit_rel, by(run year) + +collapse (mean) sim_exit_rel /// + (sd) sim_exit_rel_sd = sim_exit_rel /// + , by(year) + +foreach varname in sim_exit_rel { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +drop if year < 2012 + +* Plot figure +twoway (rarea sim_exit_rel_high sim_exit_rel_low year, sort color(green%20) /// + legend(label(1 "Simulated"))) /// +(line exit_rel year, sort color(green) /// + legend(label(2 "SILC "))), /// + title("Share of partnered that exit partnership") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + ylabel(, labsize(small)) /// + xlabel(,labsize(small)) /// + graphregion(color(white)) /// + note("Notes: Sample contains all individual ages 18-65.", size(vsmall)) + +graph export /// +"$dir_output_files/partnership/validation_${country}_exit_partnership_ts_18_65_at_risk.jpg", /// + replace width(2400) height(1350) quality(100) + + +** Whole pop +* Into partnership + +use idPers year dwt valid_partnered valid_single demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if demAge > 17 +keep if demAge < 66 + +sort idPers year + +gen form_rel = 0 if valid_partnered[_n-1] == 0 & idPers == idPers[_n-1] +replace form_rel = 1 if valid_partnered == 1 & form_rel == 0 +replace form_rel = 0 if form_rel == . + +drop if idPers != idPers[_n-1] + +collapse (mean) form_rel [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use idPers run year demAge sim_partnered sim_single using /// + "$dir_data/simulation_sample.dta", clear + +keep if demAge > 17 +keep if demAge < 66 + +sort run idPers year + +gen sim_form_rel = 0 if sim_partnered[_n-1] == 0 & idPers == idPers[_n-1] +replace sim_form_rel = 1 if sim_partnered == 1 & sim_form_rel == 0 +replace sim_form_rel = 0 if sim_form_rel == . + +drop if idPers != idPers[_n-1] + +collapse (mean) sim_form_rel, by(run year) + +collapse (mean) sim_form_rel /// + (sd) sim_form_rel_sd = sim_form_rel /// + , by(year) + +foreach varname in sim_form_rel { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +drop if year < 2012 + +* Plot figure +twoway (rarea sim_form_rel_high sim_form_rel_low year, sort color(green%20) /// + legend(label(1 "Simulated"))) /// +(line form_rel year, sort color(green) /// + legend(label(2 "SILC "))), /// + title("Share of population that form partnership") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + ylabel(, labsize(small)) /// + xlabel(,labsize(small)) /// + graphregion(color(white)) /// + note("Notes: Sample contains all individual ages 18-65.", size(vsmall)) + +graph export /// +"$dir_output_files/partnership/validation_${country}_into_partnership_ts_18_65_all.jpg", /// + replace width(2400) height(1350) quality(100) + + +* Out of parnership +use idPers year dwt valid_partnered valid_single demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if demAge > 17 +keep if demAge < 66 + +sort idPers year + +gen exit_rel = 0 if valid_partnered[_n-1] == 1 & idPers == idPers[_n-1] +replace exit_rel = 1 if valid_partnered == 0 & exit_rel == 0 +replace exit_rel = 0 if exit_rel == . + +drop if idPers != idPers[_n-1] + +collapse (mean) exit_rel [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulated data +use idPers run year demAge sim_partnered sim_single using /// + "$dir_data/simulation_sample.dta", clear + +keep if demAge > 17 +keep if demAge < 66 + +sort run idPers year + +gen sim_exit_rel = 0 if sim_partnered[_n-1] == 1 & idPers == idPers[_n-1] +replace sim_exit_rel = 1 if sim_partnered == 0 & sim_exit_rel == 0 +replace sim_exit_rel = 0 if sim_exit_rel == . + +drop if idPers != idPers[_n-1] + +collapse (mean) sim_exit_rel, by(run year) + +collapse (mean) sim_exit_rel /// + (sd) sim_exit_rel_sd = sim_exit_rel /// + , by(year) + +foreach varname in sim_exit_rel { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +drop if year < 2012 + +* Plot figure +twoway (rarea sim_exit_rel_high sim_exit_rel_low year, sort color(green%20) /// + legend(label(1 "Simulated"))) /// +(line exit_rel year, sort color(green) /// + legend(label(2 "SILC "))), /// + title("Share of population that exit partnership") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + ylabel(, labsize(small)) /// + xlabel(,labsize(small)) /// + graphregion(color(white)) /// + note("Notes: Sample contains all individual ages 18-65.", size(vsmall)) + +graph export /// +"$dir_output_files/partnership/validation_${country}_exit_partnership_ts_18_65_all.jpg", /// + replace width(2400) height(1350) quality(100) + + +graph drop _all + + + + + diff --git a/validation/02_simulation_validation/do_files/04_13_plot_health.do b/validation/02_simulation_validation/do_files/04_13_plot_health.do new file mode 100644 index 0000000..b9337e3 --- /dev/null +++ b/validation/02_simulation_validation/do_files/04_13_plot_health.do @@ -0,0 +1,483 @@ +/******************************************************************************* +* PROJECT: SimPaths EU +* SECTION: Validation +* OBJECT: Health +* AUTHORS: Ashley Burdett +* LAST UPDATE: Jan 2026 +* COUNTRY: PL +******************************************************************************** +* NOTES: +*******************************************************************************/ + +******************************************************************************** +* 1 : Mean values over time +******************************************************************************** + +******************************************************************************** +* 1.1 : Mean values over time - Self rated health, 16-65 +******************************************************************************** + +use year dwt valid_healthSelfRated demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,16,65) + +gen health = valid_healthSelfRated + +collapse (mean) health [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare Simulated data +use run year sim_healthSelfRated demAge run using /// + "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,16,65) + +gen healthSelfRated = sim_healthSelfRated + +collapse (mean) healthSelfRated, by(year run) + +collapse (mean) healthSelfRated /// + (sd) healthSelfRated_sd = healthSelfRated /// + , by(year) + +* Compute 95% confidence interval +foreach varname in healthSelfRated { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway (rarea healthSelfRated_high healthSelfRated_low year, /// + sort color(green%20) legend(label(1 "SimPaths"))) /// +(line health year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("Self Rated Health") /// + subtitle("Ages 16-65") /// + xtitle("Year", size(small)) /// + ytitle("Score", size(small)) /// + ylabel(,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: The health variable is a self-assessed variable and follows a 5-point Likert scale (1 = poor, ..., 5 = excellent). ", /// + size(vsmall)) + +graph export /// +"$dir_output_files/health/validation_${country}_self_rated_ts_16_65_both.jpg", /// + replace width(2560) height(1440) quality(100) + + +******************************************************************************** +* 1.2 : Mean values over time - Self rated health, 16-65, by gender +******************************************************************************** + +* Prepare validation data +use year dwt valid_healthSelfRated demAge demMaleFlag using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,16,65) + +gen health = valid_healthSelfRated + +collapse (mean) health [aw = dwt], by(year demMaleFlag) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulation data +use run year sim_healthSelfRated demAge demMaleFlag run using "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,16,65) + +gen healthSelfRated = sim_healthSelfRated + +collapse (mean) healthSelfRated, by(year demMaleFlag run) + +collapse (mean) healthSelfRated /// + (sd) healthSelfRated_sd = healthSelfRated /// + , by(year demMaleFlag) + +* Compute 95% confidence interval +foreach varname in healthSelfRated { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway (rarea healthSelfRated_high healthSelfRated_low year if /// + demMaleFlag == 0, sort color(green%20) legend(label(1 "SimPaths"))) /// +(line health year if demMaleFlag == 0, sort color(green) /// + legend(label(2 "SILC"))), /// + subtitle("Females") /// + name(health_female, replace) /// + xtitle("Year", size(small)) /// + ytitle("Score", size(small)) /// + ylabel(3[.2]3.8,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + + +twoway (rarea healthSelfRated_high healthSelfRated_low year if /// + demMaleFlag == 1, sort color(green%20) legend(label(1 "SimPaths"))) /// +(line health year if demMaleFlag == 1, sort color(green) /// + legend(label(2 "SILC"))), /// + subtitle("Males") /// + name(health_male, replace) /// + xtitle("Year", size(small)) /// + ytitle("Score", size(small)) /// + ylabel(3[.2]3.8,labsize(small)) /// + xlabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + +grc1leg health_female health_male, /// + title("Self Rated Health") /// + subtitle("Ages 16-65") /// + legendfrom(health_female) rows(1) /// + graphregion(color(white)) /// + ycomm /// +note("Notes: The health variable is a self-assessed variable and follows a 5-point Likert scale (1 = poor, ..., 5 = excellent). ", /// + size(vsmall)) + +graph export /// +"$dir_output_files/health/validation_${country}_self_rated_ts_16_65_gender.jpg", /// + replace width(2560) height(1440) quality(100) + + +******************************************************************************** +* 1.3 : Mean values over time - self rated health, by age group and gender +******************************************************************************** + +* Prepare validation data +use year dwt demAge demMaleFlag ageGroup valid_healthSelfRated using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +gen health_m = valid_healthSelfRated if demMaleFlag == 1 +gen health_f = valid_healthSelfRated if demMaleFlag == 0 + +drop if ageGroup == 0 + +collapse (mean) health* [aw = dwt], by(ageGroup year) + +drop if missing(ageGroup) +reshape wide health*, i(year) j(ageGroup) + +forvalues i = 1(1)8 { + + rename health_f`i' health_f_`i'_valid + rename health_m`i' health_m_`i'_valid + +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare Simulated data +use run year demMaleFlag ageGroup sim_healthSelfRated using /// + "$dir_data/simulation_sample.dta", clear + +gen health_m = sim_healthSelfRated if demMaleFlag == 1 +gen health_f = sim_healthSelfRated if demMaleFlag == 0 + +collapse (mean) health*, by(ageGroup run year) + +drop if missing(ageGroup) + +reshape wide health*, i(year run) j(ageGroup) + +collapse (mean) health* /// + (sd) health_m_1_sd = health_m1 /// + (sd) health_f_1_sd = health_f1 /// + (sd) health_m_2_sd = health_m2 /// + (sd) health_f_2_sd = health_f2 /// + (sd) health_m_3_sd = health_m3 /// + (sd) health_f_3_sd = health_f3 /// + (sd) health_m_4_sd = health_m4 /// + (sd) health_f_4_sd = health_f4 /// + (sd) health_m_5_sd = health_m5 /// + (sd) health_f_5_sd = health_f5 /// + (sd) health_m_6_sd = health_m6 /// + (sd) health_f_6_sd = health_f6 /// + (sd) health_m_7_sd = health_m7 /// + (sd) health_f_7_sd = health_f7 /// + (sd) health_m_8_sd = health_m8 /// + (sd) health_f_8_sd = health_f8 /// + , by(year) + +forvalues i=1(1)8 { + + gen health_f_`i'_sim_high = health_f`i' + 1.96*health_f_`i'_sd + gen health_f_`i'_sim_low = health_f`i' - 1.96*health_f_`i'_sd + gen health_m_`i'_sim_high = health_m`i' + 1.96*health_m_`i'_sd + gen health_m_`i'_sim_low = health_m`i' - 1.96*health_m_`i'_sd + +} + + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +foreach vble in "health_f" "health_m" { + + twoway (rarea `vble'_1_sim_high `vble'_1_sim_low year, sort /// + color(red%20) legend(label(1 "SimPaths") position(6) rows(1))) /// + (line `vble'_1_valid year, sort color(red) legend(label(2 "SILC"))), /// + subtitle("Age 15-19") /// + name(`vble'_1, replace) /// + xtitle("Year", size(small)) /// + ytitle("Score", size(small)) /// + ylabel(3 [1] 5, labsize(vsmall)) /// + xlabel(,labsize(vsmall)) /// + legend(size(small)) /// + graphregion(color(white)) + + twoway (rarea `vble'_2_sim_high `vble'_2_sim_low year, sort /// + color(red%20) legend(label(1 "SimPaths") position(6) rows(1))) /// + (line `vble'_2_valid year, sort color(red) legend(label(2 "SILC"))), /// + subtitle("Age 20-24") /// + name(`vble'_2, replace) /// + xtitle("Year", size(small)) /// + ytitle("Score", size(small)) /// + ylabel(3 [1] 5, labsize(vsmall)) /// + xlabel(,labsize(vsmall)) /// + legend(size(small)) /// + graphregion(color(white)) + + twoway (rarea `vble'_3_sim_high `vble'_3_sim_low year, sort /// + color(red%20) legend(label(1 "SimPaths") position(6) rows(1))) /// + (line `vble'_3_valid year, sort color(red) legend(label(2 "SILC"))), /// + subtitle("Age 25-29") /// + name(`vble'_3, replace) /// + xtitle("Year", size(small)) /// + ytitle("Score", size(small)) /// + ylabel(3 [1] 5, labsize(vsmall)) /// + xlabel(,labsize(vsmall)) /// + legend(size(small)) /// + graphregion(color(white)) + + twoway (rarea `vble'_4_sim_high `vble'_4_sim_low year, sort /// + color(red%20) legend(label(1 "SimPaths") position(6) rows(1))) /// + (line `vble'_4_valid year, sort color(red) legend(label(2 "SILC"))), /// + subtitle("Age 30-34") /// + name(`vble'_4, replace) /// + xtitle("Year", size(small)) /// + ytitle("Score", size(small)) /// + ylabel(3 [1] 5, labsize(vsmall)) /// + xlabel(,labsize(vsmall)) /// + legend(size(small)) /// + graphregion(color(white)) + + twoway (rarea `vble'_5_sim_high `vble'_5_sim_low year, sort /// + color(red%20) legend(label(1 "SimPaths") position(6) rows(1))) /// + (line `vble'_5_valid year, sort color(red) legend(label(2 "SILC"))), /// + subtitle("Age 35-39") /// + name(`vble'_5, replace) /// + xtitle("Year", size(small)) /// + ytitle("Score", size(small)) /// + ylabel(3 [1] 5, labsize(vsmall)) /// + xlabel(,labsize(vsmall)) /// + legend(size(small)) /// + graphregion(color(white)) + + twoway (rarea `vble'_6_sim_high `vble'_6_sim_low year, sort /// + color(red%20) legend(label(1 "SimPaths") position(6) rows(1))) /// + (line `vble'_6_valid year, sort color(red) legend(label(2 "SILC"))), /// + subtitle("Age 40-59") /// + name(`vble'_6, replace) /// + xtitle("Year", size(small)) /// + ytitle("Score", size(small)) /// + ylabel(3 [1] 5, labsize(vsmall)) /// + xlabel(,labsize(vsmall)) /// + legend(size(small)) /// + graphregion(color(white)) + + twoway (rarea `vble'_7_sim_high `vble'_7_sim_low year, sort /// + color(red%20) legend(label(1 "SimPaths") position(6) rows(1))) /// + (line `vble'_7_valid year, sort color(red) legend(label(2 "SILC"))), /// + subtitle("Age 60-79") /// + name(`vble'_7, replace) /// + xtitle("Year", size(small)) /// + ytitle("Score", size(small)) /// + ylabel(3 [1] 5, labsize(vsmall)) /// + xlabel(,labsize(vsmall)) /// + legend(size(small)) /// + graphregion(color(white)) + + twoway (rarea `vble'_8_sim_high `vble'_8_sim_low year, sort /// + color(red%20) legend(label(1 "SimPaths") position(6) rows(1))) /// + (line `vble'_7_valid year, sort color(red) legend(label(2 "SILC"))), /// + subtitle("Age 80-100") /// + name(`vble'_8, replace) /// + xtitle("Year", size(small)) /// + ytitle("Score", size(small)) /// + ylabel(3 [1] 5, labsize(vsmall)) /// + xlabel(,labsize(vsmall)) /// + legend(size(small)) /// + graphregion(color(white)) +} + +* Save figures +grc1leg health_f_1 health_f_2 health_f_3 health_f_4 health_f_5 /// + health_f_6 health_f_7 health_f_8, /// + title("Self Rated Health") /// + subtitle("Females") /// + legendfrom(health_f_1) /// + graphregion(color(white)) /// + ycomm /// +note("Notes: The health variable is a self-assessed variable and follows a 5-point Likert scale (1 = poor, ..., 5 = excellent). ", /// + size(vsmall)) + +graph export /// + "$dir_output_files/health/validation_${country}_self_rated_ts_age_groups_female.jpg", /// + replace width(2400) height(1350) quality(100) + + +grc1leg health_m_1 health_m_2 health_m_3 health_m_4 health_m_5 /// + health_m_6 health_m_7 health_m_8, /// + title("Self Rated Health") /// + subtitle("Males") /// + legendfrom(health_m_1) /// + graphregion(color(white)) /// + ycomm /// +note("Notes: The health variable is a self-assessed variable and follows a 5-point Likert scale (1 = poor, ..., 5 = excellent). ", /// + size(vsmall)) + +graph export /// + "$dir_output_files/health/validation_${country}_self_rated_ts_age_groups_male.jpg", /// + replace width(2400) height(1350) quality(100) + +graph drop _all + + + +******************************************************************************** +* 2 : Histograms +******************************************************************************** + +******************************************************************************** +* 2.1 : Histograms - Working age +******************************************************************************** + +* Working age +use year dwt valid_healthSelfRated demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,16,65) + +gen health = valid_healthSelfRated + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare Simulated data +use run year sim_healthSelfRated demAge run using /// + "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,16,65) + +gen healthSelfRated = sim_healthSelfRated + +keep if run == 1 + +append using "$dir_data/temp_valid_stats.dta" + + +* Plot figure +twoway (hist healthSelfRated, width(0.2) color(green%30) /// + legend(label(1 "SimPaths"))) /// +(hist health, width(0.2) color(red%30) legend(label(2 "SILC"))), /// + title("Self Rated Health") /// + subtitle("Ages 16-65") /// + xtitle("Score", size(small)) /// + ytitle("Year", size(small)) /// + xlabel(,labsize(small)) /// + ylabel(,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: The health variable is a self-assessed variable and follows a 5-point Likert scale (1 = poor, ..., 5 = excellent). ", /// + size(vsmall)) + +graph export /// +"$dir_output_files/health/validation_${country}_self_rated_hist_16_65_both.jpg", /// + replace width(2560) height(1440) quality(100) + + +******************************************************************************** +* 2.2 : Histograms - Working age, by gender +******************************************************************************** + +use year dwt demAge valid_healthSelfRated demMaleFlag using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,16,65) + +gen health = valid_healthSelfRated + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare Simulated data +use run year demAge sim_healthSelfRated demMaleFlag run using /// + "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,16,65) + +gen healthSelfRated = sim_healthSelfRated + +keep if run == 1 + +append using "$dir_data/temp_valid_stats.dta" + + +* Plot figure + +twoway (hist healthSelfRated if demMaleFlag == 0, width(0.2) color(green%30) /// + legend(label(1 "SimPaths"))) /// +(hist health if demMaleFlag == 0, width(0.2) color(red%30) /// + legend(label(2 "SILC"))), /// + subtitle("Females") /// + name(health_female, replace) /// + xtitle("Score", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(small)) /// + ylabel(0[.5]2.5,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) + + +twoway (hist healthSelfRated if demMaleFlag == 1, width(0.2) color(green%30) /// + legend(label(1 "SimPaths"))) /// +(hist health if demMaleFlag == 1, width(0.2) color(red%30) /// + legend(label(2 "SILC"))), /// + subtitle("Males") /// + name(health_male, replace) /// + xtitle("Score", size(small)) /// + ytitle("Density", size(small)) /// + xlabel(,labsize(small)) /// + ylabel(0[.5]2.5,labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) + +grc1leg health_female health_male, /// + title("Self Rated Health") /// + subtitle("Ages 16-65") /// + legendfrom(health_male) rows(1) /// + graphregion(color(white)) /// + ycomm /// + note("Notes: The health variable is a self-assessed variable and follows a 5-point Likert scale (1 = poor, ..., 5 = excellent). ", /// + size(vsmall)) + + +graph export /// +"$dir_output_files/health/validation_${country}_self_rated_hist_16_65_gender.jpg", /// + replace width(2560) height(1440) quality(100) + +graph drop _all diff --git a/validation/02_simulation_validation/do_files/04_14_plot_at_risk_of_poverty.do b/validation/02_simulation_validation/do_files/04_14_plot_at_risk_of_poverty.do new file mode 100644 index 0000000..08b0c16 --- /dev/null +++ b/validation/02_simulation_validation/do_files/04_14_plot_at_risk_of_poverty.do @@ -0,0 +1,216 @@ +/******************************************************************************* +* PROJECT: SimPaths EU +* SECTION: Validation +* OBJECT: Risk of poverty +* AUTHORS: Ashley Burdett +* LAST UPDATE: Jan 2026 +* COUNTRY: PL +******************************************************************************** +* NOTES: +*******************************************************************************/ + +******************************************************************************** +* 1 : Mean values over time +******************************************************************************** + +******************************************************************************** +* 1.1 : Mean values over time, working age (18-65) +******************************************************************************** + +* Prepare validation data +use year demAge dwt valid_yDispBuEquivYear using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yDispBuEquivYear, d + + replace valid_yDispBuEquivYear = . if /// + valid_yDispBuEquivYear < r(p1) | valid_yDispBuEquivYear > r(p99) + +} + +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = r(max) +if "$max_sim_year" != "" local max_year = $max_sim_year + + +gen poverty_line = . +forval year = `min_year'/`max_year' { + + sum valid_yDispBuEquivYear if year == `year', d + replace poverty_line = 0.6*r(p50) if year == `year' + +} + +gen arop = (valid_yDispBuEquivYear < poverty_line) + +collapse (mean) arop [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year demAge sim_yDispEquivYear using /// + "$dir_data/simulation_sample.dta", clear + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yDispEquivYear, d + + replace sim_yDispEquivYear = . if /// + sim_yDispEquivYear < r(p1) | sim_yDispEquivYear > r(p99) + +} + +bys run year: egen equivincome_median = median(sim_yDispEquivYear) + +gen poverty_line = 0.6*equivincome_median + +gen arop_sim = (sim_yDispEquivYear < poverty_line) + +collapse (mean) arop_sim, by(run year) + +collapse (mean) arop_sim /// + (sd) arop_sim_sd = arop_sim /// + , by(year) + +foreach varname in arop_sim { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway (rarea arop_sim_high arop_sim_low year, sort color(green%20) /// + legend(label(1 "SimPaths"))) /// +(line arop year, sort color(green) legend(label(2 "SILC"))), /// + title("At Risk of Poverty") /// + subtitle("Ages 18-65") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Poverty line calculated within each year as 60% of the median equivalised disposable income of benefit unit. Calculated" "using individual level observations.", /// + size(vsmall)) + +* Save figure +graph export "$dir_output_files/poverty/validation_${country}_at_risk_of_poverty_18_${max_age}.jpg", /// + replace width(2560) height(1440) quality(100) + + + +******************************************************************************** +* 1.1 : Mean values over time, 18+ +******************************************************************************** + +* Prepare validation data +use year demAge dwt valid_yDispBuEquivYear using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* Select sample +drop if demAge < 18 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yDispBuEquivYear, d + + replace valid_yDispBuEquivYear = . if /// + valid_yDispBuEquivYear < r(p1) | valid_yDispBuEquivYear > r(p99) + +} + +qui sum year +local min_year = 2011 +if "$min_sim_year" != "" local min_year = $min_sim_year +local max_year = r(max) +if "$max_sim_year" != "" local max_year = $max_sim_year + + +gen poverty_line = . + +forval year = `min_year'/`max_year' { + + sum valid_yDispBuEquivYear if year == `year', d + replace poverty_line = 0.6*r(p50) if year == `year' + +} + +gen arop = (valid_yDispBuEquivYear < poverty_line) + +collapse (mean) arop [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year demAge sim_yDispEquivYear using /// + "$dir_data/simulation_sample.dta", clear + +* Select sample +drop if demAge < 18 + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yDispEquivYear, d + + replace sim_yDispEquivYear = . if /// + sim_yDispEquivYear < r(p1) | sim_yDispEquivYear > r(p99) + +} + +bys run year: egen equivincome_median = median(sim_yDispEquivYear) + +gen poverty_line = 0.6*equivincome_median + +gen arop_sim = (sim_yDispEquivYear < poverty_line) + +collapse (mean) arop_sim, by(run year) + +collapse (mean) arop_sim /// + (sd) arop_sim_sd = arop_sim /// + , by(year) + +foreach varname in arop_sim { + + gen `varname'_high = `varname' + 1.96*`varname'_sd + gen `varname'_low = `varname' - 1.96*`varname'_sd + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway (rarea arop_sim_high arop_sim_low year, sort color(green%20) /// + legend(label(1 "SimPaths"))) /// +(line arop year, sort color(green) legend(label(2 "SILC"))), /// + title("At Risk of Poverty") /// + subtitle("Ages 18+") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Note: Poverty line calculated within each year as 60% of the median equivalised disposable income of benefit unit. Calculated" "using individual level observations.", /// + size(vsmall)) + +* Save figure +graph export "$dir_output_files/poverty/validation_${country}_at_risk_of_poverty_18plus.jpg", /// + replace width(2560) height(1440) quality(100) + + +graph drop _all diff --git a/validation/02_simulation_validation/do_files/04_15_plot_inequality.do b/validation/02_simulation_validation/do_files/04_15_plot_inequality.do new file mode 100644 index 0000000..f9b623a --- /dev/null +++ b/validation/02_simulation_validation/do_files/04_15_plot_inequality.do @@ -0,0 +1,457 @@ +******************************************************************************** +* PROJECT: SimPath EU +* SECTION: Validation +* OBJECT: Inequality +* AUTHORS: Ashley Burdett +* LAST UPDATE: 9/2025 (AB) +* COUNTRY: PL +******************************************************************************** +* NOTES: Equivalized disposable income used to created ratios +******************************************************************************** + +//ssc install ineqdeco + +******************************************************************************** +* 1 : Income ratios through time +******************************************************************************** + +******************************************************************************** +* 1.1 : Income ratio, 90/50 +******************************************************************************** + +* Prepare validation data +use year demAge dwt valid_yDispBuEquivYear using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yDispBuEquivYear, d + + replace valid_yDispBuEquivYear = . if /// + valid_yDispBuEquivYear < r(p1) | valid_yDispBuEquivYear > r(p99) + +} + +collapse (p90) p90_disp = valid_yDispBuEquivYear /// + (p50) p50_disp = valid_yDispBuEquivYear /// + [aw = dwt] , by(year) + +gen p90_p50_ratio_disp_obs = p90_disp/p50_disp + +* Align reference years +gen l_p90_p50_ratio_disp_obs = p90_p50_ratio_disp_obs[_n+1] + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year demAge sim_yDispEquivYear using /// + "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,18,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yDispEquivYear, d + + replace sim_yDispEquivYear = . if /// + sim_yDispEquivYear < r(p1) | sim_yDispEquivYear > r(p99) + +} + +collapse (p90) p90_disp = sim_yDispEquivYear /// + (p50) p50_disp = sim_yDispEquivYear, by(run year) + +gen p90_p50_ratio_disp = p90_disp/p50_disp + +collapse (mean) p90_p50_ratio_disp /// + (sd) sd_p90_p50_ratio_disp = p90_p50_ratio_disp /// + , by(year) + + foreach var in p90_p50_ratio_disp { + + gen `var'_high = `var' + 1.96*sd_`var' + gen `var'_low = `var' - 1.96*sd_`var' + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + + +* Plot figure +twoway (rarea p90_p50_ratio_disp_high p90_p50_ratio_disp_low year, sort /// + color(green%20) legend(label(1 "SimPaths") position(6) rows(1))) /// +(line p90_p50_ratio_disp_obs year, sort color(green)legend(label(2 "SILC"))), /// + title("P90/P50 Disposable Income Ratio") /// + subtitle("Ages 18-65") /// + xtitle("Year", size(small)) /// + ytitle("Ratio", size(small)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Ratios computed using individual observations of benefit unit measure of equivalized disposable income.", /// + size(vsmall)) + +* Save figure +graph export "$dir_output_files/inequality/validation_${country}_p90p50.jpg", /// + replace width(2400) height(1350) quality(100) + + +******************************************************************************** +* 1.1 : Income ratio, 90/10 +******************************************************************************** + +* Prepare validation data +use year demAge dwt valid_yDispBuEquivYear using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yDispBuEquivYear, d + + replace valid_yDispBuEquivYear = . if /// + valid_yDispBuEquivYear < r(p1) | valid_yDispBuEquivYear > r(p99) + +} + +collapse (p90) p90_disp = valid_yDispBuEquivYear /// + (p10) p10_disp = valid_yDispBuEquivYear /// + [aw = dwt], by(year) + +gen p90_p10_ratio_disp_obs = p90_disp/p10_disp + +* Align reference years +gen l_p90_p10_ratio_disp_obs = p90_p10_ratio_disp_obs[_n+1] + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year demAge sim_yDispEquivYear using /// + "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,18,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yDispEquivYear, d + + replace sim_yDispEquivYear = . if /// + sim_yDispEquivYear < r(p1) | sim_yDispEquivYear > r(p99) + +} + +collapse (p90) p90_disp = sim_yDispEquivYear /// + (p10) p10_disp = sim_yDispEquivYear, by(run year) + +gen p90_p10_ratio_disp = p90_disp/p10_disp + +collapse (mean) p90_p10_ratio_disp /// + (sd) sd_p90_p10_ratio_disp = p90_p10_ratio_disp /// + , by(year) + + foreach var in p90_p10_ratio_disp { + + gen `var'_high = `var' + 1.96*sd_`var' + gen `var'_low = `var' - 1.96*sd_`var' + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + + +* Plot figure +twoway (rarea p90_p10_ratio_disp_high p90_p10_ratio_disp_low year, sort /// + color(green%20) legend(label(1 "SimPaths") position(6) rows(1))) /// +(line p90_p10_ratio_disp_obs year, sort color(green)legend(label(2 "SILC"))), /// + title("P90/P10 Disposable Income Ratio") /// + subtitle("Ages 18-65") /// + xtitle("Year", size(small)) /// + ytitle("Ratio", size(small)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Ratios computed using individual observations of benefit unit measure of equivalized disposable income.", /// + size(vsmall)) + +* Save figure +graph export "$dir_output_files/inequality/validation_${country}_p90p10.jpg", /// + replace width(2400) height(1350) quality(100) + + +******************************************************************************** +* 1.3 : Gini coefficeint +******************************************************************************** + +* Prepare validation data +use year demAge dwt valid_yDispBuEquivYear using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum valid_yDispBuEquivYear, d + + replace valid_yDispBuEquivYear = . if /// + valid_yDispBuEquivYear < r(p1) | valid_yDispBuEquivYear > r(p99) + +} + +* Calulate gini for each year +statsby gini = r(gini), by(year) clear: ineqdeco valid_yDispBuEquivYear [aw=dwt] + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year demAge sim_yDispEquivYear using /// + "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,18,65) + +* Trim outliers +if "$trim_outliers" == "true" { + + sum sim_yDispEquivYear, d + + replace sim_yDispEquivYear = . if /// + sim_yDispEquivYear < r(p1) | sim_yDispEquivYear > r(p99) + +} + +* Calculate gini for each year and run +statsby gini = r(gini), by(year run) clear: ineqdeco sim_yDispEquivYear + +* Obtain the mean and standard deviation by year +collapse (mean) gini /// + (sd) gini_sd = gini, by(year) + +* Compute the 95% confidence interval +gen gini_high = gini + 1.96 * gini_sd +gen gini_low = gini - 1.96 * gini_sd + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway (rarea gini_high gini_low year, sort /// + color(green%20) legend(label(1 "SimPaths") position(6) rows(1))) /// +(line gini year, sort color(green)legend(label(2 "SILC"))), /// + title("Gini Coefficient") /// + subtitle("Ages 18-65") /// + xtitle("Year", size(small)) /// + ytitle("Coefficient", size(small)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Gini coefficient computed using individual observations of benefit unit measure of equivalized disposable income.", /// + size(vsmall)) + +* Save figure +graph export "$dir_output_files/inequality/validation_${country}_gini.jpg", /// + replace width(2400) height(1350) quality(100) + + +graph drop _all + + + +******************************************************************************** +* 1.4 : Net transfers +******************************************************************************** + +use "$dir_data/simulation_sample.dta", clear + +* keep only one observation per benefit unit +sort run idBu year + +* Keep one observation per benefit unit +* Create a marker for the first observation in each group +bys run year idBu (demAge): gen byte to_keep = (_n == _N) + +* Keep only the marked rows +keep if to_keep == 1 +drop to_keep + +* Create gross income deciles +sort run idPers year + +xtile sim_decile = sim_yNonBenBuGrossLevelYear , n(10) + +tab sim_decile + + +* Plots + +* Sources + +* All +graph bar (mean) sim_net_transfers , over(sim_decile) /// + title("SimPaths") /// + name(simulated_net_trans_all, replace) /// + b1title("BU Gross Income Decile", size(small)) /// + ytitle("Euro") /// + ylabel(#5, format(%12.0fc)) /// + ylabel(,labsize(small)) /// + graphregion(color(white)) + + +* Oldest <= 65 +preserve + +drop if demAge > 65 + +graph bar (mean) sim_net_transfers , over(sim_decile) /// + title("SimPaths") /// + name(simulated_net_trans_upto65, replace) /// + b1title("BU Gross Income Decile", size(small)) /// + ytitle("Euro") /// + ylabel(#5, format(%12.0fc)) /// + ylabel(,labsize(small)) /// + graphregion(color(white)) + + +restore + +* Oldest > 65 +preserve + +drop if demAge <= 65 + +graph bar (mean) sim_net_transfers , over(sim_decile) /// + title("SimPaths") /// + name(simulated_net_trans_66plus, replace) /// + b1title("BU Gross Income Decile", size(small)) /// + ytitle("Euro") /// + ylabel(#5, format(%12.0fc)) /// + ylabel(,labsize(small)) /// + graphregion(color(white)) + +restore + + +use "$dir_data/${country}-eusilc_validation_sample.dta", clear + +* keep only one observation per benefit unit +sort idBu year + +* Keep one observation per benefit unit +* Create a marker for the first observation in each group +bys year idBu (demAge): gen byte to_keep = (_n == _N) + +* Keep only the marked rows +keep if to_keep == 1 +drop to_keep + +* Create gross income deciles +sort idPers year + +xtile valid_decile = valid_yNonBenBuGrossLevelYear , n(10) + +tab valid_decile + + +* Plots + +* Sources + +* All +graph bar (mean) valid_net_transfers , over(valid_decile) /// + title("SILC") /// + name(valid_net_trans_all, replace) /// + b1title("BU Gross Income Decile", size(small)) /// + ytitle("Euro") /// + ylabel(#5, format(%12.0fc)) /// + ylabel(,labsize(small)) /// + graphregion(color(white)) + + +* Oldest <= 65 +preserve + +drop if demAge > 65 + +graph bar (mean) valid_net_transfers , over(valid_decile) /// + title("SILC") /// + name(valid_net_trans_upto65, replace) /// + b1title("BU Gross Income Decile", size(small)) /// + ytitle("Euro") /// + ylabel(#5, format(%12.0fc)) /// + ylabel(,labsize(small)) /// + graphregion(color(white)) + + +restore + +* Oldest > 65 +preserve + +drop if demAge <= 65 + +graph bar (mean) valid_net_transfers , over(valid_decile) /// + title("SILC") /// + name(valid_net_trans_66plus, replace) /// + b1title("BU Gross Income Decile", size(small)) /// + ytitle("Euro") /// + ylabel(#5, format(%12.0fc)) /// + ylabel(,labsize(small)) /// + graphregion(color(white)) + +restore + +//net install grc1leg2, from(http://digital.cgdev.org/doc/stata/MO/Misc) + + +* Shares - Retirement age +grc1leg2 simulated_net_trans_all valid_net_trans_all, /// + rows(1) ycommon loff /// + graphregion(color(white)) /// + title("Average Net Transfers") /// + subtitle("All ages") /// + note("NOTE: ", /// + size(vsmall)) + +graph export /// + "$dir_output_files/inequality/validation_${country}_net_transfers_all.png", /// + replace width(2400) height(1350) + + +grc1leg2 simulated_net_trans_upto65 valid_net_trans_upto65 , /// + rows(1) ycommon loff /// + graphregion(color(white)) /// + title("Average Net Transfers") /// + subtitle("Oldest Age <= 65") /// + note("NOTE: ", /// + size(vsmall)) + +graph export /// + "$dir_output_files/inequality/validation_${country}_net_transfers_upto65.png", /// + replace width(2400) height(1350) + + +grc1leg2 simulated_net_trans_66plus valid_net_trans_66plus , /// + rows(1) ycommon loff /// + graphregion(color(white)) /// + title("Average Net Transfers") /// + subtitle("Oldest Age > 65") /// + note("NOTE: ", /// + size(vsmall)) + +graph export /// + "$dir_output_files/inequality/validation_${country}_net_transfers_66plus.png", /// + replace width(2400) height(1350) + +graph drop _all + diff --git a/validation/02_simulation_validation/do_files/04_16_plot_number_children.do b/validation/02_simulation_validation/do_files/04_16_plot_number_children.do new file mode 100644 index 0000000..0512b68 --- /dev/null +++ b/validation/02_simulation_validation/do_files/04_16_plot_number_children.do @@ -0,0 +1,643 @@ +******************************************************************************** +* PROJECT: SimPaths EU +* SECTION: Validation +* OBJECT: Children +* AUTHORS: Ashley Burdett +* LAST UPDATE: Jan 2026 +* COUNTRY: PL +******************************************************************************** +* NOTES: This do file plots simulated and SILC % of benefit units +* with a given number of children +******************************************************************************** + +******************************************************************************** +* 1 : Mean values over time +******************************************************************************** + +******************************************************************************** +* 1.1 : Mean values over time, working age (18-65), children < 18 +******************************************************************************** + +* Prepare validation data +use year demAge idPers idBu dwt children_* using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) + +* Calculate weighted share of benefit units with 0, 1, 2, 3 or more children +collapse (mean) children_* [aw = dwt], by(year) + +foreach varname in children_0 children_1 children_2 children_3p { + + rename `varname' valid_`varname' + +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year demAge idBu children_* using /// + "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,18,65) + +collapse (mean) children_*, by(run year) + +rename children_3plus children_3p + +collapse (mean) children_* /// + (sd) children_0_sd = children_0 /// + children_1_sd = children_1 /// + children_2_sd = children_2 /// + children_3p_sd = children_3p /// + , by(year) + +foreach varname in children_0 children_1 children_2 children_3p { + + gen sim_`varname'_h = `varname' + 1.96*`varname'_sd + gen sim_`varname'_l = `varname' - 1.96*`varname'_sd + rename `varname' sim_`varname' + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figures +label var sim_children_0 "No children" +label var sim_children_1 "1 child" +label var sim_children_2 "2 children" +label var sim_children_3p "3+ children" + + +twoway (rarea sim_children_0_h sim_children_0_l year, /// + sort color(green%20) legend(label(1 "No children, SimPaths"))) /// +(line valid_children_0 year, sort color(green) /// + legend(label(2 "No children, SILC"))) /// + (rarea sim_children_1_h sim_children_1_l year, sort color(blue%20) /// + legend(label(3 "1 child, SimPaths"))) /// +(line valid_children_1 year, sort color(blue) /// + legend(label(4 "1 child, SILC"))) /// +(rarea sim_children_2_h sim_children_2_l year, sort color(red%20) /// + legend(label(5 "2 children, SimPaths"))) /// +(line valid_children_2 year, sort color(red) /// + legend(label(6 "2 children, SILC"))) /// +(rarea sim_children_3p_h sim_children_3p_l year, sort color(grey%20) /// + legend(label(7 "3+ children, SimPaths"))) /// +(line valid_children_3p year, sort color(grey) /// + legend(label(8 "3+ children, SILC"))), /// + title("Number of Children") /// + subtitle("Ages 18-65") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) ///) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Individual observations plotted.", size(vsmall)) + +* Save figure +graph export "$dir_output_files/children/validation_${country}_children_ts_18_65_both.jpg", /// + replace width(2400) height(1350) quality(100) + + +******************************************************************************** +* 1.2 : Mean values over time, working age (18-65), children < 18, by gender +******************************************************************************** + +* Prepare validation data +use year demAge idPers idBu dwt children_* demMaleFlag using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) + +* Calculate weighted share of benefit units with 0, 1, 2, 3 or more children +collapse (mean) children_* [aw = dwt], by(year demMaleFlag) + +foreach varname in children_0 children_1 children_2 children_3p { + + rename `varname' valid_`varname' + +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year demAge idBu children_* demMaleFlag using /// + "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,18,65) + +collapse (mean) children_*, by(run year demMaleFlag) + +rename children_3plus children_3p + +collapse (mean) children_* /// + (sd) children_0_sd = children_0 /// + children_1_sd = children_1 /// + children_2_sd = children_2 /// + children_3p_sd = children_3p /// + , by(year demMaleFlag) + +foreach varname in children_0 children_1 children_2 children_3p { + + gen sim_`varname'_h = `varname' + 1.96*`varname'_sd + gen sim_`varname'_l = `varname' - 1.96*`varname'_sd + rename `varname' sim_`varname' + +} + +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figures +label var sim_children_0 "No children" +label var sim_children_1 "1 child" +label var sim_children_2 "2 children" +label var sim_children_3p "3+ children" + +* Males + +preserve + +keep if demMaleFlag == 1 + +twoway (rarea sim_children_0_h sim_children_0_l year, /// + sort color(green%20) legend(label(1 "No children, SimPaths"))) /// +(line valid_children_0 year, sort color(green) /// + legend(label(2 "No children, SILC"))) /// + (rarea sim_children_1_h sim_children_1_l year, sort color(blue%20) /// + legend(label(3 "1 child, SimPaths"))) /// +(line valid_children_1 year, sort color(blue) /// + legend(label(4 "1 child, SILC"))) /// +(rarea sim_children_2_h sim_children_2_l year, sort color(red%20) /// + legend(label(5 "2 children, SimPaths"))) /// +(line valid_children_2 year, sort color(red) /// + legend(label(6 "2 children, SILC"))) /// +(rarea sim_children_3p_h sim_children_3p_l year, sort color(grey%20) /// + legend(label(7 "3+ children, SimPaths"))) /// +(line valid_children_3p year, sort color(grey) /// + legend(label(8 "3+ children, SILC"))), /// + title("Number of Children") /// + subtitle("Ages 18-65, males") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) ///) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Individual observations plotted.", size(vsmall)) + +* Save figure +graph export "$dir_output_files/children/validation_${country}_children_ts_18_65_male.jpg", /// + replace width(2400) height(1350) quality(100) + +restore + +* Females + +keep if demMaleFlag== 0 + +twoway (rarea sim_children_0_h sim_children_0_l year, /// + sort color(green%20) legend(label(1 "No children, SimPaths"))) /// +(line valid_children_0 year, sort color(green) /// + legend(label(2 "No children, SILC"))) /// + (rarea sim_children_1_h sim_children_1_l year, sort color(blue%20) /// + legend(label(3 "1 child, SimPaths"))) /// +(line valid_children_1 year, sort color(blue) /// + legend(label(4 "1 child, SILC"))) /// +(rarea sim_children_2_h sim_children_2_l year, sort color(red%20) /// + legend(label(5 "2 children, SimPaths"))) /// +(line valid_children_2 year, sort color(red) /// + legend(label(6 "2 children, SILC"))) /// +(rarea sim_children_3p_h sim_children_3p_l year, sort color(grey%20) /// + legend(label(7 "3+ children, SimPaths"))) /// +(line valid_children_3p year, sort color(grey) /// + legend(label(8 "3+ children, SILC"))), /// + title("Number of Children") /// + subtitle("Ages 18-65, females") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) ///) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: Individual observations plotted.", size(vsmall)) + +* Save figure +graph export "$dir_output_files/children/validation_${country}_children_ts_18_65_female.jpg", /// + replace width(2400) height(1350) quality(100) + + +******************************************************************************** +* 1.3 : Mean values over time, working age (18-65), children < 3 +******************************************************************************** + +* Prepare validation data +use year demAge idBu dwt demNChild0to2 using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) + +gen child02 = . +replace child02 = 0 if demNChild0to2 == 0 +replace child02 = 1 if demNChild0to2 > 0 & demNChild0to2 != . + +* Calculate weighted share of benefit units with 0, 1, 2, 3 or more children +collapse (mean) child02 [aw = dwt], by(year) + +foreach varname in child02 { + + rename `varname' valid_`varname' + +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year demAge idBu sim_demNChild0to2 using /// + "$dir_data/simulation_sample.dta", clear + +gen sim_child02 = . +replace sim_child02 = 0 if sim_demNChild0to2 == 0 +replace sim_child02 = 1 if sim_demNChild0to2 > 0 & sim_demNChild0to2 != . + +collapse (mean) sim_child02, by(run year) + +collapse (mean) sim_child02 /// + (sd) sim_child02_sd = sim_child02 /// + , by(year) + +foreach varname in sim_child02 { + + gen `varname'_h = `varname' + 1.96*`varname'_sd + gen `varname'_l = `varname' - 1.96*`varname'_sd + rename `varname' sim_`varname' + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + + +* Plot figures +twoway (rarea sim_child02_h sim_child02_l year, /// + sort color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_child02 year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("Share With Child 0-2 Years Old") /// + subtitle("Ages 18-65") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes:", size(vsmall)) + +* Save figure +graph export "$dir_output_files/children/validation_${country}_young_child_ts_18_65_both.jpg", /// + replace width(2400) height(1350) quality(100) + +graph drop _all + + +******************************************************************************** +* 1.4 : Mean values over time, working age (18-65), children < 3, by gender +******************************************************************************** + +* Prepare validation data +use year demAge idBu dwt demNChild0to2 demMaleFlag using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) + +gen child02 = . +replace child02 = 0 if demNChild0to2 == 0 +replace child02 = 1 if demNChild0to2 > 0 & demNChild0to2 != . + +* Calculate weighted share of benefit units with 0, 1, 2, 3 or more children +collapse (mean) child02 [aw = dwt], by(year demMaleFlag) + +foreach varname in child02 { + + rename `varname' valid_`varname' + +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year demAge idBu sim_demNChild0to2 demMaleFlag using /// + "$dir_data/simulation_sample.dta", clear + +gen sim_child02 = . +replace sim_child02 = 0 if sim_demNChild0to2 == 0 +replace sim_child02 = 1 if sim_demNChild0to2 > 0 & sim_demNChild0to2 != . + +collapse (mean) sim_child02, by(run year demMaleFlag) + +collapse (mean) sim_child02 /// + (sd) sim_child02_sd = sim_child02 /// + , by(year demMaleFlag) + +foreach varname in sim_child02 { + + gen `varname'_h = `varname' + 1.96*`varname'_sd + gen `varname'_l = `varname' - 1.96*`varname'_sd + rename `varname' sim_`varname' + +} + +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + + +* Plot figures + +* Males +preserve + +keep if demMaleFlag == 1 + +twoway (rarea sim_child02_h sim_child02_l year, /// + sort color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_child02 year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("Share With Child 0-2 Years Old") /// + subtitle("Ages 18-65, males") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: ", size(vsmall)) + +* Save figure +graph export "$dir_output_files/children/validation_${country}_young_child_ts_18_65_male.jpg", /// + replace width(2400) height(1350) quality(100) + +restore + +* Females +keep if demMaleFlag == 0 + +twoway (rarea sim_child02_h sim_child02_l year, /// + sort color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_child02 year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("Share With Child 0-2 Years Old") /// + subtitle("Ages 18-65, females") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: ", size(vsmall)) + +* Save figure +graph export "$dir_output_files/children/validation_${country}_young_child_ts_18_65_female.jpg", /// + replace width(2400) height(1350) quality(100) + + +graph drop _all + + + +******************************************************************************** +* 1.5 : Mean values over time, working age (18-65), new born child +******************************************************************************** + +* Prepare validation data +use year demAge idBu dwt demNChild0 using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) + +gen child0 = . +replace child0 = 0 if demNChild0 == 0 +replace child0 = 1 if demNChild0 > 0 & demNChild0 != . + +* Calculate weighted share of benefit units with 0, 1, 2, 3 or more children +collapse (mean) child0 [aw = dwt], by(year) + +foreach varname in child0 { + + rename `varname' valid_`varname' + +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year demAge idBu sim_demNChild0 using /// + "$dir_data/simulation_sample.dta", clear + +gen sim_child0 = . +replace sim_child0 = 0 if sim_demNChild0 == 0 +replace sim_child0 = 1 if sim_demNChild0 > 0 & sim_demNChild0 != . + +collapse (mean) sim_child0, by(run year) + +collapse (mean) sim_child0 /// + (sd) sim_child0_sd = sim_child0 /// + , by(year) + +foreach varname in sim_child0 { + + gen `varname'_h = `varname' + 1.96*`varname'_sd + gen `varname'_l = `varname' - 1.96*`varname'_sd + rename `varname' sim_`varname' + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + + +* Plot figures +twoway (rarea sim_child0_h sim_child0_l year, /// + sort color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_child0 year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("Share With New Born Child") /// + subtitle("Ages 18-65") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: COnstructed from benefit unit information.", size(vsmall)) + +* Save figure +graph export "$dir_output_files/children/validation_${country}_new_born_child_ts_18_65_both.jpg", /// + replace width(2400) height(1350) quality(100) + +graph drop _all + + +******************************************************************************** +* 1.6 : Mean values over time, working age (18-65), new born child, by gender +******************************************************************************** + +* Prepare validation data +use year demAge idBu dwt demNChild0 demMaleFlag using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) + +gen child0 = . +replace child0 = 0 if demNChild0 == 0 +replace child0 = 1 if demNChild0 > 0 & demNChild0 != . + +* Calculate weighted share of benefit units +collapse (mean) child0 [aw = dwt], by(year demMaleFlag) + +foreach varname in child0 { + + rename `varname' valid_`varname' + +} + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year demAge idBu sim_demNChild0 demMaleFlag using /// + "$dir_data/simulation_sample.dta", clear + +gen sim_child0 = . +replace sim_child0 = 0 if sim_demNChild0 == 0 +replace sim_child0 = 1 if sim_demNChild0 > 0 & sim_demNChild0 != . + +collapse (mean) sim_child0, by(run year demMaleFlag) + +collapse (mean) sim_child0 /// + (sd) sim_child0_sd = sim_child0 /// + , by(year demMaleFlag) + +foreach varname in sim_child0 { + + gen `varname'_h = `varname' + 1.96*`varname'_sd + gen `varname'_l = `varname' - 1.96*`varname'_sd + rename `varname' sim_`varname' + +} + +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + + +* Plot figures + +* Males +preserve + +keep if demMaleFlag == 1 + +twoway (rarea sim_child0_h sim_child0_l year, /// + sort color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_child0 year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("Share With New Born Child") /// + subtitle("Ages 18-65, males") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: ", size(vsmall)) + +* Save figure +graph export "$dir_output_files/children/validation_${country}_new_born_child_ts_18_65_male.jpg", /// + replace width(2400) height(1350) quality(100) + +restore + +* Females +keep if demMaleFlag == 0 + +twoway (rarea sim_child0_h sim_child0_l year, /// + sort color(green%20) legend(label(1 "SimPaths"))) /// +(line valid_child0 year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("Share With New Born Child") /// + subtitle("Ages 18-65, females") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: ", size(vsmall)) + +* Save figure +graph export "$dir_output_files/children/validation_${country}_new_bron_child_ts_18_65_female.jpg", /// + replace width(2400) height(1350) quality(100) + + + +******************************************************************************** +* 1.3 : Have child +******************************************************************************** + +* Prepare validation data +use year valid_new_born dwt demMaleFlag demAge using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if demMaleFlag == 0 +keep if inrange(demAge,18,40) + +* Calculate share by year +collapse (mean) valid_new_born [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulated data +use run year sim_new_born demMaleFlag demAge using /// + "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,18,40) +keep if demMaleFlag == 0 + +collapse (mean) sim_new_born, by(run year) + +collapse (mean) sim_new_born /// + (sd) sim_new_born_sd = sim_new_born /// + , by(year) + +foreach varname in sim_new_born { + + gen `varname'_h = `varname' + 1.96*`varname'_sd + gen `varname'_l = `varname' - 1.96*`varname'_sd + rename `varname' sim_`varname' + +} + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figures + +twoway (rarea sim_new_born_h sim_new_born_l year, /// + sort color(green%20) legend(label(1 "Simulated"))) /// +(line valid_new_born year, sort color(green) /// + legend(label(2 "SILC"))), /// + title("Share of Females with Who Have New Born Baby") /// + subtitle("Age 18-40") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + graphregion(color(white)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(small)) /// + note("Notes: ", size(vsmall)) + +* Save figure +graph export "$dir_output_files/children/validation_${country}_birth_ts_18_40_female.jpg", /// + replace width(2400) height(1350) quality(100) + +graph drop _all diff --git a/validation/02_simulation_validation/do_files/04_17_plot_disability.do b/validation/02_simulation_validation/do_files/04_17_plot_disability.do new file mode 100644 index 0000000..a003615 --- /dev/null +++ b/validation/02_simulation_validation/do_files/04_17_plot_disability.do @@ -0,0 +1,321 @@ +/******************************************************************************* +* PROJECT: SimPaths UEU +* SECTION: Validation +* OBJECT: Disability +* AUTHORS: Ashley Burdett +* LAST UPDATE: Jan 2026 +* COUNTRY: PL +******************************************************************************** +* NOTES: +*******************************************************************************/ + +******************************************************************************** +* 1 : Mean values over time +******************************************************************************** + +******************************************************************************** +* 1.1 : Mean values over time, working age (18-65) +******************************************************************************** + +* Prepare validation data +use year demAge dwt valid_healthDsblLongtermFlag using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) + +collapse (mean) valid_healthDsblLongtermFlag [aw = dwt], by(year) + +save "$dir_data/temp_valid_stats.dta", replace + + +* Prepare simulation data +use year demAge sim_healthDsblLongtermFlag run using /// + "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,18,65) + +collapse (mean) sim_healthDsblLongtermFlag, by(run year) + +collapse (mean) sim_healthDsblLongtermFlag /// + (sd) sim_healthDsblLongtermFlag_sd = sim_healthDsblLongtermFlag, /// + by(year) + +gen sim_healthDsblLongtermFlag_high = /// + sim_healthDsblLongtermFlag + 1.96*sim_healthDsblLongtermFlag_sd +gen sim_healthDsblLongtermFlag_low = /// + sim_healthDsblLongtermFlag - 1.96*sim_healthDsblLongtermFlag_sd + +merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen + + +* Plot figure +twoway (rarea sim_healthDsblLongtermFlag_high /// + sim_healthDsblLongtermFlag_low year, sort color(green%20) /// + legend(label(1 "Simulated "))) /// +(line valid_healthDsblLongtermFlag year, sort color(green) /// + legend(label(2 "SILC "))), /// + title("Disabled/Long-term Sick ") /// + subtitle("Ages 18-65") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: ", size(vsmall)) + +graph export /// +"$dir_output_files/disability/validation_${country}_disability_ts_18_65_both.jpg", /// + replace width(2400) height(1350) quality(100) + + +******************************************************************************** +* 1.2 : Mean values over time, working age (18-65), by gender +******************************************************************************** + +* Prepare validation data +use year demAge dwt valid_healthDsblLongtermFlag demMaleFlag using /// + "$dir_data/${country}-eusilc_validation_sample.dta", clear + +keep if inrange(demAge,18,65) + +collapse (mean) valid_healthDsblLongtermFlag [aw = dwt], by(year demMaleFlag) + +save "$dir_data/temp_valid_stats.dta", replace + +* Prepare simulation data +use year demAge sim_healthDsblLongtermFlag run demMaleFlag using /// + "$dir_data/simulation_sample.dta", clear + +keep if inrange(demAge,18,65) + +collapse (mean) sim_healthDsblLongtermFlag, by(run year demMaleFlag) + +collapse (mean) sim_healthDsblLongtermFlag /// + (sd) sim_healthDsblLongtermFlag_sd = sim_healthDsblLongtermFlag, /// + by(year demMaleFlag) + +gen sim_healthDsblLongtermFlag_high = sim_healthDsblLongtermFlag + /// + 1.96*sim_healthDsblLongtermFlag_sd + +gen sim_healthDsblLongtermFlag_low = sim_healthDsblLongtermFlag - /// + 1.96*sim_healthDsblLongtermFlag_sd + +merge 1:1 year demMaleFlag using "$dir_data/temp_valid_stats.dta", keep(3) nogen + +* Plot figure +twoway (rarea sim_healthDsblLongtermFlag_high /// + sim_healthDsblLongtermFlag_low year if demMaleFlag == 0, /// + sort color(green%20) legend(label(1 "Female, SimPaths"))) /// +(line valid_healthDsblLongtermFlag year if demMaleFlag == 0, sort color(green) /// + legend(label(2 "Female, SILC "))) /// + (rarea sim_healthDsblLongtermFlag_high sim_healthDsblLongtermFlag_low /// + year if demMaleFlag == 1, sort color(red%20) /// + legend(label(3 "Male, SimPaths"))) /// +(line valid_healthDsblLongtermFlag year if demMaleFlag == 1, sort color(red) /// + legend(label(4 "Male, SILC"))), /// + title("Disabled/Long-term Sick ") /// + subtitle("Ages 18-65") /// + xtitle("Year", size(small)) /// + ytitle("Share", size(small)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + note("Notes: ", size(vsmall)) + +graph export /// +"$dir_output_files/disability/validation_${country}_disability_ts_18_65_male_female.jpg", /// + replace width(2560) height(1440) quality(100) + + +******************************************************************************** +* 1.2 : Mean values over time, by age group +******************************************************************************** + +* Prepare validation data +use year demAge dwt valid_healthDsblLongtermFlag demMaleFlag demAge /// + ageGroup using "$dir_data/${country}-eusilc_validation_sample.dta", clear + +drop if ageGroup == 0 + +collapse (mean) valid_healthDsblLongtermFlag [aw=dwt], by(ageGroup year) + +drop if missing(ageGroup) + +reshape wide valid_healthDsblLongtermFlag, i(year) j(ageGroup) + +forvalues i = 1(1)8 { + + rename valid_healthDsblLongtermFlag`i' /// + valid_healthDsblLongtermFlag_`i' + +} + +save "$dir_data/temp_valid_stats_full.dta", replace + + +* Prepare simulation data +use run year sim_healthDsblLongtermFlag ageGroup using /// + "$dir_data/simulation_sample.dta", clear + +collapse (mean) sim_healthDsblLongtermFlag, by(ageGroup run year) + +drop if missing(ageGroup) + +reshape wide sim_healthDsblLongtermFlag, i(year run) j(ageGroup) + +forvalues i = 1(1)8 { + + rename sim_healthDsblLongtermFlag`i' sim_healthDsblLongtermFlag_`i' + +} + +collapse (mean) sim_healthDsblLongtermFlag* /// + (sd) sd_sim_healthDsblLongtermFlag_1 = sim_healthDsblLongtermFlag_1 /// + sd_sim_healthDsblLongtermFlag_2 = sim_healthDsblLongtermFlag_2 /// + sd_sim_healthDsblLongtermFlag_3 = sim_healthDsblLongtermFlag_3 /// + sd_sim_healthDsblLongtermFlag_4 = sim_healthDsblLongtermFlag_4 /// + sd_sim_healthDsblLongtermFlag_5 = sim_healthDsblLongtermFlag_5 /// + sd_sim_healthDsblLongtermFlag_6 = sim_healthDsblLongtermFlag_6 /// + sd_sim_healthDsblLongtermFlag_7 = sim_healthDsblLongtermFlag_7 /// + sd_sim_healthDsblLongtermFlag_8 = sim_healthDsblLongtermFlag_8 /// + , by(year) + +forvalues i = 1(1)8 { + + gen sim_healthDsblLongtermFlag_`i'_h = /// + sim_healthDsblLongtermFlag_`i' + 1.96*sd_sim_healthDsblLongtermFlag_`i' + gen sim_healthDsblLongtermFlag_`i'_l = /// + sim_healthDsblLongtermFlag_`i' - 1.96*sd_sim_healthDsblLongtermFlag_`i' + +} + +recast double year + +merge 1:1 year using "$dir_data/temp_valid_stats_full.dta", keep(3) nogen + +* Plot figures +foreach vble in "healthDsblLongtermFlag" { + + twoway (rarea sim_`vble'_1_h sim_`vble'_1_l year, /// + sort color(red%20) legend(label(1 "SimPaths") position(6) rows(1))) /// + (line valid_`vble'_1 year, sort legend(label(2 "SILC"))), /// + title("Age 15-19") /// + name(`vble'_1, replace) /// + ylabel(, labsize(vsmall)) /// + xlabel(, labsize(vsmall)) /// + ytitle("Share", size(small)) /// + xtitle("Year", size(small)) /// + legend(size(small)) /// + graphregion(color(white)) + + twoway (rarea sim_`vble'_2_h sim_`vble'_2_l year, /// + sort color(red%20) legend(label(1 "SimPaths") position(6) rows(1))) /// + (line valid_`vble'_2 year, sort legend(label(2 "SILC"))), /// + title("Age 20-24") /// + name(`vble'_2, replace) /// + ylabel(, labsize(vsmall)) /// + xlabel(, labsize(vsmall)) /// + ytitle("Share", size(small)) /// + xtitle("Year", size(small)) /// + legend(size(small)) /// + graphregion(color(white)) + + twoway (rarea sim_`vble'_3_h sim_`vble'_3_l year, /// + sort color(red%20) legend(label(1 "SimPaths") position(6) rows(1))) /// + (line valid_`vble'_3 year, sort legend(label(2 "SILC"))), /// + title("Age 25-29") /// + name(`vble'_3, replace) /// + ylabel(, labsize(vsmall)) /// + xlabel(, labsize(vsmall)) /// + ytitle("Share", size(small)) /// + xtitle("Year", size(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + graphregion(color(white)) + + twoway (rarea sim_`vble'_4_h sim_`vble'_4_l year, /// + sort color(red%20) legend(label(1 "SimPaths") position(6) rows(1))) /// + (line valid_`vble'_4 year, sort legend(label(2 "SILC"))), /// + title("Age 30-34") /// + name(`vble'_4, replace) /// + ylabel(, labsize(vsmall)) /// + xlabel(, labsize(vsmall)) /// + ytitle("Share", size(small)) /// + xtitle("Year", size(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + graphregion(color(white)) + + twoway (rarea sim_`vble'_5_h sim_`vble'_5_l year, /// + sort color(red%20) legend(label(1 "SimPaths") position(6) rows(1))) /// + (line valid_`vble'_5 year, sort legend(label(2 "SILC"))), /// + title("Age 35-39") /// + name(`vble'_5, replace) /// + ylabel(, labsize(vsmall)) /// + xlabel(, labsize(vsmall)) /// + ytitle("Share", size(small)) /// + xtitle("Year", size(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + graphregion(color(white)) + + twoway (rarea sim_`vble'_6_h sim_`vble'_6_l year, /// + sort color(red%20) legend(label(1 "SimPaths") position(6) /// + rows(1))) /// + (line valid_`vble'_6 year, sort legend(label(2 "SILC"))), /// + title("Age 40-59") /// + name(`vble'_6, replace) /// + ylabel(, labsize(vsmall)) /// + xlabel(, labsize(vsmall)) /// + ytitle("Share", size(small)) /// + xtitle("Year", size(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + graphregion(color(white)) + + twoway (rarea sim_`vble'_7_h sim_`vble'_7_l year, /// + sort color(red%20) legend(label(1 "SimPaths") position(6) rows(1))) /// + (line valid_`vble'_7 year, sort legend(label(2 "SILC"))), /// + title("Age 60-79") /// + name(`vble'_7, replace) /// + ylabel(, labsize(vsmall)) /// + xlabel(, labsize(vsmall)) /// + ytitle("Share", size(small)) /// + xtitle("Year", size(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + graphregion(color(white)) + + twoway (rarea sim_`vble'_8_h sim_`vble'_8_l year, /// + sort color(red%20) legend(label(1 "SimPaths") position(6) rows(1))) /// + (line valid_`vble'_8 year, sort legend(label(2 "SILC"))), /// + title("Age 80-100") /// + name(`vble'_8, replace) /// + ylabel(, labsize(vsmall)) /// + xlabel(, labsize(vsmall)) /// + ytitle("Share", size(small)) /// + xtitle("Year", size(small)) /// + legend(size(small)) /// + graphregion(color(white)) /// + graphregion(color(white)) + +} + +grc1leg healthDsblLongtermFlag_1 healthDsblLongtermFlag_2 /// + healthDsblLongtermFlag_3 healthDsblLongtermFlag_4 /// + healthDsblLongtermFlag_5 healthDsblLongtermFlag_6 /// + healthDsblLongtermFlag_7 healthDsblLongtermFlag_8, /// + title("Disabled/Long-term Sick by Age Group") /// + legendfrom(healthDsblLongtermFlag_1) /// + graphregion(color(white)) /// + ycomm /// + note("Notes:", size(vsmall)) + +graph export /// +"$dir_output_files/disability/validation_${country}_disability_ts_age_groups_both.jpg", /// + replace width(2400) height(1350) quality(100) + + +graph drop _all diff --git a/validation/02_simulation_validation/do_files/04_18_plot_alignment_targets.do b/validation/02_simulation_validation/do_files/04_18_plot_alignment_targets.do new file mode 100644 index 0000000..b24d423 --- /dev/null +++ b/validation/02_simulation_validation/do_files/04_18_plot_alignment_targets.do @@ -0,0 +1,154 @@ +/******************************************************************************* +* PROJECT: SimPaths EU +* SECTION: Validation +* OBJECT: Alignment targets (refactored variable names) +* AUTHORS: Ashley Burdett +* LAST UPDATE: 05/2026 (refactor) +* COUNTRY: Poland +******************************************************************************** +* NOTES: Refactored to use the new SimPaths EU variable names from +* AlignmentAdjustmentFactors1.csv (align* prefix). +*******************************************************************************/ + +* Ensure output folder exists +capture mkdir "$dir_output_files/alignment_targets" + +/******************************************************************************* +* IMPORT & PREPARE DATA +*******************************************************************************/ + +import delimited "$dir_simulated_data/AlignmentAdjustmentFactors1.csv", /// + clear varnames(1) case(preserve) + +* Restrict sample years +keep if time <= ${max_year} + +rename time year + +* Drop row id (not needed) +capture drop id_AlignmentAdjustmentFactors1 + +save "$dir_data/alignment_targets_raw.dta", replace + + +******************************************************************************** +* 1 : Simulated vs Target pairs +******************************************************************************** + +* Parallel lists: simulated var, target var, title, filename suffix +local sim_vars "alignDsblSimShare alignFertRateSim alignInSchoolSimShare alignRtrdSimShare alignPartnerSimShare alignEmpSimACFShare alignEmpSimACMShare alignEmpSimCouplesShare alignEmpSimFWithDepShare alignEmpSimMWithDepShare alignEmpSimSingleFShare alignEmpSimSingleMShare" + +local tgt_vars "alignDsblTgtShare alignFertRateTarget alignInSchoolTgtShare alignRtrdTgtShare alignPartnerTargetShare alignEmpTgtACFShare alignEmpTgtACMShare alignEmpTgtCouplesShare alignEmpTgtFWithDepShare alignEmpTgtMWithDepShare alignEmpTgtSingleFShare alignEmpTgtSingleMShare" + +local titles `" "Disability share" "Fertility rate" "In school share" "Retirement share" "Share cohabiting" "Employed share - adult children (female)" "Employed share - adult children (male)" "Employed share - couples" "Employed share - female with dependants" "Employed share - male with dependants" "Employed share - single females" "Employed share - single males" "' + +local filenames "disability_share fertility_rate inschool_share retirement_share share_cohabiting employed_ACFemale employed_ACMale employed_couples employed_femaleWithDep employed_maleWithDep employed_singleFemales employed_singleMales" + +local n : word count `sim_vars' + +forvalues i = 1/`n' { + + local sim : word `i' of `sim_vars' + local tgt : word `i' of `tgt_vars' + local ttl : word `i' of `titles' + local fn : word `i' of `filenames' + + use "$dir_data/alignment_targets_raw.dta", clear + keep run year `sim' `tgt' + + * Collapse across runs: mean and sd for simulated and target + collapse (mean) m_sim = `sim' m_tgt = `tgt' /// + (sd) sd_sim = `sim' sd_tgt = `tgt' /// + , by(year) + + * Replace missing sd (single-run years) with 0 so bands still plot + replace sd_sim = 0 if missing(sd_sim) + replace sd_tgt = 0 if missing(sd_tgt) + + * Build 95% bands + gen sim_h = m_sim + 1.96*sd_sim + gen sim_l = m_sim - 1.96*sd_sim + gen tgt_h = m_tgt + 1.96*sd_tgt + gen tgt_l = m_tgt - 1.96*sd_tgt + + twoway (rarea sim_h sim_l year, /// + sort color(green%20)) /// + (line m_sim year, sort color(green) lpattern(solid)) /// + (line m_tgt year, sort color(red) lpattern(dash)), /// + title("`ttl'") /// + xtitle("Year", size(small)) /// + ytitle("Value", size(small)) /// + graphregion(color(white)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(order(1 "Simulated" 3 "Target") /// + position(6) rows(1) size(small)) /// + note("Notes: Shaded area = mean +/- 1.96*sd across simulation runs.", /// + size(vsmall)) + + graph export /// + "$dir_output_files/alignment_targets/validation_${country}_alignment_`fn'.jpg", /// + replace width(2400) height(1350) quality(100) +} + + +******************************************************************************** +* 2 : Adjustment factors (no target) +******************************************************************************** + +local adj_vars "alignDsblAdj alignFertAdj alignInSchoolAdj alignPartnerAdj alignRtrdAdj alignUtilAdjACF alignUtilAdjACM alignUtilAdjCouple alignUtilAdjFWithDep alignUtilAdjMWithDep alignUtilAdjSingleF alignUtilAdjSingleM" + +local adj_titles `" "Disability adjustment factor" "Fertility adjustment factor" "In school adjustment factor" "Partnership adjustment factor" "Retirement adjustment factor" "Utility adjustment factor - adult children (female)" "Utility adjustment factor - adult children (male)" "Utility adjustment factor - couples" "Utility adjustment factor - female with dependants" "Utility adjustment factor - male with dependants" "Utility adjustment factor - single females" "Utility adjustment factor - single males" "' + +local adj_filenames "adj_disability adj_fertility adj_inschool adj_partnership adj_retirement adj_utility_ACFemale adj_utility_ACMale adj_utility_couples adj_utility_femaleWithDep adj_utility_maleWithDep adj_utility_singleFemales adj_utility_singleMales" + +local na : word count `adj_vars' + +forvalues i = 1/`na' { + + local v : word `i' of `adj_vars' + local ttl : word `i' of `adj_titles' + local fn : word `i' of `adj_filenames' + + use "$dir_data/alignment_targets_raw.dta", clear + capture confirm variable `v' + if _rc { + display as txt "Variable `v' not found in data - skipping." + continue + } + + keep run year `v' + + collapse (mean) m_adj = `v' /// + (sd) sd_adj = `v' /// + , by(year) + + replace sd_adj = 0 if missing(sd_adj) + + gen adj_h = m_adj + 1.96*sd_adj + gen adj_l = m_adj - 1.96*sd_adj + + twoway (rarea adj_h adj_l year, /// + sort color(blue%20)) /// + (line m_adj year, sort color(blue) lpattern(solid)), /// + yline(0, lcolor(black) lpattern(dot)) /// + title("`ttl'") /// + xtitle("Year", size(small)) /// + ytitle("Adjustment factor", size(small)) /// + graphregion(color(white)) /// + xlabel(, labsize(small)) /// + ylabel(, labsize(small)) /// + legend(order(1 "Simulated") /// + position(6) rows(1) size(small)) /// + note("Notes: Shaded area = mean +/- 1.96*sd across simulation runs.", /// + size(vsmall)) + + graph export /// + "$dir_output_files/alignment_targets/validation_${country}_`fn'.jpg", /// + replace width(2400) height(1350) quality(100) +} + +* Clean up +capture erase "$dir_data/alignment_targets_raw.dta" + +graph drop _all diff --git a/validation/02_simulation_validation/do_files/04_create_EU_SILC_validation_targets.do b/validation/02_simulation_validation/do_files/04_create_EU_SILC_validation_targets.do deleted file mode 100644 index 624570c..0000000 --- a/validation/02_simulation_validation/do_files/04_create_EU_SILC_validation_targets.do +++ /dev/null @@ -1,314 +0,0 @@ -/******************************************************************************* -* PROJECT: SimPaths EU -* SECTION: Validation -* OBJECT: Targets -* AUTHORS: Patryk Bronka, Ashley Burdett, Daria Popova -* LAST UPDATE: Nov 2025 (AB) -* COUNTRY: Poland -******************************************************************************** -* DESCRIPTION: This file creates validation targets using longitudinal EU -* SILC data -* generated by do files - - -* NOTES: Consider trimming gross labour income at the -* bottom of the file. -*******************************************************************************/ - -* Load initial populations (from 05_ weight adjusted for grossing) -use "$dir_init_pop_data/${country}_pooled_ipop.dta", clear - -// alternative option would be to use the estimation sample and after benefit units have been created "03_" - - -* Restrict sample to observations up to and including specified maximum year -fre swv -keep if swv <= $max_year -gen year = stm - - -** Age groups -gen ageGroup = . -replace ageGroup = 0 if dag >= 0 & dag <= 14 -replace ageGroup = 1 if dag >= 15 & dag <= 19 -replace ageGroup = 2 if dag >= 20 & dag <= 24 -replace ageGroup = 3 if dag >= 25 & dag <= 29 -replace ageGroup = 4 if dag >= 30 & dag <= 34 -replace ageGroup = 5 if dag >= 35 & dag <= 39 -replace ageGroup = 6 if dag >= 40 & dag <= 59 -replace ageGroup = 7 if dag >= 60 & dag <= 79 -replace ageGroup = 8 if dag >= 80 & dag <= 100 - -label def ageGrouplb /// - 0 "ageGroup_0_14" /// - 1 "ageGroup_15_19" /// - 2 "ageGroup_20_24" /// - 3 "ageGroup_25_29" /// - 4 "ageGroup_30_34" /// - 5 "ageGroup_35_39" /// - 6 "ageGroup_40_59" /// - 7 "ageGroup_60_79" /// - 8 "ageGroup_80_100" /// - -label val ageGroup ageGrouplb -fre ageGroup - -gen ageGroup2 = . -replace ageGroup2 = 0 if dag >= 16 & dag <= 24 -replace ageGroup2 = 1 if dag >= 25 & dag <= 29 -replace ageGroup2 = 2 if dag >= 30 & dag <= 34 -replace ageGroup2 = 3 if dag >= 35 & dag <= 39 -replace ageGroup2 = 4 if dag >= 40 & dag <= 44 -replace ageGroup2 = 5 if dag >= 45 & dag <= 49 -replace ageGroup2 = 6 if dag >= 50 & dag <= 54 -replace ageGroup2 = 7 if dag >= 55 & dag <= 59 -replace ageGroup2 = 8 if dag >= 60 & dag <= 65 - -label def ageGrouplb2 /// - 0 "ageGroup_16_24" /// - 1 "ageGroup_25_29" /// - 2 "ageGroup_30_34" /// - 3 "ageGroup_35_39" /// - 4 "ageGroup_40_44" /// - 5 "ageGroup_45_49" /// - 6 "ageGroup_50_54" /// - 7 "ageGroup_55_59" /// - 8 "ageGroup_60_65" /// - -label val ageGroup2 ageGrouplb2 -fre ageGroup2 - -** Sex -replace dgn = . if dgn < 0 - - -*** Income variables - all annual unless stated - -** Real gross income (non-benefit) -* by individual -gen ypnb = sinh(ypnbihs_dv) -gen valid_y_gross_ind_yr = ypnb * 12 - -* by benefit unit -bys stm idhh idbenefitunit: /// - egen valid_y_gross_bu_yr = total(valid_y_gross_ind_yr) - - -** Disposable income -* by individual -merge 1:1 idperson swv using "$dir_data/silc_ind_dispos_y.dta" -drop if _m == 2 -drop _m - -* by benefit unit -bys stm idhh idbenefitunit: /// - egen valid_y_disp_bu_yr = total(valid_y_disp_ind_yr) - - -** Real gross labour income -* by individual -gen y_gross_labour_ind_yr = sinh(yplgrs_dv) * 12 - -* by benefit unit -bys stm idhh idbenefitunit: /// - egen valid_y_gross_labour_bu_yr = total(y_gross_labour_ind_yr) - -** Real capital income -* by benefit unit -gen y_gross_capital_ind_yr = sinh(ypncp) * 12 - -* by benefit unit -bys stm idhh idbenefitunit: /// - egen valid_y_gross_capital_bu_yr = total(y_gross_capital_ind_yr) - - -* Equivalised disposable income per benefit unit -* Generate number of dependent children in a benefit unit -gen depChild = 1 if (dag >= 0 & dag < $age_become_responsible) -bys swv idhh idbenefitunit: egen dnc_bu = sum(depChild) - -gen depChild02 = 1 if (dag >= 0 & dag <= 2) -bys swv idhh idbenefitunit: egen dnc02_bu = sum(depChild02) - -lab var dnc02 "Number of dependent children 0 - 2" - -* Generate modified-OECD equivalence scale: 1 for the household head, 0.5 for -* additional adults, 0.3 for children < 14 years old -bys swv idhh idbenefitunit: gen people_in_bu = _N -cap drop child -gen child = (dag < 14) -bys swv idhh idbenefitunit: egen children_in_bu = total(child) -gen other_adults = people_in_bu - children_in_bu - 1 - // -1 for the household head - -gen equiv_factor = 1 + (0.5 * other_adults) + (0.3 * children_in_bu) - // Start with 1 because each household must have at least the head -lab var equiv_factor "OECD-modified scale equivalence factor" - -gen valid_y_eq_disp_bu_yr = valid_y_disp_bu_yr / equiv_factor - -drop child people_in_bu child children_in_bu other_adults dnc_bu dnc02_bu - - -** Annual income shares -/* -xtile obs_gross_income_group = valid_y_gross_bu_yr, nq(10) - This is not correct for pooled data - -Problem: if many observations have exactly the same value, xtile would group -them into a single decile, causing one or more deciles to have very few -observations. -Adding a very small random amount can help differentiate tied values enough to -distribute them more evenly across deciles without distorting the data -meaningfully. -*/ -gen valid_y_gross_bu_yr_jit = valid_y_gross_bu_yr + runiform() * 1e-5 - -forvalues stm = 2011/$max_year { - - xtile obs_gross_income_group_`stm' = valid_y_gross_bu_yr_jit if /// - depChild != 1 & stm == `stm', nq(10) - - bys idhh: egen temp_obs_gross_income_group_`stm' = /// - max(obs_gross_income_group_`stm') if stm == `stm' - - replace obs_gross_income_group_`stm' = /// - temp_obs_gross_income_group_`stm' if /// - missing(obs_gross_income_group_`stm') - drop temp_obs_gross_income_group_`stm' - -} - -* Unify into a single variable -egen obs_gross_income_group = rowtotal(obs_gross_income_group_2011 /// - obs_gross_income_group_2012 obs_gross_income_group_2013 /// - obs_gross_income_group_2014 obs_gross_income_group_2015 /// - obs_gross_income_group_2016 obs_gross_income_group_2017 /// - obs_gross_income_group_2018 obs_gross_income_group_2019 /// - obs_gross_income_group_2020 obs_gross_income_group_2021 /// - obs_gross_income_group_2022 obs_gross_income_group_2023) - -drop obs_gross_income_group_2* -bys stm: fre obs_gross_income_group - -** Activity status -* Activity dummies -gen valid_employed = (les_c4 == 1) -gen valid_student = (les_c4 == 2) -gen valid_inactive = (les_c4 == 3) -gen valid_retired = (les_c4 == 4) - -replace valid_employed = . if les_c4 < 0 | les_c4 == . -replace valid_student = . if les_c4 < 0 | les_c4 == . -replace valid_inactive = . if les_c4 < 0 | les_c4 == . -replace valid_retired = . if les_c4 < 0 | les_c4 == . - - -** Education level -* Attainment dummies -replace deh_c3 = 3 if ded == 1 - -gen valid_edu_high = (deh_c3 == 1) -gen valid_edu_med = (deh_c3 == 2) -gen valid_edu_low = (deh_c3 == 3) - -/* -replace valid_edu_high = . if deh_c3 == . | deh_c3 < 0 | deh_flag == 1 -replace valid_edu_med = . if deh_c3 == . | deh_c3 < 0 | deh_flag == 1 -replace valid_edu_low = . if deh_c3 == . | deh_c3 < 0 | deh_flag == 1 -*/ - -** Family -* Partnership status -gen valid_dcpst_p = (dcpst == 1) // partnered -gen valid_dcpst_s = (dcpst == 2) // single - -replace valid_dcpst_p = . if dcpst == . | dcpst < 0 -replace valid_dcpst_s = . if dcpst == . | dcpst < 0 - -* Number of children -gen children_0 = (dnc == 0) -gen children_1 = (dnc == 1) -gen children_2 = (dnc == 2) -gen children_3plus = (dnc >= 3 & dnc != .) - -* Interaction of partnership status and number of children -foreach var1 in valid_dcpst_p valid_dcpst_s { - - foreach var2 in children_0 children_1 children_2 children_3p { - - gen `var1'_`var2' = (`var1' & `var2') - - } - -} - - -** Self-rated health -replace dhe = . if dhe < 0 | dhe_flag == 1 - - -** Hours worked (weekly) - -* Impose consistency with les_c4 -* Prioritize les_c4 as we did with the European models -replace lhw = . if les_c4 != 1 - -count if (lhw == 0 | lhw == .) & les_c4 == 1 -// note that 0s could be generated from missing values in rowtotal function - -tab les_c4 if lhw != 0 & lhw != . - -gen hours = lhw -gen valid_lhw = lhw - - -* Labour supply categories -gen laboursupplyweekly_hu = "ZERO" -replace laboursupplyweekly_hu = "TWENTY" if hours >= 1 & hours < 40 -replace laboursupplyweekly_hu = "FORTY" if hours == 40 -replace laboursupplyweekly_hu = "FIFTY" if hours > 40 & !missing(hours) - - -* Hourly wages -gen valid_wage_hour = . -replace valid_wage_hour = obs_earnings_hourly -replace valid_wage_hour = . if obs_earnings_hourly < 0 - -replace valid_wage_hour = . if valid_wage_hour == 0 & les_c4 == 1 - - -* Minimum wage variable -gen min_wage = 8.66 if swv == 2011 -replace min_wage = 9.38 if swv == 2012 -replace min_wage = 10.00 if swv == 2013 -replace min_wage = 10.5 if swv == 2014 -replace min_wage = 10.64 if swv == 2015 -replace min_wage = 11.56 if swv == 2016 -replace min_wage = 12.50 if swv == 2017 -replace min_wage = 13.13 if swv == 2018 -replace min_wage = 14.06 if swv == 2019 -replace min_wage = 16.25 if swv == 2020 -replace min_wage = 17.50 if swv == 2021 -replace min_wage = 18.81 if swv == 2022 -replace min_wage = 22.8 if swv == 2023 - -* Convert into real terms -replace min_wage = min_wage * (CPI/100) - - -gen valid_wage_hour_adj_mw = valid_wage_hour -replace valid_wage_hour_adj_mw = min_wage if valid_wage_hour < min_wage - -replace valid_wage_hour_adj = valid_wage_hour -replace valid_wage_hour_adj = . if valid_wage_hour < min_wage - - -save "$dir_data/${country}-eusilc_validation_full_sample_long.dta", replace - - -* Restrict sample to individuals between min and max age defined in -* 00_master file -keep if dag>= $min_age & dag <= $max_age - -save "$dir_data/${country}-eusilc_validation_sample_long.dta", replace diff --git a/validation/02_simulation_validation/do_files/05_create_simulated_validation_targets.do b/validation/02_simulation_validation/do_files/05_create_simulated_validation_targets.do deleted file mode 100644 index ce394c8..0000000 --- a/validation/02_simulation_validation/do_files/05_create_simulated_validation_targets.do +++ /dev/null @@ -1,196 +0,0 @@ -******************************************************************************** -* PROJECT: SimPaths EU -* SECTION: Validation -* OBJECT: Validation targets -* AUTHORS: Ashley Burdett -* LAST UPDATE: 11/2025 (AB) -* COUNTRY: Poland - -* NOTES: This file creates validation targets using simulated data -* generated by do file: 01_prepare_simulated_data.do -******************************************************************************** - -* Generate required variables - -use "$dir_data/simulated_data_prep1.dta", clear - -* Sex -gen sim_sex = . -replace sim_sex = 1 if dgn == "Male" -replace sim_sex = 2 if dgn == "Female" - -la def sim_sex_lb 1 "Male" 2 "Female" -la val sim_sex sim_sex_lb - -* Hours worked weekly -replace hoursworkedweekly = "" if hoursworkedweekly == "null" -destring hoursworkedweekly, replace - -* Define age groups -gen ageGroup = . -replace ageGroup = 0 if dag >= 0 & dag <= 14 -replace ageGroup = 1 if dag >= 15 & dag <= 19 -replace ageGroup = 2 if dag >= 20 & dag <= 24 -replace ageGroup = 3 if dag >= 25 & dag <= 29 -replace ageGroup = 4 if dag >= 30 & dag <= 34 -replace ageGroup = 5 if dag >= 35 & dag <= 39 -replace ageGroup = 6 if dag >= 40 & dag <= 59 -replace ageGroup = 7 if dag >= 60 & dag <= 79 -replace ageGroup = 8 if dag >= 80 & dag <= 100 - - -label def ageGrouplb /// - 0 "ageGroup_0_14" /// - 1 "ageGroup_15_19" /// - 2 "ageGroup_20_24" /// - 3 "ageGroup_25_29" /// - 4 "ageGroup_30_34" /// - 5 "ageGroup_35_39" /// - 6 "ageGroup_40_59" /// - 7 "ageGroup_60_79" /// - 8 "ageGroup_80_100" /// - -label val ageGroup ageGrouplb - -gen ageGroup2 = . -replace ageGroup2 = 0 if dag >= 16 & dag <= 24 -replace ageGroup2 = 1 if dag >= 25 & dag <= 29 -replace ageGroup2 = 2 if dag >= 30 & dag <= 34 -replace ageGroup2 = 3 if dag >= 35 & dag <= 39 -replace ageGroup2 = 4 if dag >= 40 & dag <= 44 -replace ageGroup2 = 5 if dag >= 45 & dag <= 49 -replace ageGroup2 = 6 if dag >= 50 & dag <= 54 -replace ageGroup2 = 7 if dag >= 55 & dag <= 59 -replace ageGroup2 = 8 if dag >= 60 & dag <= 65 - -label def ageGrouplb2 /// - 0 "ageGroup_16_24" /// - 1 "ageGroup_25_29" /// - 2 "ageGroup_30_34" /// - 3 "ageGroup_35_39" /// - 4 "ageGroup_40_44" /// - 5 "ageGroup_45_49" /// - 6 "ageGroup_50_54" /// - 7 "ageGroup_55_59" /// - 8 "ageGroup_60_65" /// - -label val ageGroup2 ageGrouplb2 - -* Partnership status -gen sim_dcpst_p = (dcpst == "Partnered") -gen sim_dcpst_s = (dcpst == "Single") - -replace idpartner = "" if idpartner == "null" -destring idpartner , replace -gen sim_has_partner = (idpartner != .) - -* Number of children -gen child = (dag < $age_become_responsible) -bys run year idbenefitunit: egen sim_dnc = total(child) - -gen child02 = (dag < 3) -bys run year idbenefitunit: egen sim_dnc02 = total(child02) - -gen children_0 = (sim_dnc == 0) -gen children_1 = (sim_dnc == 1) -gen children_2 = (sim_dnc == 2) -gen children_3plus = (sim_dnc >= 3) - -* Intersect partnership status and number of children -foreach var1 in sim_dcpst_p sim_dcpst_s { - - foreach var2 in children_0 children_1 children_2 children_3p { - - gen `var1'_`var2' = (`var1' & `var2') - - } -} - -* Activity status -gen sim_employed = (les_c4 == "EmployedOrSelfEmployed") -gen sim_student = (les_c4 == "Student") -gen sim_inactive = (les_c4 == "NotEmployed") -gen sim_retired = (les_c4 == "Retired") - -* Disabled / LT sick -gen sim_dlltsd = (dlltsd == "True") - -* Education -gen sim_edu_high = (deh_c3 == "High") -gen sim_edu_med = (deh_c3 == "Medium") -gen sim_edu_low = (deh_c3 == "Low") - -/* -replace sim_edu_high = . if sim_student | dag < 18 -replace sim_edu_med = . if sim_student | dag < 18 -replace sim_edu_low = . if sim_student | dag < 18 -*/ - -* Hours of work -/* -For Poland these categries are symmetric cross genders. -*/ -* Hours of work -replace laboursupplyweekly = "" if laboursupplyweekly == "null" -replace laboursupplyweekly = "TWENTY" if laboursupplyweekly == "CATEGORY_PL_1" -replace laboursupplyweekly = "FORTY" if laboursupplyweekly == "CATEGORY_PL_2" -replace laboursupplyweekly = "FIFTY" if laboursupplyweekly == "CATEGORY_PL_3" - -gen lhw = . -replace lhw = 0 if laboursupplyweekly == "ZERO" - // Zero = 0 -replace lhw = 20 if laboursupplyweekly == "TWENTY" - // TWENTY = [1,39] -replace lhw = 40 if laboursupplyweekly == "FORTY" - // FORTY = 40 -replace lhw = 50 if laboursupplyweekly == "FIFTY" - // FIFTY = [41,inf)] - - -* Potential earnings -rename fulltimehourlyearningspotential potential_earnings_hourly - -gen sim_y_disp_yr_bu = disposableincomemonthly * 12 -gen sim_y_gross_yr_bu = grossincomemonthly * 12 - - -foreach var in yplgrs_dv_lvl ypncp_lvl ypnoab_lvl { - - replace `var' = `var' * 12 - rename `var' sim_`var' - -} - -gen sim_y_gross_yr = sim_yplgrs_dv_lvl + sim_ypnoab_lvl + sim_ypncp_lvl - -* Aggregate simulated values to benefit unit level. -* Note that sim_y_gross_yr and sim_y_disp_yr are already at BU level. -foreach observed_var in sim_yplgrs_dv_lvl sim_ypnoab_lvl sim_ypncp_lvl { - - bys run year idbenefitunit: egen `observed_var'_bu = total(`observed_var') - -} - -* Generate max age in BU -bys run year idbenefitunit: egen max_age_in_bu = max(dag) - -preserve - -* Restrict sample to observations up to and including specified maximum year -keep if year <= $max_year - -save "$dir_data/simulated_data_full.dta", replace - -restore - - -* Restrict sample to individuals aged 16+ to maintain consistency -keep if dag >= 16 - -* Restrict sample to individuals between min and max age defined in 00_master -keep if dag >= $min_age & dag <= $max_age - -* Restrict sample to observations up to and including specified maximum year -keep if year <= $max_year - -save "$dir_data/simulated_data.dta", replace diff --git a/validation/02_simulation_validation/do_files/06_01_plot_activity_status.do b/validation/02_simulation_validation/do_files/06_01_plot_activity_status.do deleted file mode 100644 index 0d22aa0..0000000 --- a/validation/02_simulation_validation/do_files/06_01_plot_activity_status.do +++ /dev/null @@ -1,2052 +0,0 @@ -/******************************************************************************* -* PROJECT: SimPaths -* SECTION: Validation -* OBJECT: Activity status plots -* AUTHORS: Patryk Bronka, Ashley Burdett -* LAST UPDATE: 11/2025 (AB) -* COUNTRY: Poland - -* NOTES: This do file plots validation graphs for economics activity -* status (4 cat). -*******************************************************************************/ - -******************************************************************************** -* 1 : Mean values over time -******************************************************************************** -******************************************************************************** -* 1.1 : Mean values over time - Economic Activity Status -******************************************************************************** -******************************************************************************** -* 1.1.1 : Young people (18-30) -******************************************************************************** -******************************************************************************** -* 1.1.1.1 : Young people (18-30), All -******************************************************************************** - -* Prepare validation data -use year dwt valid_employed valid_student valid_inactive dag /// - valid_retired using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -drop if dag > 30 - -collapse (mean) valid_employed valid_student valid_inactive valid_retired /// - [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_employed sim_student sim_inactive sim_retired dag using /// - "$dir_data/simulated_data.dta", clear - -drop if dag > 30 - -collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// - by(run year) -collapse (mean) sim_employed sim_student sim_inactive sim_retired /// - (sd) sim_employed_sd = sim_employed /// - sim_student_sd = sim_student /// - sim_inactive_sd = sim_inactive /// - sim_retired_sd = sim_retired /// - , by(year) - -foreach varname in sim_employed sim_student sim_inactive sim_retired { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - - -collapse (mean) sim* valid*, by(year) - -* Plot figure -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Employed, simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "Employed, SILC"))) /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(3 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(4 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(5 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(6 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(7 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(8 "Retired, SILC"))), /// - title("Economic Activity Status") /// - subtitle("Ages ${min_age}-30") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -* Save figure -graph export /// - "$dir_output_files/economic_activity/validation_${country}_activity_status_ts_${min_age}_30_both.jpg", /// - replace width(2400) height(1350) quality(100) - - -******************************************************************************** -* 1.1.1.2 : Young people (18-30), By gender -******************************************************************************** - -** Male -* Prepare validation data -use year dwt valid_employed valid_student valid_inactive dag dgn /// - valid_retired using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", /// - clear - -drop if dag > 30 -drop if dgn == 0 - -collapse (mean) valid_employed valid_student valid_inactive valid_retired /// - [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_employed sim_student sim_inactive sim_retired dag dgn using /// - "$dir_data/simulated_data.dta", clear - -drop if dag > 30 -drop if dgn == "Female" - -collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// - by(run year) - -collapse (mean) sim_employed sim_student sim_inactive sim_retired /// - (sd) sim_employed_sd = sim_employed /// - sim_student_sd = sim_student /// - sim_inactive_sd = sim_inactive /// - sim_retired_sd = sim_retired /// - , by(year) - -* Compute 95% confidence interval -foreach varname in sim_employed sim_student sim_inactive sim_retired { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - - -collapse (mean) sim* valid*, by(year) - -* Plot figure -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Employed, simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "Employed, SILC"))) /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(3 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(4 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(5 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(6 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(7 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(8 "Retired, SILC"))), /// - title("Economic Activity Status") /// - subtitle("Ages ${min_age}-30, males ") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -* Save figure -graph export /// - "$dir_output_files/economic_activity/validation_${country}_activity_status_ts_${min_age}_30_male.jpg", /// - replace width(2400) height(1350) quality(100) - - -** Female - -* Prepare validation data -use year dwt valid_employed valid_student valid_inactive dag dgn /// - valid_retired using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -drop if dag > 30 -drop if dgn == 1 - -collapse (mean) valid_employed valid_student valid_inactive valid_retired /// - dgn [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_employed sim_student sim_inactive sim_retired dag dgn using /// - "$dir_data/simulated_data.dta", clear - -drop if dag > 30 -drop if dgn == "Male" - -collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// - by(run year) -collapse (mean) sim_employed sim_student sim_inactive sim_retired /// - (sd) sim_employed_sd = sim_employed /// - sim_student_sd = sim_student /// - sim_inactive_sd = sim_inactive /// - sim_retired_sd = sim_retired /// - , by(year) - -* Compute 95% confidence interval -foreach varname in sim_employed sim_student sim_inactive sim_retired { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - - -collapse (mean) sim* valid*, by(year) - -* Plot figure -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Employed, simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "Employed, SILC"))) /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(3 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(4 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(5 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(6 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(7 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(8 "Retired, SILC"))), /// - title("Economic Activity Status") /// - subtitle("Ages ${min_age}-30, females ") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -* Save figure -graph export /// - "$dir_output_files/economic_activity/validation_${country}_activity_status_ts_${min_age}_30_female.jpg", /// - replace width(2400) height(1350) quality(100) - - -******************************************************************************** -* 1.1.2 : Working age (18-65) -******************************************************************************** -******************************************************************************** -* 1.1.2.1 : Working age (18-65), All -******************************************************************************** - -* Prepare validation data -use year dwt valid_employed valid_student valid_inactive dgn dag /// - valid_retired using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -collapse (mean) valid_employed valid_student valid_inactive valid_retired /// - [aw = dwt], by(year dgn) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_employed sim_student sim_inactive sim_retired dgn using /// - "$dir_data/simulated_data.dta", clear - -gen dgn_coded = . -replace dgn_coded = 1 if dgn == "Male" -replace dgn_coded = 0 if dgn == "Female" - -drop dgn -rename dgn_coded dgn - -collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// - by(run year dgn) -collapse (mean) sim_employed sim_student sim_inactive sim_retired /// - (sd) sim_employed_sd = sim_employed /// - sim_student_sd = sim_student /// - sim_inactive_sd = sim_inactive /// - sim_retired_sd = sim_retired /// - , by(year dgn) - -foreach varname in sim_employed sim_student sim_inactive sim_retired { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year dgn using "$dir_data/temp_valid_stats.dta", keep(3) nogen - - -** All - -preserve - -collapse (mean) sim* valid*, by(year) - -* Plot figure -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Employed, simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "Employed, SILC"))) /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(3 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(4 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(5 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(6 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(7 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(8 "Retired, SILC"))), /// - title("Activity status") /// - subtitle("Ages ${min_age}-${max_age}") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -* Save figure -graph export /// - "$dir_output_files/economic_activity/validation_${country}_activity_status_ts_${min_age}_${max_age}_both.jpg", /// - replace width(2400) height(1350) quality(100) - -restore, preserve - -******************************************************************************** -* 1.1.2.2 : Working age (18-65), By gender -******************************************************************************** - -* Male -keep if dgn == 1 - -* Plot figure -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Employed, simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "Employed, SILC"))) /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(3 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(4 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(5 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(6 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(7 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(8 "Retired, SILC"))), /// - title("Activity status") /// - subtitle("Ages ${min_age}-${max_age}, males") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -* Save figure -graph export /// - "$dir_output_files/economic_activity/validation_${country}_activity_status_ts_${min_age}_${max_age}_male.jpg", /// - replace width(2400) height(1350) quality(100) - - -restore, preserve - - -* Female -keep if dgn == 0 - -* Plot figure -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Employed, simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "Employed, SILC"))) /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(3 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(4 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(5 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(6 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(7 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(8 "Retired, SILC"))), /// - title("Activity status") /// - subtitle("Ages ${min_age}-${max_age}, females") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/economic_activity/validation_${country}_activity_status_ts_${min_age}_${max_age}_female.jpg", /// - replace width(2400) height(1350) quality(100) - -restore - -******************************************************************************** -* 1.1.2.3 : Working age (18-65), By partnership status -******************************************************************************** - -* Prepare validation data -use year dwt valid_employed valid_student valid_inactive valid_retired dcpst /// - dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -collapse (mean) valid_employed valid_student valid_inactive valid_retired /// - [aw = dwt], by(year dcpst dgn) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_employed sim_student sim_inactive sim_retired dcpst dgn /// - using "$dir_data/simulated_data.dta", clear - -gen dcpst_coded = . -replace dcpst_coded = 1 if dcpst == "Partnered" -replace dcpst_coded = 2 if dcpst == "SingleNeverMarried" - -drop dcpst -rename dcpst_coded dcpst - -gen dgn_coded = . -replace dgn_coded = 1 if dgn == "Male" -replace dgn_coded = 0 if dgn == "Female" - -drop dgn -rename dgn_coded dgn - -collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// - by(run year dcpst dgn) - -collapse (mean) sim_employed sim_student sim_inactive sim_retired /// - (sd) sim_employed_sd = sim_employed /// - sim_student_sd = sim_student /// - sim_inactive_sd = sim_inactive /// - sim_retired_sd = sim_retired /// - , by(year dcpst dgn) - -foreach varname in sim_employed sim_student sim_inactive sim_retired { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year dcpst dgn using "$dir_data/temp_valid_stats.dta", keep(3) nogen - - -** All - -preserve - -collapse (mean) sim* valid*, by(year dcpst) - -* Plot figure: dcpst == 1, partnered -keep if dcpst == 1 - -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Employed, simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "Employed, SILC"))) /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(3 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(4 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(5 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(6 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(7 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(8 "Retired, SILC"))), /// -title("Activity status") /// - subtitle("Ages ${min_age}-${max_age}, partnered") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -graph export /// -"$dir_output_files/economic_activity/validation_${country}_activity_status_ts_${min_age}_${max_age}_both_partnered.jpg", /// - replace width(2400) height(1350) quality(100) - -restore, preserve - -collapse (mean) sim* valid*, by(year dcpst) - -* Plot figure: dcpst == 2, single -keep if dcpst == 2 - -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Employed, simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "Employed, SILC"))) /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(3 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(4 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(5 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(6 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(7 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(8 "Retired, SILC"))), /// - title("Activity status") /// - subtitle("Ages ${min_age}-${max_age}, single") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -graph export /// -"$dir_output_files/economic_activity/validation_${country}_activity_status_ts_${min_age}_${max_age}_both_single.jpg", /// - replace width(2400) height(1350) quality(100) - - -restore - -******************************************************************************** -* 1.1.2.4 : Working age (18-65), By gender and partnership status -******************************************************************************** - -** Males - -* Plot figure: dcpst == 1, partnered -preserve - -keep if dcpst == 1 & dgn == 1 - -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Employed, simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "Employed, SILC"))) /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(3 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(4 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(5 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(6 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(7 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(8 "Retired, SILC"))), /// - title("Activity status") /// - subtitle("Ages ${min_age}-${max_age}, partnered males") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -graph export /// -"$dir_output_files/economic_activity/validation_${country}_activity_status_ts_${min_age}_${max_age}_male_partnered.jpg", /// - replace width(2400) height(1350) quality(100) - -restore, preserve - -* Plot figure: dcpst == 2, single -keep if dcpst == 2 & dgn == 1 - -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Employed, simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "Employed, SILC"))) /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(3 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(4 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(5 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(6 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(7 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(8 "Retired, SILC"))), /// - title("Activity status") /// - subtitle("Ages ${min_age}-${max_age}, single males") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -graph export /// -"$dir_output_files/economic_activity/validation_${country}_activity_status_ts_${min_age}_${max_age}_male_single.jpg", /// - replace width(2400) height(1350) quality(100) - -restore - - -** Females - -* Plot figure: dcpst == 1, partnered -preserve - -keep if dcpst == 1 & dgn == 0 - -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Employed, simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "Employed, SILC"))) /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(3 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(4 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(5 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(6 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(7 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(8 "Retired, SILC"))), /// - title("Activity status") /// - subtitle("Ages ${min_age}-${max_age}, partnered females") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -graph export /// -"$dir_output_files/economic_activity/validation_${country}_activity_status_ts_${min_age}_${max_age}_female_partnered.jpg", /// - replace width(2400) height(1350) quality(100) - -restore, preserve - -* Plot figure: dcpst == 2, single -keep if dcpst == 2 & dgn == 0 - -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Employed, simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "Employed, SILC"))) /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(3 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(4 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(5 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(6 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(7 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(8 "Retired, SILC"))), /// - title("Activity status") /// - subtitle("Ages ${min_age}-${max_age}, single females") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -graph export /// -"$dir_output_files/economic_activity/validation_${country}_activity_status_ts_${min_age}_${max_age}_female_single.jpg", /// - replace width(2400) height(1350) quality(100) - -restore - - -******************************************************************************** -* 1.1.3 : Before female state pension age (18-60) -******************************************************************************** -******************************************************************************** -* 1.1.3.1 : Before female state pension age (18-60), Females -******************************************************************************** - -* Prepare validation data -use year dwt valid_employed valid_student valid_inactive dgn dag /// - valid_retired using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -drop if dag > 60 - -collapse (mean) valid_employed valid_student valid_inactive valid_retired /// - [aw = dwt], by(year dgn) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_employed sim_student sim_inactive sim_retired dgn dag using /// - "$dir_data/simulated_data.dta", clear - -drop if dag > 60 - -gen dgn_coded = . -replace dgn_coded = 1 if dgn == "Male" -replace dgn_coded = 0 if dgn == "Female" - -drop dgn -rename dgn_coded dgn - -collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// - by(run year dgn) -collapse (mean) sim_employed sim_student sim_inactive sim_retired /// - (sd) sim_employed_sd = sim_employed /// - sim_student_sd = sim_student /// - sim_inactive_sd = sim_inactive /// - sim_retired_sd = sim_retired /// - , by(year dgn) - -foreach varname in sim_employed sim_student sim_inactive sim_retired { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year dgn using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Females -keep if dgn == 0 - -* Plot figure -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Employed, simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "Employed, SILC"))) /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(3 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(4 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(5 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(6 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(7 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(8 "Retired, SILC"))), /// - title("Activity status") /// - subtitle("Ages ${min_age}-60, females") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/economic_activity/validation_${country}_activity_status_ts_${min_age}_60_female.jpg", /// - replace width(2400) height(1350) quality(100) - - -******************************************************************************** -* 1.1.4 : All ages -******************************************************************************** -******************************************************************************** -* 1.1.4 : All ages, All -******************************************************************************** - -* Prepare validation data -use year dwt valid_employed valid_student valid_inactive dag dgn /// - valid_retired using /// - "$dir_data/${country}-eusilc_validation_full_sample_long.dta", clear - -collapse (mean) valid_employed valid_student valid_inactive valid_retired /// - [aw = dwt], by(year dgn) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_employed sim_student sim_inactive sim_retired dgn using /// - "$dir_data/simulated_data_full.dta", clear - -gen dgn_coded = . -replace dgn_coded = 1 if dgn == "Male" -replace dgn_coded = 0 if dgn == "Female" - -drop dgn -rename dgn_coded dgn - -collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// - by(run year dgn) - -collapse (mean) sim_employed sim_student sim_inactive sim_retired /// - (sd) sim_employed_sd = sim_employed /// - sim_student_sd = sim_student /// - sim_inactive_sd = sim_inactive /// - sim_retired_sd = sim_retired /// - , by(year dgn) - -foreach varname in sim_employed sim_student sim_inactive sim_retired { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year dgn using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -** All - -preserve - -collapse (mean) sim* valid*, by(year) - -* Plot figure -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Employed, simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "Employed, SILC"))) /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(3 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(4 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(5 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(6 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(7 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(8 "Retired, SILC"))), /// - title("Activity status") /// - subtitle("All ages") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/economic_activity/validation_${country}_activity_status_ts_all_both.jpg", /// - replace width(2400) height(1350) quality(100) - -restore, preserve - -******************************************************************************** -* 1.1.4.2 : Adults (18+), By gender -******************************************************************************** - -keep if dgn == 1 - -* Plot figure -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Employed, simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "Employed, SILC"))) /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(3 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(4 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(5 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(6 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(7 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(8 "Retired, SILC"))), /// - title("Activity status") /// - subtitle("All ages, males") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/economic_activity/validation_${country}_activity_status_ts_all_male.jpg", /// - replace width(2400) height(1350) quality(100) - - -restore, preserve - -** Females - -keep if dgn == 0 - -* Plot figure -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Employed, simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "Employed, SILC"))) /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(3 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(4 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(5 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(6 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(7 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(8 "Retired, SILC"))), /// - title("Activity status") /// - subtitle("All ages, females") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/economic_activity/validation_${country}_activity_status_ts_all_female.jpg", /// - replace width(2400) height(1350) quality(100) - -restore - -graph drop _all - - -******************************************************************************** -* 1.2 : Mean values over time - Employed -******************************************************************************** -******************************************************************************** -* 1.2.1 : Employed, Working age(18-65) -******************************************************************************** -******************************************************************************** -* 1.2.1.1 : Employed, Working age(18-65), All -******************************************************************************** - -* Prepare validation data -use year dwt valid_employed valid_student valid_inactive dgn /// - valid_retired using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -collapse (mean) valid_employed valid_student valid_inactive valid_retired /// - [aw = dwt], by(year dgn) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_employed sim_student sim_inactive sim_retired dgn using /// - "$dir_data/simulated_data.dta", clear - -gen dgn_coded = . -replace dgn_coded = 1 if dgn == "Male" -replace dgn_coded = 0 if dgn == "Female" - -drop dgn -rename dgn_coded dgn - -collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// - by(run year dgn) - -collapse (mean) sim_employed sim_student sim_inactive sim_retired /// - (sd) sim_employed_sd = sim_employed /// - sim_student_sd = sim_student /// - sim_inactive_sd = sim_inactive /// - sim_retired_sd = sim_retired /// - , by(year dgn) - -foreach varname in sim_employed sim_student sim_inactive sim_retired { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year dgn using "$dir_data/temp_valid_stats.dta", keep(3) nogen - - -** All - -preserve - -collapse (mean) sim* valid*, by(year) - -* Plot figure -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Employed") /// - subtitle("Ages ${min_age}-${max_age}") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note(Notes:, size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/economic_activity/validation_${country}_employed_ts_${min_age}_${max_age}_both.jpg", /// - replace width(2400) height(1350) quality(100) - -restore, preserve - -******************************************************************************** -* 1.2.1.2 : Employed, Working age(18-65), By gender -******************************************************************************** - -keep if dgn == 1 - -* Plot figure -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 " SILC"))), /// - title("Employed") /// - subtitle("Ages ${min_age}-${max_age}, males") /// - xtitle("Year") /// - ytitle("Share") /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note(Notes:, size(vsmall)) - -* Save figure -graph export /// - "$dir_output_files/economic_activity/validation_${country}_employed_ts_${min_age}_${max_age}_male.jpg", /// - replace width(2400) height(1350) quality(100) - - -restore, preserve - - -** Females - -keep if dgn == 0 - -* Plot figure -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Employed") /// - subtitle("Ages ${min_age}-${max_age}, females") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note(Notes:, size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/economic_activity/validation_${country}_employed_ts_${min_age}_${max_age}_female.jpg", /// - replace width(2400) height(1350) quality(100) - -restore - -******************************************************************************** -* 1.2.2 : Employed, Before female state pension age (18-60) -******************************************************************************** -******************************************************************************** -* 1.2.2.1 : Employed Working age(18-65), Before female state pension age -* (18-60), Female -******************************************************************************** - -* Prepare validation data -use year dwt valid_employed valid_student valid_inactive dgn dag /// - valid_retired using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -drop if dag > 60 - -collapse (mean) valid_employed valid_student valid_inactive valid_retired /// - [aw = dwt], by(year dgn) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_employed sim_student sim_inactive sim_retired dgn dag using /// - "$dir_data/simulated_data.dta", clear - -drop if dag > 60 - -gen dgn_coded = . -replace dgn_coded = 1 if dgn == "Male" -replace dgn_coded = 0 if dgn == "Female" - -drop dgn -rename dgn_coded dgn - -collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// - by(run year dgn) - -collapse (mean) sim_employed sim_student sim_inactive sim_retired /// - (sd) sim_employed_sd = sim_employed /// - sim_student_sd = sim_student /// - sim_inactive_sd = sim_inactive /// - sim_retired_sd = sim_retired /// - , by(year dgn) - -foreach varname in sim_employed sim_student sim_inactive sim_retired { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year dgn using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Females -keep if dgn == 0 - -* Plot figure -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Employed") /// - subtitle("Ages ${min_age}-60") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note(Notes:, size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/economic_activity/validation_${country}_employed_ts_${min_age}_60_female.jpg", /// - replace width(2400) height(1350) quality(100) - - -******************************************************************************** -* 1.2.3 : Employed, All ages -******************************************************************************** -******************************************************************************** -* 1.2.3.1 : Employed, All ages , All -******************************************************************************** - -* Prepare validation data -use year dwt valid_employed valid_student valid_inactive dgn /// - valid_retired using /// - "$dir_data/${country}-eusilc_validation_full_sample_long.dta", /// - clear - -collapse (mean) valid_employed valid_student valid_inactive valid_retired /// - [aw = dwt], by(year dgn) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_employed sim_student sim_inactive sim_retired dgn using /// - "$dir_data/simulated_data_full.dta", clear - -gen dgn_coded = . -replace dgn_coded = 1 if dgn == "Male" -replace dgn_coded = 0 if dgn == "Female" - -drop dgn -rename dgn_coded dgn - -collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// - by(run year dgn) - -collapse (mean) sim_employed sim_student sim_inactive sim_retired /// - (sd) sim_employed_sd = sim_employed /// - sim_student_sd = sim_student /// - sim_inactive_sd = sim_inactive /// - sim_retired_sd = sim_retired /// - , by(year dgn) - -foreach varname in sim_employed sim_student sim_inactive sim_retired { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year dgn using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -** All - -preserve - -collapse (mean) sim* valid*, by(year) - -* Plot figure -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Employed") /// - subtitle("All ages") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note(Notes:, size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/economic_activity/validation_${country}_employed_ts_all_both.jpg", /// - replace width(2400) height(1350) quality(100) - -restore, preserve - - -******************************************************************************** -* 1.2.3.2 : Employed, All ages, By gender -******************************************************************************** - -keep if dgn == 1 - -* Plot figure -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Employed") /// - subtitle("All ages, males") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note(Notes:, size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/economic_activity/validation_${country}_employed_ts_all_male.jpg", /// - replace width(2400) height(1350) quality(100) - - -restore, preserve - - -** Females - -keep if dgn == 0 - -* Plot figure -twoway /// -(rarea sim_employed_high sim_employed_low year, sort color(green%20) /// - legend(label(1 "Simulated"))) /// -(line valid_employed year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Employed") /// - subtitle("All ages, females") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note(Notes:, size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/economic_activity/validation_${country}_employed_ts_all_female.jpg", /// - replace width(2400) height(1350) quality(100) - -restore - -graph drop _all - - -******************************************************************************** -* 1.2.3.3 : Employed, Adults (18+), By age group -******************************************************************************** - -* Prepare validation data -use year dwt dgn ageGroup valid_employed dag using /// - "$dir_data/${country}-eusilc_validation_full_sample_long.dta", clear - -gen employed_f = (valid_employed) if dgn == 0 -gen employed_m = (valid_employed) if dgn == 1 - -drop if ageGroup == 0 | ageGroup == 8 - -collapse (mean) employed_f employed_m [aweight=dwt], /// - by(ageGroup year) -drop if missing(ageGroup) -reshape wide employed_f employed_m, i(year) j(ageGroup) - -forvalues i = 1(1)7 { - rename employed_f`i' employed_f_`i'_valid - rename employed_m`i' employed_m_`i'_valid -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year sim_sex ageGroup sim_employed using /// - "$dir_data/simulated_data_full.dta", clear - -gen employed_f = (sim_employed) if sim_sex == 2 -gen employed_m = (sim_employed) if sim_sex == 1 - -collapse (mean) employed_f employed_m, by(ageGroup run year) -drop if missing(ageGroup) -reshape wide employed_f employed_m, i(year run) j(ageGroup) - -forvalues i = 1(1)7{ - rename employed_f`i' employed_f_`i'_sim - rename employed_m`i' employed_m_`i'_sim -} - -collapse (mean) employed* /// - (sd) sd_employed_f_1_sim = employed_f_1_sim /// - sd_employed_f_2_sim = employed_f_2_sim /// - sd_employed_f_3_sim = employed_f_3_sim /// - sd_employed_f_4_sim = employed_f_4_sim /// - sd_employed_f_5_sim = employed_f_5_sim /// - sd_employed_f_6_sim = employed_f_6_sim /// - sd_employed_f_7_sim = employed_f_7_sim /// - sd_employed_m_1_sim = employed_m_1_sim /// - sd_employed_m_2_sim = employed_m_2_sim /// - sd_employed_m_3_sim = employed_m_3_sim /// - sd_employed_m_4_sim = employed_m_4_sim /// - sd_employed_m_5_sim = employed_m_5_sim /// - sd_employed_m_6_sim = employed_m_6_sim /// - sd_employed_m_7_sim = employed_m_7_sim /// - , by(year) - - /* sd_employed_f_8_sim=employed_f_8_sim /// - sd_employed_m_8_sim=employed_m_8_sim /// */ - -forvalues i = 1(1)7 { - gen employed_f_`i'_sim_high = /// - employed_f_`i'_sim + 1.96*sd_employed_f_`i'_sim - gen employed_f_`i'_sim_low = /// - employed_f_`i'_sim - 1.96*sd_employed_f_`i'_sim - gen employed_m_`i'_sim_high = /// - employed_m_`i'_sim + 1.96*sd_employed_m_`i'_sim - gen employed_m_`i'_sim_low = /// - employed_m_`i'_sim - 1.96*sd_employed_m_`i'_sim -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figures -foreach vble in "employed_f" "employed_m" { - - twoway (rarea `vble'_1_sim_high `vble'_1_sim_low year, /// - sort color(green%20) /// - legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_1_valid year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Age 15-19") name(`vble'_1, replace) ylabel(0 [0.5] 1) /// - graphregion(color(white)) xtitle("") - - twoway (rarea `vble'_2_sim_high `vble'_2_sim_low year, /// - sort color(green%20) /// - legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_2_valid year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Age 20-24") name(`vble'_2, replace) ylabel(0 [0.5] 1) /// - graphregion(color(white)) xtitle("") - - twoway (rarea `vble'_3_sim_high `vble'_3_sim_low year, /// - sort color(green%20) /// - legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_3_valid year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Age 25-29") name(`vble'_3, replace) ylabel(0 [0.5] 1) /// - graphregion(color(white)) xtitle("") - - twoway (rarea `vble'_4_sim_high `vble'_4_sim_low year, /// - sort color(green%20) /// - legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_4_valid year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Age 30-34") name(`vble'_4, replace) ylabel(0 [0.5] 1) /// - graphregion(color(white)) xtitle("") - - twoway (rarea `vble'_5_sim_high `vble'_5_sim_low year, /// - sort color(green%20) /// - legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_5_valid year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Age 35-39") name(`vble'_5, replace) ylabel(0 [0.5] 1) /// - graphregion(color(white)) xtitle("") - - twoway (rarea `vble'_6_sim_high `vble'_6_sim_low year, /// - sort color(green%20) /// - legend(label(1 "Simulated") position(6) /// - rows(1)))(line `vble'_6_valid year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Age 40-59") name(`vble'_6, replace) ylabel(0 [0.5] 1) /// - graphregion(color(white)) xtitle("") - - twoway (rarea `vble'_7_sim_high `vble'_7_sim_low year, /// - sort color(green%20) /// - legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_7_valid year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Age 60-79") name(`vble'_7, replace) ylabel(0 [0.5] 1) /// - graphregion(color(white)) xtitle("") - -} - - -* Save figures -* Share employed women -grc1leg employed_f_1 employed_f_2 employed_f_3 employed_f_4 employed_f_5 /// - employed_f_6 employed_f_7 , title("Employment rate by age") /// - subtitle("Females") /// - legendfrom(employed_f_1) /// - graphregion(color(white)) /// - note("Notes: ", size(vsmall)) - -graph export /// -"$dir_output_files/economic_activity/validation_${country}_employed_ts_all_female.jpg", /// - replace width(2400) height(1350) quality(100) - - -* Share employed men -grc1leg employed_m_1 employed_m_2 employed_m_3 employed_m_4 employed_m_5 /// - employed_m_6 employed_m_7 , title("Employment rate by age") /// - subtitle("Males") /// - legendfrom(employed_m_1) /// - graphregion(color(white)) /// - note("Notes: ", size(vsmall)) - - -graph export /// -"$dir_output_files/economic_activity/validation_${country}_employed_ts_all_male.jpg", /// - replace width(2400) height(1350) quality(100) - -graph drop _all - - -******************************************************************************** -* 1.3 : Mean values over time - Not employed -******************************************************************************** -******************************************************************************** -* 1.3.1 : Not employed, Working age (18-65) -******************************************************************************** -******************************************************************************** -* 1.3.1.1 : Not employed, Working age (18-65), All -******************************************************************************** - -* Prepare validation data -use year dwt valid_employed valid_student valid_inactive dgn /// - valid_retired using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -collapse (mean) valid_employed valid_student valid_inactive valid_retired /// - [aw = dwt], by(year dgn) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_employed sim_student sim_inactive sim_retired dgn using /// - "$dir_data/simulated_data.dta", clear - -gen dgn_coded = . -replace dgn_coded = 1 if dgn == "Male" -replace dgn_coded = 0 if dgn == "Female" - -drop dgn -rename dgn_coded dgn - -collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// - by(run year dgn) -collapse (mean) sim_employed sim_student sim_inactive sim_retired /// - (sd) sim_employed_sd = sim_employed /// - sim_student_sd = sim_student /// - sim_inactive_sd = sim_inactive /// - sim_retired_sd = sim_retired /// - , by(year dgn) - -foreach varname in sim_employed sim_student sim_inactive sim_retired { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year dgn using "$dir_data/temp_valid_stats.dta", keep(3) nogen - - -** All - -preserve - -collapse (mean) sim* valid*, by(year) - -* Plot figure -twoway /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(1 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(2 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(3 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(4 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(5 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(6 "Retired, SILC"))), /// - title("Activity status of those not employed") /// - subtitle("Ages ${min_age}-${max_age}") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.) minus students and retired. ", /// - size(vsmall)) - -* Save figure -graph export /// - "$dir_output_files/economic_activity/validation_${country}_activity_status_not_employed_ts_${min_age}_${max_age}_both.jpg", /// - replace width(2400) height(1350) quality(100) - -restore, preserve - -******************************************************************************** -* 1.3.1.2 : Not employed, Working age (18-65), By gender -******************************************************************************** - -keep if dgn == 1 - -* Plot figure -twoway /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(1 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(2 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(3 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(4 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(5 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(6 "Retired, SILC"))), /// - title("Activity status of those not employed") /// - subtitle("Ages ${min_age}-${max_age}, males") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -* Save figure -graph export /// - "$dir_output_files/economic_activity/validation_${country}_activity_status_not_employed_ts_${min_age}_${max_age}_male.jpg", /// - replace width(2400) height(1350) quality(100) - - -restore, preserve - - -** Females - -keep if dgn == 0 - -* Plot figure -twoway /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(1 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(2 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(3 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(4 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(5 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(6 "Retired, SILC"))), /// - title("Activity status of those not employed") /// - subtitle("Ages ${min_age}-${max_age}, females") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/economic_activity/validation_${country}_activity_status_not_employed_ts_${min_age}_${max_age}_female.jpg", /// - replace width(2400) height(1350) quality(100) - -restore - - -******************************************************************************** -* 1.3.2 : Not employed, Before female state pension age (18-60) -******************************************************************************** -******************************************************************************** -* 1.3.2.1 : Not employed Working age(18-65), Before female state pension age -* (18-60), Female -******************************************************************************** - -* Prepare validation data -use year dwt valid_employed valid_student valid_inactive dgn dag /// - valid_retired using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", /// - clear - -drop if dag > 60 - -collapse (mean) valid_employed valid_student valid_inactive valid_retired /// - [aw = dwt], by(year dgn) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_employed sim_student sim_inactive sim_retired dgn dag using /// - "$dir_data/simulated_data.dta", clear - -drop if dag > 60 - -gen dgn_coded = . -replace dgn_coded = 1 if dgn == "Male" -replace dgn_coded = 0 if dgn == "Female" - -drop dgn -rename dgn_coded dgn - -collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// - by(run year dgn) -collapse (mean) sim_employed sim_student sim_inactive sim_retired /// - (sd) sim_employed_sd = sim_employed /// - sim_student_sd = sim_student /// - sim_inactive_sd = sim_inactive /// - sim_retired_sd = sim_retired /// - , by(year dgn) - -foreach varname in sim_employed sim_student sim_inactive sim_retired { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year dgn using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Females -keep if dgn == 0 - -* Plot figure -twoway /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(1 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(2 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(3 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(4 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(5 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(6 "Retired, SILC"))), /// - title("Activity status of those not employed") /// - subtitle("Ages ${min_age}-60") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/economic_activity/validation_${country}_activity_status_not_employed_ts_${min_age}_60_female.jpg", /// - replace width(2400) height(1350) quality(100) - - -******************************************************************************** -* 1.3.3 : Not employed, All ages -******************************************************************************** -******************************************************************************** -* 1.3.3.1 : Not employed, All ages, All -******************************************************************************** - -* Prepare validation data -use year dwt valid_employed valid_student valid_inactive dgn /// - valid_retired using /// - "$dir_data/${country}-eusilc_validation_full_sample_long.dta", /// - clear - -collapse (mean) valid_employed valid_student valid_inactive valid_retired /// - [aw = dwt], by(year dgn) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_employed sim_student sim_inactive sim_retired dgn using /// - "$dir_data/simulated_data_full.dta", clear - -gen dgn_coded = . -replace dgn_coded = 1 if dgn == "Male" -replace dgn_coded = 0 if dgn == "Female" - -drop dgn -rename dgn_coded dgn - -collapse (mean) sim_employed sim_student sim_inactive sim_retired, /// - by(run year dgn) - -collapse (mean) sim_employed sim_student sim_inactive sim_retired /// - (sd) sim_employed_sd = sim_employed /// - sim_student_sd = sim_student /// - sim_inactive_sd = sim_inactive /// - sim_retired_sd = sim_retired /// - , by(year dgn) - -foreach varname in sim_employed sim_student sim_inactive sim_retired { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year dgn using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -** All - -preserve - -collapse (mean) sim* valid*, by(year) - -* Plot figure -twoway /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(1 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(2 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(3 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(4 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(5 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(6 "Retired, SILC"))), /// - title("Activity status") /// - subtitle("All ages") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/economic_activity/validation_${country}_activity_status_not_employed_ts_all_both.jpg", /// - replace width(2400) height(1350) quality(100) - -restore, preserve - -******************************************************************************** -* 1.3.3.2 : Not employed, All ages, By gender -******************************************************************************** - -keep if dgn == 1 - -* Plot figure -twoway /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(1 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(2 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(3 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(4 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(5 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(6 "Retired, SILC"))), /// - title("Activity status") /// - subtitle("All ages, males") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/economic_activity/validation_${country}_activity_status_not_employed_ts_all_male.jpg", /// - replace width(2400) height(1350) quality(100) - - -restore, preserve - - -** Females - -keep if dgn == 0 - -* Plot figure -twoway /// -(rarea sim_student_high sim_student_low year, sort color(blue%20) /// - legend(label(1 "Students, simulated"))) /// -(line valid_student year, sort color(blue) /// - legend(label(2 "Students, SILC"))) /// -(rarea sim_inactive_high sim_inactive_low year, sort color(red%20) /// - legend(label(3 "Non-employed, simulated"))) /// -(line valid_inactive year, sort color(red) /// - legend(label(4 "Non-employed, SILC"))) /// -(rarea sim_retired_high sim_retired_low year, sort color(grey%20) /// - legend(label(5 "Retired, simulated"))) /// -(line valid_retired year, sort color(grey) /// - legend(label(6 "Retired, SILC"))), /// - title("Activity status") /// - subtitle("All ages, females") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Non-employed includes the unemployed and inactive (homemakers, incapacity, carers, discouraged workers etc.)" "minus students and retired. ", /// - size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/economic_activity/validation_${country}_activity_status_not_employed_ts_all_female.jpg", /// - replace width(2400) height(1350) quality(100) - -restore - - -******************************************************************************** -* 1.3.4 : Students, Adults (15+) -******************************************************************************** -******************************************************************************** -* 1.3.4.1 : Students, Adults (15+), By age group -******************************************************************************** - -* Prepare validation data -use year dwt dgn ageGroup valid_student valid_employed using /// - "$dir_data/${country}-eusilc_validation_full_sample_long.dta", clear - -gen student = valid_student - -drop if ageGroup == 0 | ageGroup == 8 - -collapse (mean) student [aweight=dwt], by(ageGroup year) -drop if missing(ageGroup) -reshape wide student, i(year) j(ageGroup) - -forvalues i = 1(1)7 { - rename student`i' student_`i'_valid -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year sim_sex ageGroup sim_student using /// - "$dir_data/simulated_data_full.dta", clear - -gen student = sim_student - -collapse (mean) student, by(ageGroup run year) -drop if missing(ageGroup) -reshape wide student , i(year run) j(ageGroup) - -forvalues i = 1(1)7{ - rename student`i' student_`i'_sim -} - -collapse (mean) student* /// - (sd) sd_student_1_sim = student_1_sim /// - sd_student_2_sim = student_2_sim /// - sd_student_3_sim = student_3_sim /// - sd_student_4_sim = student_4_sim /// - sd_student_5_sim = student_5_sim /// - sd_student_6_sim = student_6_sim /// - sd_student_7_sim = student_7_sim /// - , by(year) - - /* sd_student_8_sim=student_8_sim /// - sd_employed_f_8_sim=employed_f_8_sim /// - sd_employed_m_8_sim=employed_m_8_sim /// */ - -forvalues i = 1(1)7 { - gen student_`i'_sim_high = student_`i'_sim + 1.96*sd_student_`i'_sim - gen student_`i'_sim_low = student_`i'_sim - 1.96*sd_student_`i'_sim -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figures -foreach vble in "student" { - - twoway (rarea `vble'_1_sim_high `vble'_1_sim_low year, /// - sort color(green%20) /// - legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_1_valid year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Age 15-19") name(`vble'_1, replace) ylabel(0 [0.5] 1) /// - graphregion(color(white)) xtitle("") - - twoway (rarea `vble'_2_sim_high `vble'_2_sim_low year, /// - sort color(green%20) /// - legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_2_valid year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Age 20-24") name(`vble'_2, replace) ylabel(0 [0.5] 1) /// - graphregion(color(white)) xtitle("") - - twoway (rarea `vble'_3_sim_high `vble'_3_sim_low year, /// - sort color(green%20) /// - legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_3_valid year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Age 25-29") name(`vble'_3, replace) ylabel(0 [0.5] 1) /// - graphregion(color(white)) xtitle("") - - twoway (rarea `vble'_4_sim_high `vble'_4_sim_low year, /// - sort color(green%20) /// - legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_4_valid year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Age 30-34") name(`vble'_4, replace) ylabel(0 [0.5] 1) /// - graphregion(color(white)) xtitle("") - - twoway (rarea `vble'_5_sim_high `vble'_5_sim_low year, /// - sort color(green%20) /// - legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_5_valid year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Age 35-39") name(`vble'_5, replace) ylabel(0 [0.5] 1) /// - graphregion(color(white)) xtitle("") - - twoway (rarea `vble'_6_sim_high `vble'_6_sim_low year, /// - sort color(green%20) /// - legend(label(1 "Simulated") position(6) /// - rows(1)))(line `vble'_6_valid year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Age 40-59") name(`vble'_6, replace) ylabel(0 [0.5] 1) /// - graphregion(color(white)) xtitle("") - - twoway (rarea `vble'_7_sim_high `vble'_7_sim_low year, /// - sort color(green%20) /// - legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_7_valid year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Age 60-79") name(`vble'_7, replace) ylabel(0 [0.5] 1) /// - graphregion(color(white)) xtitle("") - -} - -* Save figures - -* Share students -grc1leg student_1 student_2 student_3 student_4 student_5 student_6 /// - student_7 , title("Share of students by age") legendfrom(student_1) /// - graphregion(color(white)) /// - note("Notes: ", size(vsmall)) - -graph export /// -"$dir_output_files/economic_activity/validation_${country}_students_ts_all_both.jpg", /// - replace width(2400) height(1350) quality(100) - -grc1leg student_1 student_2 student_3 , /// - title("Share of students by age") legendfrom(student_1) /// - graphregion(color(white)) /// - note("Notes: ", size(vsmall)) - -graph export /// - "$dir_output_files/economic_activity/validation_${country}_students_ts_15_29_both.jpg", /// - replace width(2400) height(1350) quality(100) - -graph drop _all diff --git a/validation/02_simulation_validation/do_files/06_02_plot_education_level.do b/validation/02_simulation_validation/do_files/06_02_plot_education_level.do deleted file mode 100644 index 9e87ac6..0000000 --- a/validation/02_simulation_validation/do_files/06_02_plot_education_level.do +++ /dev/null @@ -1,632 +0,0 @@ -/******************************************************************************* -* PROJECT: SimPaths EU -* SECTION: Validation -* OBJECT: Education -* AUTHORS: Patryk Bronka, Ashley Burdett -* LAST UPDATE: 11/2025 (AB) -* COUNTRY: Poland -******************************************************************************** -* NOTES: This do file plots simulated and observed education. -* -*******************************************************************************/ - -******************************************************************************** -* 1 : Mean values over time -******************************************************************************** -******************************************************************************** -* 1.1 : Mean values over time - Educational attainment -******************************************************************************** -******************************************************************************** -* 1.1.1 : Young people (18-30) -******************************************************************************** -******************************************************************************** -* 1.1.1.1 : Young people (18-30), All -******************************************************************************** - -* Prepare validation data -use year dwt valid_edu_high valid_edu_med valid_edu_low dag using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -drop if dag > 30 - -drop if valid_edu_high == 0 & valid_edu_med == 0 & valid_edu_low == 0 - // accounts for missing values - -collapse (mean) valid_edu_high valid_edu_med valid_edu_low [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_edu_high sim_edu_med sim_edu_low dag using /// - "$dir_data/simulated_data.dta", clear - -drop if dag > 30 - -collapse (mean) sim_edu_high sim_edu_med sim_edu_low, by(run year) - -collapse (mean) sim_edu_high sim_edu_med sim_edu_low /// - (sd) sim_edu_high_sd = sim_edu_high /// - sim_edu_med_sd = sim_edu_med /// - sim_edu_low_sd = sim_edu_low /// - , by(year) - -foreach varname in sim_edu_high sim_edu_med sim_edu_low { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway /// -(rarea sim_edu_high_high sim_edu_high_low year, sort color(green%20) /// - legend(label(1 "High education, simulated"))) /// -(line valid_edu_high year, sort color(green) /// - legend(label(2 "High education, observed"))) /// -(rarea sim_edu_med_high sim_edu_med_low year, sort color(blue%20) /// - legend(label(3 "Medium education, simulated"))) /// -(line valid_edu_med year, sort color(blue) /// - legend(label(4 "Medium education, observed"))) /// -(rarea sim_edu_low_high sim_edu_low_low year, sort color(red%20) /// - legend(label(5 "Low education, simulated"))) /// -(line valid_edu_low year, sort color(red) /// - legend(label(6 "Low education, observed"))), /// - title("Educational attainment") /// - subtitle("Ages ${min_age}-30") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Not required to have left initial education spell.", /// - size(vsmall)) - -graph export /// -"$dir_output_files/education/validation_${country}_education_ts_${min_age}_30_both.jpg", /// - replace width(2400) height(1350) quality(100) - - -******************************************************************************** -* 1.1.1.2 : Young people (18-30), Non-students -******************************************************************************** -* Prepare validation data -use year dwt valid_edu_high valid_edu_med valid_edu_low dag les_c3 using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -drop if dag > 30 - -drop if les_c3 == 2 - -drop if valid_edu_high == 0 & valid_edu_med == 0 & valid_edu_low == 0 - // accounts for missing values - -collapse (mean) valid_edu_high valid_edu_med valid_edu_low [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_edu_high sim_edu_med sim_edu_low dag les_c4 using /// - "$dir_data/simulated_data.dta", clear - -drop if dag > 30 -drop if les_c4 == "Student" - -collapse (mean) sim_edu_high sim_edu_med sim_edu_low, by(run year) - -collapse (mean) sim_edu_high sim_edu_med sim_edu_low /// - (sd) sim_edu_high_sd = sim_edu_high /// - sim_edu_med_sd = sim_edu_med /// - sim_edu_low_sd = sim_edu_low /// - , by(year) - -foreach varname in sim_edu_high sim_edu_med sim_edu_low { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway /// -(rarea sim_edu_high_high sim_edu_high_low year, sort color(green%20) /// - legend(label(1 "High education, simulated"))) /// -(line valid_edu_high year, sort color(green) /// - legend(label(2 "High education, observed"))) /// -(rarea sim_edu_med_high sim_edu_med_low year, sort color(blue%20) /// - legend(label(3 "Medium education, simulated"))) /// -(line valid_edu_med year, sort color(blue) /// - legend(label(4 "Medium education, observed"))) /// -(rarea sim_edu_low_high sim_edu_low_low year, sort color(red%20) /// - legend(label(5 "Low education, simulated"))) /// -(line valid_edu_low year, sort color(red) /// - legend(label(6 "Low education, observed"))), /// - title("Educational attainment, no students") /// - subtitle("Ages ${min_age}-30") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: ", /// - size(vsmall)) - -graph export /// -"$dir_output_files/education/validation_${country}_education_ts_${min_age}_30_both_not_student.jpg", /// - replace width(2400) height(1350) quality(100) - - -******************************************************************************** -* 1.1.1.2 : Young people (18-30), By gender -******************************************************************************** - -* Prepare validation data -use year dwt valid_edu_high valid_edu_med valid_edu_low dag dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -drop if dag > 30 - -drop if valid_edu_high == 0 & valid_edu_med == 0 & valid_edu_low == 0 - // accounts for missing values - -collapse (mean) valid_edu_high valid_edu_med valid_edu_low [aw = dwt], /// - by(year dgn) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_edu_high sim_edu_med sim_edu_low dag dgn using /// - "$dir_data/simulated_data.dta", clear - -drop if dag > 30 - -gen dgn2 = 0 if dgn == "Female" -replace dgn2 = 1 if dgn == "Male" - -drop dgn -rename dgn2 dgn - -collapse (mean) sim_edu_high sim_edu_med sim_edu_low, by(run year dgn) - -collapse (mean) sim_edu_high sim_edu_med sim_edu_low /// - (sd) sim_edu_high_sd = sim_edu_high /// - (sd) sim_edu_med_sd = sim_edu_med /// - (sd) sim_edu_low_sd = sim_edu_low /// - , by(year dgn) - -foreach varname in sim_edu_high sim_edu_med sim_edu_low { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year dgn using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure - female -twoway /// -(rarea sim_edu_high_high sim_edu_high_low year if dgn == 0, /// - sort color(green%20) legend(label(1 "High education, simulated"))) /// -(line valid_edu_high year if dgn == 0, sort color(green) /// - legend(label(2 "High education, observed"))) /// -(rarea sim_edu_med_high sim_edu_med_low year if dgn == 0, /// - sort color(blue%20) legend(label(3 "Medium education, simulated"))) /// -(line valid_edu_med year if dgn == 0, sort color(blue) /// - legend(label(4 "Medium education, observed"))) /// -(rarea sim_edu_low_high sim_edu_low_low year if dgn == 0, sort color(red%20) /// - legend(label(5 "Low education, simulated"))) /// -(line valid_edu_low year if dgn == 0, sort color(red) /// - legend(label(6 "Low education, observed"))), /// - title("Educational attainment") /// - subtitle("Ages ${min_age}-30, females") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Not required to have left initial education spell.", /// - size(vsmall)) - -graph export /// -"$dir_output_files/education/validation_${country}_education_ts_${min_age}_30_female.jpg", /// - replace width(2400) height(1350) quality(100) - -* Plot figure - male -twoway /// -(rarea sim_edu_high_high sim_edu_high_low year if dgn == 1, /// - sort color(green%20) legend(label(1 "High education, simulated"))) /// -(line valid_edu_high year if dgn == 1, sort color(green) /// - legend(label(2 "High education, observed"))) /// -(rarea sim_edu_med_high sim_edu_med_low year if dgn == 1, /// - sort color(blue%20) legend(label(3 "Medium education, simulated"))) /// -(line valid_edu_med year if dgn == 1, sort color(blue) /// - legend(label(4 "Medium education, observed"))) /// -(rarea sim_edu_low_high sim_edu_low_low year if dgn == 1, sort color(red%20) /// - legend(label(5 "Low education, simulated"))) /// -(line valid_edu_low year if dgn == 1, sort color(red) /// - legend(label(6 "Low education, observed"))), /// - title("Educational attainment") /// - subtitle("Ages ${min_age}-30, males") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Not required to have left initial education spell.", /// - size(vsmall)) - -graph export /// -"$dir_output_files/education/validation_${country}_education_ts_${min_age}_30_male.jpg", /// - replace width(2400) height(1350) quality(100) - -******************************************************************************** -* 1.1.2 : Working age (18-65) -******************************************************************************** -******************************************************************************** -* 1.1.2.1 : Working age (18-65), All -******************************************************************************** - -* Prepare validation data -use year dwt valid_edu_high valid_edu_med valid_edu_low using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Get rid of observations with missing values -drop if valid_edu_high == 0 & valid_edu_med == 0 & valid_edu_low == 0 - -collapse (mean) valid_edu_high valid_edu_med valid_edu_low [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_edu_high sim_edu_med sim_edu_low using /// - "$dir_data/simulated_data.dta", clear - -collapse (mean) sim_edu_high sim_edu_med sim_edu_low, by(run year) -collapse (mean) sim_edu_high sim_edu_med sim_edu_low /// - (sd) sim_edu_high_sd = sim_edu_high /// - sim_edu_med_sd = sim_edu_med /// - sim_edu_low_sd = sim_edu_low /// - , by(year) - -foreach varname in sim_edu_high sim_edu_med sim_edu_low { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway /// -(rarea sim_edu_high_high sim_edu_high_low year, sort color(green%20) /// - legend(label(1 "High education, simulated"))) /// -(line valid_edu_high year, sort color(green) /// - legend(label(2 "High education, observed"))) /// -(rarea sim_edu_med_high sim_edu_med_low year, sort color(blue%20) /// - legend(label(3 "Medium education, simulated"))) /// -(line valid_edu_med year, sort color(blue) /// - legend(label(4 "Medium education, observed"))) /// -(rarea sim_edu_low_high sim_edu_low_low year, sort color(red%20) /// - legend(label(5 "Low education, simulated"))) /// -(line valid_edu_low year, sort color(red) /// - legend(label(6 "Low education, observed"))), /// - title("Educational attainment") /// - subtitle("Ages ${min_age}-${max_age}") /// - xtitle("Year") /// - ytitle("Share") /// - graphregion(color(white)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - legend(size(small)) /// - note("Notes: Not required to have left initial education spell.", /// - size(vsmall)) - -graph export /// -"$dir_output_files/education/validation_${country}_education_ts_${min_age}_${max_age}_both.jpg", /// - replace width(2400) height(1350) quality(100) - - -******************************************************************************** -* 1.1.2.1 : Working age (18-65), By gender -******************************************************************************** - -* Prepare validation data -use year dwt valid_edu_high valid_edu_med valid_edu_low dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Get rid of observations with missing values -drop if valid_edu_high == 0 & valid_edu_med == 0 & valid_edu_low == 0 - -collapse (mean) valid_edu_high valid_edu_med valid_edu_low [aw = dwt], /// - by(year dgn) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_edu_high sim_edu_med sim_edu_low dgn using /// - "$dir_data/simulated_data.dta", clear - -gen dgn2 = 0 if dgn == "Female" -replace dgn2 = 1 if dgn == "Male" - -drop dgn -rename dgn2 dgn - -collapse (mean) sim_edu_high sim_edu_med sim_edu_low, by(run year dgn) - -collapse (mean) sim_edu_high sim_edu_med sim_edu_low /// - (sd) sim_edu_high_sd = sim_edu_high /// - (sd) sim_edu_med_sd = sim_edu_med /// - (sd) sim_edu_low_sd = sim_edu_low /// - , by(year dgn ) - -foreach varname in sim_edu_high sim_edu_med sim_edu_low { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year dgn using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure - female -twoway /// -(rarea sim_edu_high_high sim_edu_high_low year if dgn == 0, /// - sort color(green%20) legend(label(1 "High education, simulated"))) /// -(line valid_edu_high year if dgn == 0, sort color(green) /// - legend(label(2 "High education, observed"))) /// -(rarea sim_edu_med_high sim_edu_med_low year if dgn == 0, /// - sort color(blue%20) legend(label(3 "Medium education, simulated"))) /// -(line valid_edu_med year if dgn == 0, sort color(blue) /// - legend(label(4 "Medium education, observed"))) /// -(rarea sim_edu_low_high sim_edu_low_low year if dgn == 0, sort color(red%20) /// - legend(label(5 "Low education, simulated"))) /// -(line valid_edu_low year if dgn == 0, sort color(red) /// - legend(label(6 "Low education, observed"))), /// - title("Educational attainment") /// - subtitle("Ages ${min_age}-${max_age}, females") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Not required to have left initial education spell.", /// - size(vsmall)) - -graph export /// -"$dir_output_files/education/validation_${country}_education_ts_${min_age}_${max_age}_female.jpg", /// - replace width(2400) height(1350) quality(100) - -* Plot figure - male -twoway /// -(rarea sim_edu_high_high sim_edu_high_low year if dgn == 1, /// - sort color(green%20) legend(label(1 "High education, simulated"))) /// -(line valid_edu_high year if dgn == 1, sort color(green) /// - legend(label(2 "High education, observed"))) /// -(rarea sim_edu_med_high sim_edu_med_low year if dgn == 1, /// - sort color(blue%20) legend(label(3 "Medium education, simulated"))) /// -(line valid_edu_med year if dgn == 1, sort color(blue) /// - legend(label(4 "Medium education, observed"))) /// -(rarea sim_edu_low_high sim_edu_low_low year if dgn == 1, sort color(red%20) /// - legend(label(5 "Low education, simulated"))) /// -(line valid_edu_low year if dgn == 1, sort color(red) /// - legend(label(6 "Low education, observed"))), /// - title("Educational attainment") /// - subtitle("Ages ${min_age}-${max_age}, males") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Not required to have left initial education spell.", /// - size(vsmall)) - -graph export /// -"$dir_output_files/education/validation_${country}_education_ts_${min_age}_${max_age}_male.jpg", /// - replace width(2400) height(1350) quality(100) - - -******************************************************************************** -* 1.1.3 : Just above retirement age (66-70) -******************************************************************************** -******************************************************************************** -* 1.1.3.1 : Just above retirement age (66-70), All -******************************************************************************** - -* Prepare validation data -use year dwt valid_edu_high valid_edu_med valid_edu_low dag using /// - "$dir_data/${country}-eusilc_validation_full_sample_long.dta", clear - -drop if dag < 66 -drop if dag > 70 - -drop if valid_edu_high == 0 & valid_edu_med == 0 & valid_edu_low == 0 - // accounts for missing values - -collapse (mean) valid_edu_high valid_edu_med valid_edu_low [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_edu_high sim_edu_med sim_edu_low dag using /// - "$dir_data/simulated_data_full.dta", clear - -drop if dag < 66 -drop if dag > 70 - -collapse (mean) sim_edu_high sim_edu_med sim_edu_low, by(run year) - -collapse (mean) sim_edu_high sim_edu_med sim_edu_low /// - (sd) sim_edu_high_sd = sim_edu_high /// - sim_edu_med_sd = sim_edu_med /// - sim_edu_low_sd = sim_edu_low /// - , by(year) - -foreach varname in sim_edu_high sim_edu_med sim_edu_low { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway /// -(rarea sim_edu_high_high sim_edu_high_low year, sort color(green%20) /// - legend(label(1 "High education, simulated"))) /// -(line valid_edu_high year, sort color(green) /// - legend(label(2 "High education, observed"))) /// -(rarea sim_edu_med_high sim_edu_med_low year, sort color(blue%20) /// - legend(label(3 "Medium education, simulated"))) /// -(line valid_edu_med year, sort color(blue) /// - legend(label(4 "Medium education, observed"))) /// -(rarea sim_edu_low_high sim_edu_low_low year, sort color(red%20) /// - legend(label(5 "Low education, simulated"))) /// -(line valid_edu_low year, sort color(red) /// - legend(label(6 "Low education, observed"))), /// - title("Educational attainment") /// - subtitle("Ages 66-70") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note(Notes:, size(vsmall)) - -graph export /// -"$dir_output_files/education/validation_${country}_education_ts_66_70_both.jpg", /// - replace width(2400) height(1350) quality(100) - - -******************************************************************************** -* 1.1.4 : Around retirement age (60-70) -******************************************************************************** -******************************************************************************** -* 1.1.4.1 : Around retirement age (60-70), All -******************************************************************************** - -* Prepare validation data -use year dwt valid_edu_high valid_edu_med valid_edu_low dag using /// - "$dir_data/${country}-eusilc_validation_full_sample_long.dta", clear - -drop if dag < 60 -drop if dag > 70 - -drop if valid_edu_high == 0 & valid_edu_med == 0 & valid_edu_low == 0 - // accounts for missing values - -collapse (mean) valid_edu_high valid_edu_med valid_edu_low [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_edu_high sim_edu_med sim_edu_low dag using /// - "$dir_data/simulated_data_full.dta", clear - -drop if dag < 60 -drop if dag > 70 - -collapse (mean) sim_edu_high sim_edu_med sim_edu_low, by(run year) -collapse (mean) sim_edu_high sim_edu_med sim_edu_low /// - (sd) sim_edu_high_sd = sim_edu_high /// - sim_edu_med_sd = sim_edu_med /// - sim_edu_low_sd = sim_edu_low /// - , by(year) - -foreach varname in sim_edu_high sim_edu_med sim_edu_low { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway /// -(rarea sim_edu_high_high sim_edu_high_low year, sort color(green%20) /// - legend(label(1 "High education, simulated"))) /// -(line valid_edu_high year, sort color(green) /// - legend(label(2 "High education, observed"))) /// -(rarea sim_edu_med_high sim_edu_med_low year, sort color(blue%20) /// - legend(label(3 "Medium education, simulated"))) /// -(line valid_edu_med year, sort color(blue) /// - legend(label(4 "Medium education, observed"))) /// -(rarea sim_edu_low_high sim_edu_low_low year, sort color(red%20) /// - legend(label(5 "Low education, simulated"))) /// -(line valid_edu_low year, sort color(red) /// - legend(label(6 "Low education, observed"))), /// - title("Educational attainment") /// - subtitle("Ages 60-70") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note(Notes:, size(vsmall)) - -graph export /// -"$dir_output_files/education/validation_${country}_education_ts_60_70_both.jpg", /// - replace width(2400) height(1350) quality(100) - - -******************************************************************************** -* 1 : Mean values over time -******************************************************************************** -******************************************************************************** -* 1.1 : Mean values over time - Student age -******************************************************************************** - -use year dwt valid_edu_high valid_edu_med valid_edu_low dag les_c3 using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if les_c3 == 2 - -drop if valid_edu_high == 0 & valid_edu_med == 0 & valid_edu_low == 0 - // accounts for missing values - -rename dag valid_dag - -collapse (mean) valid_dag [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_edu_high sim_edu_med sim_edu_low dag les_c4 using /// - "$dir_data/simulated_data.dta", clear - -keep if les_c4 == "Student" - -collapse (mean) dag, by(run year) - -collapse (mean) dag /// - (sd) dag_sd = dag /// - , by(year) - -foreach varname in dag { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway /// -(rarea dag_high dag_low year, sort color(green%20) /// - legend(label(1 "Simulated"))) /// -(line valid_dag year, sort color(green) /// - legend(label(2 "Observed"))), /// - title("Average age of students") subtitle("") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Sample limited to working age individuals, 16-65", /// - size(vsmall)) - - -graph export /// -"$dir_output_files/education/validation_${country}_education_ts_both_avg_age.jpg", /// - replace width(2400) height(1350) quality(100) - diff --git a/validation/02_simulation_validation/do_files/06_03_plot_gross_income.do b/validation/02_simulation_validation/do_files/06_03_plot_gross_income.do deleted file mode 100644 index f41ebec..0000000 --- a/validation/02_simulation_validation/do_files/06_03_plot_gross_income.do +++ /dev/null @@ -1,896 +0,0 @@ -/******************************************************************************* -* PROJECT: SimPaths EU -* SECTION: Validation -* OBJECT: Gross income -* AUTHORS: Patryk Bronka, Ashley Burdett -* LAST UPDATE: 11/2025 (AB) -* COUNTRY: Poland -******************************************************************************** -* NOTES: Data details: The underlying variable is composed of -* - wages -* - self-employment eanrings/losses -* - private pension income -* - regular inter-hh transfers -* - child earnings (<16)* -* - Income from rental property of land* -* - Interest, dividends, profit from capital investment* - -* Underlying variable measured at the hh level and assumed to -* be spread evenly among adult hh members. -*******************************************************************************/ - -******************************************************************************** -* 1 : Mean values over time -******************************************************************************** -******************************************************************************** -* 1.1 : Mean values over time - By benefit unit -******************************************************************************** - -* Prepare validation data -use year dwt valid_y_gross_bu_yr using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_y_gross_bu_yr, d - - replace valid_y_gross_bu_yr = . if /// - valid_y_gross_bu_yr < r(p1) | valid_y_gross_bu_yr > r(p99) - -} - -collapse (mean) valid_y_gross_bu_yr [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year sim_y_gross_yr_bu laboursupplyweekly using /// - "$dir_data/simulated_data.dta", clear - -* Trim outliers -if "$trim_outliers" == "true" { - - sum sim_y_gross_yr_bu, d - - replace sim_y_gross_yr_bu = . if sim_y_gross_yr_bu < r(p1) | /// - sim_y_gross_yr_bu > r(p99) - -} - -collapse (mean) sim_y_gross_yr_bu, by(run year) - -collapse (mean) sim_y_gross_yr_bu /// - (sd) sim_y_gross_yr_bu_sd = sim_y_gross_yr_bu /// - , by(year) - -foreach varname in sim_y_gross_yr_bu { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - - } - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - - -* Plot figure -twoway /// - (rarea sim_y_gross_yr_bu_high sim_y_gross_yr_bu_low year, /// - sort color(green%20) legend(label(1 "Simulated"))) /// - (line valid_y_gross_bu_yr year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Gross Income") /// - subtitle("Ages ${min_age}-${max_age}") /// - xtitle("Year", size(small)) /// - ytitle("€ per year", size(small)) /// - ylabel(,labsize(small)) /// - xlabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) /// - note("Notes: Series represents average benefit unit gross income without benefits through time. Statistics computed by averaging benefit" "unit-level gross income over all persons ages ${min_age}-${max_age}. Amounts in 2015 prices. Top and bottom percentiles trimmed.", /// - size(vsmall)) - -graph export /// -"$dir_output_files/income/gross_income/validation_${country}_gross_income_ts_${min_age}_${max_age}_both.jpg", /// - replace width(2400) height(1350) quality(100) - - -******************************************************************************** -* 1.2 : Mean values over time - By individual -******************************************************************************** -******************************************************************************** -* 1.2.1 : Mean values over time - By individual, All -******************************************************************************** - -* Prepare validation data -use year dwt valid_y_gross_ind_yr using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_y_gross_ind_yr, d - - replace valid_y_gross_ind_yr = . if /// - valid_y_gross_ind_yr < r(p1) | /// - valid_y_gross_ind_yr > r(p99) - -} - -collapse (mean) valid_y_gross_ind_yr [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year sim_y_gross_yr using "$dir_data/simulated_data.dta", clear - -* Trim outliers -if "$trim_outliers" == "true" { - - sum sim_y_gross_yr, d - - replace sim_y_gross_yr = . if /// - sim_y_gross_yr < r(p1) | sim_y_gross_yr > r(p99) - - } - -collapse (mean) sim_y_gross_yr, by(run year) - -collapse (mean) sim_y_gross_yr /// - (sd) sim_y_gross_yr_sd = sim_y_gross_yr /// - , by(year) - -foreach varname in sim_y_gross_yr { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway /// -(rarea sim_y_gross_yr_high sim_y_gross_yr_low year, sort color(green%20) /// - legend(label(1 "Simulated"))) /// -(line valid_y_gross_ind_yr year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Individual Gross Income") /// - xtitle("Year", size(small)) /// - ytitle("€ per year", size(small)) /// - ylabel(,labsize(small)) /// - xlabel(,labsize(vsmall)) /// - legend(size(small)) /// - graphregion(color(white)) /// - note("Notes: Series represents average individual gross income without benefits through time. Statistics computed by averaging" "person-level gross income over all persons ages ${min_age}-${max_age}. Values in 2015 prices. Top and bottom percentiles trimmed.", /// - size(vsmall)) - -graph export /// -"$dir_output_files/income/gross_income/validation_${country}_ind_gross_income_ts_${min_age}_${max_age}_both.jpg", /// - replace width(2400) height(1350) quality(100) - - -******************************************************************************** -* 1.2.2 : Mean values over time - By individual, By gender -******************************************************************************** -* Prepare validation data -use year dwt valid_y_gross_ind_yr dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if dgn == 1 - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_y_gross_ind_yr, d - - replace valid_y_gross_ind_yr = . if /// - valid_y_gross_ind_yr < r(p1) | /// - valid_y_gross_ind_yr > r(p99) - -} - -collapse (mean) valid_y_gross_ind_yr [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_y_gross_yr dgn using "$dir_data/simulated_data.dta", clear - -keep if dgn == "Male" - -* Trim outliers -if "$trim_outliers" == "true" { - - sum sim_y_gross_yr, d - - replace sim_y_gross_yr = . if /// - sim_y_gross_yr < r(p1) | sim_y_gross_yr > r(p99) - -} - -collapse (mean) sim_y_gross_yr, by(run year) - -collapse (mean) sim_y_gross_yr /// - (sd) sim_y_gross_yr_sd = sim_y_gross_yr /// - , by(year) - -foreach varname in sim_y_gross_yr { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway /// -(rarea sim_y_gross_yr_high sim_y_gross_yr_low year, sort color(green%20) /// - legend(label(1 "Simulated"))) /// -(line valid_y_gross_ind_yr year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Individual Gross Income") /// - subtitle("Males") /// - xtitle("Year", size(small)) /// - ytitle("€ per year", size(small)) /// - ylabel(,labsize(small)) /// - xlabel(,labsize(vsmall)) /// - legend(size(small)) /// - graphregion(color(white)) /// - note("Notes: Series represents average individual gross income without benefits through time. Statistics computed by averaging" "person-level gross income over all persons ages ${min_age}-${max_age}. Values in 2015 prices. Top and bottom percentiles trimmed.", /// - size(vsmall)) - -graph export /// -"$dir_output_files/income/gross_income/validation_${country}_ind_gross_income_ts_${min_age}_${max_age}_male.jpg", /// - replace width(2400) height(1350) quality(100) - - -* Female -* Prepare validation data -use year dwt valid_y_gross_ind_yr dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if dgn == 0 - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_y_gross_ind_yr, d - - replace valid_y_gross_ind_yr = . if /// - valid_y_gross_ind_yr < r(p1) | /// - valid_y_gross_ind_yr > r(p99) - -} - -collapse (mean) valid_y_gross_ind_yr [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_y_gross_yr dgn using "$dir_data/simulated_data.dta", clear - -keep if dgn == "Female" - -* Trim outliers -if "$trim_outliers" == "true" { - - sum sim_y_gross_yr, d - - replace sim_y_gross_yr = . if /// - sim_y_gross_yr < r(p1) | sim_y_gross_yr > r(p99) - -} - -collapse (mean) sim_y_gross_yr, by(run year) - -collapse (mean) sim_y_gross_yr /// - (sd) sim_y_gross_yr_sd = sim_y_gross_yr /// - , by(year) - -foreach varname in sim_y_gross_yr { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway /// -(rarea sim_y_gross_yr_high sim_y_gross_yr_low year, sort color(green%20) /// - legend(label(1 "Simulated"))) /// -(line valid_y_gross_ind_yr year, sort color(green) /// - legend(label(2 "SILC"))), //// - title("Individual Gross Income") /// - subtitle("Females") /// - xtitle("Year", size(small)) /// - ytitle("€ per year", size(small)) /// - ylabel(,labsize(small)) /// - xlabel(,labsize(vsmall)) /// - legend(size(small)) /// - graphregion(color(white)) /// - note("Notes: Series represents average individual gross income without benefits through time. Statistics computed by averaging" "person-level gross income over all persons ages ${min_age}-${max_age}. Values in 2015 prices. Top and bottom percentiles trimmed.", /// - size(vsmall)) - -graph export /// -"$dir_output_files/income/gross_income/validation_${country}_ind_gross_income_ts_${min_age}_${max_age}_female.jpg", /// - replace width(2400) height(1350) quality(100) - - -/******************************************************************************* -* 2 : Histograms -*******************************************************************************/ - -/******************************************************************************* -* 2.1 : Histograms - Benefit unit gross income by year, and by category of -weekly labour supply -*******************************************************************************/ - -* Prepare validation data -use year dwt valid_y_gross_bu_yr laboursupplyweekly_hu using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_y_gross_bu_yr, d - - replace valid_y_gross_bu_yr = . if /// - valid_y_gross_bu_yr < r(p1) | valid_y_gross_bu_yr > r(p99) - -} - -* Prepare info needed for dynamic y axis labels -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - twoway__histogram_gen valid_y_gross_bu_yr if year == `year', /// - width(750) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year' = r(max) - - drop d_valid v2 - - foreach ls in $ls_cat { - - twoway__histogram_gen valid_y_gross_bu_yr if /// - year == `year' & labour == "`ls'", width(750) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year'_`ls' = r(max) - - drop d_valid v2 - - } - -} - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_y_gross_yr_bu laboursupplyweekly using /// - "$dir_data/simulated_data.dta", clear - -* Trim outliers -if "$trim_outliers" == "true" { - - sum sim_y_gross_yr_bu, d - - replace sim_y_gross_yr_bu = . if /// - sim_y_gross_yr_bu < r(p1) | sim_y_gross_yr_bu > r(p99) - -} - -keep if run == 1 - -append using "$dir_data/temp_valid_stats.dta" - -* Plot sub-figures -qui sum year -local min_year = 2011 -local max_year = r(max) - -//local year = 2011 - -forval year = `min_year'/`max_year' { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen sim_y_gross_yr if year == `year', width(750) /// - den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/3 - - - twoway (hist sim_y_gross_yr_bu if year == `year', width(750) /// - color(green%30) legend(label(1 "Simulated"))) /// - (hist valid_y_gross_bu_yr if year == `year', width(750) /// - color(red%30) legend(label(2 "SILC"))) , /// - title("ALL hours") /// - name(gross_inc_`year'_all, replace) /// - xtitle("€ per year", size(small)) /// - ytitle("", size(small)) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y',labsize(vsmall)) /// - legend(size(small)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - - foreach ls in $ls_cat { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen sim_y_gross_yr if year == `year' & /// - laboursupplyweekly == "`ls'", width(750) den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year'_`ls' if /// - max_d_valid_`year'_`ls' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/3 - - * Plot by weekly hours work - twoway (hist sim_y_gross_yr_bu if year == `year' & /// - laboursupplyweekly == "`ls'", width(750) color(green%30) /// - legend(label(1 "Simulated"))) /// - (hist valid_y_gross_bu_yr if year == `year' & /// - laboursupplyweekly_hu == "`ls'", width(750) color(red%30) /// - legend(label(2 "SILC"))) , /// - title("`ls' hours") /// - name(gross_inc_`year'_`ls', replace) /// - xtitle("€ per year", size(small)) /// - ytitle("", size(small)) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y',labsize(vsmall)) /// - legend(size(small)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - } -} - -* Combine plots by year -qui sum year -local min_year = 2011 -local max_year = r(max) - -forvalues year = `min_year'/`max_year' { - - grc1leg gross_inc_`year'_all gross_inc_`year'_ZERO /// - gross_inc_`year'_TWENTY /// - gross_inc_`year'_FORTY gross_inc_`year'_FIFTY, /// - title("Gross Income By Weekly Hours of Work") /// - subtitle("`year'") /// - legendfrom(gross_inc_`year'_all) rows(2) /// - graphregion(color(white)) /// - note("Notes: Series represents average benefit unit gross income without benefits through time. Statistics computed by averaging benefit" "unit-level gross income for all males aged ${min_age}-${max_age}. Values in € per year, 2015 prices. Weekly hours worked categories:" "ZERO = 0, TWENTY = [1,39], FORTY = 40, FIFTY = 41+. . Top and bottom percentiles trimmed.", /// - size(vsmall)) - - graph export "$dir_output_files/income/gross_income/validation_${country}_gross_income_dist_`year'.png", /// - replace width(2400) height(1350) - -} - -graph drop _all - - -/******************************************************************************* -* 2.2 : Histograms - Individual gross income by year, and by category of weekly -labour supply -*******************************************************************************/ -/******************************************************************************* -* 2.2.1 : Histograms - Individual gross income by year, and by category of weekly -labour supply, By gender -*******************************************************************************/ - -* Males - -* Prepare validation data -use year dwt valid_y_gross_ind_yr laboursupplyweekly_hu dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if dgn == 1 -drop dgn - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_y_gross_ind_yr, d - - replace valid_y_gross_ind_yr = . if /// - valid_y_gross_ind_yr < r(p1) | /// - valid_y_gross_ind_yr > r(p99) - -} - - -* Prepare info needed for dynamic y axis labels -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - twoway__histogram_gen valid_y_gross_ind_yr if year == `year' , /// - width(750) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year' = r(max) - - drop d_valid v2 - - foreach ls in $ls_cat { - - twoway__histogram_gen valid_y_gross_ind_yr if /// - year == `year' & labour == "`ls'", width(750) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year'_`ls' = r(max) - - drop d_valid v2 - - } -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year sim_y_gross_yr laboursupplyweekly dgn using /// - "$dir_data/simulated_data.dta", clear - -keep if dgn == "Male" -drop dgn - -* Trim outliers -if "$trim_outliers" == "true" { - - sum sim_y_gross_yr, d - - replace sim_y_gross_yr = . if sim_y_gross_yr < r(p1) | /// - sim_y_gross_yr > r(p99) - -} - -keep if run == 1 - -append using "$dir_data/temp_valid_stats.dta" - -* Plot sub-figures -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen sim_y_gross_yr if year == `year', width(750) /// - den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/3 - - * Plot all hours - twoway (hist sim_y_gross_yr if year == `year', width(750) /// - color(green%30) legend(label(1 "Simulated"))) /// - (hist valid_y_gross_ind_yr if year == `year', width(750) /// - color(red%30) legend(label(2 "SILC"))) , /// - title("ALL hours") /// - name(ind_gross_inc_`year'_all, replace) /// - xtitle("€ per year", size(small)) /// - ytitle("", size(small)) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y', labsize(vsmall)) /// - legend(size(small)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - foreach ls in $ls_cat { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen sim_y_gross_yr if year == `year' & /// - laboursupplyweekly == "`ls'", width(750) den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year'_`ls' if /// - max_d_valid_`year'_`ls' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/3 - - * Plot by weekly hours work - twoway (hist sim_y_gross_yr if year == `year' & /// - laboursupplyweekly == "`ls'", width(750) color(green%30) /// - legend(label(1 "Simulated"))) /// - (hist valid_y_gross_ind_yr if year == `year' & /// - laboursupplyweekly_hu == "`ls'", width(750) color(red%30) /// - legend(label(2 "SILC"))) , /// - title("`ls' hours") /// - name(ind_gross_inc_`year'_`ls', replace) /// - xtitle("€ per year", size(small)) /// - ytitle("", size(small)) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y', labsize(vsmall)) /// - legend(size(small)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - } -} - -* Combine plots by year -qui sum year -local min_year = 2011 -local max_year = r(max) - -forvalues year = `min_year'/`max_year' { - - grc1leg ind_gross_inc_`year'_all ind_gross_inc_`year'_ZERO /// - ind_gross_inc_`year'_TWENTY /// - ind_gross_inc_`year'_FORTY ind_gross_inc_`year'_FIFTY, /// - title("Individual Gross Income By Weekly Hours of Work") /// - subtitle("`year', Males") /// - legendfrom(ind_gross_inc_`year'_all) rows(2) /// - graphregion(color(white)) /// - note("Notes: Series represents average individual gross income without benefits through time. Statistics computed by averaging individual" "level gross income for all males aged ${min_age}-${max_age}. Values in € per year, 2015 prices. Sample trimmed. Weekly hours worked categories:" "ZERO = 0, TWENTY = [1,39], FORTY = 40, FIFTY = 41+. Top and bottom percentiles trimmed.", /// - size(vsmall)) - - graph export "$dir_output_files/income/gross_income/validation_${country}_ind_gross_income_dist_`year'_male.png", /// - replace width(2400) height(1350) - -} - -graph drop _all - - -* Females - -* Prepare validation data -use year dwt valid_y_gross_ind_yr laboursupplyweekly_hu dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if dgn == 0 -drop dgn - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_y_gross_ind_yr, d - - replace valid_y_gross_ind_yr = . if /// - valid_y_gross_ind_yr < r(p1) | /// - valid_y_gross_ind_yr > r(p99) - -} - - -* Prepare info needed for dynamic y axis labels -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - twoway__histogram_gen valid_y_gross_ind_yr if year == `year' , /// - width(750) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year' = r(max) - - drop d_valid v2 - - foreach ls in $ls_cat { - - twoway__histogram_gen valid_y_gross_ind_yr if /// - year == `year' & labour == "`ls'", width(750) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year'_`ls' = r(max) - - drop d_valid v2 - - } -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year sim_y_gross_yr laboursupplyweekly dgn using /// - "$dir_data/simulated_data.dta", clear - -keep if dgn == "Female" -drop dgn - -* Trim outliers -if "$trim_outliers" == "true" { - - sum sim_y_gross_yr, d - - replace sim_y_gross_yr = . if sim_y_gross_yr < r(p1) | /// - sim_y_gross_yr > r(p99) - -} - -keep if run == 1 - - -append using "$dir_data/temp_valid_stats.dta" - -* Plot sub-figures -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen sim_y_gross_yr if year == `year', width(750) /// - den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/3 - - * Plot all hours - twoway (hist sim_y_gross_yr if year == `year', width(750) /// - color(green%30) legend(label(1 "Simulated"))) /// - (hist valid_y_gross_ind_yr if year == `year', width(750) /// - color(red%30) legend(label(2 "SILC"))) , /// - title("ALL hours") /// - name(ind_gross_inc_`year'_all, replace) /// - xtitle("€ per year", size(small)) /// - ytitle("", size(small)) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y', labsize(vsmall)) /// - legend(size(small)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - foreach ls in $ls_cat { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen sim_y_gross_yr if year == `year' & /// - laboursupplyweekly == "`ls'", width(750) den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year'_`ls' if /// - max_d_valid_`year'_`ls' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/3 - - * Plot by weekly hours work - twoway (hist sim_y_gross_yr if year == `year' & /// - laboursupplyweekly == "`ls'", width(750) color(green%30) /// - legend(label(1 "Simulated"))) /// - (hist valid_y_gross_ind_yr if year == `year' & /// - laboursupplyweekly_hu == "`ls'", width(750) color(red%30) /// - legend(label(2 "SILC"))) , /// - title("`ls' hours") /// - name(ind_gross_inc_`year'_`ls', replace) /// - xtitle("€ per year", size(small)) /// - ytitle("", size(small)) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y', labsize(vsmall)) /// - legend(size(small)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - } -} - -* Combine plots by year -qui sum year -local min_year = 2011 -local max_year = r(max) - -forvalues year = `min_year'/`max_year' { - - grc1leg ind_gross_inc_`year'_all ind_gross_inc_`year'_ZERO /// - ind_gross_inc_`year'_TWENTY ind_gross_inc_`year'_FORTY /// - ind_gross_inc_`year'_FIFTY, /// - title("Individual Gross Income by Weekly Hours of Work") /// - subtitle("`year', Females") /// - legendfrom(ind_gross_inc_`year'_all) rows(2) /// - graphregion(color(white)) /// - note("Notes: Series represents average individual gross income without benefits through time. Statistics computed by averaging individual" "level gross income for all females aged ${min_age}-${max_age}. Values in € per year, 2015 prices. Sample trimmed. Weekly hours worked categories:" "ZERO = 0, TWENTY = [1,39], FORTY = 40, FIFTY = 41+. Top and bottom percentiles trimmed.", /// - size(vsmall)) - - graph export "$dir_output_files/income/gross_income/validation_${country}_ind_gross_income_dist_`year'_female.png", /// - replace width(2400) height(1350) - -} - -graph drop _all - - - -/* - -* Investigation into who the people are with high working hours and low gross -* income -/* -Note plot ben unit observations using individual level data. - -Components of gross income. - -Gross personal income components -• PY010G - Gross employee cash or near cash employee income -• PY050G - Gross cash benefits or losses from self-employment - (including royalties) -• PY080G - Pensions received from individual private plans (other than those - covered under ESSPROS) - -Plus gross income components at household level -• HY040G - Income from rental of a property or land -• HY080G - Regular inter-household cash transfers received -• HY090G - Interests, dividends, profit from capital investments in - unincorporated business -• HY110G - Income received by people aged under 16 -*/ - -* Explore 2018 FIFTY hours -use "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if year == 2018 & laboursupplyweekly_hu == "FIFTY" - -order idperson idbenefit lhw valid_y_gross_nsbc_person_yr /// - y_gross_labour_person valid_wage_hour /// - py010g* py050g py080g /// - hy080g_pc hy110g_pc hy040g_pc hy090g_pc missing* - -fre missing_py010g missing_py050g missing_py080g missing_hy080g /// - missing_hy110g missing_hy040g missing_hy090g missing_lhw if /// - valid_y_gross_nsbc_person_yr == 0 // none missing seems to be in the data - - diff --git a/validation/02_simulation_validation/do_files/06_04_plot_gross_labour_income.do b/validation/02_simulation_validation/do_files/06_04_plot_gross_labour_income.do deleted file mode 100644 index 90611d7..0000000 --- a/validation/02_simulation_validation/do_files/06_04_plot_gross_labour_income.do +++ /dev/null @@ -1,756 +0,0 @@ -******************************************************************************** -* PROJECT: SimPaths EU -* SECTION: Validation -* OBJECT: Gross labour income -* AUTHORS: Patryk Bronka, Ashley Burdett -* LAST UPDATE: 11/2025 (AB) -* COUNTRY: Poland -******************************************************************************** -* NOTES: Plotted using individual level data -* => multiple observations per ben unit. -******************************************************************************** - -******************************************************************************** -* 1 : Mean values over time -******************************************************************************** -******************************************************************************** -* 1.1 : Mean values over time - By individual, benefit unit amount -******************************************************************************** - -* Prepare validation data -use year dwt les_c4 valid_y_gross_labour_bu_yr using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Keep only employed individuals -keep if les_c4 == 1 - - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_y_gross_labour_bu_yr, d - - replace valid_y_gross_labour_bu_yr = . if /// - valid_y_gross_labour_bu_yr < r(p1) | valid_y_gross_labour_bu_yr > r(p99) - -} - -collapse (mean) valid_y_gross_labour_bu_yr [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year les_c4 sim_yplgrs_dv_lvl_bu using /// - "$dir_data/simulated_data.dta", clear - -* Keep only employed individuals -keep if les_c4 == "EmployedOrSelfEmployed" - - -* Trim outliers -if "$trim_outliers" == "true" { - - sum sim_yplgrs_dv_lvl_bu, d - - replace sim_yplgrs_dv_lvl_bu = . if /// - sim_yplgrs_dv_lvl_bu < r(p1) | sim_yplgrs_dv_lvl_bu > r(p99) - -} - -collapse (mean) sim_yplgrs_dv_lvl_bu, by(run year) - -collapse (mean) sim_yplgrs_dv_lvl_bu /// - (sd) sim_yplgrs_dv_lvl_bu_sd = sim_yplgrs_dv_lvl_bu /// - , by(year) - -foreach varname in sim_yplgrs_dv_lvl_bu { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway /// -(rarea sim_yplgrs_dv_lvl_bu_high sim_yplgrs_dv_lvl_bu_low year, /// - sort color(green%20) legend(label(1 "Simulated"))) /// -(line valid_y_gross_labour_bu_yr year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Gross Labour Income") /// - xtitle("Year", size(small)) /// - ytitle("€ per year", size(small)) /// - ylabel(,labsize(small)) /// - xlabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) /// - note("Note: Amount at the benefit unit level, individual data plotted. Statistics calculated on the sample of employed individuals" "ages ${min_age}-${max_age}. 2019 X-sectional data used in underlying estimation. Amounts in 2015 prices. Top and bottom percentiles trimmed.", /// - size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/income/gross_labour_income/validation_${country}_gross_labour_income_ts_${min_age}_${max_age}.jpg", /// - replace width(2400) height(1350) quality(100) - -/* -* Males - -* Prepare validation data -use year dwt les_c4 valid_y_gross_labour_bu_yr dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if dgn == 1 - -* Keep only employed individuals -keep if les_c4 == 1 - - -* Trim outliers -if "$trim_outliers" == "true" { - sum valid_y_gross_labour_bu_yr, d - replace valid_y_gross_labour_bu_yr = . if /// - valid_y_gross_labour_bu_yr < r(p1) | valid_y_gross_labour_bu_yr > r(p99) -} - -collapse (mean) valid_y_gross_labour_bu_yr [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year les_c4 sim_yplgrs_dv_lvl_bu dgn using /// - "$dir_data/simulated_data.dta", clear - -keep if dgn == "Male" - -* Keep only employed individuals -keep if les_c4 == "EmployedOrSelfEmployed" - - -* Trim outliers -if "$trim_outliers" == "true" { - sum sim_yplgrs_dv_lvl_bu, d - replace sim_yplgrs_dv_lvl_bu = . if /// - sim_yplgrs_dv_lvl_bu < r(p1) | sim_yplgrs_dv_lvl_bu > r(p99) -} - -collapse (mean) sim_yplgrs_dv_lvl_bu, by(run year) -collapse (mean) sim_yplgrs_dv_lvl_bu /// - (sd) sim_yplgrs_dv_lvl_bu_sd = sim_yplgrs_dv_lvl_bu /// - , by(year) - -foreach varname in sim_yplgrs_dv_lvl_bu { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway /// -(rarea sim_yplgrs_dv_lvl_bu_high sim_yplgrs_dv_lvl_bu_low year, /// - sort color(green%20) legend(label(1 "Simulated"))) /// -(line valid_y_gross_labour_bu_yr year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Gross labour income") subtitle("Males") /// - xtitle("Year") /// - ytitle("€ per year") /// - ylabel(,labsize(small)) xlabel(,labsize(small)) /// - graphregion(color(white)) /// - note("Note: Statistics calculated on the sample of employed males ages 18-65. Yearly amounts. 2019 X-sectional data used in underlying" "estimation. Amounts in 2015 prices.", size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/income/gross_labour_income/validation_${country}_gross_labour_income_ts_${min_age}_${max_age}_male.jpg", /// - replace width(2400) height(1350) quality(100) - - -* Females - -* Prepare validation data -use year dwt les_c4 valid_y_gross_labour_bu_yr dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if dgn == 0 - -* Keep only employed individuals -keep if les_c4 == 1 - -* Trim outliers -if "$trim_outliers" == "true" { - sum valid_y_gross_labour_bu_yr, d - replace valid_y_gross_labour_bu_yr = . if /// - valid_y_gross_labour_bu_yr < r(p1) | valid_y_gross_labour_bu_yr > r(p99) -} - -collapse (mean) valid_y_gross_labour_bu_yr [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year les_c4 sim_yplgrs_dv_lvl_bu dgn using /// - "$dir_data/simulated_data.dta", clear - -keep if dgn == "Female" - -* Keep only employed individuals -keep if les_c4 == "EmployedOrSelfEmployed" - - -* Trim outliers -if "$trim_outliers" == "true" { - sum sim_yplgrs_dv_lvl_bu, d - replace sim_yplgrs_dv_lvl_bu = . if /// - sim_yplgrs_dv_lvl_bu < r(p1) | sim_yplgrs_dv_lvl_bu > r(p99) -} - -collapse (mean) sim_yplgrs_dv_lvl_bu, by(run year) -collapse (mean) sim_yplgrs_dv_lvl_bu /// - (sd) sim_yplgrs_dv_lvl_bu_sd = sim_yplgrs_dv_lvl_bu /// - , by(year) - -foreach varname in sim_yplgrs_dv_lvl_bu { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway /// -(rarea sim_yplgrs_dv_lvl_bu_high sim_yplgrs_dv_lvl_bu_low year, /// - sort color(green%20) legend(label(1 "Simulated"))) /// -(line valid_y_gross_labour_bu_yr year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("Gross labour income") subtitle("Females") /// - xtitle("Year") /// - ytitle("€ per year") /// - ylabel(,labsize(small)) xlabel(,labsize(small)) /// - graphregion(color(white)) /// - note("Note: Statistics calculated on the sample of employed females ages 18-65. Yearly amounts. 2019 X-sectional data used in underlying" "estimation. Amounts in 2015 prices.", /// - size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/income/gross_labour_income/validation_${country}_gross_labour_income_ts_${min_age}_${max_age}_female.jpg", /// - replace width(2400) height(1350) quality(100) -*/ - - -******************************************************************************** -* 2 : Histograms by year, and by category of weekly labour supply -******************************************************************************** - -* All - -* Prepare validation data -use year dwt les_c4 valid_y_gross_labour_bu_yr /// - laboursupplyweekly_hu using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Keep only employed individuals -keep if les_c4 == 1 -drop les_c4 - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_y_gross_labour_bu_yr, d - - replace valid_y_gross_labour_bu_yr = . if /// - valid_y_gross_labour_bu_yr < r(p1) | valid_y_gross_labour_bu_yr > r(p99) - -} - -* Prepare info needed for dynamic y axis labels -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - twoway__histogram_gen valid_y_gross_labour_bu_yr if year == `year', /// - width(750) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year' = r(max) - - drop d_valid v2 - - foreach ls in $ls_cat { - - twoway__histogram_gen valid_y_gross_labour_bu_yr if /// - year == `year' & labour == "`ls'", width(750) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year'_`ls' = r(max) - - drop d_valid v2 - - } - -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year les_c4 sim_yplgrs_dv_lvl_bu laboursupplyweekly using /// - "$dir_data/simulated_data.dta", clear - -* Keep only employed individuals -keep if les_c4 == "EmployedOrSelfEmployed" -drop les_c4 - - -* Trim outliers -if "$trim_outliers" == "true" { - - sum sim_yplgrs_dv_lvl_bu, d - - replace sim_yplgrs_dv_lvl_bu = . if /// - sim_yplgrs_dv_lvl_bu < r(p1) | sim_yplgrs_dv_lvl_bu > r(p99) - -} - -keep if run == 1 - -append using "$dir_data/temp_valid_stats.dta" - -* Plot sub-figures -qui sum year -local min_year = 2011 // Calculate the minimum value of the 'year' variable -local max_year = r(max) // Calculate the maximum value of the 'year' variable - -forval year = `min_year'/`max_year' { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen sim_yplgrs_dv_lvl_bu if year == `year', width(750) /// - den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - - * Plot all hours - twoway (hist sim_yplgrs_dv_lvl_bu if year == `year' , width(750) /// - color(green%30) legend(label(1 "Simulated"))) /// - (hist valid_y_gross_labour_bu_yr if year == `year' , width(750) /// - color(red%30) legend(label(2 "SILC"))) , /// - subtitle("ALL hours") name(gross_labour_inc_`year'_all, replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y',labsize(vsmall)) /// - ytitle(, size(small)) /// - legend(size(small)) /// - graphregion(color(white)) - - - drop d_sim v1 max_d_sim max_value - - * Plot by weekly hours work - foreach ls in $ls_cat_labour { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen sim_yplgrs_dv_lvl_bu if year == `year' & /// - laboursupplyweekly == "`ls'", width(750) den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year'_`ls' if /// - max_d_valid_`year'_`ls' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - * Plot by weekly hours work - twoway (hist sim_yplgrs_dv_lvl_bu if year == `year' & /// - laboursupplyweekly == "`ls'", width(750) color(green%30) /// - legend(label(1 "Simulated"))) /// - (hist valid_y_gross_labour_bu_yr if year == `year' & /// - laboursupplyweekly_hu == "`ls'", width(750) color(red%30) /// - legend(label(2 "SILC"))) , /// - subtitle("`ls' hours") /// - name(gross_labour_inc_`year'_`ls', replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y',labsize(vsmall)) /// - ytitle(, size(small)) /// - legend(size(small)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - } -} - -* Combine plots by year -forvalues year = `min_year'/`max_year' { - - grc1leg gross_labour_inc_`year'_all /// - gross_labour_inc_`year'_TWENTY gross_labour_inc_`year'_FORTY /// - gross_labour_inc_`year'_FIFTY, /// - title("Gross Labour Income") /// - subtitle("`year'") /// - legendfrom(gross_labour_inc_`year'_all) rows(2) /// - graphregion(color(white)) /// - note("Notes: Amount at the benefit unit level, individual data plotted. Individuals aged ${min_age}-${max_age}. Values in € per year, 2015 prices. Weekly hours worked categories:" "ZERO = 0, TWENTY = [1,39], FORTY = 40, FIFTY = 41+. Top and bottom percentiles trimmed.", /// - size(vsmall)) - - graph export "$dir_output_files/income/gross_labour_income/validation_${country}_gross_labour_income_dist_`year'_both.png", /// - replace width(2400) height(1350) - -} - -graph drop _all - -/* -* Males - -* Prepare validation data -use year dwt les_c4 valid_y_gross_labour_bu_yr /// - laboursupplyweekly_hu dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if dgn == 1 -drop dgn - -* Keep only employed individuals -keep if les_c4 == 1 -drop les_c4 - -* Trim outliers -if "$trim_outliers" == "true" { - sum valid_y_gross_labour_bu_yr, d - replace valid_y_gross_labour_bu_yr = . if /// - valid_y_gross_labour_bu_yr < r(p1) | valid_y_gross_labour_bu_yr > r(p99) -} - -* Prepare info needed for dynamic y axis labels -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - twoway__histogram_gen valid_y_gross_labour_bu_yr if year == `year', /// - width(750) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year' = r(max) - - drop d_valid v2 - - foreach ls in $ls_cat { - - twoway__histogram_gen valid_y_gross_labour_bu_yr if /// - year == `year' & labour == "`ls'", width(750) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year'_`ls' = r(max) - - drop d_valid v2 - - } - -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year les_c4 sim_yplgrs_dv_lvl_bu laboursupplyweekly dgn using /// - "$dir_data/simulated_data.dta", clear - -keep if dgn == "Male" -drop dgn - -* Keep only employed individuals -keep if les_c4 == "EmployedOrSelfEmployed" -drop les_c4 - - -* Trim outliers -if "$trim_outliers" == "true" { - sum sim_yplgrs_dv_lvl_bu, d - replace sim_yplgrs_dv_lvl_bu = . if /// - sim_yplgrs_dv_lvl_bu < r(p1) | sim_yplgrs_dv_lvl_bu > r(p99) -} - -keep if run == 1 - -append using "$dir_data/temp_valid_stats.dta" - - -* Plot sub-figures -qui sum year -local min_year = 2011 // Calculate the minimum value of the 'year' variable -local max_year = r(max) // Calculate the maximum value of the 'year' variable - -forval year = `min_year'/`max_year' { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen sim_yplgrs_dv_lvl_bu if year == `year', width(750) /// - den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - - * Plot all hours - twoway (hist sim_yplgrs_dv_lvl_bu if year == `year' , width(750) /// - color(green%30) legend(label(1 "Simulated"))) /// - (hist valid_y_gross_labour_bu_yr if year == `year' , width(750) /// - color(red%30) legend(label(2 "SILC"))) , /// - subtitle("ALL hours") name(gross_labour_inc_`year'_all, replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y',labsize(vsmall)) /// - graphregion(color(white)) - - - drop d_sim v1 max_d_sim max_value - - * Plot by weekly hours work - foreach ls in $ls_cat_labour { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen sim_yplgrs_dv_lvl_bu if year == `year' & /// - laboursupplyweekly == "`ls'", width(750) den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year'_`ls' if /// - max_d_valid_`year'_`ls' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - * Plot by weekly hours work - twoway (hist sim_yplgrs_dv_lvl_bu if year == `year' & /// - laboursupplyweekly == "`ls'", width(750) color(green%30) /// - legend(label(1 "Simulated"))) /// - (hist valid_y_gross_labour_bu_yr if year == `year' & /// - laboursupplyweekly_hu == "`ls'", width(750) color(red%30) /// - legend(label(2 "SILC"))) , /// - subtitle("`ls' hours") name(gross_labour_inc_`year'_`ls', replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y',labsize(vsmall)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - } -} - -* Combine plots by year -forvalues year = `min_year'/`max_year' { - - grc1leg gross_labour_inc_`year'_all /// - gross_labour_inc_`year'_TWENTY gross_labour_inc_`year'_FORTY /// - gross_labour_inc_`year'_FIFTY, /// - title("Gross labour income") /// - subtitle("`year', Males") /// - legendfrom(gross_labour_inc_`year'_all) rows(2) /// - graphregion(color(white)) /// - note("Notes: Series represents average benefit unit gross labour income through time. Statistics computed by averaging benefit unit" "level gross income for all males ages 18-65. Values in € per year, 2015 prices. Weekly hours worked categories:" "ZERO = 0, TWENTY = [1,39], FORTY = 40, FIFTY = 41+", /// - size(vsmall)) - - graph export "$dir_output_files/income/gross_labour_income/validation_${country}_gross_labour_income_dist_`year'_male.png", /// - replace width(2400) height(1350) - -} - -graph drop _all - - -* Females - -* Prepare validation data -use year dwt les_c4 valid_y_gross_labour_bu_yr /// - laboursupplyweekly_hu dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if dgn == 0 -drop dgn - -* Keep only employed individuals -keep if les_c4 == 1 -drop les_c4 - -* Trim outliers -if "$trim_outliers" == "true" { - sum valid_y_gross_labour_bu_yr, d - replace valid_y_gross_labour_bu_yr = . if /// - valid_y_gross_labour_bu_yr < r(p1) | valid_y_gross_labour_bu_yr > r(p99) -} - - -* Prepare info needed for dynamic y axis labels -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - twoway__histogram_gen valid_y_gross_labour_bu_yr if year == `year', /// - width(750) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year' = r(max) - - drop d_valid v2 - - foreach ls in $ls_cat_labour { - - twoway__histogram_gen valid_y_gross_labour_bu_yr if /// - year == `year' & labour == "`ls'", width(750) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year'_`ls' = r(max) - - drop d_valid v2 - - } - -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year les_c4 sim_yplgrs_dv_lvl_bu laboursupplyweekly dgn using /// - "$dir_data/simulated_data.dta", clear - -keep if dgn == "Female" -drop dgn - -* Keep only employed individuals -keep if les_c4 == "EmployedOrSelfEmployed" -drop les_c4 - - -* Trim outliers -if "$trim_outliers" == "true" { - sum sim_yplgrs_dv_lvl_bu, d - replace sim_yplgrs_dv_lvl_bu = . if /// - sim_yplgrs_dv_lvl_bu < r(p1) | sim_yplgrs_dv_lvl_bu > r(p99) -} - -keep if run == 1 - - -append using "$dir_data/temp_valid_stats.dta" - - -* Plot sub-figures -qui sum year -local min_year = 2011 // Calculate the minimum value of the 'year' variable -local max_year = r(max) // Calculate the maximum value of the 'year' variable - -forval year = `min_year'/`max_year' { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen sim_yplgrs_dv_lvl_bu if year == `year', width(750) /// - den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - - * Plot all hours - twoway (hist sim_yplgrs_dv_lvl_bu if year == `year' , width(750) /// - color(green%30) legend(label(1 "Simulated"))) /// - (hist valid_y_gross_labour_bu_yr if year == `year' , width(750) /// - color(red%30) legend(label(2 "SILC"))) , /// - subtitle("ALL hours") name(gross_labour_inc_`year'_all, replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y',labsize(vsmall)) /// - graphregion(color(white)) - - - drop d_sim v1 max_d_sim max_value - - * Plot by weekly hours work - foreach ls in $ls_cat_labour { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen sim_yplgrs_dv_lvl_bu if year == `year' & /// - laboursupplyweekly == "`ls'", width(750) den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year'_`ls' if /// - max_d_valid_`year'_`ls' > max_d_sim & max_d_valid_`year'_`ls' - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - * Plot by weekly hours work - twoway (hist sim_yplgrs_dv_lvl_bu if year == `year' & /// - laboursupplyweekly == "`ls'", width(750) color(green%30) /// - legend(label(1 "Simulated"))) /// - (hist valid_y_gross_labour_bu_yr if year == `year' & /// - laboursupplyweekly_hu == "`ls'", width(750) color(red%30) /// - legend(label(2 "SILC"))) , /// - subtitle("`ls' hours") name(gross_labour_inc_`year'_`ls', replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y',labsize(vsmall)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - } -} - -* Combine plots by year -forvalues year = `min_year'/`max_year' { - - grc1leg gross_labour_inc_`year'_all gross_labour_inc_`year'_TWENTY /// - gross_labour_inc_`year'_FORTY /// - gross_labour_inc_`year'_FIFTY, /// - title("Gross labour income") /// - subtitle("`year', Females") /// - legendfrom(gross_labour_inc_`year'_all) rows(2) /// - graphregion(color(white)) /// - note("Notes: Series represents average benefit unit gross labour income through time. Statistics computed by averaging benefit-unit level" "gross income for all females ages 18-65. Values in € per year, 2015 prices. Weekly hours worked categories:" "ZERO = 0, TWENTY = [1,39], FORTY = 40, FIFTY = 41+.", /// - size(vsmall)) - - graph export "$dir_output_files/income/gross_labour_income/validation_${country}_gross_labour_income_dist_`year'_female.png", /// - replace width(2400) height(1350) - - -} - -graph drop _all - diff --git a/validation/02_simulation_validation/do_files/06_05_plot_capital_income.do b/validation/02_simulation_validation/do_files/06_05_plot_capital_income.do deleted file mode 100644 index fee90ff..0000000 --- a/validation/02_simulation_validation/do_files/06_05_plot_capital_income.do +++ /dev/null @@ -1,305 +0,0 @@ -/******************************************************************************* -* PROJECT: SimPaths EU -* SECTION: Validation -* OBJECT: Capital income -* AUTHORS: Patryk Bronka, Ashley Burdett -* LAST UPDATE: 06/2025 (AB) -* COUNTRY: Poland -******************************************************************************** -* NOTES: This do file plots simulated and SILC capital income, -* per benefit unit - -*******************************************************************************/ - -******************************************************************************** -* 1 : Mean values over time -******************************************************************************** -******************************************************************************** -* 1.1 : Mean values over time - By individual, benefit unit amounts -******************************************************************************** - -* Prepare validation data -use year dwt valid_y_gross_capital_bu_yr using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_y_gross_capital_bu_yr, d - - replace valid_y_gross_capital_bu_yr = . if /// - valid_y_gross_capital_bu_yr < r(p1) | /// - valid_y_gross_capital_bu_yr > r(p99) - -} - -collapse (mean) valid_y_gross_capital_bu_yr [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year sim_ypncp_lvl_bu using "$dir_data/simulated_data.dta", clear - - -* Trim outliers -if "$trim_outliers" == "true" { - - sum sim_ypncp_lvl_bu, d - - replace sim_ypncp_lvl_bu = . if /// - sim_ypncp_lvl_bu < r(p1) | sim_ypncp_lvl_bu > r(p99) - -} - - -collapse (mean) sim_ypncp_lvl_bu, by(run year) - -collapse (mean) sim_ypncp_lvl_bu /// - (sd) sim_ypncp_lvl_bu_sd = sim_ypncp_lvl_bu /// - , by(year) - -foreach varname in sim_ypncp_lvl_bu { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway (rarea sim_ypncp_lvl_bu_high sim_ypncp_lvl_bu_low year, sort /// - color(green%20) legend(label(1 "Simulated"))) /// -(line valid_y_gross_capital_bu_yr year, sort color(green) /// - legend(label(2 "SILC"))), /// -title("Capital Income") /// - xtitle("Year", size(small)) /// - ytitle("€ per year", size(small)) /// - ylabel(,labsize(small)) /// - xlabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) /// - note("Notes: Series represents average benefit unit capital income, annual. Statistics computed by averaging benefit unit-level gross" "income for all persons ages 18-65. Top and bottom percentiles trimmed. Amounts in 2015 prices. ", /// - size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/income/capital_income/validation_${country}_capital_income_ts_${min_age}_${max_age}_both.jpg", /// - replace width(2400) height(1350) quality(100) - - -******************************************************************************** -* 1.2 : Share with no capital income -******************************************************************************** -* Prepare validation data -use year dwt valid_y_gross_capital_bu_yr using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_y_gross_capital_bu_yr, d - - replace valid_y_gross_capital_bu_yr = . if /// - valid_y_gross_capital_bu_yr < r(p1) | valid_y_gross_capital_bu_yr > r(p99) - -} - -gen valid_no_capital = (valid_y_gross_capital_bu_yr == 0) - -collapse (mean) valid_no_capital [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year sim_ypncp_lvl_bu using "$dir_data/simulated_data.dta", clear - -* Trim outliers -if "$trim_outliers" == "true" { - - sum sim_ypncp_lvl_bu, d - - replace sim_ypncp_lvl_bu = . if /// - sim_ypncp_lvl_bu < r(p1) | sim_ypncp_lvl_bu > r(p99) - -} - -gen sim_no_capital = (sim_ypncp_lvl_bu == 0) - -collapse (mean) sim_no_capital, by(run year) - -collapse (mean) sim_no_capital /// - (sd) sim_no_capital_sd = sim_no_capital /// - , by(year) - -foreach varname in sim_no_capital { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway (rarea sim_no_capital_high sim_no_capital_low year, sort /// - color(green%20) legend(label(1 "Simulated"))) /// -(line valid_no_capital year, sort color(green) /// - legend(label(2 "SILC"))), /// -title("No Capital Income") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - ylabel(,labsize(small)) /// - xlabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) /// - note("Notes: Series represents share of individuals in benefit units that do not report receiving any capital income, annual. Statistics computed" "by averaging benefit unit-level gross income for all persons ages 18-65. Amounts in 2015 prices.", /// - size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/income/capital_income/validation_${country}_no_capital_income_ts_${min_age}_${max_age}_both.jpg", /// - replace width(2400) height(1350) quality(100) - -******************************************************************************** -* 2 : Histograms -******************************************************************************** -******************************************************************************** -* 2.1 : Histograms by year, All -******************************************************************************** - -* All -* Prepare validation data -use year dwt valid_y_gross_capital_bu_yr using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_y_gross_capital_bu_yr, d - - replace valid_y_gross_capital_bu_yr = . if /// - valid_y_gross_capital_bu_yr < r(p1) | /// - valid_y_gross_capital_bu_yr > r(p99) - -} - - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year sim_ypncp_lvl_bu using "$dir_data/simulated_data.dta", clear - -* Trim outliers -if "$trim_outliers" == "true" { - - sum sim_ypncp_lvl_bu, d - - replace sim_ypncp_lvl_bu = . if /// - sim_ypncp_lvl_bu < r(p1) | sim_ypncp_lvl_bu > r(p99) - -} - -append using "$dir_data/temp_valid_stats.dta" - -qui sum year -local min_year = 2011 // Calculate the minimum value of the 'year' variable -local max_year = r(max) // Calculate the maximum value of the 'year' variable - -forval year = `min_year'/`max_year' { - - //Entire sample - twoway (hist sim_ypncp_lvl_bu if year == `year' & sim_ypncp_lvl_bu < 100, /// - width(1) color(green%30) legend(label(1 "Simulated"))) /// - (hist valid_y_gross_capital_bu_yr if year == `year' & valid_y_gross_capital_bu_yr < 100, /// - width(1) color(red%30) legend(label(2 "SILC"))) , /// - title("Capital Income") /// - subtitle("`year'") /// - name(capital_inc_`year'_all, replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - xtitle("€ per year", size(small)) /// - ytitle(, size(small)) /// - graphregion(color(white)) /// - note("Notes: Individual level observations plotted. All persons ages 18-65. Values in € per year, 2015 prices. X axis range limited to 100." "Top and bottom percentails trimmed.", /// - size(vsmall)) - - graph export /// - "$dir_output_files/income/capital_income/validation_${country}_capital_income_dist_`year'.png", /// - replace width(2560) height(1440) - -} - -******************************************************************************** -* 2.1 : Histograms by year, Positive amount -******************************************************************************** -* Prepare validation data -use year dwt valid_y_gross_capital_bu_yr using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_y_gross_capital_bu_yr, d - - replace valid_y_gross_capital_bu_yr = . if /// - valid_y_gross_capital_bu_yr < r(p1) | valid_y_gross_capital_bu_yr > r(p99) - -} - -drop if valid_y_gross_capital_bu_yr == 0 - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year sim_ypncp_lvl_bu using "$dir_data/simulated_data.dta", clear - -* Trim outliers -if "$trim_outliers" == "true" { - - sum sim_ypncp_lvl_bu, d - - replace sim_ypncp_lvl_bu = . if /// - sim_ypncp_lvl_bu < r(p1) | sim_ypncp_lvl_bu > r(p99) - -} - -drop if sim_ypncp_lvl_bu == 0 - -append using "$dir_data/temp_valid_stats.dta" - -qui sum year -local min_year = 2011 // Calculate the minimum value of the 'year' variable -local max_year = r(max) // Calculate the maximum value of the 'year' variable - -forval year = `min_year'/`max_year' { - - //Entire sample - twoway (hist sim_ypncp_lvl_bu if year == `year' & sim_ypncp_lvl_bu < 100, /// - width(1) color(green%30) legend(label(1 "Simulated"))) /// - (hist valid_y_gross_capital_bu_yr if year == `year' & valid_y_gross_capital_bu_yr < 100, /// - width(1) color(red%30) legend(label(2 "SILC"))) , /// - title("Capital Income, Positive Amounts") /// - subtitle("`year'") /// - name(capital_inc_`year'_all, replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - xtitle("€ per year", size(small)) /// - ytitle(, size(small)) /// - graphregion(color(white)) /// - note("Notes: Individual level observations plotted. All persons ages 18-65. Values in € per year, 2015 prices. X axis range limited to 100." "Top percentile trimmed", /// - size(vsmall)) - - graph export /// - "$dir_output_files/income/capital_income/validation_${country}_positive_capital_income_dist_`year'.png", /// - replace width(2560) height(1440) - -} - -graph drop _all diff --git a/validation/02_simulation_validation/do_files/06_07_plot_disposable_income.do b/validation/02_simulation_validation/do_files/06_07_plot_disposable_income.do deleted file mode 100644 index 6e6d51d..0000000 --- a/validation/02_simulation_validation/do_files/06_07_plot_disposable_income.do +++ /dev/null @@ -1,626 +0,0 @@ -/******************************************************************************* -* PROJECT: SimPaths EU -* SECTION: Validation -* OBJECT: Disposable income -* AUTHORS: Patryk Bronka, Ashley Burdett -* LAST UPDATE: 11/2025 (AB) -* COUNTRY: Poland -******************************************************************************** -* NOTES: This do file plots simulated and SILC disposable income, -* per benefit unit. Individual level data plotted. -*******************************************************************************/ - -******************************************************************************** -* 1 : Mean values over time -******************************************************************************** -******************************************************************************** -* 1.1 : Mean values over time, By benefit unit -******************************************************************************** - -* Prepare validation data -use year dwt valid_y_disp_bu_yr using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_y_disp_bu_yr, d - - replace valid_y_disp_bu_yr = . if /// - valid_y_disp_bu_yr < r(p1) | valid_y_disp_bu_yr > r(p99) - -} - -collapse (mean) valid_y_disp_bu_yr [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year sim_y_disp_yr_bu using "$dir_data/simulated_data.dta", clear - -* Trim outliers -if "$trim_outliers" == "true" { - - sum sim_y_disp_yr_bu, d - - replace sim_y_disp_yr_bu = . if /// - sim_y_disp_yr_bu < r(p1) | sim_y_disp_yr_bu > r(p99) - -} - -collapse (mean) sim_y_disp_yr_bu, by(run year) - -collapse (mean) sim_y_disp_yr_bu /// - (sd) sim_y_disp_yr_bu_sd = sim_y_disp_yr_bu /// - , by(year) - -foreach varname in sim_y_disp_yr_bu { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway (rarea sim_y_disp_yr_bu_high sim_y_disp_yr_bu_low year, /// - sort color(green%20) legend(label(1 "Simulated"))) /// -(line valid_y_disp_bu_yr year, sort color(green) /// - legend(label(2 "SILC"))), /// -title("Disposable Income") /// - xtitle("Year", size(small)) /// - ytitle("€ per year", size(small)) /// - ylabel(,labsize(small)) /// - xlabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) /// - note("Notes: Statistics computed at the benefit unit level. Plotted using individual level data. Sample includes all individuals ages ${min_age}-${max_age}." "Amounts in 2015 prices. Top and bottom percentials trimmed. ", /// - size(vsmall)) - -graph export /// -"$dir_output_files/income/disposable_income/validation_${country}_disposable_income_ts_${min_age}_${max_age}_both.jpg", /// - replace //width(2560) height(1440) quality(100) - - -/* -* EUROMOD - -* Prepare EM data -use "$dir_work/${country}_EM_validation_data.dta", clear - - -* Trim outliers -if "$trim_outliers" == "true" { - sum valid_y_disp_bu_yr, d - replace valid_y_disp_bu_yr = . if /// - valid_y_disp_bu_yr < r(p1) | valid_y_disp_bu_yr > r(p99) -} - - -collapse (mean) valid_y_disp_bu_yr [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - - -* Prepare simulated data -use run year sim_y_disp_yr_bu using "$dir_data/simulated_data.dta", clear - - -* Trim outliers -if "$trim_outliers" == "true" { - sum sim_y_disp_yr_bu, d - replace sim_y_disp_yr_bu = . if /// - sim_y_disp_yr_bu < r(p1) | sim_y_disp_yr_bu > r(p99) -} - - -collapse (mean) sim_y_disp_yr_bu, by(run year) -collapse (mean) sim_y_disp_yr_bu /// - (sd) sim_y_disp_yr_bu_sd = sim_y_disp_yr_bu /// - , by(year) - -foreach varname in sim_y_disp_yr_bu { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway (rarea sim_y_disp_yr_bu_high sim_y_disp_yr_bu_low year, /// - sort color(green%20) legend(label(1 "Simulated"))) /// -(line valid_y_disp_bu_yr year, sort color(green) /// - legend(label(2 "SILC"))), /// -title("Disposable income") xtitle("Year") ytitle("€ per year (2015 prices)") /// - ylabel(,labsize(small)) xlabel(,labsize(small)) /// - graphregion(color(white)) /// - note("Notes: Statistics computed at the benefit unit level.", size(vsmall)) -*/ -*/ - - -******************************************************************************** -* 2 : Histograms -******************************************************************************** -******************************************************************************** -* 2.1 : Histograms, By year, and category of weekly labour supply -******************************************************************************** - -* Prepare validation data -use year dwt valid_y_disp_bu_yr laboursupplyweekly_hu dag /// - using "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_y_disp_bu_yr, d - - replace valid_y_disp_bu_yr = . if /// - valid_y_disp_bu_yr < r(p1) | valid_y_disp_bu_yr > r(p99) - -} - -* Prepare info needed for dynamic y axis labels -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - twoway__histogram_gen valid_y_disp_bu_yr if year == `year' , /// - width(500) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year' = r(max) - - drop d_valid v2 - - foreach ls in $ls_cat { - - twoway__histogram_gen valid_y_disp_bu_yr if /// - year == `year' & labour == "`ls'", width(500) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year'_`ls' = r(max) - - drop d_valid v2 - - } -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year sim_y_disp_yr_bu laboursupplyweekly dag using /// - "$dir_data/simulated_data.dta", clear - - -* Trim outliers -if "$trim_outliers" == "true" { - - sum sim_y_disp_yr_bu, d - - replace sim_y_disp_yr_bu = . if /// - sim_y_disp_yr_bu < r(p1) | sim_y_disp_yr_bu > r(p99) - -} - -keep if run == 1 - -append using "$dir_data/temp_valid_stats.dta" - -* Plot sub-figures -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen sim_y_disp_yr_bu if year == `year', width(500) /// - den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - * Plot all hours - twoway (hist sim_y_disp_yr_bu if year == `year', width(500) /// - color(green%30) legend(label(1 "Simulated"))) /// - (hist valid_y_disp_bu_yr if year == `year' , width(500) color(red%30) /// - legend(label(2 "SILC"))) , /// - subtitle("ALL hours") name(disp_inc_`year'_all, replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y', labsize(vsmall)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - foreach ls in $ls_cat { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen sim_y_disp_yr_bu if year == `year' & /// - laboursupplyweekly == "`ls'", width(500) den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year'_`ls' if /// - max_d_valid_`year'_`ls' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - * Plot by weekly hours work - twoway (hist sim_y_disp_yr_bu if year == `year' & /// - laboursupplyweekly == "`ls'", width(500) color(green%30) /// - legend(label(1 "Simulated"))) /// - (hist valid_y_disp_bu_yr if year == `year' & /// - laboursupplyweekly_hu == "`ls'", width(500) color(red%30) /// - legend(label(2 "SILC"))) , /// - subtitle("`ls' hours") name(disp_inc_`year'_`ls', replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y', labsize(vsmall)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - } -} - -* Combine plots by year -qui sum year -local min_year = 2011 -local max_year = r(max) - -forvalues year = `min_year'/`max_year' { - - grc1leg disp_inc_`year'_all disp_inc_`year'_ZERO disp_inc_`year'_TWENTY /// - disp_inc_`year'_FORTY /// - disp_inc_`year'_FIFTY, /// - title("Disposable Income By Weekly Hours of Work") /// - subtitle("`year',") /// - legendfrom(disp_inc_`year'_all) rows(2) /// - graphregion(color(white)) /// - note("Notes: Sample includes all males aged ${min_age}-${max_age}. Values in € per year (2015 prices). Indiviudal level data for benefit level variable. Samples" "trimmed. Weekly hours worked categories: ZERO = 0, TWENTY = [1,39], FORTY = 40, FIFTY = 41+", /// - size(vsmall)) - - graph export /// - "$dir_output_files/income/disposable_income/validation_${country}_disposable_income_dist_`year'.png", /// - replace width(2400) height(1350) -} - - -graph drop _all - - -/* - -* Males - -* Prepare validation data -use year dwt valid_y_disp_bu_yr laboursupplyweekly_hu dag dgn /// - using "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if dgn == 1 -drop dgn - -* Trim outliers -if "$trim_outliers" == "true" { - sum valid_y_disp_bu_yr, d - - replace valid_y_disp_bu_yr = . if /// - valid_y_disp_bu_yr < r(p1) | valid_y_disp_bu_yr > r(p99) -} - -* Prepare info needed for dynamic y axis labels -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - twoway__histogram_gen valid_y_disp_bu_yr if year == `year' , /// - width(500) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year' = r(max) - - drop d_valid v2 - - foreach ls in $ls_cat { - - twoway__histogram_gen valid_y_disp_bu_yr if /// - year == `year' & labour == "`ls'", width(500) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year'_`ls' = r(max) - - drop d_valid v2 - - } -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year sim_y_disp_yr_bu laboursupplyweekly dag dgn using /// - "$dir_data/simulated_data.dta", clear - -keep if dgn == "Male" -drop dgn - -* Trim outliers -if "$trim_outliers" == "true" { - sum sim_y_disp_yr_bu, d - replace sim_y_disp_yr_bu = . if /// - sim_y_disp_yr_bu < r(p1) | sim_y_disp_yr_bu > r(p99) -} - -keep if run == 1 - -append using "$dir_data/temp_valid_stats.dta" - -* Plot sub-figures -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen sim_y_disp_yr_bu if year == `year', width(500) /// - den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - * Plot all hours - twoway (hist sim_y_disp_yr_bu if year == `year', width(500) /// - color(green%30) legend(label(1 "Simulated"))) /// - (hist valid_y_disp_bu_yr if year == `year' , width(500) color(red%30) /// - legend(label(2 "SILC"))) , /// - subtitle("ALL hours") name(disp_inc_`year'_all, replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y', labsize(vsmall)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - foreach ls in $ls_cat { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen sim_y_disp_yr_bu if year == `year' & /// - laboursupplyweekly == "`ls'", width(500) den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year'_`ls' if /// - max_d_valid_`year'_`ls' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - * Plot by weekly hours work - twoway (hist sim_y_disp_yr_bu if year == `year' & /// - laboursupplyweekly == "`ls'", width(500) color(green%30) /// - legend(label(1 "Simulated"))) /// - (hist valid_y_disp_bu_yr if year == `year' & /// - laboursupplyweekly_hu == "`ls'", width(500) color(red%30) /// - legend(label(2 "SILC"))) , /// - subtitle("`ls' hours") name(disp_inc_`year'_`ls', replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y', labsize(vsmall)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - } -} - -* Combine plots by year -qui sum year -local min_year = 2011 -local max_year = r(max) - -forvalues year = `min_year'/`max_year' { - - grc1leg disp_inc_`year'_all disp_inc_`year'_ZERO disp_inc_`year'_TWENTY /// - disp_inc_`year'_FORTY /// - disp_inc_`year'_FIFTY, /// - title("Disposable income by weekly hours of work") /// - subtitle("`year', Males") /// - legendfrom(disp_inc_`year'_all) rows(2) /// - graphregion(color(white)) /// - note("Notes: Sample includes all males aged 18-65. Values in € per year (2015 prices). Indiviudal level data for benefit level variable. Samples" "trimmed. Weekly hours worked categories: ZERO = 0, TWENTY = [1,39], FORTY = 40, FIFTY = 41+", /// - size(vsmall)) - - graph export /// - "$dir_output_files/income/disposable_income/validation_${country}_disposable_income_dist_`year'_male.png", /// - replace width(2400) height(1350) -} - - -graph drop _all - - -* Females - -* Prepare validation data -use year dwt valid_y_disp_bu_yr laboursupplyweekly_hu dag dgn /// - using "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if dgn == 0 -drop dgn - -* Trim outliers -if "$trim_outliers" == "true" { - sum valid_y_disp_bu_yr, d - - replace valid_y_disp_bu_yr = . if /// - valid_y_disp_bu_yr < r(p1) | valid_y_disp_bu_yr > r(p99) -} - -* Prepare info needed for dynamic y axis labels -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - twoway__histogram_gen valid_y_disp_bu_yr if year == `year' , /// - width(500) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year' = r(max) - - drop d_valid v2 - - foreach ls in $ls_cat { - - twoway__histogram_gen valid_y_disp_bu_yr if /// - year == `year' & labour == "`ls'", width(500) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year'_`ls' = r(max) - - drop d_valid v2 - - } -} - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year sim_y_disp_yr_bu laboursupplyweekly dag dgn using /// - "$dir_data/simulated_data.dta", clear - -keep if dgn == "Female" -drop dgn - -* Trim outliers -if "$trim_outliers" == "true" { - sum sim_y_disp_yr_bu, d - replace sim_y_disp_yr_bu = . if /// - sim_y_disp_yr_bu < r(p1) | sim_y_disp_yr_bu > r(p99) -} - -keep if run == 1 - - -append using "$dir_data/temp_valid_stats.dta" - -* Plot sub-figures -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen sim_y_disp_yr_bu if year == `year', width(500) /// - den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - * Plot all hours - twoway (hist sim_y_disp_yr_bu if year == `year' , width(500) /// - color(green%30) legend(label(1 "Simulated"))) /// - (hist valid_y_disp_bu_yr if year == `year' , width(500) color(red%30) /// - legend(label(2 "SILC"))) , /// - subtitle("ALL hours") name(disp_inc_`year'_all, replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y', labsize(vsmall)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - foreach ls in $ls_cat { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen sim_y_disp_yr_bu if year == `year' & /// - laboursupplyweekly == "`ls'", width(500) den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year'_`ls' if /// - max_d_valid_`year'_`ls' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - * Plot by weekly hours work - twoway (hist sim_y_disp_yr_bu if year == `year' & /// - laboursupplyweekly == "`ls'", width(500) color(green%30) /// - legend(label(1 "Simulated"))) /// - (hist valid_y_disp_bu_yr if year == `year' & /// - laboursupplyweekly_hu == "`ls'", width(500) color(red%30) /// - legend(label(2 "SILC"))) , /// - subtitle("`ls' hours") name(disp_inc_`year'_`ls', replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y', labsize(vsmall)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - } -} - -* Combine plots by year -qui sum year -local min_year = 2011 -local max_year = r(max) - -forvalues year = `min_year'/`max_year' { - - grc1leg disp_inc_`year'_all disp_inc_`year'_ZERO disp_inc_`year'_TWENTY /// - disp_inc_`year'_FORTY /// - disp_inc_`year'_FIFTY, /// - title("Disposable income by weekly hours of work") /// - subtitle("`year', Females") /// - legendfrom(disp_inc_`year'_all) rows(2) /// - graphregion(color(white)) /// - note("Notes: Sample includes all females aged 18-65. Values in € per year (2015 prices). Indiviudal level data for benefit level variable. Samples" "trimmed. Weekly hours worked categories: ZERO = 0, TWENTY = [1,39], FORTY = 40, FIFTY = 41+.", /// - size(vsmall)) - - graph export /// - "$dir_output_files/income/disposable_income/validation_${country}_disposable_income_dist_`year'_female.png", /// - replace width(2400) height(1350) -} - - -graph drop _all diff --git a/validation/02_simulation_validation/do_files/06_08_plot_equivalised_disposable_income.do b/validation/02_simulation_validation/do_files/06_08_plot_equivalised_disposable_income.do deleted file mode 100644 index f5c928b..0000000 --- a/validation/02_simulation_validation/do_files/06_08_plot_equivalised_disposable_income.do +++ /dev/null @@ -1,686 +0,0 @@ -/******************************************************************************* -* PROJECT: SimPaths EU -* SECTION: Validation -* OBJECT: Equivalised disposable income -* AUTHORS: Patryk Bronka, Ashley Burdett -* LAST UPDATE: 11/2025 (AB) -* COUNTRY: Poland -******************************************************************************** -* NOTES: This do file plots simulated and observed equivalised -* disposable income, per benefit unit -*******************************************************************************/ - -******************************************************************************** -* 1 : Mean values over time -******************************************************************************** - -* Prepare validation data -use year dwt valid_y_eq_disp_bu_yr using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_y_eq_disp_bu_yr, d - - replace valid_y_eq_disp_bu_yr = . if /// - valid_y_eq_disp_bu_yr < r(p1) | valid_y_eq_disp_bu_yr > r(p99) - -} - - -collapse (mean) valid_y_eq_disp_bu_yr [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year equivalisedincome using "$dir_data/simulated_data.dta", clear - -* Trim outliers -if "$trim_outliers" == "true" { - - sum equivalisedincome, d - - replace equivalisedincome = . if /// - equivalisedincome < r(p1) | equivalisedincome > r(p99) - -} - -collapse (mean) equivalisedincome, by(run year) - -collapse (mean) equivalisedincome /// - (sd) equivalisedincome_sd = equivalisedincome, by(year) - -foreach varname in equivalisedincome { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway (rarea equivalisedincome_high equivalisedincome_low year, /// - sort color(green%20) legend(label(1 "Simulated"))) /// -(line valid_y_eq_disp_bu_yr year, sort color(green) /// - legend(label(2 "Observed"))), /// - title("Equivalised Disposable Income") /// - xtitle("Year", size(small)) /// - ytitle("€ per year", size(small)) /// - ylabel(,labsize(small)) /// - xlabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) /// - note("Notes: Equivalised disposable income assigned to each person computed by adjusting benefit unit's disposable income by the" "modified OECD scale. Samples includes all individuals ages ${min_age}-${max_age}. Trimmed sample. Amounts in 2015 prices.", /// - size(vsmall)) - -graph export /// -"$dir_output_files/income/equivalised_disposable_income/validation_${country}_equivalised_disposable_income_ts_${min_age}_${max_age}_both.jpg", /// - replace width(2400) height(1350) - -/* -* Males - -* Prepare validation data -use year dwt valid_y_eq_disp_bu_yr dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if dgn == 1 - -* Trim outliers -if "$trim_outliers" == "true" { - sum valid_y_eq_disp_bu_yr, d - replace valid_y_eq_disp_bu_yr = . if /// - valid_y_eq_disp_bu_yr < r(p1) | valid_y_eq_disp_bu_yr > r(p99) -} - -collapse (mean) valid_y_eq_disp_bu_yr [aw = dwt], by(year) - - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year equivalisedincome dgn using "$dir_data/simulated_data.dta", clear - -keep if dgn == "Male" - -* Trim outliers -if "$trim_outliers" == "true" { - sum equivalisedincome, d - replace equivalisedincome = . if /// - equivalisedincome < r(p1) | equivalisedincome > r(p99) -} - -collapse (mean) equivalisedincome, by(run year) -collapse (mean) equivalisedincome /// - (sd) equivalisedincome_sd = equivalisedincome, by(year) - -foreach varname in equivalisedincome { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway (rarea equivalisedincome_high equivalisedincome_low year, /// - sort color(green%20) legend(label(1 "Simulated"))) /// -(line valid_y_eq_disp_bu_yr year, sort color(green) /// - legend(label(2 "Observed"))), /// - title("Equivalised disposable income") subtitle("Males") /// - xtitle("Year") /// - ytitle("€ per year.") /// - ylabel(,labsize(small)) xlabel(,labsize(small)) /// - graphregion(color(white)) /// - note("Notes: Equivalised disposable income assigned to each person computed by adjusting benefit unit's disposable income by the" "modified OECD scale. Samples includes males ages 18-65. Trimmed sample_long. Amounts in 2015 prices.", /// - size(vsmall)) - -graph export /// -"$dir_output_files/income/equivalised_disposable_income/validation_${country}_equivalised_disposable_income_ts_${min_age}_${max_age}_male.jpg", /// - replace width(2400) height(1350) - - -* Females - -* Prepare validation data -use year dwt valid_y_eq_disp_bu_yr dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if dgn == 0 - -* Trim outliers -if "$trim_outliers" == "true" { - sum valid_y_eq_disp_bu_yr, d - replace valid_y_eq_disp_bu_yr = . if /// - valid_y_eq_disp_bu_yr < r(p1) | valid_y_eq_disp_bu_yr > r(p99) -} - -collapse (mean) valid_y_eq_disp_bu_yr [aw = dwt], by(year) - - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulated data -use run year equivalisedincome dgn using "$dir_data/simulated_data.dta", clear - -keep if dgn == "Female" - -* Trim outliers -if "$trim_outliers" == "true" { - sum equivalisedincome, d - replace equivalisedincome = . if /// - equivalisedincome < r(p1) | equivalisedincome > r(p99) -} - -collapse (mean) equivalisedincome, by(run year) -collapse (mean) equivalisedincome /// - (sd) equivalisedincome_sd = equivalisedincome, by(year) - -foreach varname in equivalisedincome { - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway (rarea equivalisedincome_high equivalisedincome_low year, /// - sort color(green%20) legend(label(1 "Simulated"))) /// -(line valid_y_eq_disp_bu_yr year, sort color(green) /// - legend(label(2 "Observed"))), /// - title("Equivalised disposable income") subtitle("Females") /// - xtitle("Year") /// - ytitle("€ per year.") /// - ylabel(,labsize(small)) xlabel(,labsize(small)) /// - graphregion(color(white)) /// - note("Notes: Equivalised disposable income assigned to each person computed by adjusting benefit unit's disposable income by the" "modified OECD scale. Samples includes females ages 18-65. Trimmed sample_long. Amounts in 2015 prices.", /// - size(vsmall)) - -graph export /// -"$dir_output_files/income/equivalised_disposable_income/validation_${country}_equivalised_disposable_income_ts_${min_age}_${max_age}_female.jpg", /// - replace width(2400) height(1350) -*/ - -******************************************************************************** -* 2 : Histograms -******************************************************************************** -******************************************************************************** -* 2 : Histograms, By year, and by category of weekly labour supply -******************************************************************************** - -* Males - -* Prepare validation data -use year dwt valid_y_eq_disp_bu_yr laboursupplyweekly_hu using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_y_eq_disp_bu_yr, d - - replace valid_y_eq_disp_bu_yr = . if /// - valid_y_eq_disp_bu_yr < r(p1) | valid_y_eq_disp_bu_yr > r(p99) - -} - -* Prepare info needed for dynamic y axis labels -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - twoway__histogram_gen valid_y_eq_disp_bu_yr if year == `year' , /// - width(500) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year' = r(max) - - drop d_valid v2 - - foreach ls in $ls_cat { - - twoway__histogram_gen valid_y_eq_disp_bu_yr if /// - year == `year' & labour == "`ls'", width(500) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year'_`ls' = r(max) - - drop d_valid v2 - - } -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year equivalisedincome laboursupplyweekly using /// - "$dir_data/simulated_data.dta", clear - - -* Trim outliers -if "$trim_outliers" == "true" { - - sum equivalisedincome, d - - replace equivalisedincome = . if /// - equivalisedincome < r(p1) | equivalisedincome > r(p99) - -} - -keep if run == 1 - -append using "$dir_data/temp_valid_stats.dta" - -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen equivalisedincome if year == `year', width(500) /// - den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - * Plot all hours - twoway (hist equivalisedincome if year == `year', width(500) /// - color(green%30) legend(label(1 "Simulated"))) /// - (hist valid_y_eq_disp_bu_yr if year == `year', width(500) color(red%30) /// - legend(label(2 "Observed"))) , /// - subtitle("ALL hours") /// - name(eqdisp_inc_`year'_all, replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y', labsize(vsmall)) /// - legend(size(small)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - foreach ls in $ls_cat { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen equivalisedincome if year == `year' & /// - laboursupplyweekly == "`ls'", width(500) den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year'_`ls' if /// - max_d_valid_`year'_`ls' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - twoway (hist equivalisedincome if year == `year' & /// - laboursupplyweekly == "`ls'", width(500) color(green%30) /// - legend(label(1 "Simulated"))) /// - (hist valid_y_eq_disp_bu_yr if year == `year' & /// - laboursupplyweekly_hu == "`ls'", width(500) color(red%30) /// - legend(label(2 "Observed"))) , /// - subtitle("`ls' hours") name(eqdisp_inc_`year'_`ls', replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y', labsize(vsmall)) /// - ytitle(, size(small)) /// - legend(size(small)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - } -} - -qui sum year -local min_year = 2011 -local max_year = r(max) - -forvalues year = `min_year'/`max_year' { - - grc1leg eqdisp_inc_`year'_all eqdisp_inc_`year'_ZERO /// - eqdisp_inc_`year'_TWENTY eqdisp_inc_`year'_FORTY /// - eqdisp_inc_`year'_FIFTY, /// - title("Equivalised Disposable Income") /// - subtitle("`year', Males") /// - legendfrom(eqdisp_inc_`year'_all) rows(2) /// - graphregion(color(white)) /// - note("Notes: Distribution of benefit unit equivalised disposable, individual. Individual observations plotted, ${min_age}-${max_age} yo. Values in € per year," "2015 prices. Top and bottom percentiles trimmed. Weekly hours worked categories: ZERO = 0, TWENTY = [1,39], FORTY = 40, FIFTY = 41+", /// - size(vsmall)) - - graph export /// - "$dir_output_files/income/equivalised_disposable_income/validation_${country}_equivalised_disposable_inc_dist_`year'.png", /// - replace width(2560) height(1440) - -} - -graph drop _all - - - -/* -* Males - -* Prepare validation data -use year dwt valid_y_eq_disp_bu_yr laboursupplyweekly_hu dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if dgn == 1 -drop dgn - - -* Trim outliers -if "$trim_outliers" == "true" { - sum valid_y_eq_disp_bu_yr, d - - replace valid_y_eq_disp_bu_yr = . if /// - valid_y_eq_disp_bu_yr < r(p1) | valid_y_eq_disp_bu_yr > r(p99) -} - -* Prepare info needed for dynamic y axis labels -qui sum year -local min_year = 2010 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - twoway__histogram_gen valid_y_eq_disp_bu_yr if year == `year' , /// - width(500) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year' = r(max) - - drop d_valid v2 - - foreach ls in $ls_cat { - - twoway__histogram_gen valid_y_eq_disp_bu_yr if /// - year == `year' & labour == "`ls'", width(500) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year'_`ls' = r(max) - - drop d_valid v2 - - } -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year equivalisedincome laboursupplyweekly dgn using /// - "$dir_data/simulated_data.dta", clear - -keep if dgn == "Male" -drop dgn - - -* Trim outliers -if "$trim_outliers" == "true" { - sum equivalisedincome, d - replace equivalisedincome = . if /// - equivalisedincome < r(p1) | equivalisedincome > r(p99) -} - -keep if run == 1 - -append using "$dir_data/temp_valid_stats.dta" - -qui sum year -local min_year = 2010 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen equivalisedincome if year == `year', width(500) /// - den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - * Plot all hours - twoway (hist equivalisedincome if year == `year' , /// - color(green%30) legend(label(1 "Simulated"))) /// - (hist valid_y_eq_disp_bu_yr if year == `year' , color(red%30) /// - legend(label(2 "Observed"))) , /// - subtitle("ALL hours") name(eqdisp_inc_`year'_all, replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y', labsize(vsmall)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - foreach ls in $ls_cat { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen equivalisedincome if year == `year' & /// - laboursupplyweekly == "`ls'", width(500) den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year'_`ls' if /// - max_d_valid_`year'_`ls' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - twoway (hist equivalisedincome if year == `year' & /// - laboursupplyweekly == "`ls'", color(green%30) /// - legend(label(1 "Simulated"))) /// - (hist valid_y_eq_disp_bu_yr if year == `year' & /// - laboursupplyweekly_hu == "`ls'", color(red%30) /// - legend(label(2 "Observed"))) , /// - subtitle("`ls' hours") name(eqdisp_inc_`year'_`ls', replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y', labsize(vsmall)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - } -} - -qui sum year -local min_year = 2010 -local max_year = r(max) - -forvalues year = `min_year'/`max_year' { - - grc1leg eqdisp_inc_`year'_all eqdisp_inc_`year'_ZERO /// - eqdisp_inc_`year'_TWENTY eqdisp_inc_`year'_FORTY /// - eqdisp_inc_`year'_FIFTY, /// - title("Equivalised disposable income") /// - subtitle("`year', Males") /// - legendfrom(eqdisp_inc_`year'_all) rows(2) /// - graphregion(color(white)) /// - note("Notes: Series represents average benefit unit equivalised disposable income for all persons ages 18-65. Individual observations plotted." "Values in € per year, 2015 prices. Sample trimmed. Weekly hours worked categories:" "ZERO = 0, TWENTY = [1,39], FORTY = 40, FIFTY = 41+", /// - size(vsmall)) - - graph export /// - "$dir_output_files/income/equivalised_disposable_income/validation_${country}_equivalised_disposable_inc_dist_`year'_male.png", /// - replace width(2560) height(1440) - -} - -graph drop _all - - -* Females - -* Prepare validation data -use year dwt valid_y_eq_disp_bu_yr laboursupplyweekly_hu dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if dgn == 0 -drop dgn - - -* Trim outliers -if "$trim_outliers" == "true" { - sum valid_y_eq_disp_bu_yr, d - replace valid_y_eq_disp_bu_yr = . if /// - valid_y_eq_disp_bu_yr < r(p1) | valid_y_eq_disp_bu_yr > r(p99) -} - -* Prepare info needed for dynamic y axis labels -qui sum year -local min_year = 2010 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - twoway__histogram_gen valid_y_eq_disp_bu_yr if year == `year' , /// - width(500) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year' = r(max) - - drop d_valid v2 - - foreach ls in $ls_cat { - - twoway__histogram_gen valid_y_eq_disp_bu_yr if /// - year == `year' & labour == "`ls'", width(500) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year'_`ls' = r(max) - - drop d_valid v2 - - } -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year equivalisedincome laboursupplyweekly dgn using /// - "$dir_data/simulated_data.dta", clear - -keep if dgn == "Female" -drop dgn - - -* Trim outliers -if "$trim_outliers" == "true" { - sum equivalisedincome, d - replace equivalisedincome = . if /// - equivalisedincome < r(p1) | equivalisedincome > r(p99) -} - -keep if run == 1 - -append using "$dir_data/temp_valid_stats.dta" - -qui sum year -local min_year = 2010 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen equivalisedincome if year == `year', width(500) /// - den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - * Plot all hours - twoway (hist equivalisedincome if year == `year' , /// - color(green%30) legend(label(1 "Simulated"))) /// - (hist valid_y_eq_disp_bu_yr if year == `year' , color(red%30) /// - legend(label(2 "Observed"))) , /// - subtitle("ALL hours") name(eqdisp_inc_`year'_all, replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y', labsize(vsmall)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - foreach ls in $ls_cat { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen equivalisedincome if year == `year' & /// - laboursupplyweekly == "`ls'", width(500) den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year'_`ls' if /// - max_d_valid_`year'_`ls' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - twoway (hist equivalisedincome if year == `year' & /// - laboursupplyweekly == "`ls'", color(green%30) /// - legend(label(1 "Simulated"))) /// - (hist valid_y_eq_disp_bu_yr if year == `year' & /// - laboursupplyweekly_hu == "`ls'", color(red%30) /// - legend(label(2 "Observed"))) , /// - subtitle("`ls' hours") name(eqdisp_inc_`year'_`ls', replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y', labsize(vsmall)) /// - graphregion(color(white)) - - drop d_sim v1 max_d_sim max_value - - } -} - -qui sum year -local min_year = 2010 -local max_year = r(max) - -forvalues year = `min_year'/`max_year' { - - grc1leg eqdisp_inc_`year'_all eqdisp_inc_`year'_ZERO /// - eqdisp_inc_`year'_TWENTY eqdisp_inc_`year'_FORTY /// - eqdisp_inc_`year'_FIFTY, /// - title("Equivalised disposable income") /// - subtitle("`year', Females") /// - legendfrom(eqdisp_inc_`year'_all) rows(2) /// - graphregion(color(white)) /// - note("Notes: Series represents average benefit unit equivalised disposable income for all persons ages 18-65. Individual observations plotted" "Values in € per year, 2015 prices. Sample trimmed. Weekly hours worked categories:" "ZERO = 0, TWENTY = [1,39], FORTY = 40, FIFTY = 41+.", /// - size(vsmall)) - - graph export /// - "$dir_output_files/income/equivalised_disposable_income/validation_${country}_equivalised_disposable_inc_dist_`year'_female.png", /// - replace width(2560) height(1440) - -} - -graph drop _all diff --git a/validation/02_simulation_validation/do_files/06_09_plot_hourly_wages.do b/validation/02_simulation_validation/do_files/06_09_plot_hourly_wages.do deleted file mode 100644 index e7c20f0..0000000 --- a/validation/02_simulation_validation/do_files/06_09_plot_hourly_wages.do +++ /dev/null @@ -1,695 +0,0 @@ -/******************************************************************************* -* PROJECT: SimPaths EU -* SECTION: Validation -* OBJECT: Hourly wages -* AUTHORS: Patryk Bronka, Ashley Burdett -* LAST UPDATE: 1/2025 (AB) -* COUNTRY: Poland -*******************************************************************************/ -* NOTES: This master do file organises do files used for validating -* SimPaths model using EU-SILC data for Poland. -******************************************************************************** - -******************************************************************************** -* 1 : Mean values over time -******************************************************************************** -******************************************************************************** -* 1.1 : Mean values over time, All -******************************************************************************** -* Prepare validation data -use year dwt les_c4 valid_wage_hour using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Keep only employed individuals -keep if les_c4 == 1 - -drop if valid_wage_hour == -9 - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_wage_hour, d - replace valid_wage_hour = . if /// - valid_wage_hour < r(p1) | valid_wage_hour > r(p99) - -} - -collapse (mean) valid_wage_hour [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year les_c4 potential_earnings_hourly using /// - "$dir_data/simulated_data.dta", clear - -* Keep only employed individuals -keep if les_c4 == "EmployedOrSelfEmployed" - -* Trim outliers -if "$trim_outliers" == "true" { - - sum potential_earnings_hourly, d - - replace potential_earnings_hourly = . if /// - potential_earnings_hourly < r(p1) | potential_earnings_hourly > r(p99) - -} - -collapse (mean) potential_earnings_hourly, by(run year) - -collapse (mean) potential_earnings_hourly /// - (sd) potential_earnings_hourly_sd = potential_earnings_hourly /// - , by(year) - -foreach varname in potential_earnings_hourly { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway (rarea potential_earnings_hourly_high /// - potential_earnings_hourly_low year, /// - sort color(green%20) legend(label(1 "Simulated"))) /// -(line valid_wage_hour year, sort color(green) /// - legend(label(2 "Observed"))), /// -title("Hourly Wages") /// - xtitle("Year", size(small)) /// - ytitle("€ per year", size(small)) /// - ylabel(,labsize(small)) /// - xlabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) /// - note("Notes: Statistics calculated on sample of employed and self-employed individuals. Values in 2015 prices.", /// - size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/wages/validation_${country}_wages_ts_${min_age}_${max_age}_both.jpg", /// - replace width(2560) height(1440) quality(100) - - -**** With min wage adjustment - -* Prepare validation data -use year dwt les_c4 valid_wage_hour_adj_mw using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Keep only employed individuals -keep if les_c4 == 1 - -drop if valid_wage_hour_adj == -9 - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_wage_hour_adj_mw, d - replace valid_wage_hour_adj_mw = . if /// - valid_wage_hour_adj_mw < r(p1) | valid_wage_hour_adj_mw > r(p99) - -} - -collapse (mean) valid_wage_hour_adj_mw [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year les_c4 potential_earnings_hourly using /// - "$dir_data/simulated_data.dta", clear - -* Keep only employed individuals -keep if les_c4 == "EmployedOrSelfEmployed" - -* Trim outliers -if "$trim_outliers" == "true" { - - sum potential_earnings_hourly, d - - replace potential_earnings_hourly = . if /// - potential_earnings_hourly < r(p1) | potential_earnings_hourly > r(p99) - -} - -collapse (mean) potential_earnings_hourly, by(run year) - -collapse (mean) potential_earnings_hourly /// - (sd) potential_earnings_hourly_sd = potential_earnings_hourly /// - , by(year) - -foreach varname in potential_earnings_hourly { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway (rarea potential_earnings_hourly_high /// - potential_earnings_hourly_low year, /// - sort color(green%20) legend(label(1 "Simulated"))) /// -(line valid_wage_hour_adj_mw year, sort color(green) /// - legend(label(2 "Observed"))), /// -title("Hourly Wages") /// - subtitle("Min wage adjusted") /// - xtitle("Year", size(small)) /// - ytitle("€ per year", size(small)) /// - ylabel(,labsize(small)) /// - xlabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) /// - note("Notes: Statistics calculated on sample of employed and self-employed individuals. Values in 2015 prices.", /// - size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/wages/validation_${country}_wages_ts_${min_age}_${max_age}_both_mw.jpg", /// - replace width(2560) height(1440) quality(100) - - -**** Without those below the min wage - -* Prepare validation data -use year dwt les_c4 valid_wage_hour_adj using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Keep only employed individuals -keep if les_c4 == 1 - -drop if valid_wage_hour_adj == -9 - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_wage_hour_adj, d - replace valid_wage_hour_adj = . if /// - valid_wage_hour_adj < r(p1) | valid_wage_hour_adj > r(p99) - -} - -collapse (mean) valid_wage_hour_adj [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year les_c4 potential_earnings_hourly using /// - "$dir_data/simulated_data.dta", clear - -* Keep only employed individuals -keep if les_c4 == "EmployedOrSelfEmployed" - -* Trim outliers -if "$trim_outliers" == "true" { - - sum potential_earnings_hourly, d - - replace potential_earnings_hourly = . if /// - potential_earnings_hourly < r(p1) | potential_earnings_hourly > r(p99) - -} - -collapse (mean) potential_earnings_hourly, by(run year) - -collapse (mean) potential_earnings_hourly /// - (sd) potential_earnings_hourly_sd = potential_earnings_hourly /// - , by(year) - -foreach varname in potential_earnings_hourly { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway (rarea potential_earnings_hourly_high /// - potential_earnings_hourly_low year, /// - sort color(green%20) legend(label(1 "Simulated"))) /// -(line valid_wage_hour_adj year, sort color(green) /// - legend(label(2 "Observed"))), /// -title("Hourly Wages") /// - subtitle("Belwo min wage removed") /// - xtitle("Year, Min wage adjusted", size(small)) /// - ytitle("€ per year", size(small)) /// - ylabel(,labsize(small)) /// - xlabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) /// - note("Notes: Statistics calculated on sample of employed and self-employed individuals. Values in 2015 prices.", /// - size(vsmall)) - - -******************************************************************************** -* 1.2 : Mean values over time, By gender -******************************************************************************** -* Prepare validation data -use year dwt les_c4 valid_wage_hour dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Keep only employed individuals -keep if les_c4 == 1 - -drop if valid_wage_hour == -9 - - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_wage_hour, d - - replace valid_wage_hour = . if /// - valid_wage_hour < r(p1) | valid_wage_hour > r(p99) - -} - -collapse (mean) valid_wage_hour [aw = dwt], by(year dgn) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year les_c4 potential_earnings_hourly dgn using /// - "$dir_data/simulated_data.dta", clear - -* Keep only employed individuals -keep if les_c4 == "EmployedOrSelfEmployed" - -gen dgn2 = 0 if dgn == "Female" -replace dgn2 = 1 if dgn == "Male" - -drop dgn -rename dgn2 dgn - -* Trim outliers -if "$trim_outliers" == "true" { - - sum potential_earnings_hourly , d - - replace potential_earnings_hourly = . if /// - potential_earnings_hourly < r(p1) | potential_earnings_hourly > r(p99) - -} - -collapse (mean) potential_earnings_hourly, by(run year dgn) - -collapse (mean) potential_earnings_hourly /// - (sd) potential_earnings_hourly_sd = potential_earnings_hourly /// - , by(year dgn) - -foreach varname in potential_earnings_hourly { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year dgn using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway (rarea potential_earnings_hourly_high /// - potential_earnings_hourly_low year if dgn == 0, /// - sort color(green%20) legend(label(1 "Simulated"))) /// -(line valid_wage_hour year if dgn == 0, sort color(green) /// - legend(label(2 "Observed"))), /// -subtitle("Females") name(wages_female, replace) /// - xtitle("Year", size(small)) /// - ytitle("€ per year", size(small)) /// - ylabel(,labsize(small)) /// - xlabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) /// - - -twoway (rarea potential_earnings_hourly_high /// - potential_earnings_hourly_low year if dgn == 1, /// - sort color(green%20) legend(label(1 "Simulated"))) /// -(line valid_wage_hour year if dgn == 1, sort color(green) /// - legend(label(2 "Observed"))), /// -subtitle("Males") name(wages_male, replace) /// - xtitle("Year", size(small)) /// - ytitle("€ per year", size(small)) /// - ylabel(,labsize(small)) /// - xlabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) /// - - -grc1leg wages_female wages_male, /// - title("Hourly Wage") /// - legendfrom(wages_female) rows(1) /// - graphregion(color(white)) /// - note("Notes: Statistics calculated on sample of employed and self-employed individuals. Values in 2015 prices.", /// - size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/wages/validation_${country}_wages_ts_${min_age}_${max_age}_gender.jpg", /// - replace width(2560) height(1440) quality(100) - - -******************************************************************************** -* 2 : Histograms -******************************************************************************** -******************************************************************************** -* 2.1 : Histograms, By year -******************************************************************************** - -* Prepare validation data -use year dwt les_c4 valid_wage_hour using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Keep only employed individuals -keep if les_c4 == 1 - -drop if valid_wage_hour == -9 - -drop les_c4 - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_wage_hour, d - - replace valid_wage_hour = . if /// - valid_wage_hour < r(p1) | valid_wage_hour > r(p99) - -} - -* Prepare info needed for dynamic y axis labels -qui sum year -local min_year = 2011 -local max_year = 2022 - -forval year = `min_year'/`max_year' { - - twoway__histogram_gen valid_wage_hour if year == `year' , /// - bin(60) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year' = r(max) - - drop d_valid v2 - -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year les_c4 potential_earnings_hourly using /// - "$dir_data/simulated_data.dta", clear - -* Keep only employed individuals -keep if les_c4 == "EmployedOrSelfEmployed" - -drop les_c4 - -* Trim outliers -if "$trim_outliers" == "true" { - - sum potential_earnings_hourly, d - - replace potential_earnings_hourly = . if /// - potential_earnings_hourly < r(p1) | potential_earnings_hourly > r(p99) - -} - -append using "$dir_data/temp_valid_stats.dta" - -qui sum year -local min_year = 2011 -local max_year = 2022 - -forval year = `min_year'/`max_year' { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen potential_earnings_hourly if year == `year', /// - bin(60) den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - * Plot all hours - twoway (hist potential_earnings_hourly if year == `year', /// - color(green%30) legend(label(1 "Simulated"))) /// - (hist valid_wage_hour if year == `year', color(red%30) /// - legend(label(2 "Observed"))), /// - title("Hourly Wage") /// - subtitle("`year'") /// - name(hourly_wages_`year'_all, replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y', labsize(vsmall)) /// - xtitle("€ per year", size(small)) /// - ytitle(, size(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Statistics calculated on subsample of employed and self-employed individuals. Values in €, 2015 prices.", size(vsmall)) - - graph export /// - "$dir_output_files/wages/validation_${country}_wages_dist_`year'.png", /// - replace width(2400) height(1350) - - drop d_sim v1 max_d_sim max_value - -} - -******************************************************************************** -* 2.2 : Histograms, By gender -******************************************************************************** - -* Males -* Prepare validation data -use year dwt les_c4 valid_wage_hour dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Keep only employed individuals -keep if les_c4 == 1 -keep if dgn == 0 - -drop if valid_wage_hour == -9 - - -drop les_c4 dgn - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_wage_hour, d - - replace valid_wage_hour = . if /// - valid_wage_hour < r(p1) | valid_wage_hour > r(p99) - -} - -* Prepare info needed for dynamic y axis labels -qui sum year -local min_year = 2011 -local max_year = 2022 - -forval year = `min_year'/`max_year' { - - twoway__histogram_gen valid_wage_hour if year == `year' , /// - bin(60) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year' = r(max) - - drop d_valid v2 - -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year les_c4 potential_earnings_hourly dgn using /// - "$dir_data/simulated_data.dta", clear - -* Keep only employed individuals -keep if les_c4 == "EmployedOrSelfEmployed" -keep if dgn == "Female" -drop les_c4 dgn - -* Trim outliers -if "$trim_outliers" == "true" { - - sum potential_earnings_hourly, d - - replace potential_earnings_hourly = . if /// - potential_earnings_hourly < r(p1) | potential_earnings_hourly > r(p99) - -} - -append using "$dir_data/temp_valid_stats.dta" - -qui sum year -local min_year = 2011 -local max_year = 2022 - -forval year = `min_year'/`max_year' { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen potential_earnings_hourly if year == `year', /// - bin(60) den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - * Plot all hours - twoway (hist potential_earnings_hourly if year == `year', /// - color(green%30) legend(label(1 "Simulated"))) /// - (hist valid_wage_hour if year == `year', color(red%30) /// - legend(label(2 "Observed"))) , /// - title("Hourly Wage") /// - subtitle("`year', females") /// - name(hourly_wages_`year'_all, replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y', labsize(vsmall)) /// - xtitle("€ per year", size(small)) /// - ytitle(, size(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Statistics calculated on subsample of employed and self-employed individuals. Values in €, 2015 prices.", size(vsmall)) - - graph export /// - "$dir_output_files/wages/validation_${country}_wages_dist_`year'_female.png", /// - replace width(2400) height(1350) - - drop d_sim v1 max_d_sim max_value - -} - - -* Females -* Prepare validation data -use year dwt les_c4 valid_wage_hour dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Keep only employed individuals -keep if les_c4 == 1 -keep if dgn == 1 - -drop if valid_wage_hour == -9 - -drop les_c4 dgn - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_wage_hour, d - - replace valid_wage_hour = . if /// - valid_wage_hour < r(p1) | valid_wage_hour > r(p99) - -} - -* Prepare info needed for dynamic y axis labels -qui sum year -local min_year = 2011 -local max_year = 2022 - -forval year = `min_year'/`max_year' { - - twoway__histogram_gen valid_wage_hour if year == `year' , /// - bin(60) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year' = r(max) - - drop d_valid v2 - -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year les_c4 potential_earnings_hourly dgn using /// - "$dir_data/simulated_data.dta", clear - -* Keep only employed individuals -keep if les_c4 == "EmployedOrSelfEmployed" -keep if dgn == "Male" -drop les_c4 dgn - -* Trim outliers -if "$trim_outliers" == "true" { - - sum potential_earnings_hourly, d - - replace potential_earnings_hourly = . if /// - potential_earnings_hourly < r(p1) | potential_earnings_hourly > r(p99) - -} - -append using "$dir_data/temp_valid_stats.dta" - -qui sum year -local min_year = 2011 -local max_year = 2022 - -forval year = `min_year'/`max_year' { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen potential_earnings_hourly if year == `year', /// - bin(60) den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - * Plot all hours - twoway (hist potential_earnings_hourly if year == `year', /// - color(green%30) legend(label(1 "Simulated"))) /// - (hist valid_wage_hour if year == `year', color(red%30) /// - legend(label(2 "Observed"))) , /// - title("Hourly Wage") /// - subtitle("`year', males") /// - name(hourly_wages_`year'_all, replace) /// - xlabel(,labsize(vsmall) angle(forty_five)) /// - ylabel(0(`steps')`max_y', labsize(vsmall)) /// - xtitle("€ per year", size(small)) /// - ytitle(, size(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: Statistics calculated on subsample of employed and self-employed individuals. Values in €, 2015 prices.", size(vsmall)) - - graph export /// - "$dir_output_files/wages/validation_${country}_wages_dist_`year'_male.png", /// - replace width(2400) height(1350) - - drop d_sim v1 max_d_sim max_value - -} - - -graph drop _all diff --git a/validation/02_simulation_validation/do_files/06_10_plot_hours_worked.do b/validation/02_simulation_validation/do_files/06_10_plot_hours_worked.do deleted file mode 100644 index c0d4009..0000000 --- a/validation/02_simulation_validation/do_files/06_10_plot_hours_worked.do +++ /dev/null @@ -1,507 +0,0 @@ -/******************************************************************************* -* PROJECT: SimPaths EU -* SECTION: Validation -* OBJECT: Hours worked -* AUTHORS: Patryk Bronka, Ashley Burdett -* LAST UPDATE: 11/2025 (AB) -* COUNTRY: Poland -******************************************************************************** -* NOTES: Histrograms by gender restrict hours. -*******************************************************************************/ - -******************************************************************************** -* 1 : Mean values over time -******************************************************************************** -******************************************************************************** -* 1.1 : Mean values over time, All -******************************************************************************** - -* Prepare validation data -use year dwt les_c4 valid_lhw using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Keep only employed individuals -keep if les_c4 == 1 - -collapse (mean) valid_lhw [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year les_c4 hoursworkedweekly using /// - "$dir_data/simulated_data.dta", clear - -rename hoursworkedweekly lhw_sim - -* Keep only employed individuals -keep if les_c4 == "EmployedOrSelfEmployed" - -collapse (mean) lhw_sim, by(run year) - -collapse (mean) lhw_sim /// - (sd) lhw_sim_sd = lhw_sim /// - , by(year) - -foreach varname in lhw_sim { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway (rarea lhw_sim_high lhw_sim_low year, sort color(green%20) /// - legend(label(1 "Simulated"))) /// -(line valid_lhw year, sort color(green) legend(label(2 "SILC"))), /// - title("Average weekly hours worked") /// - xtitle("Year", size(small)) /// - ytitle("Hours per week", size(small)) /// - ylabel(34 [2] 44,labsize(small)) /// - xlabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) /// - note("Note: Statistics calculated on sample of employed and self-employed individuals", /// - size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/hours_worked/validation_${country}_hours_worked_ts_${min_age}_${max_age}_both.jpg", /// - replace width(2560) height(1440) quality(100) - - -******************************************************************************** -* 1.2 : Mean values over time, By gender -******************************************************************************** - -* Prepare validation data -use year dwt les_c4 valid_lhw dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if dgn == 1 - -* Keep only employed individuals -keep if les_c4 == 1 - -collapse (mean) valid_lhw [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year les_c4 hoursworkedweekly dgn using /// - "$dir_data/simulated_data.dta", clear - -rename hoursworkedweekly lhw - -keep if dgn == "Male" - -* Keep only employed individuals -keep if les_c4 == "EmployedOrSelfEmployed" - -collapse (mean) lhw, by(run year) - -collapse (mean) lhw /// - (sd) lhw_sd = lhw /// - , by(year) - -foreach varname in lhw { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway (rarea lhw_high lhw_low year, sort color(green%20) /// - legend(label(1 "Simulated"))) /// -(line valid_lhw year, sort color(green) legend(label(2 "SILC"))), /// - title("Average weekly hours worked") /// - subtitle("Males") /// - xtitle("Year", size(small)) /// - ytitle("Hours per week", size(small)) /// - ylabel(34 [2] 44,labsize(small)) /// - xlabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) /// - note("Note: Statistics calculated on sample of employed and self-employed individuals", /// - size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/hours_worked/validation_${country}_hours_worked_ts_${min_age}_${max_age}_male.jpg", /// - replace width(2560) height(1440) quality(100) - - -* Females - -* Prepare validation data -use year dwt les_c4 valid_lhw dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if dgn == 0 - -* Keep only employed individuals -keep if les_c4 == 1 - -collapse (mean) valid_lhw [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year les_c4 hoursworkedweekly dgn using /// - "$dir_data/simulated_data.dta", clear - -rename hoursworkedweekly lhw - -keep if dgn == "Female" - -* Keep only employed individuals -keep if les_c4 == "EmployedOrSelfEmployed" - -collapse (mean) lhw, by(run year) - -collapse (mean) lhw /// - (sd) lhw_sd = lhw /// - , by(year) - -foreach varname in lhw { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway (rarea lhw_high lhw_low year, sort color(green%20) /// - legend(label(1 "Simulated"))) /// -(line valid_lhw year, sort color(green) legend(label(2 "SILC"))), /// - title("Average weekly hours worked") /// - subtitle("Females") /// - xtitle("Year", size(small)) /// - ytitle("Hours per week", size(small)) /// - ylabel(34 [2] 44,labsize(small)) /// - xlabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) /// - note("Note: Statistics calculated on sample of employed and self-employed individuals", /// - size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/hours_worked/validation_${country}_hours_worked_ts_${min_age}_${max_age}_female.jpg", /// - replace width(2560) height(1440) quality(100) - - -******************************************************************************** -* 2 : Histograms -******************************************************************************** -******************************************************************************** -* 2.1 : Histograms, By Year, All -******************************************************************************** - -* Prepare validation data -use year dwt les_c4 valid_lhw using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Keep only employed individuals -keep if les_c4 == 1 - -* Prepare info needed for dynamic y axis labels -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - twoway__histogram_gen valid_lhw if year == `year' , /// - bin(60) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year' = r(max) - - drop d_valid v2 - -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run idperson year les_c4 hoursworkedweekly using /// - "$dir_data/simulated_data.dta", clear - -rename hoursworkedweekly lhw - -* Keep only employed individuals -keep if les_c4 == "EmployedOrSelfEmployed" - -collapse (mean) lhw, by(idperson year) - -append using "$dir_data/temp_valid_stats.dta" - -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen lhw if year == `year', bin(60) /// - den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - twoway (hist lhw if year == `year' /*& lhw <= 65*/, width(1) color(green%20) /// - legend(label(1 "Simulated"))) /// - (hist valid_lhw if year == `year' /*& valid_lhw <= 65*/, width(1) color(red%20) /// - legend(label(2 "SILC"))), /// - title("Weekly hours worked") /// - subtitle("`year'") /// - xtitle("Hours", size(small)) /// - ytitle(, size(small)) /// - xlabel(,labsize(small)) /// - ylabel(0(`steps')`max_y', labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) /// - note("Note: Statistics calculated on sample of employed and self-employed individuals. SILC hours unrestricted.", /// - size(vsmall)) - - graph export /// - "$dir_output_files/hours_worked/validation_${country}_hours_worked_hist_`year'_unrestricted.png", /// - replace width(2400) height(1350) - - drop d_sim v1 max_d_sim max_value -} - - -forval year = `min_year'/`max_year' { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen lhw if year == `year', bin(60) /// - den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - twoway (hist lhw if year == `year' & lhw <= 65, width(1) color(green%20) /// - legend(label(1 "Simulated"))) /// - (hist valid_lhw if year == `year' & valid_lhw <= 65, width(1) color(red%20) /// - legend(label(2 "SILC"))), /// - title("Weekly hours worked") /// - subtitle("`year'") /// - xtitle("Hours") /// - xlabel(,labsize(small)) /// - ylabel(0(`steps')`max_y', labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Note: Statistics calculated on sample of employed and self-employed individuals. Hours restricted to <= 65 per week.", /// - size(vsmall)) - - graph export /// - "$dir_output_files/hours_worked/validation_${country}_hours_worked_hist_`year'.png", /// - replace width(2400) height(1350) - - drop d_sim v1 max_d_sim max_value -} - - -******************************************************************************** -* 2.1 : Histograms, By Year, By gender -******************************************************************************** - -* Female -* Prepare validation data -use year dwt les_c4 valid_lhw dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Keep only employed individuals -keep if les_c4 == 1 -keep if dgn == 0 - -* Prepare info needed for dynamic y axis labels -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - twoway__histogram_gen valid_lhw if year == `year' , /// - bin(60) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year' = r(max) - - drop d_valid v2 - -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run idperson year les_c4 dgn hoursworkedweekly using /// - "$dir_data/simulated_data.dta", clear - -rename hoursworkedweekly lhw - -* Keep only employed individuals -keep if les_c4 == "EmployedOrSelfEmployed" -keep if dgn == "Female" - -collapse (mean) lhw, by(idperson year) - -append using "$dir_data/temp_valid_stats.dta" - -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen lhw if year == `year', bin(60) /// - den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - twoway (hist lhw if year == `year' /*& lhw <= 65*/, width(1) /// - color(green%20) legend(label(1 "Simulated"))) /// - (hist valid_lhw if year == `year' /*& valid_lhw <= 65*/, width(1) color(red%20) /// - legend(label(2 "SILC"))), /// - title("Weekly hours worked") /// - subtitle("`year', females") /// - xtitle("Hours", size(small)) /// - ytitle(, size(small)) /// - xlabel(,labsize(small)) /// - ylabel(0(`steps')`max_y', labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) /// - note("Note: Statistics calculated on sample of employed and self-employed individuals", /// - size(vsmall)) - - graph export /// - "$dir_output_files/hours_worked/validation_${country}_hours_worked_hist_`year'_female.png", /// - replace width(2400) height(1350) - - drop d_sim v1 max_d_sim max_value -} - - -* Male -* Prepare validation data -use year dwt les_c4 valid_lhw dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Keep only employed individuals -keep if les_c4 == 1 -keep if dgn == 1 - -* Prepare info needed for dynamic y axis labels -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - twoway__histogram_gen valid_lhw if year == `year' , /// - bin(60) den gen(d_valid v2) - - qui sum d_valid - gen max_d_valid_`year' = r(max) - - drop d_valid v2 - -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run idperson year les_c4 dgn hoursworkedweekly using /// - "$dir_data/simulated_data.dta", clear - -rename hoursworkedweekly lhw - -* Keep only employed individuals -keep if les_c4 == "EmployedOrSelfEmployed" -keep if dgn == "Male" - -collapse (mean) lhw, by(idperson year) - -append using "$dir_data/temp_valid_stats.dta" - -qui sum year -local min_year = 2011 -local max_year = r(max) - -forval year = `min_year'/`max_year' { - - * Prepare info needed for dynamic y axis labels - twoway__histogram_gen lhw if year == `year', bin(60) /// - den gen(d_sim v1) - - qui sum d_sim - gen max_d_sim = r(max) - - gen max_value = max_d_valid_`year' if max_d_valid_`year' > max_d_sim - replace max_value = max_d_sim if max_value == . - - sum max_value - local max_y = 1.25*r(max) - local steps = `max_y'/2 - - twoway (hist lhw if year == `year' /*& lhw <= 65*/, width(1) /// - color(green%20) legend(label(1 "Simulated"))) /// - (hist valid_lhw if year == `year' /*& lhw <= 65*/, width(1) color(red%20) /// - legend(label(2 "SILC"))), /// - title("Weekly hours worked") /// - subtitle("`year', males") /// - xtitle("Hours", size(small)) /// - ytitle(, size(small)) /// - xlabel(,labsize(small)) /// - ylabel(0(`steps')`max_y', labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) /// - note("Note: Statistics calculated on sample of employed and self-employed individuals", /// - size(vsmall)) - - graph export /// - "$dir_output_files/hours_worked/validation_${country}_hours_worked_hist_`year'_male.png", /// - replace width(2400) height(1350) - - drop d_sim v1 max_d_sim max_value -} - - -graph drop _all diff --git a/validation/02_simulation_validation/do_files/06_11_plot_income_shares.do b/validation/02_simulation_validation/do_files/06_11_plot_income_shares.do deleted file mode 100644 index 84ffffe..0000000 --- a/validation/02_simulation_validation/do_files/06_11_plot_income_shares.do +++ /dev/null @@ -1,457 +0,0 @@ -/******************************************************************************* -* PROJECT: SimPaths EU -* SECTION: Validation -* OBJECT: Income shares -* AUTHORS: Patryk Bronka, Ashley Burdett -* LAST UPDATE: 11/2025 (AB) -* COUNTRY: Poland -******************************************************************************** -* NOTES: This do file plots simulated and observed income shares and -* incomes by deciles of gross income -* Altered pension age to 65 -*******************************************************************************/ - -******************************************************************************** -* Observed -******************************************************************************** - -use year dwt max_age_in_bu gross_labour_income_share_bu /// - pension_income_share_bu capital_income_share_bu social_income_share_bu /// - net_income_share_bu gross_income_bu net_income_bu gross_labour_income_bu /// - pension_income_bu capital_income_bu social_income_bu /// - gross_income_bu_jittered using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -xtile obs_gross_income_group = gross_income_bu, nq(10) -//xtile obs_gross_income_group = gross_income_bu_jittered, nq(10) - - -* All ages - -* Graph income shares (gross) -graph bar (mean) gross_labour_income_share_bu (mean) /// - pension_income_share_bu (mean) capital_income_share_bu [aweight = dwt], /// - over(obs_gross_income_group) stack title(`"Observed (all ages)"', /// - size(medium)) /// - legend(order(1 "Labour" 2 "Pension" 3 "Capital") position(6) rows(1)) /// - name(observed_income_shares, replace) b1title("Decile", size(small)) /// - ytitle("Share", size(small)) ylabel(0 [0.5] 1) /// - graphregion(color(white)) - -* Graph income level (gross) -graph bar (mean) gross_labour_income_bu (mean) pension_income_bu (mean) /// - capital_income_bu [aweight = dwt], /// - over(obs_gross_income_group) stack title(`"Observed (all ages)"', /// - size(medium)) /// - legend(order(1 "Labour" 2 "Pension" 3 "Capital") position(6) rows(1)) /// - name(observed_incomes, replace) ylabel(0 [100000] 100000) /// - ytitle("€", size(small)) b1title("Decile", size(small)) /// - graphregion(color(white)) - -* Graph share of net income in gross income -graph bar (mean) net_income_share_bu [aweight = dwt], /// - over(obs_gross_income_group) stack title(`"Observed (all ages)"', /// - size(medium)) legend(order(1 "Net income") position(6) rows(1)) /// - name(observed_net_income_shares, replace) ytitle("Share", size(small)) /// - b1title("Decile", size(small)) /// - graphregion(color(white)) - -* Graph income level (net) -graph bar (mean) net_income_bu [aweight = dwt], /// - over(obs_gross_income_group) /// - stack title(`"Observed (all ages)"', size(medium)) /// - legend(order(1 "Net income") position(6) rows(1)) /// - name(observed_net_income, replace) ylabel(0 [80000] 80000) /// - ytitle("€", size(small)) b1title("Decile", size(small)) legend(off) /// - graphregion(color(white)) - - -* Oldest person above65 (pension age) - -preserve -drop obs_gross_income_group -keep if max_age_in_bu >= 65 -xtile obs_gross_income_group = gross_income_bu, nq(10) - -* Graph income shares (gross) -graph bar (mean) gross_labour_income_share_bu (mean) /// - pension_income_share_bu (mean) capital_income_share_bu [pweight = dwt], /// - over(obs_gross_income_group) stack title(`"Observed (oldest age >=65)"', /// - size(medium)) /// - legend(order(1 "Labour" 2 "Pension" 3 "Capital") position(6) rows(1)) /// - name(observed_income_shares_o68, replace) /// - b1title("Decile", size(small)) /// - ytitle("Share", size(small)) ylabel(0 [0.5] 1) /// - graphregion(color(white)) - -* Graph income level (gross) -graph bar (mean) gross_labour_income_bu (mean) pension_income_bu /// - (mean) capital_income_bu [pweight = dwt], /// - over(obs_gross_income_group) /// - stack title(`"Observed (oldest age >=65)"', size(medium)) /// - legend(order(1 "Labour" 2 "Pension" 3 "Capital") position(6) rows(1)) /// - name(observed_incomes_o68, replace) ylabel(0 [100000] 100000) /// - ytitle("€") b1title("Decile", size(small)) /// - graphregion(color(white)) - -* Graph share of net income in gross income -graph bar (mean) net_income_share_bu [pweight = dwt], /// - over(obs_gross_income_group) /// - stack title(`"Observed (oldest age >=65)"', size(medium)) /// - legend(order(1 "Net income") position(6) rows(1)) /// - name(observed_net_income_shares_o68, replace) /// - b1title("Decile", size(small)) ytitle("Share", size(small)) /// - graphregion(color(white)) - //ylabel(0 [0.5] 1) - -// Graph income level (net) -graph bar (mean) net_income_bu [pweight = dwt], /// - over(obs_gross_income_group) /// - stack title(`"Observed (oldest age >=65)"', size(medium)) /// - legend(order(1 "Net income") position(6) rows(1)) /// - name(observed_net_income_o68, replace) ylabel(0 [80000] 80000) /// - ytitle("£") b1title("Decile", size(small)) legend(off) /// - graphregion(color(white)) - - -* Oldest person below65 (pension age) - -restore -drop obs_gross_income_group -keep if max_age_in_bu <65 -xtile obs_gross_income_group = gross_income_bu, nq(10) - -* Graph income shares (gross) -graph bar (mean) gross_labour_income_share_bu (mean) /// - pension_income_share_bu (mean) capital_income_share_bu [pweight = dwt], /// - over(obs_gross_income_group) /// - stack title(`"Observed (oldest age <65)"', size(medium)) /// - legend(order(1 "Labour" 2 "Pension" 3 "Capital") position(6) rows(1)) /// - name(observed_income_shares_u68, replace) /// - b1title("Decile", size(small)) ytitle("Share", size(small)) /// - ylabel(0 [0.5] 1) /// - graphregion(color(white)) - -* Graph income level (gross) -graph bar (mean) gross_labour_income_bu (mean) pension_income_bu /// - (mean) capital_income_bu [pweight = dwt], /// - over(obs_gross_income_group) /// - stack title(`"Observed (oldest age <65)"', size(medium)) /// - legend(order(1 "Labour" 2 "Pension" 3 "Capital") position(6) rows(1)) /// - name(observed_incomes_u68, replace) ylabel(0 [100000] 100000) /// - ytitle("£") b1title("Decile", size(small)) /// - graphregion(color(white)) - -* Graph share of net income in gross income -graph bar (mean) net_income_share_bu [pweight = dwt], /// - over(obs_gross_income_group) /// - stack title(`"Observed (oldest age <65)"', size(medium)) /// - legend(order(1 "Net income") position(6) rows(1)) /// - name(observed_net_income_shares_u68, replace) /// - b1title("Decile", size(small)) ytitle("Share", size(small)) /// - graphregion(color(white)) - -* Graph income level (net) -graph bar (mean) net_income_bu [pweight = dwt], /// - over(obs_gross_income_group) /// - stack title(`"Observed (oldest age <65)"', size(medium)) /// - legend(order(1 "Net income") position(6) rows(1)) /// - name(observed_net_income_u68, replace) ylabel(0 [80000] 80000) /// - ytitle("€") b1title("Decile", size(small)) legend(off) /// - graphregion(color(white)) - - -******************************************************************************** -* Simulated -******************************************************************************** - -* Load simulated data -use run year idperson max_age_in_bu sim_yplgrs_dv_lvl_bu sim_ypnoab_lvl_bu /// - sim_ypncp_lvl_bu sim_y_disp_yr_bu sim_y_gross_yr_bu using /// - "$dir_data/simulated_data.dta", clear - -gen calc_bu_gross_income = sim_yplgrs_dv_lvl_bu + sim_ypncp_lvl_bu + /// - sim_ypnoab_lvl_bu - -collapse max_age_in_bu sim_yplgrs_dv_lvl_bu sim_ypnoab_lvl_bu /// - sim_ypncp_lvl_bu sim_y_disp_yr_bu sim_y_gross_yr_bu /// - calc_bu_gross_income, by(idperson year) - -* Income shares: -gen gross_labour_income_share_bu = sim_yplgrs_dv_lvl_bu / calc_bu_gross_income -gen pension_income_share_bu = sim_ypnoab_lvl_bu / calc_bu_gross_income -gen capital_income_share_bu = sim_ypncp_lvl_bu / calc_bu_gross_income -gen net_income_share_bu = sim_y_disp_yr_bu / calc_bu_gross_income -replace net_income_share_bu = . if net_income_share_bu >= 50 - -xtile sim_gross_income_group = calc_bu_gross_income, nq(10) - -* All ages - -* Graph income shares (gross) -graph bar (mean) gross_labour_income_share_bu (mean) pension_income_share_bu /// - (mean) capital_income_share_bu, over(sim_gross_income_group) /// - stack title(`"Simulated (all ages)"', size(medium)) /// - legend(order(1 "Labour" 2 "Pension" 3 "Capital") /// - position(6) rows(1)) name(simulated_income_shares, replace) /// - b1title("Decile", size(small)) ytitle("Share", size(small)) /// - ylabel(0 [0.5] 1) /// - graphregion(color(white)) - -* Graph income level (gross) -graph bar (mean) sim_yplgrs_dv_lvl_bu (mean) sim_ypnoab_lvl_bu /// - (mean) sim_ypncp_lvl_bu, over(sim_gross_income_group) /// - stack title(`"Simulated (all ages)"', size(medium)) /// - legend(order(1 "Labour" 2 "Pension" 3 "Capital") /// - position(6) rows(1)) name(simulated_incomes, replace) /// - ylabel(0 [100000] 100000) ytitle("€") b1title("Decile", size(small)) /// - graphregion(color(white)) - -* Graph share of net income in gross income -graph bar (mean) net_income_share_bu, over(sim_gross_income_group) /// - stack title(`"Simulated (all ages)"', size(medium)) /// - legend(order(1 "Net income") position(6) rows(1)) /// - name(simulated_net_income_shares, replace) /// - b1title("Decile", size(small)) /// - graphregion(color(white)) - -* Graph income level (net) -graph bar (mean) sim_y_disp_yr_bu, over(sim_gross_income_group) /// - stack title(`"Simulated (all ages)"', size(medium)) /// - legend(order(1 "Net income") position(6) rows(1)) /// - legend(off) name(simulated_net_income, replace) /// - ylabel(0 [80000] 80000) ytitle("€") /// - b1title("Decile", size(small)) ytitle("€") /// - graphregion(color(white)) - - -* Oldest person above65 (pension age) - -preserve -drop sim_gross_income_group -keep if max_age_in_bu >= 65 -xtile sim_gross_income_group = calc_bu_gross_income, nq(10) - -* Graph income shares (gross) -graph bar (mean) gross_labour_income_share_bu (mean) pension_income_share_bu /// - (mean) capital_income_share_bu, over(sim_gross_income_group) /// - stack title(`"Simulated (oldest age >=65)"', size(medium)) /// - legend(order(1 "Labour" 2 "Pension" 3 "Capital") position(6) rows(1)) /// - name(simulated_income_shares_o68, replace) /// - b1title("Decile", size(small)) ytitle("Share", size(small)) /// - ylabel(0 [0.5] 1) /// - graphregion(color(white)) - -* Graph income level (gross) -graph bar (mean) sim_yplgrs_dv_lvl_bu (mean) sim_ypnoab_lvl_bu /// - (mean) sim_ypncp_lvl_bu, over(sim_gross_income_group) /// - stack title(`"Simulated (oldest age >=65)"', size(medium)) /// - legend(order(1 "Labour" 2 "Pension" 3 "Capital") position(6) rows(1)) /// - name(simulated_incomes_o68, replace) ylabel(0 [100000] 100000) /// - ytitle("€") b1title("Decile", size(small)) /// - graphregion(color(white)) - -* Graph share of net income in gross income -graph bar (mean) net_income_share_bu, over(sim_gross_income_group) /// - stack title(`"Simulated (oldest age >=65)"', size(medium)) /// - legend(order(1 "Net income") position(6) rows(1)) /// - name(simulated_net_income_shares_o68, replace) /// - b1title("Decile", size(small)) /// - ytitle("Share", size(small)) /// - graphregion(color(white)) - -* Graph income level (net) -graph bar (mean) sim_y_disp_yr_bu, over(sim_gross_income_group) /// - stack title(`"Simulated (oldest age >=65)"', size(medium)) /// - legend(order(1 "Net income") position(6) rows(1)) /// - name(simulated_net_income_o68, replace) ylabel(0 [80000] 80000) /// - ytitle("€") b1title("Decile", size(small)) legend(off) /// - graphregion(color(white)) - - -* Oldest person below 68 (pension age) - -restore -drop sim_gross_income_group -keep if max_age_in_bu < 65 -xtile sim_gross_income_group = calc_bu_gross_income, nq(10) - -* Graph income shares (gross) -graph bar (mean) gross_labour_income_share_bu (mean) /// - pension_income_share_bu (mean) capital_income_share_bu, /// - over(sim_gross_income_group) /// - stack title(`"Simulated (oldest age <65)"', size(medium)) /// - legend(order(1 "Labour" 2 "Pension" 3 "Capital") /// - position(6) rows(1)) name(simulated_income_shares_u68, replace) /// - b1title("Decile", size(small)) ytitle("Share", size(small)) /// - ylabel(0 [0.5] 1) /// - graphregion(color(white)) - -* Graph income level (gross) -graph bar (mean) sim_yplgrs_dv_lvl_bu (mean) sim_ypnoab_lvl_bu /// - (mean) sim_ypncp_lvl_bu, over(sim_gross_income_group) stack /// - title(`"Observed (oldest age <65)"', size(medium)) /// - legend(order(1 "Labour" 2 "Pension" 3 "Capital") position(6) rows(1)) /// - name(simulated_incomes_u68, replace) ylabel(0 [100000] 100000) /// - ytitle("") b1title("Decile", size(small)) /// - graphregion(color(white)) - -* Graph share of net income in gross income -graph bar (mean) net_income_share_bu, over(sim_gross_income_group) stack /// - title(`"Observed (oldest age <65)"', size(medium)) /// - legend(order(1 "Net income") position(6) rows(1)) /// - name(simulated_net_income_shares_u68, replace) /// - b1title("Decile", size(small)) ytitle("Share", size(small)) /// - graphregion(color(white)) - -* Graph income level (net) -graph bar (mean) sim_y_disp_yr_bu, over(sim_gross_income_group) /// - stack title(`"Observed (oldest age <65)"', /// - size(medium)) legend(order(1 "Net income") /// - position(6) rows(1)) legend(off) name(simulated_net_income_u68, replace) /// - ylabel(0 [80000] 80000) ytitle("€") b1title("Decile", size(small)) /// - graphregion(color(white)) - - -******************************************************************************** -* Combine graphs -******************************************************************************** - -* Simulated and observed gross income shares, -* all / above pension age / below pension age -grc1leg simulated_income_shares observed_income_shares /// - simulated_income_shares_o68 observed_income_shares_o68 /// - simulated_income_shares_u68 observed_income_shares_u68, /// - legendfrom(observed_income_shares) rows(3) /// - graphregion(color(white)) /// - title("Gross income shares") /// - subtitle("By age and benefit unit gross income decile") /// - note("Notes: Statistics computed at the benefit unit level. Based on values in € per year (2015 prices).", /// - size(vsmall)) - -graph export "$dir_output_files/income/validation_${country}_income_shares.png", /// - replace width(2400) height(1350) - - -* Simulated and observed income levels, -* all / above pension age / below pension age -grc1leg simulated_incomes observed_incomes simulated_incomes_o68 /// - observed_incomes_o68 simulated_incomes_u68 observed_incomes_u68, /// - legendfrom(observed_incomes) rows(3) /// - graphregion(color(white)) /// - subtitle("Gross income sources, by age and ben unit gross income decile") /// - note("Statistics computed at the benefit unit level. Values in € per year (2015 prices).", /// - size(vsmall)) - -graph export /// - "$dir_output_files/income/validation_${country}_combined_income_levels.png", /// - replace width(2400) height(1350) - -* Simulated and observed net income share in gross, -* all / above pension age / below pension age -grc1leg simulated_net_income_shares observed_net_income_shares /// - simulated_net_income_shares_o68 observed_net_income_shares_o68 /// - simulated_net_income_shares_u68 observed_net_income_shares_u68, /// - legendfrom(observed_net_income_shares) rows(3) /// - graphregion(color(white)) /// - subtitle("Net income shares, by age and ben unit gross income decile") /// - note("Statistics computed at the benefit unit level. Based on values in € per year (2015 prices).", /// - size(vsmall)) - -//graph export "$dir_output_files/combined_net_income_share.jpg", /// -// replace width(2560) height(1440) quality(100) - -// Simulated and observed net income level, -* all / above pension age / below pension age -grc1leg simulated_net_income observed_net_income simulated_net_income_o68 /// - observed_net_income_o68 simulated_net_income_u68 /// - observed_net_income_u68, rows(3) ycommon /// - subtitle("Net income, by age and ben unit gross income decile") /// - legendfrom(simulated_net_income) /// - graphregion(color(white)) /// - note("Statistics computed at the benefit unit level. Values in € per year (2015 prices).", /// - size(vsmall)) - -graph export /// - "$dir_output_files/income/validation_${country}_combined_net_income_levels.png", /// - replace width(2400) height(1350) - - -graph drop _all - - -/* -** Investigate components of capital income fpr third decile - -use "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -// how many BU have no gross income? -sum gross_income_bu -count if gross_income_bu == 0 - // 65,775 obs have no gross income from labour, capital or private pension - // 20% of bs have no gross income - -cap drop obs_gross_income_group -xtile obs_gross_income_group = gross_income_bu, nq(10) - -//26% of BU have no gross income -sum hy080g_pc if obs_gross_income_group == 3, de //inter-hh transfers 90% pop -sum hy110g_pc if obs_gross_income_group == 3, de //child income 95% pop -sum hy040g_pc if obs_gross_income_group == 3, de //property income 99% pop -sum hy090g_pc if obs_gross_income_group == 3, de //cap investments 99% -// inter hh transfer and child income are the largest sources - -sum py010g if obs_gross_income_group == 3, de //wages 90% pop -sum py050g if obs_gross_income_group == 3, de //self emp 90% pop -// most don't have labour income - -// pension income small for all - - -foreach var in hy080g_pc hy110g_pc hy040g_pc hy090g_pc py010g py050g { - - gen d_`var' = (`var' != 0) - -} - -tab d_hy080g_pc if obs_gross_income_group == 3 // 35% inter-hh transfers -tab d_hy110g_pc if obs_gross_income_group == 3 // 13% child income -tab d_hy040g_pc if obs_gross_income_group == 3 // 6% property income -tab d_hy090g_pc if obs_gross_income_group == 3 // 7% capital investments -tab d_py010g if obs_gross_income_group == 3 // 16% wages -tab d_py050g if obs_gross_income_group == 3 // 17% self employment - -gen ind_work_income = (d_py010g == 1 | d_py050g == 1) -tab ind_work_income if obs_gross_income_group == 3 - -// => 68% report no income from work - - - -tab hhsize if obs_gross_income_group == 3 - -/* - hhsize | Freq. Percent Cum. -------------+----------------------------------- - 1 | 747 8.93 8.93 - 2 | 1,624 19.41 28.33 - 3 | 1,848 22.08 50.42 - 4 | 1,660 19.84 70.26 - 5 | 1,168 13.96 84.21 - 6 | 739 8.83 93.04 - 7 | 343 4.10 97.14 - 8 | 130 1.55 98.70 - 9 | 44 0.53 99.22 - 10 | 47 0.56 99.78 - 11 | 12 0.14 99.93 - 12 | 4 0.05 99.98 - 15 | 2 0.02 100.00 */ - -* Age -histogram dag if obs_gross_income_group == 3 // U-shaped - -* Activity -tab les_c3 if obs_gross_income_group == 3 // 57% not employed, 19% students - - -graph drop _all diff --git a/validation/02_simulation_validation/do_files/06_12_plot_partnership_status.do b/validation/02_simulation_validation/do_files/06_12_plot_partnership_status.do deleted file mode 100644 index 0e0e13c..0000000 --- a/validation/02_simulation_validation/do_files/06_12_plot_partnership_status.do +++ /dev/null @@ -1,377 +0,0 @@ -/******************************************************************************* -* PROJECT: SimPaths EU -* SECTION: Validation -* OBJECT: Partnership -* AUTHORS: Patryk Bronka, Ashley Burdett -* LAST UPDATE: 06/2025 (AB) -* COUNTRY: Poland -******************************************************************************** -* NOTES: -*******************************************************************************/ - -******************************************************************************** -* 1 : Mean values over time -******************************************************************************** -******************************************************************************** -* 1.1 : Mean values over time, All -******************************************************************************** - -* Prepare validation data -use year dwt valid_dcpst_p valid_dcpst_s using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - - -collapse (mean) valid_dcpst_p valid_dcpst_s [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year sim_dcpst_p sim_dcpst_s sim_has_partner using /// - "$dir_data/simulated_data.dta", clear - -collapse (mean) sim_dcpst_p sim_dcpst_s sim_has_partner, by(run year) - -collapse (mean) sim_dcpst_p sim_dcpst_s sim_has_partner /// - (sd) sim_dcpst_p_sd = sim_dcpst_p /// - sim_dcpst_s_sd = sim_dcpst_s /// - sim_has_partner_sd = sim_has_partner /// - , by(year) - -foreach varname in sim_dcpst_p sim_dcpst_s sim_has_partner { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway (rarea sim_dcpst_p_high sim_dcpst_p_low year, sort color(green%20) /// - legend(label(1 "Simulated"))) /// -(line valid_dcpst_p year, sort color(green) /// - legend(label(2 "SILC "))), /// - title("Partnered") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - ylabel(0.5[0.1]0.7, labsize(small)) /// - xlabel(,labsize(small)) /// - graphregion(color(white)) /// - note("Notes: Sample contains all individual ages ${min_age}-${max_age}.", size(vsmall)) - -* Save figure -graph export /// -"$dir_output_files/partnership/validation_${country}_partnered_ts_${min_age}_${max_age}_both.jpg", /// - replace width(2400) height(1350) quality(100) - - -twoway (rarea sim_dcpst_p_high sim_dcpst_p_low year, sort color(green%20) /// - legend(label(1 "Partnered, simulated"))) /// -(line valid_dcpst_p year, sort color(green) /// - legend(label(2 "Partnered, SILC "))) /// -(rarea sim_dcpst_s_high sim_dcpst_s_low year, sort color(red%20) /// - legend(label(3 "Single, simulated"))) /// -(line valid_dcpst_s year, sort color(red) /// - legend(label(4 "Single, SILC "))), /// - title("Partnership Status") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - ylabel(0.5[0.1]0.7, labsize(small)) /// - xlabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) /// - note("Notes: Sample contains all individual ages ${min_age}-${max_age}.", size(vsmall)) - -graph export /// -"$dir_output_files/partnership/validation_${country}_partnership_ts_${min_age}_${max_age}_both.jpg", /// - replace width(2400) height(1350) quality(100) - -graph drop _all - -******************************************************************************** -* 1.2 : Mean values over time, All, By children -******************************************************************************** - -* Load validation data -use year dwt valid_dcpst_p_children_0 valid_dcpst_p_children_1 /// - valid_dcpst_p_children_2 valid_dcpst_p_children_3p /// - valid_dcpst_s_children_0 valid_dcpst_s_children_1 /// - valid_dcpst_s_children_2 valid_dcpst_s_children_3p /// - using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -collapse (mean) valid_dcpst_p_children_0 valid_dcpst_p_children_1 /// - valid_dcpst_p_children_2 valid_dcpst_p_children_3p /// - valid_dcpst_s_children_0 valid_dcpst_s_children_1 /// - valid_dcpst_s_children_2 valid_dcpst_s_children_3p /// - [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - -* Load simulated data -use run year sim_dcpst_p_children_0 sim_dcpst_p_children_1 /// - sim_dcpst_p_children_2 sim_dcpst_p_children_3p sim_dcpst_s_children_0 /// - sim_dcpst_s_children_1 sim_dcpst_s_children_2 /// - sim_dcpst_s_children_3p /// - using "$dir_data/simulated_data.dta", clear - -collapse (mean) sim_dcpst_p_children_0 sim_dcpst_p_children_1 /// - sim_dcpst_p_children_2 sim_dcpst_p_children_3p /// - sim_dcpst_s_children_0 sim_dcpst_s_children_1 /// - sim_dcpst_s_children_2 sim_dcpst_s_children_3p, /// - by(run year) - -collapse (mean) sim_dcpst_p_children_0 sim_dcpst_p_children_1 /// - sim_dcpst_p_children_2 sim_dcpst_p_children_3p sim_dcpst_s_children_0 /// - sim_dcpst_s_children_1 sim_dcpst_s_children_2 /// - sim_dcpst_s_children_3p /// - (sd) sim_dcpst_p_children_0_sd = sim_dcpst_p_children_0 /// - sim_dcpst_p_children_1_sd = sim_dcpst_p_children_1 /// - sim_dcpst_p_children_2_sd = sim_dcpst_p_children_2 /// - sim_dcpst_p_children_3p_sd = sim_dcpst_p_children_3p /// - sim_dcpst_s_children_0_sd = sim_dcpst_s_children_0 /// - sim_dcpst_s_children_1_sd = sim_dcpst_s_children_1 /// - sim_dcpst_s_children_2_sd = sim_dcpst_s_children_2 /// - sim_dcpst_s_children_3p_sd = sim_dcpst_s_children_3 /// - , by(year) - -foreach varname in sim_dcpst_p_children_0 sim_dcpst_p_children_1 sim_dcpst_p_children_2 sim_dcpst_p_children_3p sim_dcpst_s_children_0 sim_dcpst_s_children_1 sim_dcpst_s_children_2 sim_dcpst_s_children_3p { - - gen `varname'_h = `varname' + 1.96*`varname'_sd - gen `varname'_l = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figures - -// Labels of simulated variables are used as titles for the graphs below -label var sim_dcpst_p_children_0 "Partnered, no children" -label var sim_dcpst_p_children_1 "Partnered, 1 child" -label var sim_dcpst_p_children_2 "Partnered, 2 children" -label var sim_dcpst_p_children_3p "Partnered, 3+ children" -label var sim_dcpst_s_children_0 "Not partnered, no children" -label var sim_dcpst_s_children_1 "Not partnered, 1 child" -label var sim_dcpst_s_children_2 "Not partnered, 2 children" -label var sim_dcpst_s_children_3p "Not partnered, 3+ children" - -foreach varname in dcpst_p_children_0 dcpst_p_children_1 dcpst_p_children_2 dcpst_p_children_3p dcpst_s_children_0 dcpst_s_children_1 dcpst_s_children_2 dcpst_s_children_3p { - - local vtext : variable label sim_`varname' - if `"`vtext'"' == "" local vtext "sim_`varname'" - twoway (rarea sim_`varname'_h sim_`varname'_l year, sort color(green%20) /// - legend(label(1 "Simulated") position(6) rows(1))) /// - (line valid_`varname' year, sort color(green) /// - legend(label(2 "SILC"))), /// - subtitle("`vtext'") /// - name(`varname', replace) /// - ytitle("Share", size(small)) /// - xtitle("". size(small)) /// - ylabel(,labsize(small)) /// - xlabel(,labsize(small)) /// - graphregion(color(white)) - -} - -grc1leg dcpst_p_children_0 dcpst_p_children_1 dcpst_p_children_2 /// - dcpst_p_children_3p dcpst_s_children_0 dcpst_s_children_1 /// - dcpst_s_children_2 dcpst_s_children_3p, /// -title("Partnership and Number of Children") /// - legendfrom(dcpst_p_children_0) /// - graphregion(color(white)) /// - note("Notes: Samples contains all individual ages 18-65. ", size(vsmall)) - -graph export /// -"$dir_output_files/partnership/validation_${country}_partnership_children_ts_${min_age}_${max_age}_both.jpg", /// - replace width(2400) height(1350) quality(100) - - - -******************************************************************************** -* 1.3 : Mean values over time, All, By age group -******************************************************************************** - -* Those in their 20s - -* Validation data -use year dwt valid_dcpst_p valid_dcpst_s ageGroup using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if ageGroup == 2 | ageGroup == 3 - -collapse (mean) valid_dcpst_p valid_dcpst_s [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year sim_dcpst_p sim_dcpst_s sim_has_partner ageGroup using /// - "$dir_data/simulated_data.dta", clear - -keep if ageGroup == 2 | ageGroup == 3 - -collapse (mean) sim_dcpst_p sim_dcpst_s sim_has_partner, by(run year) - -collapse (mean) sim_dcpst_p sim_dcpst_s sim_has_partner /// - (sd) sim_dcpst_p_sd = sim_dcpst_p /// - sim_dcpst_s_sd = sim_dcpst_s /// - sim_has_partner_sd = sim_has_partner /// - , by(year) - -foreach varname in sim_dcpst_p sim_dcpst_s sim_has_partner { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway (rarea sim_dcpst_p_high sim_dcpst_p_low year, sort color(green%20) /// - legend(label(1 "Partnered, simulated"))) /// -(line valid_dcpst_p year, sort color(green) /// - legend(label(2 "Partnered, SILC "))) /// -(rarea sim_dcpst_s_high sim_dcpst_s_low year, sort color(red%20) /// - legend(label(3 "Single, simulated"))) /// -(line valid_dcpst_s year, sort color(red) /// - legend(label(4 "Single, SILC "))), /// -title("Partnership Status") /// - subtitle("Ages 20-29") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(,labsize(small)) /// - ylabel(0[0.2]0.8, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes:", size(vsmall)) - -graph export /// -"$dir_output_files/partnership/validation_${country}_partnership_ts_20_29_both.jpg", /// - replace width(2400) height(1350) quality(100) - - -* Those in their 30s - -* Validation data -use year dwt valid_dcpst_p valid_dcpst_s ageGroup using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if ageGroup == 4 | ageGroup == 5 - -collapse (mean) valid_dcpst_p valid_dcpst_s [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year sim_dcpst_p sim_dcpst_s sim_has_partner ageGroup using /// -"$dir_data/simulated_data.dta", clear - -keep if ageGroup == 4 | ageGroup == 5 - -collapse (mean) sim_dcpst_p sim_dcpst_s sim_has_partner, by(run year) - -collapse (mean) sim_dcpst_p sim_dcpst_s sim_has_partner /// - (sd) sim_dcpst_p_sd = sim_dcpst_p /// - sim_dcpst_s_sd = sim_dcpst_s /// - sim_has_partner_sd = sim_has_partner /// - , by(year) - -foreach varname in sim_dcpst_p sim_dcpst_s sim_has_partner { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway (rarea sim_dcpst_p_high sim_dcpst_p_low year, sort color(green%20) /// - legend(label(1 "Partnered, simulated"))) /// -(line valid_dcpst_p year, sort color(green) /// - legend(label(2 "Partnered, SILC "))) /// -(rarea sim_dcpst_s_high sim_dcpst_s_low year, sort color(red%20) /// - legend(label(3 "Single, simulated"))) /// -(line valid_dcpst_s year, sort color(red) /// - legend(label(4 "Single, SILC "))), /// - title("Partnership Status") /// - subtitle("Ages 30-39") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(,labsize(small)) /// - ylabel(0[0.1]0.7, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes:", size(vsmall)) - -graph export /// -"$dir_output_files/partnership/validation_${country}_partnership_ts_30_39_both.jpg", /// - replace width(2400) height(1350) quality(100) - - -* Those in their 40-59 - -* Validation data -use year dwt valid_dcpst_p valid_dcpst_s ageGroup using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -keep if ageGroup == 6 - -collapse (mean) valid_dcpst_p valid_dcpst_s [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year sim_dcpst_p sim_dcpst_s sim_has_partner ageGroup using /// -"$dir_data/simulated_data.dta", clear - -keep if ageGroup == 6 - -collapse (mean) sim_dcpst_p sim_dcpst_s sim_has_partner, by(run year) - -collapse (mean) sim_dcpst_p sim_dcpst_s sim_has_partner /// - (sd) sim_dcpst_p_sd = sim_dcpst_p /// - sim_dcpst_s_sd = sim_dcpst_s /// - sim_has_partner_sd = sim_has_partner /// - , by(year) - -foreach varname in sim_dcpst_p sim_dcpst_s sim_has_partner { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway (rarea sim_dcpst_p_high sim_dcpst_p_low year, sort color(green%20) /// - legend(label(1 "Partnered, simulated"))) /// -(line valid_dcpst_p year, sort color(green) /// - legend(label(2 "Partnered, SILC "))) /// -(rarea sim_dcpst_s_high sim_dcpst_s_low year, sort color(red%20) /// - legend(label(3 "Single, simulated"))) /// -(line valid_dcpst_s year, sort color(red) /// - legend(label(4 "Single, SILC "))), /// - title("Partnership Status") /// - subtitle("Ages 40-59") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(,labsize(small)) /// - ylabel(0[0.1]0.8, labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes:", size(vsmall)) - -graph export /// -"$dir_output_files/partnership/validation_${country}_partnership_ts_40_59_both.jpg", /// - replace width(2400) height(1350) quality(100) - - -graph drop _all - diff --git a/validation/02_simulation_validation/do_files/06_13_plot_health.do b/validation/02_simulation_validation/do_files/06_13_plot_health.do deleted file mode 100644 index 22a3781..0000000 --- a/validation/02_simulation_validation/do_files/06_13_plot_health.do +++ /dev/null @@ -1,408 +0,0 @@ -/******************************************************************************* -* PROJECT: SimPaths EU -* SECTION: Validation -* OBJECT: Health -* AUTHORS: Patryk Bronka, Ashley Burdett -* LAST UPDATE: 11/2025 (AB) -* COUNTRY: Poland -******************************************************************************** -* NOTES: Simulated data doesn't contain 80-100 year olds which make -* up group 8. -* Adjusted the code so that runs without this group. -*******************************************************************************/ - -******************************************************************************** -* 1 : Mean values over time -******************************************************************************** -******************************************************************************** -* 1.1 : Mean values over time, Working age (16-65) -******************************************************************************** - -* Working age -use year dwt dhe dag using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -gen health = dhe - -collapse (mean) health [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare Simulated data -use run year dhe dag run using "$dir_data/simulated_data.dta", clear - -collapse (mean) dhe, by(year run) - -collapse (mean) dhe /// - (sd) dhe_sd = dhe /// - , by(year) - -foreach varname in dhe { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway (rarea dhe_high dhe_low year, /// - sort color(green%20) legend(label(1 "Simulated"))) /// -(line health year, sort color(green) /// - legend(label(2 "SILC"))), /// -title("Health score") /// - subtitle("Ages ${min_age}-${max_age}") /// - xtitle("Year", size(small)) /// - ytitle("", size(small)) /// - ylabel(,labsize(small)) /// - xlabel(,labsize(small)) /// - graphregion(color(white)) /// - note("Notes: The health variable is a self-assessed variable with 5 categories. The greater the value the better the self-perceived health. ", /// - size(vsmall)) - -graph export /// -"$dir_output_files/health/validation_${country}_health_ts_${min_age}_${max_age}_both.jpg", /// - replace width(2560) height(1440) quality(100) - - -******************************************************************************** -* 1.2 : Mean values over time, Working age (16-65), By gender -******************************************************************************** -* Prepare validation data -use year dwt dhe dag dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -gen health = dhe - -collapse (mean) health [aw = dwt], by(year dgn) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulation data -use run year dhe dag dgn run using "$dir_data/simulated_data.dta", clear - -gen dgn2 = 0 if dgn == "Female" -replace dgn2 = 1 if dgn == "Male" - -drop dgn -rename dgn2 dgn - -collapse (mean) dhe, by(year dgn run) - -collapse (mean) dhe /// - (sd) dhe_sd = dhe /// - , by(year dgn) - -foreach varname in dhe { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year dgn using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -twoway (rarea dhe_high dhe_low year if dgn == 0, /// - sort color(green%20) legend(label(1 "Simulated"))) /// -(line health year if dgn == 0, sort color(green) /// - legend(label(2 "SILC"))), /// - subtitle("Females") name(health_female, replace) /// - xtitle("Year", size(small)) /// - ytitle("", size(small)) /// - ylabel(,labsize(small)) /// - xlabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) - -twoway (rarea dhe_high dhe_low year if dgn == 1, /// - sort color(green%20) legend(label(1 "Simulated"))) /// -(line health year if dgn == 1, sort color(green) /// - legend(label(2 "SILC"))), /// -subtitle("Males") name(health_male, replace) /// - xtitle("Year", size(small)) /// - ytitle("", size(small)) /// - ylabel(,labsize(small)) /// - xlabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) - -grc1leg health_female health_male, /// - title("Health score") /// - subtitle("Ages ${min_age}-${max_age}") /// - legendfrom(health_female) rows(1) /// - graphregion(color(white)) /// - note("Notes: The health variable is a self-assessed variable with 5 categories. The greater the value the better the self-perceived health.", /// - size(vsmall)) - -graph export /// -"$dir_output_files/health/validation_${country}_health_ts_${min_age}_${max_age}_gender.jpg", /// - replace width(2560) height(1440) quality(100) - - -******************************************************************************** -* 1.2 : Mean values over time, Working age (16-65), By age group -******************************************************************************** - -* Prepare validation data -use year dwt dgn ageGroup dhe using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -gen health_m = dhe if dgn == 1 -gen health_f = dhe if dgn == 0 - -drop if ageGroup == 0 | ageGroup == 8 - -collapse (mean) health* [aw = dwt], by(ageGroup year) - -drop if missing(ageGroup) -reshape wide health*, i(year) j(ageGroup) - -forvalues i = 1(1)7 { - - rename health_f`i' health_f_`i'_valid - rename health_m`i' health_m_`i'_valid - -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare Simulated data -use run year sim_sex ageGroup dhe using "$dir_data/simulated_data.dta", clear - -gen health_m = dhe if sim_sex == 1 -gen health_f = dhe if sim_sex == 2 - -collapse (mean) health*, by(ageGroup run year) -drop if missing(ageGroup) -reshape wide health*, i(year run) j(ageGroup) - -collapse (mean) health* /// - (sd) health_m_1_sd = health_m1 /// - (sd) health_f_1_sd = health_f1 /// - (sd) health_m_2_sd = health_m2 /// - (sd) health_f_2_sd = health_f2 /// - (sd) health_m_3_sd = health_m3 /// - (sd) health_f_3_sd = health_f3 /// - (sd) health_m_4_sd = health_m4 /// - (sd) health_f_4_sd = health_f4 /// - (sd) health_m_5_sd = health_m5 /// - (sd) health_f_5_sd = health_f5 /// - (sd) health_m_6_sd = health_m6 /// - (sd) health_f_6_sd = health_f6 /// - (sd) health_m_7_sd = health_m7 /// - (sd) health_f_7_sd = health_f7 /// - , by(year) - /*(sd) health_m_8_sd = health_m8 /// - *(sd) health_f_8_sd = health_f8 /// */ - -forvalues i=1(1)7 { - gen health_f_`i'_sim_high = health_f`i' + 1.96*health_f_`i'_sd - gen health_f_`i'_sim_low = health_f`i' - 1.96*health_f_`i'_sd - gen health_m_`i'_sim_high = health_m`i' + 1.96*health_m_`i'_sd - gen health_m_`i'_sim_low = health_m`i' - 1.96*health_m_`i'_sd -} - - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figure -foreach vble in "health_f" "health_m" { - twoway (rarea `vble'_1_sim_high `vble'_1_sim_low year, sort /// - color(green%20) legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_1_valid year, sort color(green) legend(label(2 "SILC"))), /// - title("Age 15-19") name(`vble'_1, replace) ylabel(3 [1] 5) /// - xlabel(,labsize(small)) ylabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) - - twoway (rarea `vble'_2_sim_high `vble'_2_sim_low year, sort /// - color(green%20) legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_2_valid year, sort color(green) legend(label(2 "SILC"))), /// - title("Age 20-24") name(`vble'_2, replace) ylabel(3 [1] 5) /// - xlabel(,labsize(small)) ylabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) - - twoway (rarea `vble'_3_sim_high `vble'_3_sim_low year, sort /// - color(green%20) legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_3_valid year, sort color(green) legend(label(2 "SILC"))), /// - title("Age 25-29") name(`vble'_3, replace) ylabel(3 [1] 5) /// - xlabel(,labsize(small)) ylabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) - - twoway (rarea `vble'_4_sim_high `vble'_4_sim_low year, sort /// - color(green%20) legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_4_valid year, sort color(green) legend(label(2 "SILC"))), /// - title("Age 30-34") name(`vble'_4, replace) ylabel(3 [1] 5) /// - xlabel(,labsize(small)) ylabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) - - twoway (rarea `vble'_5_sim_high `vble'_5_sim_low year, sort /// - color(green%20) legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_5_valid year, sort color(green) legend(label(2 "SILC"))), /// - title("Age 35-39") name(`vble'_5, replace) ylabel(3 [1] 5) /// - xlabel(,labsize(small)) ylabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) - - twoway (rarea `vble'_6_sim_high `vble'_6_sim_low year, sort /// - color(green%20) legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_6_valid year, sort color(green) legend(label(2 "SILC"))), /// - title("Age 40-59") name(`vble'_6, replace) ylabel(3 [1] 5) /// - xlabel(,labsize(small)) ylabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) - - twoway (rarea `vble'_7_sim_high `vble'_7_sim_low year, sort /// - color(green%20) legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_7_valid year, sort color(green) legend(label(2 "SILC"))), /// - title("Age 60-79") name(`vble'_7, replace) ylabel(3 [1] 5) /// - xlabel(,labsize(small)) ylabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) - - /*twoway (rarea `vble'_8_sim_high `vble'_8_sim_low year, sort /// - color(green%20) legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_8_valid year, sort color(green) legend(label(2 "SILC"))), /// - title("age 80-100") name(`vble'_8, replace) ylabel(1 [1] 5)*/ -} - -* Save figures -grc1leg health_f_1 health_f_2 health_f_3 health_f_4 health_f_5 /// - health_f_6 health_f_7 /*health_f_8*/, /// - title("Health score") /// - subtitle("Females") /// - legendfrom(health_f_1) /// - graphregion(color(white)) /// - note("Notes: The health variable is a self-assessed variable with 5 categories. The greater the value the better the self-perceived health.", /// - size(vsmall)) - -graph export /// - "$dir_output_files/health/validation_${country}_health_ts_all_female.jpg", /// - replace width(2400) height(1350) quality(100) - - -grc1leg health_m_1 health_m_2 health_m_3 health_m_4 health_m_5 /// - health_m_6 health_m_7 /*health_m_8*/, /// - title("Health score") /// - subtitle("Males") /// - legendfrom(health_m_1) /// - graphregion(color(white)) /// - note("Notes: The health variable is a self-assessed variable with 5 categories. The greater the value the better the self-perceived health.", /// - size(vsmall)) - -graph export /// - "$dir_output_files/health/validation_${country}_health_ts_all_male.jpg", /// - replace width(2400) height(1350) quality(100) - - -******************************************************************************** -* 2 : Histograms -******************************************************************************** -******************************************************************************** -* 2.1 : Histograms, Working age -******************************************************************************** - -use year dwt dhe dag using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -gen health = dhe - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare Simulated data -use run year dhe dag run using "$dir_data/simulated_data.dta", clear - -keep if run == 1 - -append using "$dir_data/temp_valid_stats.dta" - - -* Plot figure -twoway (hist dhe, width(0.2) color(green%30) legend(label(1 "Simulated"))) /// -(hist health, width(0.2) color(red%30) legend(label(2 "SILC"))), /// -title("Health score") subtitle("Ages ${min_age}-${max_age}") /// - xtitle("Health score", size(small)) /// - ytitle(, size(small)) /// - xlabel(,labsize(small)) /// - ylabel(,labsize(small)) /// - graphregion(color(white)) /// - legend(size(small)) /// - note("Notes: The health score is a self-assessed. The greater the value the better the self-perceived health.", /// - size(vsmall)) - -graph export /// -"$dir_output_files/health/validation_${country}_health_hist_${min_age}_${max_age}_both.jpg", /// - replace width(2560) height(1440) quality(100) - - -******************************************************************************** -* 2.1 : Histograms, Working age, By gender -******************************************************************************** -use year dwt dhe dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -gen health = dhe - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare Simulated data -use run year dhe dgn run using "$dir_data/simulated_data.dta", clear - -keep if run == 1 - -gen dgn2 = 0 if dgn == "Female" -replace dgn2 = 1 if dgn == "Male" - -drop dgn -rename dgn2 dgn - -append using "$dir_data/temp_valid_stats.dta" - - -* Plot figure - -twoway (hist dhe if dgn == 0, width(0.2) color(green%30) /// - legend(label(1 "Simulated"))) /// -(hist health if dgn == 0, width(0.2) color(red%30) /// - legend(label(2 "SILC"))), /// -subtitle("Females") /// - name(health_female, replace) /// - xtitle("Health score", size(small)) /// - xtitle("", size(small)) /// - xlabel(,labsize(small)) /// - ylabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) - -twoway (hist dhe if dgn == 1, width(0.2) color(green%30) /// - legend(label(1 "Simulated"))) /// -(hist health if dgn == 1, width(0.2) color(red%30) /// - legend(label(2 "SILC"))), /// -subtitle("Males") /// - name(health_male, replace) xtitle("Health score", size(small)) /// - xtitle("", size(small)) /// - xlabel(,labsize(small)) /// - ylabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) - -grc1leg health_female health_male, /// - title("Health score") /// - subtitle("Ages ${min_age}-${max_age}") /// - legendfrom(health_male) rows(1) /// - graphregion(color(white)) /// - note("Notes: The health score is a self-assessed. The greater the value the better the self-perceived health.", /// - size(vsmall)) - - -graph export /// -"$dir_output_files/health/validation_${country}_health_hist_${min_age}_${max_age}_gender.jpg", /// - replace width(2560) height(1440) quality(100) - -graph drop _all diff --git a/validation/02_simulation_validation/do_files/06_14_plot_at_risk_of_poverty.do b/validation/02_simulation_validation/do_files/06_14_plot_at_risk_of_poverty.do deleted file mode 100644 index e6678dc..0000000 --- a/validation/02_simulation_validation/do_files/06_14_plot_at_risk_of_poverty.do +++ /dev/null @@ -1,104 +0,0 @@ -/******************************************************************************* -* PROJECT: SimPaths EU -* SECTION: Validation -* OBJECT: Risk of poverty -* AUTHORS: Patryk Bronka, Ashley Burdett -* LAST UPDATE: 06/2025 (AB) -* COUNTRY: Poland -******************************************************************************** -* NOTES: This master do file organises do files used for validating -* SimPaths model using EU-SILC data for Poland. -*******************************************************************************/ - -******************************************************************************** -* 1 : Mean values over time -******************************************************************************** - -* Prepare validation data -use year dwt valid_y_eq_disp_yr_bu using /// - "$dir_data/${country}-eusilc_validation_sample.dta", clear - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_y_eq_disp_yr_bu, d - - replace valid_y_eq_disp_yr_bu = . if /// - valid_y_eq_disp_yr_bu < r(p1) | valid_y_eq_disp_yr_bu > r(p99) - -} - -qui sum year -local min_year = 2011 -local max_year = r(max) - -gen poverty_line = . -forval year = `min_year'/`max_year' { - - sum valid_y_eq_disp_yr_bu if year == `year', d - replace poverty_line = 0.6*r(p50) if year == `year' - -} - -gen arop = (valid_y_eq_disp_yr_bu < poverty_line) - -collapse (mean) arop [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year equivalisedincome using "$dir_data/simulated_data.dta", clear - -* Trim outliers -if "$trim_outliers" == "true" { - - sum equivalisedincome, d - - replace equivalisedincome = . if /// - equivalisedincome < r(p1) | equivalisedincome > r(p99) - -} - -bys run year: egen equivincome_median = median(equivalisedincome) -gen poverty_line = 0.6*equivincome_median -gen arop_sim = (equivalisedincome < poverty_line) - -collapse (mean) arop_sim, by(run year) - -collapse (mean) arop_sim /// - (sd) arop_sim_sd = arop_sim /// - , by(year) - -foreach varname in arop_sim { - - gen `varname'_high = `varname' + 1.96*`varname'_sd - gen `varname'_low = `varname' - 1.96*`varname'_sd - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - - -* Plot figure - - -twoway /// - (rarea arop_sim_high arop_sim_low year, sort color(green%20) /// - legend(label(1 "Simulated"))) /// - (line arop year, sort color(green) /// - legend(label(2 "Observed"))), /// - title("At Risk of Poverty") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) /// - note("Note: Poverty line calculated within each year as equivalised disposable income of benefit unit 60% of the median value.", /// - size(vsmall)) - - -* Save figure -graph export "$dir_output_files/poverty/validation_${country}_at_risk_of_poverty_EUSILC_age_${min_age}_${max_age}.jpg", /// - replace width(2560) height(1440) quality(100) diff --git a/validation/02_simulation_validation/do_files/06_15_plot_income_ratios.do b/validation/02_simulation_validation/do_files/06_15_plot_income_ratios.do deleted file mode 100644 index 0d63bda..0000000 --- a/validation/02_simulation_validation/do_files/06_15_plot_income_ratios.do +++ /dev/null @@ -1,124 +0,0 @@ -/******************************************************************************* -* PROJECT: SimPaths EU -* SECTION: Validation -* OBJECT: Income ratios -* AUTHORS: Patryk Bronka, Ashley Burdett -* LAST UPDATE: 11/2025 (AB) -* COUNTRY: Poland -******************************************************************************** -* NOTES: -*******************************************************************************/ - -******************************************************************************** -* 1 : Time series -******************************************************************************** - -* Prepare validation data -use year dwt valid_y_eq_disp_bu_yr using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -* Trim outliers -if "$trim_outliers" == "true" { - - sum valid_y_eq_disp_bu_yr, d - - replace valid_y_eq_disp_bu_yr = . if /// - valid_y_eq_disp_bu_yr < r(p1) | valid_y_eq_disp_bu_yr > r(p99) - -} - -collapse (p90) p90_disp = valid_y_eq_disp_bu_yr (p50) /// - p50_disp = valid_y_eq_disp_bu_yr (p10) /// - p10_disp = valid_y_eq_disp_bu_yr, by(year) - -gen p90_p10_ratio_disp_obs = p90_disp/p10_disp -gen p90_p50_ratio_disp_obs = p90_disp/p50_disp - -* Align reference years -gen l_p90_p10_ratio_disp_obs = p90_p10_ratio_disp_obs[_n+1] -gen l_p90_p50_ratio_disp_obs = p90_p50_ratio_disp_obs[_n+1] - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year equivalisedincome using "$dir_data/simulated_data.dta", clear - -* Trim outliers -if "$trim_outliers" == "true" { - - sum equivalisedincome, d - replace equivalisedincome = . if /// - equivalisedincome < r(p1) | equivalisedincome > r(p99) - -} - -collapse (p90) p90_disp = equivalisedincome (p50) /// - p50_disp = equivalisedincome (p10) /// - p10_disp = equivalisedincome, by(run year) - -gen p90_p10_ratio_disp = p90_disp/p10_disp -gen p90_p50_ratio_disp = p90_disp/p50_disp - -collapse (mean) p90_p10_ratio_disp p90_p50_ratio_disp /// - (sd) sd_p90_p10_ratio_disp = p90_p10_ratio_disp /// - sd_p90_p50_ratio_disp = p90_p50_ratio_disp /// - , by(year) - - foreach var in p90_p10_ratio_disp p90_p50_ratio_disp { - - gen `var'_HI = `var' + 1.96*sd_`var' - gen `var'_LO = `var' - 1.96*sd_`var' - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - - -* Plot figure -foreach var in p90_p10_ratio_disp { - - twoway (rarea `var'_HI `var'_LO year, sort color(red%20) /// - legend(label(1 "Simulated") position(6) rows(1))) /// - (line `var'_obs year, sort legend(label(2 "SILC"))), /// - title("P90/P10") /// - name(`var', replace) /// - xtitle("", size(small)) /// - ytitle("", size(small)) /// - ylabel(,labsize(small)) /// - xlabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) - -} - -foreach var in p90_p50_ratio_disp { - - twoway (rarea `var'_HI `var'_LO year, sort color(red%20) /// - legend(label(1 "Simulated") position(6) rows(1))) /// - (line `var'_obs year, sort legend(label(2 "SILC"))), /// - title("P90/P50") /// - name(`var', replace) /// - xtitle("", size(small)) /// - ytitle("", size(small)) /// - ylabel(,labsize(small)) /// - xlabel(,labsize(small)) /// - legend(size(small)) /// - graphregion(color(white)) - -} - -* Save figure -grc1leg p90_p10_ratio_disp p90_p50_ratio_disp, /// - title("Inequality") /// - legendfrom(p90_p10_ratio_disp) /// - graphregion(color(white)) /// - note("Notes: Figures contain household income decile ratios. Sample contains all individuals ages ${min_age}-${max_age}. Individual observatioons plotted, beneift unit" "variable.", /// - size(vsmall)) - -graph export /// -"$dir_output_files/inequality/validation_${country}_disposable_income_ratio_ts_all_both.jpg", /// - replace width(2400) height(1350) - - -graph drop _all diff --git a/validation/02_simulation_validation/do_files/06_16_plot_number_children.do b/validation/02_simulation_validation/do_files/06_16_plot_number_children.do deleted file mode 100644 index 87c3029..0000000 --- a/validation/02_simulation_validation/do_files/06_16_plot_number_children.do +++ /dev/null @@ -1,171 +0,0 @@ -/******************************************************************************* -* PROJECT: SimPaths EU -* SECTION: Validation -* OBJECT: Children -* AUTHORS: Patryk Bronka, Ashley Burdett -* LAST UPDATE: 06/2025 (AB) -* COUNTRY: Poland -******************************************************************************** -* NOTES: This do file plots simulated and SILC % of benefit units -* with a given number of children -*******************************************************************************/ - -******************************************************************************** -* 1 : Mean values over time -******************************************************************************** -******************************************************************************** -* 1.1 : Mean values over time, All children -******************************************************************************** - -* Prepare validation data -use year idbenefitunit dwt children_* using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -bys year idbenefitunit: keep if _n == 1 - -* Calculate weighted share of benefit units with 0, 1, 2, 3 or more children -collapse (mean) children_* [aw = dwt], by(year) - -foreach varname in children_0 children_1 children_2 children_3p { - - rename `varname' valid_`varname' - -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year idbenefitunit children_* using /// - "$dir_data/simulated_data.dta", clear - -bys run year idbenefitunit: keep if _n == 1 - -collapse (mean) children_*, by(run year) - -rename children_3plus children_3p - -collapse (mean) children_* /// - (sd) children_0_sd = children_0 /// - children_1_sd = children_1 /// - children_2_sd = children_2 /// - children_3p_sd = children_3p /// - , by(year) - -foreach varname in children_0 children_1 children_2 children_3p { - - gen sim_`varname'_h = `varname' + 1.96*`varname'_sd - gen sim_`varname'_l = `varname' - 1.96*`varname'_sd - rename `varname' sim_`varname' - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figures -label var sim_children_0 "No children" -label var sim_children_1 "1 child" -label var sim_children_2 "2 children" -label var sim_children_3p "3+ children" - - -twoway (rarea sim_children_0_h sim_children_0_l year, /// - sort color(green%20) legend(label(1 "No children, simulated"))) /// -(line valid_children_0 year, sort color(green) /// - legend(label(2 "No children, SILC"))) /// -(rarea sim_children_1_h sim_children_1_l year, sort color(blue%20) /// - legend(label(3 "1 child, simulated"))) /// -(line valid_children_1 year, sort color(blue) /// - legend(label(4 "1 child, SILC"))) /// -(rarea sim_children_2_h sim_children_2_l year, sort color(red%20) /// - legend(label(5 "2 children, simulated"))) /// -(line valid_children_2 year, sort color(red) /// - legend(label(6 "2 children, SILC"))) /// -(rarea sim_children_3p_h sim_children_3p_l year, sort color(grey%20) /// - legend(label(7 "3+ children, simulated"))) /// -(line valid_children_3p year, sort color(grey) /// - legend(label(8 "3+ children, SILC"))), /// - title("Number of children") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - graphregion(color(white)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - legend(size(small)) /// - note("Notes: Statistics computed at the benefit unit level.", size(vsmall)) - -* Save figure -graph export "$dir_output_files/children/validation_${country}_children_ts_all_both.jpg", /// - replace width(2400) height(1350) quality(100) - - -******************************************************************************** -* 1.1 : Mean values over time, Children under 2 -******************************************************************************** - -* Prepare validation data -use year idbenefitunit dwt dnc02 using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -gen child02 = . -replace child02 = 0 if dnc02 == 0 -replace child02 = 1 if dnc02 > 0 & dnc02 != . - -bys year idbenefitunit: keep if _n == 1 - -* Calculate weighted share of benefit units with 0, 1, 2, 3 or more children -collapse (mean) child02 [aw = dwt], by(year) - -foreach varname in child02 { - rename `varname' valid_`varname' -} - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulated data -use run year idbenefitunit sim_dnc02 using /// - "$dir_data/simulated_data.dta", clear - -gen sim_child02 = . -replace sim_child02 = 0 if sim_dnc02 == 0 -replace sim_child02 = 1 if sim_dnc02 > 0 & sim_dnc02 != . - -bys run year idbenefitunit: keep if _n == 1 - -collapse (mean) sim_child02, by(run year) - -collapse (mean) sim_child02 /// - (sd) sim_child02_sd = sim_child02 /// - , by(year) - -foreach varname in sim_child02 { - - gen `varname'_h = `varname' + 1.96*`varname'_sd - gen `varname'_l = `varname' - 1.96*`varname'_sd - rename `varname' sim_`varname' - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figures - -twoway (rarea sim_child02_h sim_child02_l year, /// - sort color(green%20) legend(label(1 "Simulated"))) /// -(line valid_child02 year, sort color(green) /// - legend(label(2 "SILC"))), /// - title("With child 0-2 years old") /// - xtitle("Year", size(small)) /// - ytitle("Share", size(small)) /// - graphregion(color(white)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - legend(size(small)) /// - note("Notes: Statistics computed at the benefit unit level.", size(vsmall)) - -* Save figure -graph export "$dir_output_files/children/validation_${country}_young_child_ts_all_both.jpg", /// - replace width(2400) height(1350) quality(100) - -//graph drop _all diff --git a/validation/02_simulation_validation/do_files/06_17_plot_disability.do b/validation/02_simulation_validation/do_files/06_17_plot_disability.do deleted file mode 100644 index 43dd723..0000000 --- a/validation/02_simulation_validation/do_files/06_17_plot_disability.do +++ /dev/null @@ -1,225 +0,0 @@ -/******************************************************************************* -* PROJECT: SimPaths EU -* SECTION: Validation -* OBJECT: Disability -* AUTHORS: Patryk Bronka, Ashley Burdett -* LAST UPDATE: 11/2025 (AB) -* COUNTRY: Poland -******************************************************************************** -* NOTES: -*******************************************************************************/ - -******************************************************************************** -* 1 : Mean values over time -******************************************************************************** -******************************************************************************** -* 1.1 : Mean values over time, Working age, All -******************************************************************************** -* Prepare validation data -use year dwt dlltsd using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -collapse (mean) dlltsd [aw = dwt], by(year) - -save "$dir_data/temp_valid_stats.dta", replace - - -* Prepare simulation data -use year sim_dlltsd run using "$dir_data/simulated_data.dta", clear - -collapse (mean) sim_dlltsd, by(run year) - -collapse (mean) sim_dlltsd (sd) sim_dlltsd_sd = sim_dlltsd, by(year) - -gen sim_dlltsd_high = sim_dlltsd + 1.96*sim_dlltsd_sd -gen sim_dlltsd_low = sim_dlltsd - 1.96*sim_dlltsd_sd - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - - -* Plot figure -twoway (rarea sim_dlltsd_high sim_dlltsd_low year, sort color(green%20) /// - legend(label(1 "Simulated "))) /// -(line dlltsd year, sort color(green) /// - legend(label(2 "SILC "))), /// - title("Disabled/Long-term sick ") /// - xtitle("", size(small)) /// - ytitle("Share", size(small)) /// - legend(size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - note("Notes:Sample contains those aged ${min_age}-${max_age}.", size(vsmall)) - -graph export /// -"$dir_output_files/disability/validation_${country}_disability_ts_${min_age}_${max_age}_both.jpg", /// - replace width(2400) height(1350) quality(100) - - -******************************************************************************** -* 1.2 : Mean values over time, Working age, By gender -******************************************************************************** -* Prepare validation data -use year dwt dlltsd dgn using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -collapse (mean) dlltsd [aw = dwt], by(year dgn) - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulation data -use year sim_dlltsd run dgn using "$dir_data/simulated_data.dta", clear - -gen dgn2 = 0 if dgn == "Female" -replace dgn2 = 1 if dgn == "Male" - -drop dgn -rename dgn2 dgn - -collapse (mean) sim_dlltsd, by(run year dgn) - -collapse (mean) sim_dlltsd (sd) sim_dlltsd_sd = sim_dlltsd, by(year dgn) - -gen sim_dlltsd_high = sim_dlltsd + 1.96*sim_dlltsd_sd -gen sim_dlltsd_low = sim_dlltsd - 1.96*sim_dlltsd_sd - -merge 1:1 year dgn using "$dir_data/temp_valid_stats.dta", keep(3) nogen - - -twoway (rarea sim_dlltsd_high sim_dlltsd_low year if dgn == 0, sort color(green%20) /// - legend(label(1 "Female, simulated"))) /// -(line dlltsd year if dgn == 0, sort color(green) /// - legend(label(2 "Female, SILC "))) /// - (rarea sim_dlltsd_high sim_dlltsd_low year if dgn == 1, sort color(red%20) /// - legend(label(3 "Male, simulated"))) /// -(line dlltsd year if dgn == 1, sort color(red) /// - legend(label(4 "Male, SILC"))), /// -title("Disabled/Long-term sick by gender ") xtitle("", size(small)) /// - ytitle("Share", size(small)) /// - legend(size(small)) /// - xlabel(, labsize(small)) /// - ylabel(, labsize(small)) /// - graphregion(color(white)) /// - note("Notes:Sample contains those aged ${min_age}-${max_age}.", size(vsmall)) - -graph export /// -"$dir_output_files/disability/validation_${country}_disability_ts_${min_age}_${max_age}_male_female.jpg", /// - replace width(2560) height(1440) quality(100) - - -******************************************************************************** -* 1.2 : Mean values over time, Working age, By age -******************************************************************************** - -* Prepare validation data -use year dwt dlltsd dgn ageGroup using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -drop if ageGroup == 0 | ageGroup == 8 - -collapse (mean) dlltsd [aweight=dwt], by(ageGroup year) - -drop if missing(ageGroup) - -reshape wide dlltsd, i(year) j(ageGroup) - -forvalues i = 1(1)7 { - rename dlltsd`i' dlltsd_`i'_valid -} - -save "$dir_data/temp_valid_stats.dta", replace - -* Prepare simulation data -use run year sim_dlltsd ageGroup using "$dir_data/simulated_data.dta", clear - -collapse (mean) sim_dlltsd, by(ageGroup run year) - -drop if missing(ageGroup) - -reshape wide sim_dlltsd, i(year run) j(ageGroup) - -forvalues i = 1(1)7{ - - rename sim_dlltsd`i' dlltsd_`i'_sim - -} - -collapse (mean) dlltsd* /// - (sd) sd_dlltsd_1_sim = dlltsd_1_sim /// - sd_dlltsd_2_sim = dlltsd_2_sim /// - sd_dlltsd_3_sim = dlltsd_3_sim /// - sd_dlltsd_4_sim = dlltsd_4_sim /// - sd_dlltsd_5_sim = dlltsd_5_sim /// - sd_dlltsd_6_sim = dlltsd_6_sim /// - sd_dlltsd_7_sim = dlltsd_7_sim /// - , by(year) - -forvalues i = 1(1)7 { - - gen dlltsd_`i'_sim_high = dlltsd_`i'_sim + 1.96*sd_dlltsd_`i'_sim - gen dlltsd_`i'_sim_low = dlltsd_`i'_sim - 1.96*sd_dlltsd_`i'_sim - -} - -merge 1:1 year using "$dir_data/temp_valid_stats.dta", keep(3) nogen - -* Plot figures -foreach vble in "dlltsd" { - - twoway (rarea `vble'_1_sim_high `vble'_1_sim_low year, /// - sort color(red%20) legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_1_valid year, sort legend(label(2 "SILC"))), /// - title("age 15-19") name(`vble'_1, replace) ylabel(0 [0.05] 0.1) /// - graphregion(color(white)) - - twoway (rarea `vble'_2_sim_high `vble'_2_sim_low year, /// - sort color(red%20) legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_2_valid year, sort legend(label(2 "SILC"))), /// - title("age 20-24") name(`vble'_2, replace) ylabel(0 [0.05] 0.1) /// - graphregion(color(white)) - - twoway (rarea `vble'_3_sim_high `vble'_3_sim_low year, /// - sort color(red%20) legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_3_valid year, sort legend(label(2 "SILC"))), /// - title("age 25-29") name(`vble'_3, replace) ylabel(0 [0.05] 0.1) /// - graphregion(color(white)) - - twoway (rarea `vble'_4_sim_high `vble'_4_sim_low year, /// - sort color(red%20) legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_4_valid year, sort legend(label(2 "SILC"))), /// - title("age 30-34") name(`vble'_4, replace) ylabel(0 [0.05] 0.1) /// - graphregion(color(white)) - - twoway (rarea `vble'_5_sim_high `vble'_5_sim_low year, /// - sort color(red%20) legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_5_valid year, sort legend(label(2 "SILC"))), /// - title("age 35-39") name(`vble'_5, replace) ylabel(0 [0.05] 0.1) /// - graphregion(color(white)) - - twoway (rarea `vble'_6_sim_high `vble'_6_sim_low year, /// - sort color(red%20) legend(label(1 "Simulated") position(6) /// - rows(1)))(line `vble'_6_valid year, sort /// - legend(label(2 "SILC"))), /// - title("age 40-59") name(`vble'_6, replace) ylabel(0 [0.05] 0.1) /// - graphregion(color(white)) - - twoway (rarea `vble'_7_sim_high `vble'_7_sim_low year, /// - sort color(red%20) legend(label(1 "Simulated") position(6) rows(1))) /// - (line `vble'_7_valid year, sort legend(label(2 "SILC"))), /// - title("age 60-65") name(`vble'_7, replace) ylabel(0 [0.05] 0.1) /// - graphregion(color(white)) - -} - -grc1leg dlltsd_1 dlltsd_2 dlltsd_3 dlltsd_4 dlltsd_5 dlltsd_6 /// - dlltsd_7 , title("Disabled/Long-term sick by age") legendfrom(dlltsd_1) /// - graphregion(color(white)) /// - note("Notes:", size(vsmall)) - - -graph export /// -"$dir_output_files/disability/validation_${country}_disability_ts_all_both.jpg", /// - replace width(2400) height(1350) quality(100) - - -graph drop _all diff --git a/validation/02_simulation_validation/do_files/07_01_correlations.do b/validation/02_simulation_validation/do_files/07_01_correlations.do deleted file mode 100644 index e8cc947..0000000 --- a/validation/02_simulation_validation/do_files/07_01_correlations.do +++ /dev/null @@ -1,153 +0,0 @@ -/******************************************************************************* -* PROJECT: SimPaths EU -* SECTION: Validation -* OBJECT: Correlation -* AUTHORS: Patryk Bronka, Ashley Burdett -* LAST UPDATE: 11/2025 (AB) -* COUNTRY: Poland -******************************************************************************** -* NOTES: This file calculates correlations between variables of -* interest, in observed and simulated data -* -* List of variables considered. Name : simulated : validation -* -* 1. disposable income per benefit unit : sim_y_disp_yr_bu : -* valid_y_disp_yr_bu -* 2. labour market status : sim_employed, sim_student, -* sim_inactive, sim_retired : valid_employed, -* valid_student, valid_inactive, valid_retired -* -*******************************************************************************/ - -global sim_varlist sim_employed sim_inactive sim_retired sim_edu_high /// - sim_edu_med sim_edu_low sim_y_gross_yr_bu sim_yplgrs_dv_lvl_bu /// - sim_ypncp_lvl_bu sim_y_disp_yr_bu equivalisedincome /// - /*obs_earnings_hourly*/ hoursworkedweekly sim_dcpst_s /// - sim_dcpst_p dhe - -global valid_varlist valid_employed valid_inactive valid_retired /// - valid_edu_high valid_edu_med valid_edu_low valid_y_gross_bu_yr /// - valid_y_gross_labour_bu_yr valid_y_gross_capital_bu_yr /// - valid_y_disp_bu_yr valid_y_eq_disp_bu_yr /*valid_wage_hour*/ valid_lhw /// - valid_dcpst_s valid_dcpst_p dhe - -* Simulated correlations - -use run year ${sim_varlist} using "$dir_data/simulated_data.dta", clear - -lab var dhe "Health" -lab var sim_employed "Employed" -lab var sim_inactive "Non-employed" -lab var sim_retired "Retired" -lab var sim_edu_high "High education" -lab var sim_edu_med "Medium education" -lab var sim_edu_low "Low education" -lab var sim_y_disp_yr_bu "Disposable income" -lab var sim_y_gross_yr_bu "Gross income" -lab var sim_yplgrs_dv_lvl_bu "Gross labour income" -lab var sim_ypncp_lvl_bu "Capital income" -lab var sim_ypnoab_lvl_bu "Private pension income" -lab var equivalisedincome "Equivalised disposable income" -//lab var potential_earnings_hourly "Hourly wage" -lab var hoursworkedweekly "Hours worked" -lab var sim_dcpst_s "Single" -lab var sim_dcpst_p "Partnered" - -keep if run == 1 - -quietly correlate ${sim_varlist} -matrix CS = r(C) - -heatplot CS, values(format(%3.2f) size(1.1)) cuts(-1.05(.1)1.05) /// - color(hcl diverging, intensity(.6)) legend(off) aspectratio(1) /// - lower label xlabel(, angle(90) labsize(vsmall)) /// - ylabel(, labsize(vsmall)) title("Simulated") name(sim_corr, replace) /// - graphregion(color(white)) - -* Save figure -graph export /// - "$dir_output_files/correlations/validation_correlations_simulated_${min_age}_${max_age}.jpg", /// - replace width(2560) height(1440) quality(100) - - -* SILC correlations -use year dwt ${valid_varlist} using /// - "$dir_data/${country}-eusilc_validation_sample_long.dta", clear - -lab var dhe "Health" -lab var valid_employed "Employed" -lab var valid_inactive "Non-employed" -lab var valid_retired "Retired" -lab var valid_edu_high "High education" -lab var valid_edu_med "Medium education" -lab var valid_edu_low "Low education" -lab var valid_y_disp_bu_yr "Disposable income" -lab var valid_y_gross_bu_yr "Gross income" -lab var valid_y_gross_labour_bu_yr "Gross labour income" -lab var valid_y_gross_capital_bu_yr "Capital income" -//lab var pension_income_bu "Private pension income" -lab var valid_y_eq_disp_bu_yr "Equivalised disposable income" -//lab var valid_wage_hour "Hourly wage" -lab var valid_lhw "Hours worked" -lab var valid_dcpst_p "Partnered" -lab var valid_dcpst_s "Single" - -replace valid_lhw = 0 if valid_inactive == 1 | valid_retired == 1 - -quietly correlate ${valid_varlist} - -matrix CV = r(C) - -heatplot CV, values(format(%3.2f) size(1.1)) cuts(-1.05(.1)1.05) /// - color(hcl diverging, intensity(.6)) legend(off) aspectratio(1) /// - lower label xlabel(, angle(90) labsize(vsmall)) /// - ylabel(, labsize(vsmall)) title("Observed") name(obs_corr, replace) /// - graphregion(color(white)) - - -* Save figure -graph combine sim_corr obs_corr, title("Correlation coefficients") /// - note("Notes: Ages 18-65 included. ", size(vsmall)) graphregion(color(white)) - - -graph export /// -"$dir_output_files/correlations/validation_correlations_simulated_observed_${min_age}_${max_age}.png", /// - replace width(2560) height(1440) - -/* -Calculate the difference and absolute difference matrix -*/ - -matrix CDiff = CS - CV -//matewmf CDiff CDiffAbs, f(abs) - -/* -* Heatplot for the distance matrix -heatplot CDiff, values(format(%3.2f) size(tiny)) cuts(-1.05(.1)1.05) /// - color(hcl diverging, intensity(.6)) legend(off) aspectratio(1) /// - lower label xlabel(, angle(45) labsize(small)) /// - ylabel(, labsize(small)) /// - title("Distance between simulated" "and observed correlations") - -* Save figure -graph export /// - "$dir_output_files/validation_correlations_distance_${min_age}_${max_age}.jpg", /// - replace width(2560) height(1440) quality(100) -*/ - -* Heatplot for the difference matrix -heatplot CDiff, values(format(%3.2f) size(tiny)) cuts(-1.05(.1)1.05) /// - color(hcl diverging, intensity(.6)) legend(off) aspectratio(1) /// - lower label xlabel(, angle(90) labsize(vsmall)) /// - ylabel(, labsize(vsmall)) /// - title("Difference between simulated" "and observed correlations") /// - note("Positive values indicate that simulated correlation was stronger than observed", size(vsmall)) graphregion(color(white)) - -* Save figure -graph export /// -"$dir_output_files/correlations/validation_correlations_difference_${min_age}_${max_age}.jpg", /// - replace width(2560) height(1440) quality(100) - - -graph drop _all - diff --git a/validation/02_simulation_validation/do_files/07_create_euromod_validation_targets.do b/validation/02_simulation_validation/do_files/07_create_euromod_validation_targets.do deleted file mode 100644 index 5cb5a93..0000000 --- a/validation/02_simulation_validation/do_files/07_create_euromod_validation_targets.do +++ /dev/null @@ -1,116 +0,0 @@ -/* -This file pre-processes observed output data from EUROMOD, which will be used for validation of SimPaths. - -Author: Ashley Burdett -Last modified: April 2025 - - - -*/ -clear all - -local bu_headid = "tu_cbfam_hu_headid" - -* Import the data - -forvalues year = 2011/2023 { - - insheet using "$dir_euromod_data/${country_lower}_`year'_std.txt", clear - - gen year = `year' - - save "$dir_work/${country}_EM_`year'.dta", replace - -} - -* Append data - -use "$dir_work/${country}_EM_2011.dta" - -forvalues year = 2012/2023 { - - append using "$dir_work/${country}_EM_`year'.dta" - -} - -* Formatting of ids -format idhh %-18.0g -format tu_household_hu_headid %-18.0g -format tu_cbfam_hu_headid %-18.0g -format tu_cpfam_hu_headid %-18.0g -format idperson %-18.0g -format idmother %-18.0g -format idfather %-18.0g -format idpartner %-18.0g - - -* Benefit unit dispoaable income - -* Real terms -gen CPI = . - -replace CPI = 68.98 if year == 2005 -replace CPI = 71.76 if year == 2006 -replace CPI = 77.45 if year == 2007 -replace CPI = 82.12 if year == 2008 -replace CPI = 85.43 if year == 2009 -replace CPI = 89.47 if year == 2010 -replace CPI = 92.98 if year == 2011 -replace CPI = 98.24 if year == 2012 -replace CPI = 99.92 if year == 2013 -replace CPI = 99.94 if year == 2014 -replace CPI = 100 if year == 2015 -replace CPI = 100.45 if year == 2016 -replace CPI = 102.84 if year == 2017 -replace CPI = 105.84 if year == 2018 -replace CPI = 109.46 if year == 2019 -replace CPI = 113.15 if year == 2020 -replace CPI = 119.04 if year == 2021 -replace CPI = 137.22 if year == 2022 -replace CPI = 160.59 if year == 2023 - -lab var CPI "HICP, all items, base 2015" - -gen ind_disp_y = ils_dispy / (CPI/100) - - -* Collapse by benefit unit head id -preserve - -collapse (sum) ind_disp_y, by(`bu_headid' year) - -rename ind_disp_y em_bu_disp_y - -save "$dir_work/${country}_EM_collapse.dta", replace - -restore - -merge m:1 `bu_headid' year using "$dir_work/${country}_EM_collapse.dta" -drop _m - - -* Equivalised disposable income -gen depChild = 1 if (dag >= 0 & dag <= 18) -bys year idhh `bu_headid': egen dnc = sum(depChild) -lab var dnc "Number of dependent children 0 - 18" - -* Generate modified-OECD equivalence scale: 1 for the household head, 0.5 for -* additional adults, 0.3 for children < 14 years old -bys year idhh `bu_headid': gen people_in_hh = _N -gen child = (dag < 14) -bys year idhh `bu_headid': egen children_in_hh = total(child) -gen other_adults = people_in_hh - children_in_hh - 1 - // -1 for the household head - -gen equiv_factor = 1 + (0.5 * other_adults) + (0.3 * children_in_hh) - // Start with 1 because each household must have at least the head -la var equiv_factor "OECD-modified scale equivalence factor" - -gen em_bu_eq_disp_y = em_bu_disp_y / equiv_factor - -drop child people_in_hh child children_in_hh other_adults - - -rename em_bu_disp_y valid_y_disp_yr_bu - -save "$dir_work/${country}_EM_validation_data.dta", replace