Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
/*******************************************************************************
* PROJECT: SimPaths UK
* DO-FILE NAME: 00_master_conditions.do
* DESCRIPTION: Sets out the assumptions and conditions imposed in the
* creation of the unique dataset and the if conditions
* imposed when estimating the processes for SimPaths.
********************************************************************************
* COUNTRY: UK
* AUTHORS: Daria Popova
* LAST UPDATE: 6 May 2026 DP
********************************************************************************
* -----------------------------------------------------------------------------
* Assumptions imposed to align the initial populations with simulation rules
* -----------------------------------------------------------------------------
*
* - Retirement:
* - Treated as an absorbing state
* - Must retire by a specified maximum age
* - Cannot retire before a specified minimum age
*
* - Education:
* - Leave education no earlier than a specified minimum age
* - Must leave the initial education spell by a specified maximum age
* - Cannot return to education after retirement
*
* - Work:
* - Can work from a specified minimum age
* - Activity status and hours of work populated consistently:
* → Assume not working if report hours = 0
* → Assume hours = 0 if not working
* - If missing partial information, don't assume the missing is 0 and
* impute (hot-deck)
*
* - Leaving the parental home:
* - Can leave from a specified minimum age
* - Become the effective head of hh even when living with parents when
* paretns retire or reach state retirment age
*
* - Home ownership:
* - Can own a home from a specified minimum age
*
* - Partnership formation:
* - Can form a partnership from a specified minimum age
*
* - Disability:
* - Treated as a subsample of the not-employed population
*
* The relevant age thresholds are defined in globals defined in "DEFINE
* PARAMETERS" section below.
* Throughout also construct relevant flags and produce an Excel file "flag_descriptves" to
* see the extent of the adjustments to the raw data.
*
* -----------------------------------------------------------------------
* Additional notes on implementation:
* -----------------------------------------------------------------------
* Current imputations :
* - Self-rated health status (ordered probit model)
* - Subjective well-being (liner regression)
* - Mental and physical component summaries (linear regression)
* - Impute highest parental education status (ordered probit model)
* - Impute education status using lagged observation and generalized ordered logit
* - Impute working hours if missing but the person is in work (panel based imputation + hot-deck)
* - Impute observed hourly wages if missing but the person is in work (panel based imputation + hot-deck)
*
* -----------------------------------------------------------------------
* Remaining disparities between initial populations and simulation rules:
* -----------------------------------------------------------------------
* - Ages at which females can have a child. [Be informed by the sample?]
* Permit teenage mothers in this script (deal with in 03_ )
* - A few higher/older education spells (30+) that last multiple years, whilst
* in the simulation can only return to education for single year spells.
* - Should we have people becoming adults at 18 or 16 for income/number of
* children purposes?
* Considered a child if live with parents until 18 and in ft education?
* - Don't impose monotoncity on reported educational attainment information.
* - Number of children vars (all ages or 0-2) don't account for feasibility
* of age at birth of the mother.
*******************************************************************************/

/*******************************************************************************
* DEFINTE PARAMETERS
*******************************************************************************/

global country "UK"

global first_sim_year "2010"

global last_sim_year "2025"



* Globals used for all processes
global weight "dwt"

//global regions "UKC UKD UKE UKF UKG UKH UKJ UKK UKL UKM UKN" //UKI is London (reference)
global regions "demRgnUKC demRgnUKD demRgnUKE demRgnUKF demRgnUKG demRgnUKH demRgnUKJ demRgnUKK demRgnUKL demRgnUKM demRgnUKN" //demRgnUKI is London (reference)

//global ethnicity "Ethn_Asian Ethn_Black Ethn_Other" //White is reference. Mixed race & undefined are in Other category
global ethnicity "demEthnC4Asian demEthnC4Black demEthnC4Other" //White is reference. Mixed race & undefined are in Other category

* Define threshold ages
/*
Ages used for specifying samples.
ENSURE THE SAME AS THE GLOBALS USED IN THE INTIIAL POPULATIONS MASTER FILE
*/

* Age become an adult in various dimensions
global age_becomes_responsible 18

global age_becomes_semi_responsible 16

global age_seek_employment 16

global age_leave_school 16

global age_form_partnership 18

global age_have_child_min 18

global age_leave_parental_home 18

global age_own_home 18

* Age can/must/cannot make various transitions
global age_max_dep_child 17

global age_adult 18

global age_can_retire 50

global age_force_retire 75

global age_force_leave_spell1_edu 30

global age_have_child_max 49 // allow this to be led by the data


/*******************************************************************************
* PROCESS IF CONDITIONS
*******************************************************************************/

* Education
global e1a_if_condition "dag >= ${age_leave_school} & dag < ${age_force_leave_spell1_edu} & l.les_c4 == 2"

global e1b_if_condition "dag >= ${age_leave_school} & l.les_c4 != 4 & l.les_c4 != 2"

global e2_if_condition "dag >= ${age_leave_school} & l.les_c4 == 2 & les_c4 != 2"

* Leave the parental home
global p1_if_condition "ded == 0 & dag >= ${age_leave_parental_home}"

* Partnership
global u1_if_condition "dag >= ${age_form_partnership} & ssscp != 1"

global u2_if_condition "dgn == 0 & dag >= ${age_form_partnership} & l.ssscp != 1"

* Fertility
global f1_if_condition "dag >= ${age_have_child_min} & dag <= ${age_have_child_max} & dgn == 0"

* Health
global h1_if_condition "dag >= ${age_becomes_semi_responsible} & flag_dhe_imp == 0"

global h2_if_condition "dag >= ${age_becomes_semi_responsible} & ded == 0"

* Home ownership
global ho1_if_condition "dag >= ${age_own_home}"

* Retirment
global r1a_if_condition "dcpst == 2 & dag >= ${age_can_retire}"

global r1b_if_condition "ssscp != 1 & dcpst == 1 & dag >= ${age_can_retire}"


* WAGES
global wages_f_no_prev_if_condition "dgn == 0 & dag >= ${age_seek_employment} & dag <= ${age_force_retire} & previouslyWorking == 0 & deh_c4>0"

global wages_m_no_prev_if_condition "dgn == 1 & dag >= ${age_seek_employment} & dag <= ${age_force_retire} & previouslyWorking == 0 & deh_c4>0"

global wages_f_prev_if_condition "dgn == 0 & dag >= ${age_seek_employment} & dag <= ${age_force_retire} & previouslyWorking == 1 & deh_c4>0"

global wages_m_prev_if_condition "dgn == 1 & dag >= ${age_seek_employment} & dag <= ${age_force_retire} & previouslyWorking == 1 & deh_c4>0"


* CAPITAL INCOME
global i1a_if_condition "dag >= ${age_becomes_semi_responsible}"

global i1b_if_condition "dag >= ${age_becomes_semi_responsible} & receives_ypncp == 1"

* PRIVATE PENSION INCOME
global i2b_if_condition "dag >= ${age_can_retire} & dlrtrd == 1 & l.dlrtrd==1 & receives_ypnoab==1"

global i3a_if_condition "dag >= ${age_can_retire} & dlrtrd == 1 & l.dlrtrd!=1 & l.les_c4 != 2"

global i3b_if_condition "dag >= ${age_can_retire} & dlrtrd == 1 & l.dlrtrd!=1 & l.les_c4 != 2 & receives_ypnoab==1"


* SOCIAL CARE
global s2a_if_condition "dag > 64 & stm >= 15 & stm <= 22" // Need care

global s2b_if_condition "dag > 64 & stm >= 16 & stm <= 21" // Receive care

global s2c_if_condition "dag > 64 & receive_care & stm >= 16 & stm <= 21" // Care mix received

global s2d_if_condition "dag > 64 & receive_informal_care & stm >= 16 & stm <= 21" // Informal care hours received

global s2e_if_condition "dag > 64 & receive_formal_care & stm >= 16 & stm <= 21" // Formal care hours received


global s3a_if_condition "Single & stm >= 15" // Provide care, Singles

global s3b_if_condition "Partnered & stm >= 15" // Provide care, Partnered

global s3c_if_condition "provide_informal_care & Single & stm >= 15" // Informal care hours provided, Singles

global s3d_if_condition "provide_informal_care & Partnered & stm >= 15" // Informal care hours provided, Singles


* Finanicial distress and health processes
* TO ADD
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@

***************************************************************************************
* PROJECT: SimPaths UK: regression estimates for SimPaths using UKHLS data
* DO-FILE NAME: master.do
* DESCRIPTION: Main do-file to set the main parameters (country, paths) and call sub-scripts
***************************************************************************************
* COUNTRY: UK
* DATA: UKHLS EUL version - UKDA-6614-stata [to wave o]
*
* AUTHORS: Daria Popova, Justin van de Ven
* LAST UPDATE: 6 May 2026 DP
***************************************************************************************

***************************************************************************************
* General comments:
* - Note that in the following scripts some standard commands may be
* abbreviated: (gen)erate, (tab)ulate, (sum)marize, (di)splay,
* (cap)ture, (qui)etly, (noi)sily

*Stata packages to install
*ssc install fre
*ssc install tsspell
*ssc install carryforward
*ssc install outreg2
*ssc install oparallel
*ssc install gologit2
*ssc install winsor
*ssc install reghdfe
*ssc install ftools
*ssc install require
*
* NOTES:
* The income and union parameter do file must be run after
* the wage estimates are obtained because they use
* predicted wages. The order of the remaining files is
* arbitrary.
***************************************************************************************
***************************************************************************************

clear all
set more off
set type double
set maxvar 30000
set matsize 1000


/**************************************************************************************
* DEFINE DIRECTORIES
**************************************************************************************/

* Working directory

global path "D:\Dasha\ESSEX\_SimPaths\_SimPaths_UK\input_processing"

global dir_work "${path}\regression_estimates"

* Directory which contains do files
global dir_do "${dir_work}\do"

* Directory which contains log files
global dir_log "${dir_work}\log"

* Directory which contains raw output: Excel and Word tables
global dir_raw_results "${dir_work}\raw_results"

* Directory which contains final Excel files read by the model
global dir_results "${dir_work}\results"

* Pooled dataset for estimates
global estimation_sample "${path}\initial_populations\data\ukhls_pooled_ipop.dta"

* Pooled dataset with predicted wages after Heckman
global estimation_sample2 "${path}\initial_populations\data\UKHLS_pooled_ipop2.dta"

* Directory containing external data used for the estimates (e.g. fertility rates, wage growth)
global dir_external_data "${dir_work}/external_data"

* Directory to save data for internal validation
global dir_validation_data "${dir_work}/internal_validation/data"

/*******************************************************************************
* DEFINE PARAMETERS & PROCESS IF CONDITIONS
*******************************************************************************/

do "${path}\00_master_conditions.do"


/*******************************************************************************
* ESTIMATION FILES
*******************************************************************************/
/*
Two additional do-files are called from each of these do-files
- variable_update.do refactors variable names
- programs.do contains Stata programs to process the output of regressions and create Excel files with results used by Simpaths
*/

do "${dir_do}/01_reg_education.do"

do "${dir_do}/02_reg_leave_parental_home.do"

do "${dir_do}/03_reg_partnership.do"

do "${dir_do}/04_reg_fertility.do"

do "${dir_do}/05_reg_health.do"

do "${dir_do}/06_reg_home_ownership.do"

do "${dir_do}/07_reg_retirement.do"

do "${dir_do}/08_reg_wages.do"

do "${dir_do}/09_reg_income.do"

do "${dir_do}/10_reg_socialcare.do"

/*Note that the do-files below are not yet refactored */
do "${dir_do}/11_reg_financial_distress.do"

do "${dir_do}/12_reg_health_mental.do"

do "${dir_do}/13_reg_health_wellbeing.do"


/**************************************************************************************
* END OF FILE
**************************************************************************************/
Loading
Loading