Skip to content

AB-programs do file take2#451

Closed
ANBurdett wants to merge 2 commits intodevelopfrom
AB-programs_do_file_take2
Closed

AB-programs do file take2#451
ANBurdett wants to merge 2 commits intodevelopfrom
AB-programs_do_file_take2

Conversation

@ANBurdett
Copy link
Copy Markdown
Contributor

Apologies for the confusion last time; it seems I'm still not as fluent with GitHub as I would like to be. Hopefully the new files are now in this branch...

I've copied the text from the previous branch below for reference:

In this branch, I have included the draft programs do files, in which all of the Excel files containing the process estimates (except wages) are produced using the programs file.

A few things worth noting:

The Excel file name is now dynamic and fed into the relevant program.
There are a few different possible processing programs. The one to use depends on the type of model you are running.
Probit and logit models are processed in the same way, even if the probit option is always used.
Review Request:
@andrewbaxter439 - It would be great if you could take a look at the files for the well-being, mental health and financial distress to see if they are working as expected.
Please let me know if you spot any bugs or have any issues.

The one obvious thing that is outstanding to me is that some of the labels are incorrect. Specifically, it is easier to create dummy variables from categorical variables with the name that will be picked up by SimPaths. Lags are dealt with in the code, but category names are not. I think the best place for these variables to be created is in the variable_update do-file. @dariaple is currently updating this file (along with the full set of estimation do files) with the new variable names, so perhaps this is best added after she has completed that work. However, this shouldn't impact the programs themselves it just alters what is fed into them.

@ANBurdett ANBurdett self-assigned this Apr 1, 2026
@ANBurdett ANBurdett changed the title Ab programs do file take2 AB-programs do file take2 Apr 1, 2026
@andrewbaxter439 andrewbaxter439 changed the base branch from main to develop April 7, 2026 14:26
@andrewbaxter439
Copy link
Copy Markdown
Collaborator

Hi @ANBurdett - sorry wasn't at meeting today to review. Have had a quick look and changed the PR to go to develop branch as I think this helpfully is only including the one commit needing reviewed. Will try and give it a run and make sure it's working, apologies for the delay in this!

Copy link
Copy Markdown
Collaborator

@andrewbaxter439 andrewbaxter439 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ANBurdett thanks for doing this - the programs file and associated functions look great and tidy up the code a lot! thanks for refactoring files accordingly 👍

I've left a couple of comments on parts that don't work on my computer, though I'm not sure why this is only my machine (maybe Windows does it in a buggy way?). I've made suggestions on potential fixes to try. These are based around my understanding of how the matrix is being written, with the old code writing all as one block and the new code inserting cell by cell. If it doesn't make sense, or if there is a clear reason why the nested-loop method for matrix writing is used then can perhaps look at other fixes.

I might try and get another Windows machine running the code to see if there's a shared problem or whether it's just a strange aspect of my own machine!

Comment on lines 27 to 54
void write_all_to_excel() {
b_trimmed = st_matrix("b_trimmed")
V_trimmed = st_matrix("V_trimmed")
n = cols(b_trimmed)
for (i=1; i<=n; i++) {
row = i + 1
coef = b_trimmed[1,i]
stata("quietly putexcel B" + strofreal(row) + " = (" + strofreal(coef) + ")")
}
printf("Writing V-C matrix\n")
for (i=1; i<=n; i++) {
for (j=1; j<=n; j++) {
row = i + 1
col_num = j + 2
col_name = ""
temp = col_num
while (temp > 0) {
rem = mod(temp - 1, 26)
col_name = char(65 + rem) + col_name
temp = floor((temp - 1) / 26)
}
val = V_trimmed[i,j]
stata("quietly putexcel " + col_name + strofreal(row) + " = (" + strofreal(val) + ")")
}
if (mod(i, 5) == 0) printf(" Row %g/%g\n", i, n)
}
printf("Done\n")
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a brief thought on this - when I run the process this is the part my computer gets stuck on each time. I think the problem it faces is that each iteration in the loop a) opens the Excel file, b) writes one cell, c) closes the Excel file. But at some point in each run it crashes when it doesn't register as having completed c) before starting a) on the next value (file is locked for editing).

In the prior code, the matrix was written as one block, so the Excel file was opened, written and saved once for the whole matrix. I experimented with this on a different branch and wonder if this would achieve the same result but smoother?

Suggested change
}
void write_all_to_excel() {
printf("Writing V-C matrix\n")
stata("quietly putexcel B2 = matrix(b_trimmed')")
stata("quietly putexcel C2 = matrix(V_trimmed)")
printf("Done\n")
}

No stata programming expert here, so do review and if there was a reason for the rowXcolumn looping over a one-block writing then this wouldn't be a fix. I'm not sure otherwise though how to get Stata putexcel to stop tripping over itself.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this seems preferable - it's more understandable and potentially a lot faster (since it only writes to a file twice instead of i*j times). I'm also not a Mata expert by any means, but I don't think the function is doing anything other than this - maybe @dariaple or @dav-sonn would be able to confirm.

Since it only calls Stata commands, I would imagine this could then be a Stata program rather than a Mata function? But maybe not if you want to keep edits minimal for the purpose of this PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
Not sure why I took the brute force approach, I think I was originally wanting to maintain more control to ensure maximum flexibility, but I agree it's overcomplicated, slow and not needed here.
Further, I agree that it could be written as a Stata command, however I don't think it will make much of a difference. Thanks both!

Comment on lines +124 to +141
for (i=1; i<=n; i++) {
for (j=1; j<=n; j++) {
row = i + 1
col_num = j + 2
col_name = ""
temp = col_num
while (temp > 0) {
rem = mod(temp - 1, 26)
col_name = char(65 + rem) + col_name
temp = floor((temp - 1) / 26)
}
// Only write actual variance on diagonal, zero elsewhere
if (i == j) val = V_trimmed[i,j]
else val = 0
stata("quietly putexcel " + col_name + strofreal(row) + " = (" + strofreal(val) + ")")
}
if (mod(i, 5) == 0) printf(" Row %g/%g\n", i, n)
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In relation to above comment on block-writing of matrices, this one seems a little tricker as it's modifying matrix values as it writes them. This I presume could be changed to first zero the off-diagonals in memory then block-write? Something like:

for (i=1; i<=n; i++) {
    for (j=1; j<=n; j++) {
        if (i != j) V_trimmed[i,j] = 0
    }
}

stata("quietly putexcel C2 = matrix(V_trimmed)")

is perhaps a useful approach?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, this seems preferable and likely a big performance improvement since it only writes once instead of i*j times.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked at this briefly, but I think the following should work:

void write_diagonal_to_excel() {
   
    V_trimmed = st_matrix("V_trimmed")
    b_trimmed = st_matrix("b_trimmed")
    
    stata("quietly putexcel B2 = matrix(b_trimmed')")
    
    printf("Creating diagonal matrix...\n")
    
    V_diag = diag(diagonal(V_trimmed))
    
    st_replacematrix("V_trimmed", V_diag)
    
    stata("quietly putexcel C2 = matrix(V_trimmed)")
    
    printf("Done (Diagonal matrix written)\n")
}

Let me know what you think.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @andrewbaxter439, on second thought, let me take a closer look before you try running the file. I'll check for any further loops and putpexel calls, as those seem to be the main issues. Give me a couple of days, and I’ll provide an updated version. Thanks!

@andrewbaxter439
Copy link
Copy Markdown
Collaborator

Also to note - these set of do files currently will not work for financial_distress, health_wellbeing and health_mental as they don't use the updated structure of variable names in the regression itself - the reg files produced by these codes wont match up with Enums in the SimPaths code. This is something I'd started working on in #359 (example there for financial distress) but was dependent on the programs.do updates. I wonder if once the refactored programs.do is working I should do a PR with my updated changes to this branch before it's commited to develop?

@igelstorm
Copy link
Copy Markdown
Contributor

Also to note - these set of do files currently will not work for financial_distress, health_wellbeing and health_mental as they don't use the updated structure of variable names in the regression itself - the reg files produced by these codes wont match up with Enums in the SimPaths code. This is something I'd started working on in #359 (example there for financial distress) but was dependent on the programs.do updates. I wonder if once the refactored programs.do is working I should do a PR with my updated changes to this branch before it's commited to develop?

I agree, this will need done - I think there are three options:

  1. What you suggested (add the changes from Feature/refactor health do files #359 into this PR)
  2. Update the variable names in these three do files in a separate PR (but without updating the verbose writing-to-Excel code), merge that first, then rebase/merge this branch to incorporate those changes
  3. Merge this PR first, accept that these do files will temporarily be broken in develop (i.e. generate output that doesn't work with the model), and then make your fixes from Feature/refactor health do files #359 in a new PR

Pragmatically I'd be tempted to suggest 3 even though it's a bit "dirty" (in that develop will temporarily be in an inconsistent state), because the others add a fair bit of faff for (IMO) limited gain? This PR is already quite large.

@ANBurdett
Copy link
Copy Markdown
Contributor Author

Hello,

Quick update: Following a request from Justin, instead of using a PR on the main repo to continue working on the do files, I have invited @andrewbaxter439 and @igelstorm to my forked repo (SimPaths_AB) with the thought that we could collaborate on proposed changes there and still use the PR-track changes + discussion functions without "prematurely" flagging a PR to Justin.

Please let me know if you think there is a better approach for WIP code sharing. Again, I'm still learning, so feedback is very warmly welcomed. Thanks again for your patience.

Regarding the files themselves, I updated the programs do-file in the branch (AB-programs do file take2) to avoid brute force populating of the var-cov matrices and the labels (less of an issue, but while we are cleaning up, why not). Unfortunately, the track changes weren't preserved when I was trying to figure out the repos and branches for collaboration, but given it's not too much code, hopefully it's not too inconvenient.

Note, I have also added a few programs to the do file that automate the populating of the Heckman wage models Excel files. This process was not previously incorporated into the automated workflow. You can ignore these.

All the best,

Ashley

@justin-ven
Copy link
Copy Markdown
Contributor

Closing pull request to wait for issue to be fully resolved.

@justin-ven justin-ven closed this Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants