Conversation
|
Hi @ANBurdett - sorry wasn't at meeting today to review. Have had a quick look and changed the PR to go to |
andrewbaxter439
left a comment
There was a problem hiding this comment.
Hi @ANBurdett thanks for doing this - the programs file and associated functions look great and tidy up the code a lot! thanks for refactoring files accordingly 👍
I've left a couple of comments on parts that don't work on my computer, though I'm not sure why this is only my machine (maybe Windows does it in a buggy way?). I've made suggestions on potential fixes to try. These are based around my understanding of how the matrix is being written, with the old code writing all as one block and the new code inserting cell by cell. If it doesn't make sense, or if there is a clear reason why the nested-loop method for matrix writing is used then can perhaps look at other fixes.
I might try and get another Windows machine running the code to see if there's a shared problem or whether it's just a strange aspect of my own machine!
| void write_all_to_excel() { | ||
| b_trimmed = st_matrix("b_trimmed") | ||
| V_trimmed = st_matrix("V_trimmed") | ||
| n = cols(b_trimmed) | ||
| for (i=1; i<=n; i++) { | ||
| row = i + 1 | ||
| coef = b_trimmed[1,i] | ||
| stata("quietly putexcel B" + strofreal(row) + " = (" + strofreal(coef) + ")") | ||
| } | ||
| printf("Writing V-C matrix\n") | ||
| for (i=1; i<=n; i++) { | ||
| for (j=1; j<=n; j++) { | ||
| row = i + 1 | ||
| col_num = j + 2 | ||
| col_name = "" | ||
| temp = col_num | ||
| while (temp > 0) { | ||
| rem = mod(temp - 1, 26) | ||
| col_name = char(65 + rem) + col_name | ||
| temp = floor((temp - 1) / 26) | ||
| } | ||
| val = V_trimmed[i,j] | ||
| stata("quietly putexcel " + col_name + strofreal(row) + " = (" + strofreal(val) + ")") | ||
| } | ||
| if (mod(i, 5) == 0) printf(" Row %g/%g\n", i, n) | ||
| } | ||
| printf("Done\n") | ||
| } |
There was a problem hiding this comment.
As a brief thought on this - when I run the process this is the part my computer gets stuck on each time. I think the problem it faces is that each iteration in the loop a) opens the Excel file, b) writes one cell, c) closes the Excel file. But at some point in each run it crashes when it doesn't register as having completed c) before starting a) on the next value (file is locked for editing).
In the prior code, the matrix was written as one block, so the Excel file was opened, written and saved once for the whole matrix. I experimented with this on a different branch and wonder if this would achieve the same result but smoother?
| } | |
| void write_all_to_excel() { | |
| printf("Writing V-C matrix\n") | |
| stata("quietly putexcel B2 = matrix(b_trimmed')") | |
| stata("quietly putexcel C2 = matrix(V_trimmed)") | |
| printf("Done\n") | |
| } |
No stata programming expert here, so do review and if there was a reason for the rowXcolumn looping over a one-block writing then this wouldn't be a fix. I'm not sure otherwise though how to get Stata putexcel to stop tripping over itself.
There was a problem hiding this comment.
In general this seems preferable - it's more understandable and potentially a lot faster (since it only writes to a file twice instead of i*j times). I'm also not a Mata expert by any means, but I don't think the function is doing anything other than this - maybe @dariaple or @dav-sonn would be able to confirm.
Since it only calls Stata commands, I would imagine this could then be a Stata program rather than a Mata function? But maybe not if you want to keep edits minimal for the purpose of this PR.
There was a problem hiding this comment.
Looks good to me.
Not sure why I took the brute force approach, I think I was originally wanting to maintain more control to ensure maximum flexibility, but I agree it's overcomplicated, slow and not needed here.
Further, I agree that it could be written as a Stata command, however I don't think it will make much of a difference. Thanks both!
| for (i=1; i<=n; i++) { | ||
| for (j=1; j<=n; j++) { | ||
| row = i + 1 | ||
| col_num = j + 2 | ||
| col_name = "" | ||
| temp = col_num | ||
| while (temp > 0) { | ||
| rem = mod(temp - 1, 26) | ||
| col_name = char(65 + rem) + col_name | ||
| temp = floor((temp - 1) / 26) | ||
| } | ||
| // Only write actual variance on diagonal, zero elsewhere | ||
| if (i == j) val = V_trimmed[i,j] | ||
| else val = 0 | ||
| stata("quietly putexcel " + col_name + strofreal(row) + " = (" + strofreal(val) + ")") | ||
| } | ||
| if (mod(i, 5) == 0) printf(" Row %g/%g\n", i, n) | ||
| } |
There was a problem hiding this comment.
In relation to above comment on block-writing of matrices, this one seems a little tricker as it's modifying matrix values as it writes them. This I presume could be changed to first zero the off-diagonals in memory then block-write? Something like:
for (i=1; i<=n; i++) {
for (j=1; j<=n; j++) {
if (i != j) V_trimmed[i,j] = 0
}
}
stata("quietly putexcel C2 = matrix(V_trimmed)")
is perhaps a useful approach?
There was a problem hiding this comment.
As above, this seems preferable and likely a big performance improvement since it only writes once instead of i*j times.
There was a problem hiding this comment.
I just looked at this briefly, but I think the following should work:
void write_diagonal_to_excel() {
V_trimmed = st_matrix("V_trimmed")
b_trimmed = st_matrix("b_trimmed")
stata("quietly putexcel B2 = matrix(b_trimmed')")
printf("Creating diagonal matrix...\n")
V_diag = diag(diagonal(V_trimmed))
st_replacematrix("V_trimmed", V_diag)
stata("quietly putexcel C2 = matrix(V_trimmed)")
printf("Done (Diagonal matrix written)\n")
}
Let me know what you think.
There was a problem hiding this comment.
Hi @andrewbaxter439, on second thought, let me take a closer look before you try running the file. I'll check for any further loops and putpexel calls, as those seem to be the main issues. Give me a couple of days, and I’ll provide an updated version. Thanks!
|
Also to note - these set of do files currently will not work for financial_distress, health_wellbeing and health_mental as they don't use the updated structure of variable names in the regression itself - the reg files produced by these codes wont match up with Enums in the SimPaths code. This is something I'd started working on in #359 (example there for financial distress) but was dependent on the programs.do updates. I wonder if once the refactored programs.do is working I should do a PR with my updated changes to this branch before it's commited to develop? |
I agree, this will need done - I think there are three options:
Pragmatically I'd be tempted to suggest 3 even though it's a bit "dirty" (in that develop will temporarily be in an inconsistent state), because the others add a fair bit of faff for (IMO) limited gain? This PR is already quite large. |
|
Hello, Quick update: Following a request from Justin, instead of using a PR on the main repo to continue working on the do files, I have invited @andrewbaxter439 and @igelstorm to my forked repo ( Please let me know if you think there is a better approach for WIP code sharing. Again, I'm still learning, so feedback is very warmly welcomed. Thanks again for your patience. Regarding the files themselves, I updated the Note, I have also added a few programs to the do file that automate the populating of the Heckman wage models Excel files. This process was not previously incorporated into the automated workflow. You can ignore these. All the best, Ashley |
|
Closing pull request to wait for issue to be fully resolved. |
Apologies for the confusion last time; it seems I'm still not as fluent with GitHub as I would like to be. Hopefully the new files are now in this branch...
I've copied the text from the previous branch below for reference:
In this branch, I have included the draft programs do files, in which all of the Excel files containing the process estimates (except wages) are produced using the programs file.
A few things worth noting:
The Excel file name is now dynamic and fed into the relevant program.
There are a few different possible processing programs. The one to use depends on the type of model you are running.
Probit and logit models are processed in the same way, even if the probit option is always used.
Review Request:
@andrewbaxter439 - It would be great if you could take a look at the files for the well-being, mental health and financial distress to see if they are working as expected.
Please let me know if you spot any bugs or have any issues.
The one obvious thing that is outstanding to me is that some of the labels are incorrect. Specifically, it is easier to create dummy variables from categorical variables with the name that will be picked up by SimPaths. Lags are dealt with in the code, but category names are not. I think the best place for these variables to be created is in the variable_update do-file. @dariaple is currently updating this file (along with the full set of estimation do files) with the new variable names, so perhaps this is best added after she has completed that work. However, this shouldn't impact the programs themselves it just alters what is fed into them.