Thanks for creating this convinient package!
I've tested the example script on my machine for the $3 \times 4$ case and it runs pretty well. (the doED is a simple function that wraps up the whole script) Also, the multi-core parallelling works well, 59% of my 96 cpus are used to run this work.

However, when I simply change the script from $3 \times 4$ to $4 \times 4$ case, the running process uses only 1 cpu, though the computation time seems reasonably aligning with scaling.

I wonder if this is an issue or a feature. And I want to know why you choose to demonstrate $3 \times 4$ case in the sample script? Thank you!
Thanks for creating this convinient package!$3 \times 4$ case and it runs pretty well. (the

$3 \times 4$ to $4 \times 4$ case, the running process uses only 1 cpu, though the computation time seems reasonably aligning with scaling.

$3 \times 4$ case in the sample script? Thank you!
I've tested the example script on my machine for the
doEDis a simple function that wraps up the whole script) Also, the multi-core parallelling works well, 59% of my 96 cpus are used to run this work.However, when I simply change the script from
I wonder if this is an issue or a feature. And I want to know why you choose to demonstrate