Clarify explanation of requires_grad in PyTorch#3717
Clarify explanation of requires_grad in PyTorch#3717FlightVin wants to merge 6 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3717
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 46faa19 with merge base b863c3b ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
random: Should I also link this? |
Fix broken links
|
@FlightVin thanks for this update. Can you fix the lint issues from the failing lint check. |
|
Forgot this wasn't work lol, I'll request reviews from the UI, sorry for the tag 😭 |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Fixes #3716
Description
It was challenging for me to initially grasp why
requires_gradwas done afterweightsinitialization, but in the same line asbias.The existing explanation ("we don't want that step included in the gradient") is technically correct but omits the practical consequence: Leaf Node status.
If
requires_grad=Trueis set before the initialization math (the division bysqrt(n)), theweightstensor records that operation and becomes a calculated output (non-leaf node) rather than a source parameter. This makes it impossible for optimizers to update it.This PR clarifies that we set
requires_gradafter the math to ensure the tensor remains a trainable Leaf Node.Checklist
P.S. Leaving the 3rd point unchecked since there are no label as of yet in the issue.
cc @svekars @sekyondaMeta @AlannaBurke @albanD @jbschlosser