Skip to content

Commit 96099ce

Browse files
yoffCopilot
andcommitted
Python: visit function parameter and return annotations in new CFG
The new (shared-CFG-based) Python control flow graph in `semmle.python.controlflow.internal.Cfg` previously did not emit CFG nodes for parameter type annotations (`def f(x: T): ...`) or for the return type annotation (`-> T`). The legacy CFG emitted both, and a small number of framework models rely on this: `LocalSources.qll`'s `annotatedInstance` walks the parameter annotation expression by way of its CFG node to track that a parameter receives an instance of the annotated class. After the dataflow flip to the new CFG/SSA this regression manifested as lost flows in any test exercising annotation-based parameter tracking: FastAPI `Depends()` receivers, Pydantic request bodies, Starlette `WebSocket`, the call-graph type-annotation test, and so on. Extend `FunctionDefExpr` to visit each annotation as a child of the function-def expression, in CPython evaluation order: positional parameter annotations, `*args` annotation, keyword-only parameter annotations, `**kwargs` annotation, then the return annotation. (Lambda expressions have no annotations in Python syntax, so `LambdaExpr` is unchanged.) PEP 695 type parameters remain out of scope; they belong to the inner annotation scope, not the enclosing CFG. Restored test results across `framework/aiohttp`, `framework/fastapi`, `framework/lxml`, the `CallGraph-type-annotations` test, and `CWE-022-PathInjection`. Two FastAPI list-comprehension MISSING markers become positive (`taint_test.py:41,55`). CPython CFG consistency remains clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent f79f239 commit 96099ce

9 files changed

Lines changed: 72 additions & 168 deletions

File tree

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
category: minorAnalysis
3+
---
4+
* The new (shared-CFG-based) Python control flow graph now visits parameter and return type annotations as CFG nodes for function definitions, matching the legacy CFG. This restores annotation-based type tracking through framework models such as FastAPI's `Depends()`, Pydantic request models, Starlette `WebSocket` handlers, and any other models that flow a class reference through `Parameter.getAnnotation()` to identify instances of the annotated class.

python/ql/lib/semmle/python/controlflow/internal/AstNodeImpl.qll

Lines changed: 59 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1441,10 +1441,19 @@ module Ast implements AstSig<Py::Location> {
14411441

14421442
/**
14431443
* A function definition expression (visits positional and keyword
1444-
* defaults, but NOT PEP 695 type parameters — those bind in an
1445-
* annotation scope that nests the function body, so they belong to
1446-
* the inner scope's CFG, not the enclosing scope's; the legacy CFG
1447-
* also omitted them).
1444+
* defaults followed by parameter and return type annotations, but NOT
1445+
* PEP 695 type parameters — those bind in an annotation scope that
1446+
* nests the function body, so they belong to the inner scope's CFG,
1447+
* not the enclosing scope's; the legacy CFG also omitted them).
1448+
*
1449+
* Evaluation order follows CPython: defaults are pushed first, then
1450+
* keyword-only defaults, then annotations (the `__annotations__` dict
1451+
* is built last, before `MAKE_FUNCTION`). Annotations are emitted as
1452+
* CFG nodes so that flows from a class reference into a parameter's
1453+
* type annotation are visible to dataflow (e.g. so that framework
1454+
* models like FastAPI's `Depends()` can use a parameter's type hint
1455+
* to track that the parameter receives an instance of the annotated
1456+
* class — see `LocalSources::annotatedInstance`).
14481457
*/
14491458
additional class FunctionDefExpr extends Expr {
14501459
private Py::FunctionExpr funcExpr;
@@ -1468,15 +1477,61 @@ module Ast implements AstSig<Py::Location> {
14681477
rank[n + 1](Py::Expr d, int i | d = funcExpr.getArgs().getKwDefault(i) | d order by i)
14691478
}
14701479

1480+
/**
1481+
* Gets the `n`th annotation expression, in CPython evaluation
1482+
* order: positional parameter annotations (by argument position),
1483+
* `*args` annotation, keyword-only parameter annotations (by
1484+
* argument position), `**kwargs` annotation, then the return
1485+
* annotation. Each annotation appears at most once.
1486+
*/
1487+
Expr getAnnotation(int n) {
1488+
result.asExpr() =
1489+
rank[n + 1](Py::Expr a, int subOrder, int subIndex |
1490+
functionAnnotation(funcExpr, a, subOrder, subIndex)
1491+
|
1492+
a order by subOrder, subIndex
1493+
)
1494+
}
1495+
14711496
int getNumberOfDefaults() { result = count(funcExpr.getArgs().getADefault()) }
14721497

1498+
int getNumberOfKwDefaults() { result = count(funcExpr.getArgs().getAKwDefault()) }
1499+
1500+
int getNumberOfAnnotations() {
1501+
result = count(Py::Expr a | functionAnnotation(funcExpr, a, _, _))
1502+
}
1503+
14731504
override AstNode getChild(int index) {
14741505
result = this.getDefault(index)
14751506
or
14761507
result = this.getKwDefault(index - this.getNumberOfDefaults())
1508+
or
1509+
result = this.getAnnotation(index - this.getNumberOfDefaults() - this.getNumberOfKwDefaults())
14771510
}
14781511
}
14791512

1513+
/**
1514+
* Holds if `a` is an annotation of `funcExpr` in slot
1515+
* `(subOrder, subIndex)`. Slots are CPython evaluation order:
1516+
* positional param annotations (subOrder 0, subIndex = argument
1517+
* position), `*args` annotation (1, 0), keyword-only annotations
1518+
* (2, position), `**kwargs` annotation (3, 0), return annotation
1519+
* (4, 0).
1520+
*/
1521+
private predicate functionAnnotation(
1522+
Py::FunctionExpr funcExpr, Py::Expr a, int subOrder, int subIndex
1523+
) {
1524+
a = funcExpr.getArgs().getAnnotation(subIndex) and subOrder = 0
1525+
or
1526+
a = funcExpr.getArgs().getVarargannotation() and subOrder = 1 and subIndex = 0
1527+
or
1528+
a = funcExpr.getArgs().getKwAnnotation(subIndex) and subOrder = 2
1529+
or
1530+
a = funcExpr.getArgs().getKwargannotation() and subOrder = 3 and subIndex = 0
1531+
or
1532+
a = funcExpr.getReturns() and subOrder = 4 and subIndex = 0
1533+
}
1534+
14801535
/** A lambda expression (has default args evaluated at definition time). */
14811536
additional class LambdaExpr extends Expr {
14821537
private Py::Lambda lambda;
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
testFailures
2-
| type_annotations.py:6:16:6:32 | Comment # $ tt=Foo.method | Missing result: tt=Foo.method |
3-
| type_annotations.py:16:16:16:32 | Comment # $ tt=Foo.method | Missing result: tt=Foo.method |
42
debug_callableNotUnique
53
pointsTo_found_typeTracker_notFound
64
typeTracker_found_pointsTo_notFound
5+
| type_annotations.py:6:5:6:14 | Attribute() | Foo.method |
6+
| type_annotations.py:16:5:16:14 | Attribute() | Foo.method |
77
| type_annotations.py:29:5:29:14 | Attribute() | Foo.method |
Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,3 @@
11
argumentToEnsureNotTaintedNotMarkedAsSpurious
22
untaintedArgumentToEnsureTaintedNotMarkedAsMissing
3-
| taint_test.py:151:9:151:15 | taint_test.py:151 | ERROR, you should add `# $ MISSING: tainted` annotation | request |
4-
| taint_test.py:152:9:152:19 | taint_test.py:152 | ERROR, you should add `# $ MISSING: tainted` annotation | request.url |
5-
| taint_test.py:153:9:153:36 | taint_test.py:153 | ERROR, you should add `# $ MISSING: tainted` annotation | Await |
63
testFailures
7-
| taint_test.py:151:18:151:28 | Comment # $ tainted | Missing result: tainted |
8-
| taint_test.py:152:22:152:32 | Comment # $ tainted | Missing result: tainted |
9-
| taint_test.py:153:39:153:49 | Comment # $ tainted | Missing result: tainted |
10-
| taint_test.py:168:76:168:96 | Comment # $ SPURIOUS: tainted | Fixed spurious result: tainted |
Lines changed: 0 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +0,0 @@
1-
| response_test.py:11:30:11:37 | Parameter | Unexpected result: routedParameter=response |
2-
| response_test.py:12:41:12:151 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieHttpOnly=false |
3-
| response_test.py:12:41:12:151 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieName="key" |
4-
| response_test.py:12:41:12:151 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieSameSite=Lax |
5-
| response_test.py:12:41:12:151 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieSecure=false |
6-
| response_test.py:12:41:12:151 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieValue="value" |
7-
| response_test.py:12:41:12:151 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieWrite |
8-
| response_test.py:13:51:13:161 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieHttpOnly=false |
9-
| response_test.py:13:51:13:161 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieName="key" |
10-
| response_test.py:13:51:13:161 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieSameSite=Lax |
11-
| response_test.py:13:51:13:161 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieSecure=false |
12-
| response_test.py:13:51:13:161 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieValue="value" |
13-
| response_test.py:13:51:13:161 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieWrite |
14-
| response_test.py:14:96:14:205 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=true CookieSameSite=Lax | Missing result: CookieHttpOnly=true |
15-
| response_test.py:14:96:14:205 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=true CookieSameSite=Lax | Missing result: CookieName="key" |
16-
| response_test.py:14:96:14:205 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=true CookieSameSite=Lax | Missing result: CookieSameSite=Lax |
17-
| response_test.py:14:96:14:205 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=true CookieSameSite=Lax | Missing result: CookieSecure=false |
18-
| response_test.py:14:96:14:205 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=true CookieSameSite=Lax | Missing result: CookieValue="value" |
19-
| response_test.py:14:96:14:205 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=true CookieSameSite=Lax | Missing result: CookieWrite |
20-
| response_test.py:15:58:15:221 | Comment # $ headerWriteName="Set-Cookie" headerWriteValue="key2=value2" CookieWrite CookieRawHeader="key2=value2" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieHttpOnly=false |
21-
| response_test.py:15:58:15:221 | Comment # $ headerWriteName="Set-Cookie" headerWriteValue="key2=value2" CookieWrite CookieRawHeader="key2=value2" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieRawHeader="key2=value2" |
22-
| response_test.py:15:58:15:221 | Comment # $ headerWriteName="Set-Cookie" headerWriteValue="key2=value2" CookieWrite CookieRawHeader="key2=value2" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieSameSite=Lax |
23-
| response_test.py:15:58:15:221 | Comment # $ headerWriteName="Set-Cookie" headerWriteValue="key2=value2" CookieWrite CookieRawHeader="key2=value2" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieSecure=false |
24-
| response_test.py:15:58:15:221 | Comment # $ headerWriteName="Set-Cookie" headerWriteValue="key2=value2" CookieWrite CookieRawHeader="key2=value2" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieWrite |
25-
| response_test.py:15:58:15:221 | Comment # $ headerWriteName="Set-Cookie" headerWriteValue="key2=value2" CookieWrite CookieRawHeader="key2=value2" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: headerWriteName="Set-Cookie" |
26-
| response_test.py:15:58:15:221 | Comment # $ headerWriteName="Set-Cookie" headerWriteValue="key2=value2" CookieWrite CookieRawHeader="key2=value2" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: headerWriteValue="key2=value2" |
27-
| response_test.py:16:68:16:231 | Comment # $ headerWriteName="Set-Cookie" headerWriteValue="key2=value2" CookieWrite CookieRawHeader="key2=value2" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieHttpOnly=false |
28-
| response_test.py:16:68:16:231 | Comment # $ headerWriteName="Set-Cookie" headerWriteValue="key2=value2" CookieWrite CookieRawHeader="key2=value2" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieRawHeader="key2=value2" |
29-
| response_test.py:16:68:16:231 | Comment # $ headerWriteName="Set-Cookie" headerWriteValue="key2=value2" CookieWrite CookieRawHeader="key2=value2" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieSameSite=Lax |
30-
| response_test.py:16:68:16:231 | Comment # $ headerWriteName="Set-Cookie" headerWriteValue="key2=value2" CookieWrite CookieRawHeader="key2=value2" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieSecure=false |
31-
| response_test.py:16:68:16:231 | Comment # $ headerWriteName="Set-Cookie" headerWriteValue="key2=value2" CookieWrite CookieRawHeader="key2=value2" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieWrite |
32-
| response_test.py:16:68:16:231 | Comment # $ headerWriteName="Set-Cookie" headerWriteValue="key2=value2" CookieWrite CookieRawHeader="key2=value2" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: headerWriteName="Set-Cookie" |
33-
| response_test.py:16:68:16:231 | Comment # $ headerWriteName="Set-Cookie" headerWriteValue="key2=value2" CookieWrite CookieRawHeader="key2=value2" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: headerWriteValue="key2=value2" |
34-
| response_test.py:17:53:17:116 | Comment # $ headerWriteName="X-MyHeader" headerWriteValue="header-value" | Missing result: headerWriteName="X-MyHeader" |
35-
| response_test.py:17:53:17:116 | Comment # $ headerWriteName="X-MyHeader" headerWriteValue="header-value" | Missing result: headerWriteValue="header-value" |
36-
| response_test.py:23:26:23:29 | Parameter | Unexpected result: routedParameter=resp |
37-
| response_test.py:41:42:41:49 | Parameter | Unexpected result: routedParameter=response |
38-
| response_test.py:48:41:48:151 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieHttpOnly=false |
39-
| response_test.py:48:41:48:151 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieName="key" |
40-
| response_test.py:48:41:48:151 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieSameSite=Lax |
41-
| response_test.py:48:41:48:151 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieSecure=false |
42-
| response_test.py:48:41:48:151 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieValue="value" |
43-
| response_test.py:48:41:48:151 | Comment # $ CookieWrite CookieName="key" CookieValue="value" CookieSecure=false CookieHttpOnly=false CookieSameSite=Lax | Missing result: CookieWrite |
44-
| response_test.py:49:87:49:184 | Comment # $ headerWriteName="Custom-Response-Type" headerWriteValue="yes, but only after function has run" | Missing result: headerWriteName="Custom-Response-Type" |
45-
| response_test.py:49:87:49:184 | Comment # $ headerWriteName="Custom-Response-Type" headerWriteValue="yes, but only after function has run" | Missing result: headerWriteValue="yes, but only after function has run" |

0 commit comments

Comments
 (0)