Skip to content

Python: Improve some flow summaries#22101

Merged
hvitved merged 2 commits into
github:mainfrom
hvitved:python/flow-summaries-improvements
Jul 1, 2026
Merged

Python: Improve some flow summaries#22101
hvitved merged 2 commits into
github:mainfrom
hvitved:python/flow-summaries-improvements

Conversation

@hvitved

@hvitved hvitved commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

This PR rewrites some Python flow summaries into equivalent* but much more performant summaries. Some summaries, like builtins.enumerate, could be rewritten directly by making use of existing ContentSets, while others, such as builtins.dict, required adding support for "with content".

*The flow summary for builtins.zip has been generalized by removing the restriction that only the first two arguments are taken into account.

DCA looks excellent; we reduce analysis time on dask__dask by 95 %, but also recover the performance on saltstack__salt that was originally lost on #21888 (note that the performance lost on that PR for ytdl-org__youtube-dl was already recovered in #21941).

@github-actions github-actions Bot added the Python label Jul 1, 2026
@hvitved hvitved force-pushed the python/flow-summaries-improvements branch from 34db1c7 to a5444b5 Compare July 1, 2026 10:06
preservesValue = true
)
input = "Argument[0].WithAnyDictionaryElement" and
output = "ReturnValue" and

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to change the meaning of the flow summary - previously it went from Argument[0].DictionaryElement[x] to ReturnValue.DictionaryElement[x] for all keys x, and now it goes from Argument[0].WithAnyDictionaryElement to ReturnValue. Can you explain?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argument[0].DictionaryElement[x] is saying that data must be stored inside a dictionary value with some key, which has the exact same effect as the original flow summary, except it doesn't compile down to a bunch of read-steps followed by a bunch of identical store-steps, and it is hence much more performant.

@owen-mc owen-mc Jul 1, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I get it now. I've never come across this mechanism before - go seems to be one of the few languages not using it. So, if I've understood correctly, if the input is <input>.WithAnyDictionaryElement and the output is <output> then the MaD machinery will create flow from <input> to <output> as long as there is flow to <input>.AnyDictionaryElement? A little bit like a lookahead in regexes, where it checks for the existence of something without consuming it.

@hvitved hvitved added the no-change-note-required This PR does not need a change note label Jul 1, 2026
@hvitved hvitved marked this pull request as ready for review July 1, 2026 12:32
@hvitved hvitved requested a review from a team as a code owner July 1, 2026 12:32
Copilot AI review requested due to automatic review settings July 1, 2026 12:32

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Python dataflow flow summaries to be (mostly) equivalent while significantly improving performance by relying more on existing ContentSet encodings and adding support for summaries that depend on “with content”. It also broadens the builtins.zip modeling by removing the prior limitation that only the first two arguments were considered.

Changes:

  • Rework several stdlib flow summaries to use Any*Element / WithAny*Element-style encodings instead of per-index/per-key expansion.
  • Add infrastructure to encode and consume “with content” in flow summaries and plumb “expects content” through the dataflow internals.
  • Update Python library tests to reflect the new (generalized) behavior, including new expected and known-spurious flows.
Show a summary per file
File Description
python/ql/test/library-tests/frameworks/django-orm/testapp/orm_tests.py Updates ORM test expectations to reflect newly modeled flow through in_bulk().values().
python/ql/test/library-tests/dataflow/coverage/test_builtins.py Adjusts zip tuple test expectations for generalized argument handling and records a known spurious flow.
python/ql/lib/semmle/python/frameworks/Stdlib.qll Rewrites several stdlib flow summaries to use broader content encodings and “with content” forms.
python/ql/lib/semmle/python/dataflow/new/internal/FlowSummaryImpl.qll Adds encoding helper for “with content” in flow summary representations.
python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPrivate.qll Hooks expectsContent up to flow-summary-specific content expectations.

Review details

  • Files reviewed: 4/5 changed files
  • Comments generated: 0
  • Review effort level: Low

@hvitved hvitved merged commit 6c3c5ea into github:main Jul 1, 2026
20 checks passed
@hvitved hvitved deleted the python/flow-summaries-improvements branch July 1, 2026 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-change-note-required This PR does not need a change note Python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants