security(dci): allowlist external predicate symbols (fields + metrics)#251
Conversation
External DCI Social Registry predicate searches compiled sender-supplied CEL against res.partner with no restriction on which symbols a predicate may reference. Because a predicate returns total_count and pageable matches, this turned the search into an oracle for sensitive data one query at a time. The prior fix (#235) denylisted sensitive DCI metric accessors via a regex over the raw predicate text. That was incomplete: - Bypassable: the CEL lexer decodes \. to ., so metric('r\.dci\.dr\.severity', me, arg='Vision') evaded the regex but still resolved to the metric. - Fail-open: new sensitive metrics had to be remembered in a separate module. - It only covered metrics. The same disability data is reachable via plain partner field paths (r.disability_severity_id == 5) and via relation traversal (enrollments.exists(m, m.partner_id.disability_severity_id == 5)), neither of which is a metric. Replace the denylist with a positive, default-deny allowlist enforced on the parsed/resolved CEL AST (robust to source obfuscation and domain rewriting), gated inside compile_expression before translation/execution so the check cannot diverge from what runs: - spp_cel_domain: cel_predicate_guard.collect_referenced_symbols walks the AST for field paths, functions, relation methods and DCI metric accessors; CELService.compile_expression gains a predicate_policy arg enforced via _enforce_predicate_policy. - spp_dci_server_social: external predicates may reference only allowlisted person fields, safe scalar functions (age_years/date/today/now) and r.dci.sr.*/r.dci.ibr.* metrics; disability/CRVS fields & metrics, relations and arbitrary partner fields are denied. Tests: AST walker + enforcement units, escaped-dot/field-path/relation/function regressions. spp_dci_server_social 72/72, spp_cel_domain 616/616, spp_dci_indicators 81/81.
There was a problem hiding this comment.
Code Review
This pull request replaces the regex-based denylist for externally-supplied CEL predicates with a robust, AST-based positive allowlist guard (cel_predicate_guard.py). This guard is integrated into the CEL service compilation flow and applied to the DCI Social Registry search service to prevent sensitive fields or metrics from being used as a side-channel oracle. Feedback on this PR highlights that the current AST symbol collector does not track or resolve relation lambda variables (e.g., m in enrollments.exists(m, ...)), which makes the allow_relations: True policy option fragile. A detailed code suggestion is provided to track active lambda variables during the AST walk and canonicalize them to their relation paths.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| def collect_referenced_symbols(ast) -> ReferencedSymbols: | ||
| """Walk a parsed CEL AST and return the symbols it references. | ||
|
|
||
| Args: | ||
| ast: a node produced by ``cel_parser.parse``. | ||
|
|
||
| Returns: | ||
| ReferencedSymbols with: | ||
| - fields: dotted field paths (root ``r``/``me`` stripped), and bare | ||
| identifiers used as fields. | ||
| - functions: names of plain ``name(...)`` calls (excluding ``metric``). | ||
| - methods: flattened names of ``a.b(...)`` calls that are not DCI | ||
| metric accessors (relation methods like | ||
| ``enrollments.exists``, namespaced calls, ...). | ||
| - metrics: canonical DCI metric names (``r.dci.…``) and the decoded | ||
| names of ``metric('…', …)`` calls; ``DYNAMIC_METRIC`` when | ||
| a metric name cannot be determined statically. | ||
| """ | ||
| out = ReferencedSymbols() | ||
| _walk(ast, out) | ||
| return out | ||
|
|
||
|
|
||
| def _flatten_attr(node): | ||
| """Flatten an Attr/Ident chain to a dotted string, or return None. | ||
|
|
||
| ``r.dci.dr.severity`` -> "r.dci.dr.severity"; returns None if the chain is | ||
| not rooted at a plain identifier (e.g. it is built on a call result). | ||
| """ | ||
| parts = [] | ||
| cur = node | ||
| while isinstance(cur, P.Attr): | ||
| parts.append(cur.name) | ||
| cur = cur.obj | ||
| if isinstance(cur, P.Ident): | ||
| parts.append(cur.name) | ||
| return ".".join(reversed(parts)) | ||
| return None | ||
|
|
||
|
|
||
| def _classify_path(path, out): | ||
| """Classify a flattened ``root.a.b`` path as a metric or a field.""" | ||
| head, _, rest = path.partition(".") | ||
| if head in _RECORD_ROOTS: | ||
| if rest: | ||
| first = rest.split(".", 1)[0] | ||
| if first in _DCI_METRIC_ROOTS: | ||
| # Canonicalise me.dci.* to r.dci.* so the allowlist is uniform. | ||
| out.metrics.add("r." + rest) | ||
| else: | ||
| out.fields.add(rest) | ||
| # bare `r` / `me` alone is the record, not a field | ||
| return | ||
| # A dotted path not rooted at the record (e.g. a relation lambda var | ||
| # `m.partner_id.x`): record the whole path so it fails the field allowlist. | ||
| out.fields.add(path) | ||
|
|
||
|
|
||
| def _walk(node, out): # noqa: C901 - explicit per-node dispatch is clearest here | ||
| if node is None: | ||
| return | ||
|
|
||
| if isinstance(node, P.Call): | ||
| _walk_call(node, out) | ||
| return | ||
|
|
||
| if isinstance(node, P.Attr): | ||
| path = _flatten_attr(node) | ||
| if path is not None: | ||
| _classify_path(path, out) | ||
| else: | ||
| # Chain rooted at something other than an identifier (e.g. a call | ||
| # result): walk the base so nested references are still seen. | ||
| _walk(node.obj, out) | ||
| return | ||
|
|
||
| if isinstance(node, P.Ident): | ||
| if node.name not in _RECORD_ROOTS: | ||
| # The translator treats a bare identifier as a field on the root | ||
| # model, so an external predicate could reference a field this way. | ||
| out.fields.add(node.name) | ||
| return | ||
|
|
||
| if isinstance(node, P.Literal): | ||
| return | ||
|
|
||
| # Structural nodes: recurse into every child expression. | ||
| for child in _children(node): | ||
| _walk(child, out) | ||
|
|
||
|
|
||
| def _walk_call(node, out): | ||
| func = node.func | ||
| if isinstance(func, P.Ident): | ||
| if func.name == "metric": | ||
| out.metrics.add(_metric_name(node)) | ||
| # Skip the name literal; still walk the remaining args/kwargs. | ||
| for arg in node.args[1:]: | ||
| _walk(arg, out) | ||
| for v in (node.kwargs or {}).values(): | ||
| _walk(v, out) | ||
| return | ||
| out.functions.add(func.name) | ||
| elif isinstance(func, P.Attr): | ||
| flat = _flatten_attr(func) | ||
| if flat and _is_dci_metric_path(flat): | ||
| head, _, rest = flat.partition(".") | ||
| out.metrics.add("r." + rest) | ||
| else: | ||
| # Relation method (enrollments.exists), namespaced call, etc. | ||
| out.methods.add(flat if flat is not None else "<dynamic-method>") | ||
| if flat is None: | ||
| _walk(func.obj, out) | ||
| # Walk arguments and keyword-argument values of any call. | ||
| for arg in node.args: | ||
| _walk(arg, out) | ||
| for v in (node.kwargs or {}).values(): | ||
| _walk(v, out) | ||
|
|
There was a problem hiding this comment.
The current implementation of collect_referenced_symbols does not track or resolve relation lambda variables (e.g., m in enrollments.exists(m, m.partner_id.x)). Instead, it classifies any dotted path not rooted at r or me as a literal field path (e.g., m.partner_id.x).
This makes the allow_relations: True policy option extremely fragile and practically unusable. If a client uses an arbitrary lambda variable name (like x instead of m), the query will be rejected unless the exact arbitrary name is allowlisted. Conversely, if they use m, it might be allowed only if the server specifically allowlisted m.field, which is highly fragile.
To make this robust, we can track active lambda variables during the AST walk and canonicalize them to their relation paths (e.g., mapping m.partner_id.x to enrollments.partner_id.x). This allows the policy to specify exactly which relation fields are allowed (e.g., enrollments.partner_id.name) regardless of the arbitrary variable name chosen by the client.
def collect_referenced_symbols(ast) -> ReferencedSymbols:
"""Walk a parsed CEL AST and return the symbols it references.
Args:
ast: a node produced by ``cel_parser.parse``.
Returns:
ReferencedSymbols with:
- fields: dotted field paths (root ``r``/``me`` stripped), and bare
identifiers used as fields.
- functions: names of plain ``name(...)`` calls (excluding ``metric``).
- methods: flattened names of ``a.b(...)`` calls that are not DCI
metric accessors (relation methods like
``enrollments.exists``, namespaced calls, ...).
- metrics: canonical DCI metric names (``r.dci.…``) and the decoded
names of ``metric('…', …)`` calls; ``DYNAMIC_METRIC`` when
a metric name cannot be determined statically.
"""
out = ReferencedSymbols()
_walk(ast, out, {})
return out
def _flatten_attr(node):
"""Flatten an Attr/Ident chain to a dotted string, or return None.
``r.dci.dr.severity`` -> "r.dci.dr.severity"; returns None if the chain is
not rooted at a plain identifier (e.g. it is built on a call result).
"""
parts = []
cur = node
while isinstance(cur, P.Attr):
parts.append(cur.name)
cur = cur.obj
if isinstance(cur, P.Ident):
parts.append(cur.name)
return ".".join(reversed(parts))
return None
def _classify_path(path, out, lambda_vars):
"""Classify a flattened ``root.a.b`` path as a metric or a field."""
head, _, rest = path.partition(".")
if head in lambda_vars:
relation_path = lambda_vars[head]
canonical_path = f"{relation_path}.{rest}" if rest else relation_path
out.fields.add(canonical_path)
return
if head in _RECORD_ROOTS:
if rest:
first = rest.split(".", 1)[0]
if first in _DCI_METRIC_ROOTS:
# Canonicalise me.dci.* to r.dci.* so the allowlist is uniform.
out.metrics.add("r." + rest)
else:
out.fields.add(rest)
# bare `r` / `me` alone is the record, not a field
return
# A dotted path not rooted at the record (e.g. a relation lambda var
# `m.partner_id.x`): record the whole path so it fails the field allowlist.
out.fields.add(path)
def _walk(node, out, lambda_vars): # noqa: C901 - explicit per-node dispatch is clearest here
if node is None:
return
if isinstance(node, P.Call):
_walk_call(node, out, lambda_vars)
return
if isinstance(node, P.Attr):
path = _flatten_attr(node)
if path is not None:
_classify_path(path, out, lambda_vars)
else:
# Chain rooted at something other than an identifier (e.g. a call
# result): walk the base so nested references are still seen.
_walk(node.obj, out, lambda_vars)
return
if isinstance(node, P.Ident):
if node.name not in _RECORD_ROOTS:
if node.name in lambda_vars:
out.fields.add(lambda_vars[node.name])
else:
# The translator treats a bare identifier as a field on the root
# model, so an external predicate could reference a field this way.
out.fields.add(node.name)
return
if isinstance(node, P.Literal):
return
# Structural nodes: recurse into every child expression.
for child in _children(node):
_walk(child, out, lambda_vars)
def _walk_call(node, out, lambda_vars):
func = node.func
if isinstance(func, P.Ident):
if func.name == "metric":
out.metrics.add(_metric_name(node))
# Skip the name literal; still walk the remaining args/kwargs.
for arg in node.args[1:]:
_walk(arg, out, lambda_vars)
for v in (node.kwargs or {}).values():
_walk(v, out, lambda_vars)
return
out.functions.add(func.name)
elif isinstance(func, P.Attr):
flat = _flatten_attr(func)
if flat and _is_dci_metric_path(flat):
head, _, rest = flat.partition(".")
out.metrics.add("r." + rest)
else:
# Relation method (enrollments.exists), namespaced call, etc.
out.methods.add(flat if flat is not None else "<dynamic-method>")
# Check if this is a relation macro call that binds a lambda variable
if func.name in ("exists", "exists_one", "all", "map", "filter") and len(node.args) >= 2:
relation_path = _flatten_attr(func.obj)
if relation_path and isinstance(node.args[0], P.Ident):
lambda_var = node.args[0].name
new_lambda_vars = dict(lambda_vars)
new_lambda_vars[lambda_var] = relation_path
_walk(node.args[0], out, lambda_vars)
for arg in node.args[1:]:
_walk(arg, out, new_lambda_vars)
for v in (node.kwargs or {}).values():
_walk(v, out, new_lambda_vars)
return
if flat is None:
_walk(func.obj, out, lambda_vars)
# Walk arguments and keyword-argument values of any call.
for arg in node.args:
_walk(arg, out, lambda_vars)
for v in (node.kwargs or {}).values():
_walk(v, out, lambda_vars)
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 19.0 #251 +/- ##
==========================================
+ Coverage 74.71% 74.75% +0.04%
==========================================
Files 1089 1091 +2
Lines 64699 64814 +115
==========================================
+ Hits 48341 48453 +112
- Misses 16358 16361 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
The first commit guarded only query_type="predicate". The structured query_type="expression" path (_parse_expression -> _condition_to_domain) is the same caller-supplied oracle: _condition_to_domain passed any unknown attribute straight through as a raw Odoo field, so a sender could filter on disability_severity_id / has_disability without any CEL and read the value via total_count/pagination. Apply the same positive, default-deny field allowlist at _condition_to_domain (the single chokepoint for both seq and or conditions). The resolved Odoo field is checked, so DCI attribute aliases (surname, sex, ...) that map onto allowed fields still work; idtype-value is unaffected (it only touches reg_ids). Tests: _condition_to_domain rejects unknown/sensitive attributes; e2e expression search on disability_severity_id returns rjct.filter.invalid. spp_dci_server_social 74/74.
Summary
External DCI Social Registry searches accept sender-supplied filters and compile them against
res.partnerwith no restriction on which symbols/attributes a sender may reference. Because a search returnstotal_countand pageable matches, this turned the endpoint into an oracle for sensitive data (disability severity, vital/civil status) one query at a time — even though those values never appear in the response body.The prior fix (#235) denylisted sensitive DCI metric accessors via a regex over the raw predicate text. This PR replaces that with a positive, default-deny allowlist, and closes the same oracle across both sender-controlled query paths.
Why the denylist was insufficient
\.→., sometric('r\.dci\.dr\.severity', me, arg='Vision') == 2evaded the regex but still resolved to the metric.r.disability_severity_id == 5,r.has_disability == true;enrollments.exists(m, m.partner_id.disability_severity_id == 5);expressionquery type —{"seq":[{"attribute":"disability_severity_id","operator":"=","value":5}]}, which_condition_to_domainpassed straight through as a raw field.The fix
A positive, default-deny allowlist applied to both external query paths.
1. CEL
predicatepath — checked on the resolved CEL AST (robust to source obfuscation and to domain rewriting like_smart_op_domain), gated insidecompile_expressionafter variable resolution but before translation/execution, so the check operates on the exact expression that runs (no time-of-check/time-of-use gap).spp_cel_domain/services/cel_predicate_guard.py—collect_referenced_symbolswalks the AST and classifies field paths, function calls, relation/namespaced methods, and DCI metric accessors. DCI accessors are recognised both as expandedmetric('r.dci.…', me)calls and as raw dotted accessors, so the guard is independent of which indicator variables are seeded.spp_cel_domain/models/cel_service.py—compile_expression(…, predicate_policy=None)+_enforce_predicate_policy. Keyword-only, defaultNone→ all other callers (internal eligibility/scoring CEL that legitimately usesr.dci.*accessors) are unaffected.2. Structured
expressionpath —spp_dci_server_social_condition_to_domainnow rejects any attribute whose resolved Odoo field is not in the allowlist (the single chokepoint for bothseqandorconditions).Allowlist (
spp_dci_server_social): person fields (name,given_name,family_name,birthdate,gender,phone,email,city,state_id.name,country_id.code), safe scalar functions (age_years,date,today,now), andr.dci.sr.*/r.dci.ibr.*metrics.idtype-valueis unaffected (it only touchesreg_ids).Scope / non-impact
spp_dci_indicators. The DCI metric accessors, fetchers, resolver, and remote-server calls are untouched. This PR only restricts what an inbound external sender can filter on; OpenSPP's own outbound CEL (program eligibility/scoring referencingr.dci.dr.severity, which fetches from remote DCI servers) is unaffected.Behaviour change (fail-closed)
Disability/CRVS fields & metrics, relation traversal, disability CEL functions, and any other partner field or non-DCI metric are now denied in external searches. A field a legitimate sender needs is a one-line allowlist addition — never a silent leak.
Tests
spp_cel_domain: AST walker units +_enforce_predicate_policyallow/deny units.spp_dci_server_social: regressions for escaped-dot metric bypass, plain partner-field leak (predicate path), disability functions, relation traversal, and theexpression-type disability filter; plus the allow-path for benign person predicates and SR/IBR metrics.All green:
spp_dci_server_social74/74 ·spp_cel_domain616/616 ·spp_dci_indicators81/81. Lint-clean (ruff/ruff-format/semgrep/bandit/OpenSPP hooks).