security(dci): allowlist external predicate symbols (fields + metrics) by gonzalesedwin1123 · Pull Request #251 · OpenSPP/OpenSPP2

gonzalesedwin1123 · 2026-06-29T03:55:28Z

Summary

External DCI Social Registry searches accept sender-supplied filters and compile them against res.partner with no restriction on which symbols/attributes a sender may reference. Because a search returns total_count and pageable matches, this turned the endpoint into an oracle for sensitive data (disability severity, vital/civil status) one query at a time — even though those values never appear in the response body.

The prior fix (#235) denylisted sensitive DCI metric accessors via a regex over the raw predicate text. This PR replaces that with a positive, default-deny allowlist, and closes the same oracle across both sender-controlled query paths.

Why the denylist was insufficient

Bypassable. The CEL lexer decodes \. → ., so metric('r\.dci\.dr\.severity', me, arg='Vision') == 2 evaded the regex but still resolved to the metric.
Fail-open. Every new sensitive metric had to be remembered in a denylist living in a different, optional module.
Predicate-only, and metric-only. The same disability data is reachable without any CEL metric:
- plain partner field paths — r.disability_severity_id == 5, r.has_disability == true;
- relation traversal — enrollments.exists(m, m.partner_id.disability_severity_id == 5);
- the structured expression query type — {"seq":[{"attribute":"disability_severity_id","operator":"=","value":5}]}, which _condition_to_domain passed straight through as a raw field.

The fix

A positive, default-deny allowlist applied to both external query paths.

1. CEL predicate path — checked on the resolved CEL AST (robust to source obfuscation and to domain rewriting like _smart_op_domain), gated inside compile_expression after variable resolution but before translation/execution, so the check operates on the exact expression that runs (no time-of-check/time-of-use gap).

spp_cel_domain/services/cel_predicate_guard.py — collect_referenced_symbols walks the AST and classifies field paths, function calls, relation/namespaced methods, and DCI metric accessors. DCI accessors are recognised both as expanded metric('r.dci.…', me) calls and as raw dotted accessors, so the guard is independent of which indicator variables are seeded.
spp_cel_domain/models/cel_service.py — compile_expression(…, predicate_policy=None) + _enforce_predicate_policy. Keyword-only, default None → all other callers (internal eligibility/scoring CEL that legitimately uses r.dci.* accessors) are unaffected.

2. Structured expression path — spp_dci_server_social _condition_to_domain now rejects any attribute whose resolved Odoo field is not in the allowlist (the single chokepoint for both seq and or conditions).

Allowlist (spp_dci_server_social): person fields (name, given_name, family_name, birthdate, gender, phone, email, city, state_id.name, country_id.code), safe scalar functions (age_years, date, today, now), and r.dci.sr.* / r.dci.ibr.* metrics. idtype-value is unaffected (it only touches reg_ids).

Scope / non-impact

No change to spp_dci_indicators. The DCI metric accessors, fetchers, resolver, and remote-server calls are untouched. This PR only restricts what an inbound external sender can filter on; OpenSPP's own outbound CEL (program eligibility/scoring referencing r.dci.dr.severity, which fetches from remote DCI servers) is unaffected.

Behaviour change (fail-closed)

Disability/CRVS fields & metrics, relation traversal, disability CEL functions, and any other partner field or non-DCI metric are now denied in external searches. A field a legitimate sender needs is a one-line allowlist addition — never a silent leak.

Tests

spp_cel_domain: AST walker units + _enforce_predicate_policy allow/deny units.
spp_dci_server_social: regressions for escaped-dot metric bypass, plain partner-field leak (predicate path), disability functions, relation traversal, and the expression-type disability filter; plus the allow-path for benign person predicates and SR/IBR metrics.

All green: spp_dci_server_social 74/74 · spp_cel_domain 616/616 · spp_dci_indicators 81/81. Lint-clean (ruff/ruff-format/semgrep/bandit/OpenSPP hooks).

External DCI Social Registry predicate searches compiled sender-supplied CEL against res.partner with no restriction on which symbols a predicate may reference. Because a predicate returns total_count and pageable matches, this turned the search into an oracle for sensitive data one query at a time. The prior fix (#235) denylisted sensitive DCI metric accessors via a regex over the raw predicate text. That was incomplete: - Bypassable: the CEL lexer decodes \. to ., so metric('r\.dci\.dr\.severity', me, arg='Vision') evaded the regex but still resolved to the metric. - Fail-open: new sensitive metrics had to be remembered in a separate module. - It only covered metrics. The same disability data is reachable via plain partner field paths (r.disability_severity_id == 5) and via relation traversal (enrollments.exists(m, m.partner_id.disability_severity_id == 5)), neither of which is a metric. Replace the denylist with a positive, default-deny allowlist enforced on the parsed/resolved CEL AST (robust to source obfuscation and domain rewriting), gated inside compile_expression before translation/execution so the check cannot diverge from what runs: - spp_cel_domain: cel_predicate_guard.collect_referenced_symbols walks the AST for field paths, functions, relation methods and DCI metric accessors; CELService.compile_expression gains a predicate_policy arg enforced via _enforce_predicate_policy. - spp_dci_server_social: external predicates may reference only allowlisted person fields, safe scalar functions (age_years/date/today/now) and r.dci.sr.*/r.dci.ibr.* metrics; disability/CRVS fields & metrics, relations and arbitrary partner fields are denied. Tests: AST walker + enforcement units, escaped-dot/field-path/relation/function regressions. spp_dci_server_social 72/72, spp_cel_domain 616/616, spp_dci_indicators 81/81.

gemini-code-assist

Code Review

This pull request replaces the regex-based denylist for externally-supplied CEL predicates with a robust, AST-based positive allowlist guard (cel_predicate_guard.py). This guard is integrated into the CEL service compilation flow and applied to the DCI Social Registry search service to prevent sensitive fields or metrics from being used as a side-channel oracle. Feedback on this PR highlights that the current AST symbol collector does not track or resolve relation lambda variables (e.g., m in enrollments.exists(m, ...)), which makes the allow_relations: True policy option fragile. A detailed code suggestion is provided to track active lambda variables during the AST walk and canonicalize them to their relation paths.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-29T03:57:25Z

+def collect_referenced_symbols(ast) -> ReferencedSymbols:
+    """Walk a parsed CEL AST and return the symbols it references.
+
+    Args:
+        ast: a node produced by ``cel_parser.parse``.
+
+    Returns:
+        ReferencedSymbols with:
+          - fields:    dotted field paths (root ``r``/``me`` stripped), and bare
+                       identifiers used as fields.
+          - functions: names of plain ``name(...)`` calls (excluding ``metric``).
+          - methods:   flattened names of ``a.b(...)`` calls that are not DCI
+                       metric accessors (relation methods like
+                       ``enrollments.exists``, namespaced calls, ...).
+          - metrics:   canonical DCI metric names (``r.dci.…``) and the decoded
+                       names of ``metric('…', …)`` calls; ``DYNAMIC_METRIC`` when
+                       a metric name cannot be determined statically.
+    """
+    out = ReferencedSymbols()
+    _walk(ast, out)
+    return out
+
+
+def _flatten_attr(node):
+    """Flatten an Attr/Ident chain to a dotted string, or return None.
+
+    ``r.dci.dr.severity`` -> "r.dci.dr.severity"; returns None if the chain is
+    not rooted at a plain identifier (e.g. it is built on a call result).
+    """
+    parts = []
+    cur = node
+    while isinstance(cur, P.Attr):
+        parts.append(cur.name)
+        cur = cur.obj
+    if isinstance(cur, P.Ident):
+        parts.append(cur.name)
+        return ".".join(reversed(parts))
+    return None
+
+
+def _classify_path(path, out):
+    """Classify a flattened ``root.a.b`` path as a metric or a field."""
+    head, _, rest = path.partition(".")
+    if head in _RECORD_ROOTS:
+        if rest:
+            first = rest.split(".", 1)[0]
+            if first in _DCI_METRIC_ROOTS:
+                # Canonicalise me.dci.* to r.dci.* so the allowlist is uniform.
+                out.metrics.add("r." + rest)
+            else:
+                out.fields.add(rest)
+        # bare `r` / `me` alone is the record, not a field
+        return
+    # A dotted path not rooted at the record (e.g. a relation lambda var
+    # `m.partner_id.x`): record the whole path so it fails the field allowlist.
+    out.fields.add(path)
+
+
+def _walk(node, out):  # noqa: C901 - explicit per-node dispatch is clearest here
+    if node is None:
+        return
+
+    if isinstance(node, P.Call):
+        _walk_call(node, out)
+        return
+
+    if isinstance(node, P.Attr):
+        path = _flatten_attr(node)
+        if path is not None:
+            _classify_path(path, out)
+        else:
+            # Chain rooted at something other than an identifier (e.g. a call
+            # result): walk the base so nested references are still seen.
+            _walk(node.obj, out)
+        return
+
+    if isinstance(node, P.Ident):
+        if node.name not in _RECORD_ROOTS:
+            # The translator treats a bare identifier as a field on the root
+            # model, so an external predicate could reference a field this way.
+            out.fields.add(node.name)
+        return
+
+    if isinstance(node, P.Literal):
+        return
+
+    # Structural nodes: recurse into every child expression.
+    for child in _children(node):
+        _walk(child, out)
+
+
+def _walk_call(node, out):
+    func = node.func
+    if isinstance(func, P.Ident):
+        if func.name == "metric":
+            out.metrics.add(_metric_name(node))
+            # Skip the name literal; still walk the remaining args/kwargs.
+            for arg in node.args[1:]:
+                _walk(arg, out)
+            for v in (node.kwargs or {}).values():
+                _walk(v, out)
+            return
+        out.functions.add(func.name)
+    elif isinstance(func, P.Attr):
+        flat = _flatten_attr(func)
+        if flat and _is_dci_metric_path(flat):
+            head, _, rest = flat.partition(".")
+            out.metrics.add("r." + rest)
+        else:
+            # Relation method (enrollments.exists), namespaced call, etc.
+            out.methods.add(flat if flat is not None else "<dynamic-method>")
+            if flat is None:
+                _walk(func.obj, out)
+    # Walk arguments and keyword-argument values of any call.
+    for arg in node.args:
+        _walk(arg, out)
+    for v in (node.kwargs or {}).values():
+        _walk(v, out)
+


The current implementation of collect_referenced_symbols does not track or resolve relation lambda variables (e.g., m in enrollments.exists(m, m.partner_id.x)). Instead, it classifies any dotted path not rooted at r or me as a literal field path (e.g., m.partner_id.x).

This makes the allow_relations: True policy option extremely fragile and practically unusable. If a client uses an arbitrary lambda variable name (like x instead of m), the query will be rejected unless the exact arbitrary name is allowlisted. Conversely, if they use m, it might be allowed only if the server specifically allowlisted m.field, which is highly fragile.

To make this robust, we can track active lambda variables during the AST walk and canonicalize them to their relation paths (e.g., mapping m.partner_id.x to enrollments.partner_id.x). This allows the policy to specify exactly which relation fields are allowed (e.g., enrollments.partner_id.name) regardless of the arbitrary variable name chosen by the client.

def collect_referenced_symbols(ast) -> ReferencedSymbols: """Walk a parsed CEL AST and return the symbols it references. Args: ast: a node produced by ``cel_parser.parse``. Returns: ReferencedSymbols with: - fields: dotted field paths (root ``r``/``me`` stripped), and bare identifiers used as fields. - functions: names of plain ``name(...)`` calls (excluding ``metric``). - methods: flattened names of ``a.b(...)`` calls that are not DCI metric accessors (relation methods like ``enrollments.exists``, namespaced calls, ...). - metrics: canonical DCI metric names (``r.dci.…``) and the decoded names of ``metric('…', …)`` calls; ``DYNAMIC_METRIC`` when a metric name cannot be determined statically. """ out = ReferencedSymbols() _walk(ast, out, {}) return out def _flatten_attr(node): """Flatten an Attr/Ident chain to a dotted string, or return None. ``r.dci.dr.severity`` -> "r.dci.dr.severity"; returns None if the chain is not rooted at a plain identifier (e.g. it is built on a call result). """ parts = [] cur = node while isinstance(cur, P.Attr): parts.append(cur.name) cur = cur.obj if isinstance(cur, P.Ident): parts.append(cur.name) return ".".join(reversed(parts)) return None def _classify_path(path, out, lambda_vars): """Classify a flattened ``root.a.b`` path as a metric or a field.""" head, _, rest = path.partition(".") if head in lambda_vars: relation_path = lambda_vars[head] canonical_path = f"{relation_path}.{rest}" if rest else relation_path out.fields.add(canonical_path) return if head in _RECORD_ROOTS: if rest: first = rest.split(".", 1)[0] if first in _DCI_METRIC_ROOTS: # Canonicalise me.dci.* to r.dci.* so the allowlist is uniform. out.metrics.add("r." + rest) else: out.fields.add(rest) # bare `r` / `me` alone is the record, not a field return # A dotted path not rooted at the record (e.g. a relation lambda var # `m.partner_id.x`): record the whole path so it fails the field allowlist. out.fields.add(path) def _walk(node, out, lambda_vars): # noqa: C901 - explicit per-node dispatch is clearest here if node is None: return if isinstance(node, P.Call): _walk_call(node, out, lambda_vars) return if isinstance(node, P.Attr): path = _flatten_attr(node) if path is not None: _classify_path(path, out, lambda_vars) else: # Chain rooted at something other than an identifier (e.g. a call # result): walk the base so nested references are still seen. _walk(node.obj, out, lambda_vars) return if isinstance(node, P.Ident): if node.name not in _RECORD_ROOTS: if node.name in lambda_vars: out.fields.add(lambda_vars[node.name]) else: # The translator treats a bare identifier as a field on the root # model, so an external predicate could reference a field this way. out.fields.add(node.name) return if isinstance(node, P.Literal): return # Structural nodes: recurse into every child expression. for child in _children(node): _walk(child, out, lambda_vars) def _walk_call(node, out, lambda_vars): func = node.func if isinstance(func, P.Ident): if func.name == "metric": out.metrics.add(_metric_name(node)) # Skip the name literal; still walk the remaining args/kwargs. for arg in node.args[1:]: _walk(arg, out, lambda_vars) for v in (node.kwargs or {}).values(): _walk(v, out, lambda_vars) return out.functions.add(func.name) elif isinstance(func, P.Attr): flat = _flatten_attr(func) if flat and _is_dci_metric_path(flat): head, _, rest = flat.partition(".") out.metrics.add("r." + rest) else: # Relation method (enrollments.exists), namespaced call, etc. out.methods.add(flat if flat is not None else "<dynamic-method>") # Check if this is a relation macro call that binds a lambda variable if func.name in ("exists", "exists_one", "all", "map", "filter") and len(node.args) >= 2: relation_path = _flatten_attr(func.obj) if relation_path and isinstance(node.args[0], P.Ident): lambda_var = node.args[0].name new_lambda_vars = dict(lambda_vars) new_lambda_vars[lambda_var] = relation_path _walk(node.args[0], out, lambda_vars) for arg in node.args[1:]: _walk(arg, out, new_lambda_vars) for v in (node.kwargs or {}).values(): _walk(v, out, new_lambda_vars) return if flat is None: _walk(func.obj, out, lambda_vars) # Walk arguments and keyword-argument values of any call. for arg in node.args: _walk(arg, out, lambda_vars) for v in (node.kwargs or {}).values(): _walk(v, out, lambda_vars)

codecov · 2026-06-29T03:59:12Z

Codecov Report

❌ Patch coverage is 94.30894% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.75%. Comparing base (0400977) to head (2fc6521).

Files with missing lines	Patch %	Lines
spp_cel_domain/services/cel_predicate_guard.py	93.54%	6 Missing ⚠️
spp_cel_domain/models/cel_service.py	95.45%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             19.0     #251      +/-   ##
==========================================
+ Coverage   74.71%   74.75%   +0.04%     
==========================================
  Files        1089     1091       +2     
  Lines       64699    64814     +115     
==========================================
+ Hits        48341    48453     +112     
- Misses      16358    16361       +3

Flag	Coverage Δ
spp_analytics	`93.13% <ø> (ø)`
spp_api_v2_change_request	`66.85% <ø> (ø)`
spp_api_v2_cycles	`71.12% <ø> (ø)`
spp_api_v2_data	`64.41% <ø> (ø)`
spp_api_v2_entitlements	`70.19% <ø> (ø)`
spp_api_v2_gis	`84.27% <ø> (ø)`
spp_api_v2_programs	`92.03% <ø> (ø)`
spp_api_v2_simulation	`71.12% <ø> (ø)`
spp_approval	`50.29% <ø> (ø)`
spp_audit_programs	`0.00% <ø> (?)`
spp_base_common	`90.26% <ø> (ø)`
spp_case_cel	`89.01% <ø> (ø)`
spp_case_entitlements	`97.61% <ø> (ø)`
spp_case_programs	`97.14% <ø> (ø)`
spp_cel_domain	`62.78% <93.91%> (+0.84%)`	⬆️
spp_cel_event	`85.23% <ø> (ø)`
spp_dci_server_social	`88.44% <100.00%> (-0.03%)`	⬇️
spp_programs	`65.12% <ø> (ø)`
spp_registry	`86.83% <ø> (ø)`
spp_security	`66.66% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
spp_dci_server_social/services/search_service.py	`90.32% <100.00%> (-0.04%)`	⬇️
spp_cel_domain/models/cel_service.py	`42.47% <95.45%> (+3.33%)`	⬆️
spp_cel_domain/services/cel_predicate_guard.py	`93.54% <93.54%> (ø)`

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The first commit guarded only query_type="predicate". The structured query_type="expression" path (_parse_expression -> _condition_to_domain) is the same caller-supplied oracle: _condition_to_domain passed any unknown attribute straight through as a raw Odoo field, so a sender could filter on disability_severity_id / has_disability without any CEL and read the value via total_count/pagination. Apply the same positive, default-deny field allowlist at _condition_to_domain (the single chokepoint for both seq and or conditions). The resolved Odoo field is checked, so DCI attribute aliases (surname, sex, ...) that map onto allowed fields still work; idtype-value is unaffected (it only touches reg_ids). Tests: _condition_to_domain rejects unknown/sensitive attributes; e2e expression search on disability_severity_id returns rjct.filter.invalid. spp_dci_server_social 74/74.

gemini-code-assist Bot reviewed Jun 29, 2026

View reviewed changes

gonzalesedwin1123 merged commit f04d936 into 19.0 Jun 29, 2026
35 checks passed

gonzalesedwin1123 deleted the security-fixes branch June 29, 2026 09:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

security(dci): allowlist external predicate symbols (fields + metrics)#251

security(dci): allowlist external predicate symbols (fields + metrics)#251
gonzalesedwin1123 merged 2 commits into
19.0from
security-fixes

gonzalesedwin1123 commented Jun 29, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 29, 2026

Uh oh!

codecov Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

gonzalesedwin1123 commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why the denylist was insufficient

The fix

Scope / non-impact

Behaviour change (fail-closed)

Tests

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gonzalesedwin1123 commented Jun 29, 2026 •

edited

Loading

codecov Bot commented Jun 29, 2026 •

edited

Loading