Skip to content

security(dci): allowlist external predicate symbols (fields + metrics)#251

Merged
gonzalesedwin1123 merged 2 commits into
19.0from
security-fixes
Jun 29, 2026
Merged

security(dci): allowlist external predicate symbols (fields + metrics)#251
gonzalesedwin1123 merged 2 commits into
19.0from
security-fixes

Conversation

@gonzalesedwin1123

@gonzalesedwin1123 gonzalesedwin1123 commented Jun 29, 2026

Copy link
Copy Markdown
Member

Summary

External DCI Social Registry searches accept sender-supplied filters and compile them against res.partner with no restriction on which symbols/attributes a sender may reference. Because a search returns total_count and pageable matches, this turned the endpoint into an oracle for sensitive data (disability severity, vital/civil status) one query at a time — even though those values never appear in the response body.

The prior fix (#235) denylisted sensitive DCI metric accessors via a regex over the raw predicate text. This PR replaces that with a positive, default-deny allowlist, and closes the same oracle across both sender-controlled query paths.

Why the denylist was insufficient

  • Bypassable. The CEL lexer decodes \.., so metric('r\.dci\.dr\.severity', me, arg='Vision') == 2 evaded the regex but still resolved to the metric.
  • Fail-open. Every new sensitive metric had to be remembered in a denylist living in a different, optional module.
  • Predicate-only, and metric-only. The same disability data is reachable without any CEL metric:
    • plain partner field paths — r.disability_severity_id == 5, r.has_disability == true;
    • relation traversal — enrollments.exists(m, m.partner_id.disability_severity_id == 5);
    • the structured expression query type — {"seq":[{"attribute":"disability_severity_id","operator":"=","value":5}]}, which _condition_to_domain passed straight through as a raw field.

The fix

A positive, default-deny allowlist applied to both external query paths.

1. CEL predicate path — checked on the resolved CEL AST (robust to source obfuscation and to domain rewriting like _smart_op_domain), gated inside compile_expression after variable resolution but before translation/execution, so the check operates on the exact expression that runs (no time-of-check/time-of-use gap).

  • spp_cel_domain/services/cel_predicate_guard.pycollect_referenced_symbols walks the AST and classifies field paths, function calls, relation/namespaced methods, and DCI metric accessors. DCI accessors are recognised both as expanded metric('r.dci.…', me) calls and as raw dotted accessors, so the guard is independent of which indicator variables are seeded.
  • spp_cel_domain/models/cel_service.pycompile_expression(…, predicate_policy=None) + _enforce_predicate_policy. Keyword-only, default None → all other callers (internal eligibility/scoring CEL that legitimately uses r.dci.* accessors) are unaffected.

2. Structured expression pathspp_dci_server_social _condition_to_domain now rejects any attribute whose resolved Odoo field is not in the allowlist (the single chokepoint for both seq and or conditions).

Allowlist (spp_dci_server_social): person fields (name, given_name, family_name, birthdate, gender, phone, email, city, state_id.name, country_id.code), safe scalar functions (age_years, date, today, now), and r.dci.sr.* / r.dci.ibr.* metrics. idtype-value is unaffected (it only touches reg_ids).

Scope / non-impact

  • No change to spp_dci_indicators. The DCI metric accessors, fetchers, resolver, and remote-server calls are untouched. This PR only restricts what an inbound external sender can filter on; OpenSPP's own outbound CEL (program eligibility/scoring referencing r.dci.dr.severity, which fetches from remote DCI servers) is unaffected.

Behaviour change (fail-closed)

Disability/CRVS fields & metrics, relation traversal, disability CEL functions, and any other partner field or non-DCI metric are now denied in external searches. A field a legitimate sender needs is a one-line allowlist addition — never a silent leak.

Tests

  • spp_cel_domain: AST walker units + _enforce_predicate_policy allow/deny units.
  • spp_dci_server_social: regressions for escaped-dot metric bypass, plain partner-field leak (predicate path), disability functions, relation traversal, and the expression-type disability filter; plus the allow-path for benign person predicates and SR/IBR metrics.

All green: spp_dci_server_social 74/74 · spp_cel_domain 616/616 · spp_dci_indicators 81/81. Lint-clean (ruff/ruff-format/semgrep/bandit/OpenSPP hooks).

External DCI Social Registry predicate searches compiled sender-supplied CEL
against res.partner with no restriction on which symbols a predicate may
reference. Because a predicate returns total_count and pageable matches, this
turned the search into an oracle for sensitive data one query at a time.

The prior fix (#235) denylisted sensitive DCI metric accessors via a regex over
the raw predicate text. That was incomplete:
- Bypassable: the CEL lexer decodes \. to ., so metric('r\.dci\.dr\.severity',
  me, arg='Vision') evaded the regex but still resolved to the metric.
- Fail-open: new sensitive metrics had to be remembered in a separate module.
- It only covered metrics. The same disability data is reachable via plain
  partner field paths (r.disability_severity_id == 5) and via relation
  traversal (enrollments.exists(m, m.partner_id.disability_severity_id == 5)),
  neither of which is a metric.

Replace the denylist with a positive, default-deny allowlist enforced on the
parsed/resolved CEL AST (robust to source obfuscation and domain rewriting),
gated inside compile_expression before translation/execution so the check
cannot diverge from what runs:
- spp_cel_domain: cel_predicate_guard.collect_referenced_symbols walks the AST
  for field paths, functions, relation methods and DCI metric accessors;
  CELService.compile_expression gains a predicate_policy arg enforced via
  _enforce_predicate_policy.
- spp_dci_server_social: external predicates may reference only allowlisted
  person fields, safe scalar functions (age_years/date/today/now) and
  r.dci.sr.*/r.dci.ibr.* metrics; disability/CRVS fields & metrics, relations
  and arbitrary partner fields are denied.

Tests: AST walker + enforcement units, escaped-dot/field-path/relation/function
regressions. spp_dci_server_social 72/72, spp_cel_domain 616/616,
spp_dci_indicators 81/81.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the regex-based denylist for externally-supplied CEL predicates with a robust, AST-based positive allowlist guard (cel_predicate_guard.py). This guard is integrated into the CEL service compilation flow and applied to the DCI Social Registry search service to prevent sensitive fields or metrics from being used as a side-channel oracle. Feedback on this PR highlights that the current AST symbol collector does not track or resolve relation lambda variables (e.g., m in enrollments.exists(m, ...)), which makes the allow_relations: True policy option fragile. A detailed code suggestion is provided to track active lambda variables during the AST walk and canonicalize them to their relation paths.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +51 to +169
def collect_referenced_symbols(ast) -> ReferencedSymbols:
"""Walk a parsed CEL AST and return the symbols it references.

Args:
ast: a node produced by ``cel_parser.parse``.

Returns:
ReferencedSymbols with:
- fields: dotted field paths (root ``r``/``me`` stripped), and bare
identifiers used as fields.
- functions: names of plain ``name(...)`` calls (excluding ``metric``).
- methods: flattened names of ``a.b(...)`` calls that are not DCI
metric accessors (relation methods like
``enrollments.exists``, namespaced calls, ...).
- metrics: canonical DCI metric names (``r.dci.…``) and the decoded
names of ``metric('…', …)`` calls; ``DYNAMIC_METRIC`` when
a metric name cannot be determined statically.
"""
out = ReferencedSymbols()
_walk(ast, out)
return out


def _flatten_attr(node):
"""Flatten an Attr/Ident chain to a dotted string, or return None.

``r.dci.dr.severity`` -> "r.dci.dr.severity"; returns None if the chain is
not rooted at a plain identifier (e.g. it is built on a call result).
"""
parts = []
cur = node
while isinstance(cur, P.Attr):
parts.append(cur.name)
cur = cur.obj
if isinstance(cur, P.Ident):
parts.append(cur.name)
return ".".join(reversed(parts))
return None


def _classify_path(path, out):
"""Classify a flattened ``root.a.b`` path as a metric or a field."""
head, _, rest = path.partition(".")
if head in _RECORD_ROOTS:
if rest:
first = rest.split(".", 1)[0]
if first in _DCI_METRIC_ROOTS:
# Canonicalise me.dci.* to r.dci.* so the allowlist is uniform.
out.metrics.add("r." + rest)
else:
out.fields.add(rest)
# bare `r` / `me` alone is the record, not a field
return
# A dotted path not rooted at the record (e.g. a relation lambda var
# `m.partner_id.x`): record the whole path so it fails the field allowlist.
out.fields.add(path)


def _walk(node, out): # noqa: C901 - explicit per-node dispatch is clearest here
if node is None:
return

if isinstance(node, P.Call):
_walk_call(node, out)
return

if isinstance(node, P.Attr):
path = _flatten_attr(node)
if path is not None:
_classify_path(path, out)
else:
# Chain rooted at something other than an identifier (e.g. a call
# result): walk the base so nested references are still seen.
_walk(node.obj, out)
return

if isinstance(node, P.Ident):
if node.name not in _RECORD_ROOTS:
# The translator treats a bare identifier as a field on the root
# model, so an external predicate could reference a field this way.
out.fields.add(node.name)
return

if isinstance(node, P.Literal):
return

# Structural nodes: recurse into every child expression.
for child in _children(node):
_walk(child, out)


def _walk_call(node, out):
func = node.func
if isinstance(func, P.Ident):
if func.name == "metric":
out.metrics.add(_metric_name(node))
# Skip the name literal; still walk the remaining args/kwargs.
for arg in node.args[1:]:
_walk(arg, out)
for v in (node.kwargs or {}).values():
_walk(v, out)
return
out.functions.add(func.name)
elif isinstance(func, P.Attr):
flat = _flatten_attr(func)
if flat and _is_dci_metric_path(flat):
head, _, rest = flat.partition(".")
out.metrics.add("r." + rest)
else:
# Relation method (enrollments.exists), namespaced call, etc.
out.methods.add(flat if flat is not None else "<dynamic-method>")
if flat is None:
_walk(func.obj, out)
# Walk arguments and keyword-argument values of any call.
for arg in node.args:
_walk(arg, out)
for v in (node.kwargs or {}).values():
_walk(v, out)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of collect_referenced_symbols does not track or resolve relation lambda variables (e.g., m in enrollments.exists(m, m.partner_id.x)). Instead, it classifies any dotted path not rooted at r or me as a literal field path (e.g., m.partner_id.x).

This makes the allow_relations: True policy option extremely fragile and practically unusable. If a client uses an arbitrary lambda variable name (like x instead of m), the query will be rejected unless the exact arbitrary name is allowlisted. Conversely, if they use m, it might be allowed only if the server specifically allowlisted m.field, which is highly fragile.

To make this robust, we can track active lambda variables during the AST walk and canonicalize them to their relation paths (e.g., mapping m.partner_id.x to enrollments.partner_id.x). This allows the policy to specify exactly which relation fields are allowed (e.g., enrollments.partner_id.name) regardless of the arbitrary variable name chosen by the client.

def collect_referenced_symbols(ast) -> ReferencedSymbols:
    """Walk a parsed CEL AST and return the symbols it references.

    Args:
        ast: a node produced by ``cel_parser.parse``.

    Returns:
        ReferencedSymbols with:
          - fields:    dotted field paths (root ``r``/``me`` stripped), and bare
                       identifiers used as fields.
          - functions: names of plain ``name(...)`` calls (excluding ``metric``).
          - methods:   flattened names of ``a.b(...)`` calls that are not DCI
                       metric accessors (relation methods like
                       ``enrollments.exists``, namespaced calls, ...).
          - metrics:   canonical DCI metric names (``r.dci.…``) and the decoded
                       names of ``metric('…', …)`` calls; ``DYNAMIC_METRIC`` when
                       a metric name cannot be determined statically.
    """
    out = ReferencedSymbols()
    _walk(ast, out, {})
    return out


def _flatten_attr(node):
    """Flatten an Attr/Ident chain to a dotted string, or return None.

    ``r.dci.dr.severity`` -> "r.dci.dr.severity"; returns None if the chain is
    not rooted at a plain identifier (e.g. it is built on a call result).
    """
    parts = []
    cur = node
    while isinstance(cur, P.Attr):
        parts.append(cur.name)
        cur = cur.obj
    if isinstance(cur, P.Ident):
        parts.append(cur.name)
        return ".".join(reversed(parts))
    return None


def _classify_path(path, out, lambda_vars):
    """Classify a flattened ``root.a.b`` path as a metric or a field."""
    head, _, rest = path.partition(".")
    if head in lambda_vars:
        relation_path = lambda_vars[head]
        canonical_path = f"{relation_path}.{rest}" if rest else relation_path
        out.fields.add(canonical_path)
        return

    if head in _RECORD_ROOTS:
        if rest:
            first = rest.split(".", 1)[0]
            if first in _DCI_METRIC_ROOTS:
                # Canonicalise me.dci.* to r.dci.* so the allowlist is uniform.
                out.metrics.add("r." + rest)
            else:
                out.fields.add(rest)
        # bare `r` / `me` alone is the record, not a field
        return
    # A dotted path not rooted at the record (e.g. a relation lambda var
    # `m.partner_id.x`): record the whole path so it fails the field allowlist.
    out.fields.add(path)


def _walk(node, out, lambda_vars):  # noqa: C901 - explicit per-node dispatch is clearest here
    if node is None:
        return

    if isinstance(node, P.Call):
        _walk_call(node, out, lambda_vars)
        return

    if isinstance(node, P.Attr):
        path = _flatten_attr(node)
        if path is not None:
            _classify_path(path, out, lambda_vars)
        else:
            # Chain rooted at something other than an identifier (e.g. a call
            # result): walk the base so nested references are still seen.
            _walk(node.obj, out, lambda_vars)
        return

    if isinstance(node, P.Ident):
        if node.name not in _RECORD_ROOTS:
            if node.name in lambda_vars:
                out.fields.add(lambda_vars[node.name])
            else:
                # The translator treats a bare identifier as a field on the root
                # model, so an external predicate could reference a field this way.
                out.fields.add(node.name)
        return

    if isinstance(node, P.Literal):
        return

    # Structural nodes: recurse into every child expression.
    for child in _children(node):
        _walk(child, out, lambda_vars)


def _walk_call(node, out, lambda_vars):
    func = node.func
    if isinstance(func, P.Ident):
        if func.name == "metric":
            out.metrics.add(_metric_name(node))
            # Skip the name literal; still walk the remaining args/kwargs.
            for arg in node.args[1:]:
                _walk(arg, out, lambda_vars)
            for v in (node.kwargs or {}).values():
                _walk(v, out, lambda_vars)
            return
        out.functions.add(func.name)
    elif isinstance(func, P.Attr):
        flat = _flatten_attr(func)
        if flat and _is_dci_metric_path(flat):
            head, _, rest = flat.partition(".")
            out.metrics.add("r." + rest)
        else:
            # Relation method (enrollments.exists), namespaced call, etc.
            out.methods.add(flat if flat is not None else "<dynamic-method>")
            
            # Check if this is a relation macro call that binds a lambda variable
            if func.name in ("exists", "exists_one", "all", "map", "filter") and len(node.args) >= 2:
                relation_path = _flatten_attr(func.obj)
                if relation_path and isinstance(node.args[0], P.Ident):
                    lambda_var = node.args[0].name
                    new_lambda_vars = dict(lambda_vars)
                    new_lambda_vars[lambda_var] = relation_path
                    
                    _walk(node.args[0], out, lambda_vars)
                    for arg in node.args[1:]:
                        _walk(arg, out, new_lambda_vars)
                    for v in (node.kwargs or {}).values():
                        _walk(v, out, new_lambda_vars)
                    return

            if flat is None:
                _walk(func.obj, out, lambda_vars)
    # Walk arguments and keyword-argument values of any call.
    for arg in node.args:
        _walk(arg, out, lambda_vars)
    for v in (node.kwargs or {}).values():
        _walk(v, out, lambda_vars)

@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.30894% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.75%. Comparing base (0400977) to head (2fc6521).

Files with missing lines Patch % Lines
spp_cel_domain/services/cel_predicate_guard.py 93.54% 6 Missing ⚠️
spp_cel_domain/models/cel_service.py 95.45% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             19.0     #251      +/-   ##
==========================================
+ Coverage   74.71%   74.75%   +0.04%     
==========================================
  Files        1089     1091       +2     
  Lines       64699    64814     +115     
==========================================
+ Hits        48341    48453     +112     
- Misses      16358    16361       +3     
Flag Coverage Δ
spp_analytics 93.13% <ø> (ø)
spp_api_v2_change_request 66.85% <ø> (ø)
spp_api_v2_cycles 71.12% <ø> (ø)
spp_api_v2_data 64.41% <ø> (ø)
spp_api_v2_entitlements 70.19% <ø> (ø)
spp_api_v2_gis 84.27% <ø> (ø)
spp_api_v2_programs 92.03% <ø> (ø)
spp_api_v2_simulation 71.12% <ø> (ø)
spp_approval 50.29% <ø> (ø)
spp_audit_programs 0.00% <ø> (?)
spp_base_common 90.26% <ø> (ø)
spp_case_cel 89.01% <ø> (ø)
spp_case_entitlements 97.61% <ø> (ø)
spp_case_programs 97.14% <ø> (ø)
spp_cel_domain 62.78% <93.91%> (+0.84%) ⬆️
spp_cel_event 85.23% <ø> (ø)
spp_dci_server_social 88.44% <100.00%> (-0.03%) ⬇️
spp_programs 65.12% <ø> (ø)
spp_registry 86.83% <ø> (ø)
spp_security 66.66% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
spp_dci_server_social/services/search_service.py 90.32% <100.00%> (-0.04%) ⬇️
spp_cel_domain/models/cel_service.py 42.47% <95.45%> (+3.33%) ⬆️
spp_cel_domain/services/cel_predicate_guard.py 93.54% <93.54%> (ø)

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The first commit guarded only query_type="predicate". The structured
query_type="expression" path (_parse_expression -> _condition_to_domain) is the
same caller-supplied oracle: _condition_to_domain passed any unknown attribute
straight through as a raw Odoo field, so a sender could filter on
disability_severity_id / has_disability without any CEL and read the value via
total_count/pagination.

Apply the same positive, default-deny field allowlist at _condition_to_domain
(the single chokepoint for both seq and or conditions). The resolved Odoo field
is checked, so DCI attribute aliases (surname, sex, ...) that map onto allowed
fields still work; idtype-value is unaffected (it only touches reg_ids).

Tests: _condition_to_domain rejects unknown/sensitive attributes; e2e
expression search on disability_severity_id returns rjct.filter.invalid.
spp_dci_server_social 74/74.
@gonzalesedwin1123 gonzalesedwin1123 merged commit f04d936 into 19.0 Jun 29, 2026
35 checks passed
@gonzalesedwin1123 gonzalesedwin1123 deleted the security-fixes branch June 29, 2026 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant