Skip to content

Add transaction_screening_local template (rules + query on local DuckDB)#80

Open
cafzal wants to merge 11 commits into
mainfrom
add-transaction-screening-local-template
Open

Add transaction_screening_local template (rules + query on local DuckDB)#80
cafzal wants to merge 11 commits into
mainfrom
add-transaction-screening-local-template

Conversation

@cafzal

@cafzal cafzal commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

What

A beginner, rules-based template that runs entirely on a local DuckDB database — no Snowflake account or Native App. It triages a transfer ledger: classifies structuring and large-sender accounts, flags suspects, and builds an investigation set (suspects plus everyone who transacted directly with one) via a relationship self-join.

Local-development counterpart to the Snowflake-backed templates — same ontology → rules → query workflow, on an engine you can run anywhere.

Reasoning

  • Rules-based: is_structuring, is_large_sender, is_suspect (OR-chained), near_suspect, under_review.
  • Querying / relationship traversal: network aggregates and a suspect → counterparty self-join over a transfers_to relationship (the local way to answer connectivity, without the graph reasoner).

Requires relationalai >= 1.13

Local DuckDB execution is enabled by a deployment section:

create_config(
    connections={"local": DuckDBConnection(path=":memory:")},
    default_connection="local",
    deployment={"schema": "main", "auto_deploy": True},
)

The programmatic-config deploy path this template relies on was stabilized in 1.12; 1.13 is the floor.

Files

v1/transaction_screening_local/
├── README.md
├── runbook.md                # analyst paste-test walkthrough
├── pyproject.toml            # relationalai==1.13.0
├── transaction_screening_local.py
└── data/transactions.csv     # 75-transfer / 54-account ledger: structuring ring + large sender in a legitimate-traffic majority

Also regenerates the template index (v1/README.md + root README.md) from the template's front matter via scripts/generate_version_indexes.py.

Dataset

A 75-transfer ledger across 54 accounts with a realistic AML base rate: only 6 of 54 accounts (~11%) flag. The suspicious activity is a small, legible subgraph — C2001–C2005 form a structuring ring (transfers of 9,400–9,900, just under the 10,000 threshold) and C1001 makes one 60,000 transfer — embedded in a legitimate C3xxx payment network (payroll, vendor invoices, retail transfers) that never flags. Four near-miss transfers (8,900; exactly 10,000; 49,000; exactly 50,000) sit on the rule boundaries and deliberately stay clean, exercising the sharp edges of the thresholds.

Verification

  • Script runs end-to-end on relationalai 1.13.0 (in-memory DuckDB): 54 accounts, 75 transfers / 870,000 moved, per-account volume, 6 suspects (C1001 + ring C2001–C2005), the ring visible in the self-join, and an 8-account investigation set. py_compile + ruff clean.
  • Runbook paste-tested on the enriched data: a fresh skill-only agent (no access to the script, isolated to the data file) reproduced all five runbook answers exactly — independently re-deriving the model from the skills and cross-checking against the raw CSV.

Notes / open items

  • Local DuckDB execution uses deploy mode, flagged experimental in the package — confirm support stance with the engine team before merging.
  • New local-DuckDB template category (no Snowflake). Confirm portfolio fit; distinct from fraud-detection (multi-reasoner, Snowflake) and commercial_underwriting (rules).
  • The graph reasoner is intentionally not used — not reliable on local DuckDB (details in rai-agent-evals PR #108).

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

The docs preview for this pull request has been deployed to Vercel!

✅ Preview: https://relationalai-docs-owwgtwgju-relationalai.vercel.app/build/templates
🔍 Inspect: https://vercel.com/relationalai/relationalai-docs/GFDmKYEPKwSSWo8Td332o47zeMpu

cafzal added 8 commits June 9, 2026 10:07
…et to suspect-or-near-suspect (under_review)
… rate

Grow the ledger from 10 transfers/9 accounts to 75 transfers/54 accounts,
embedding the structuring ring and large sender in a legitimate-traffic
majority (C3xxx). 6 of 54 accounts flag (~11%), a realistic AML base rate.
Add near-miss transfers (8,900 / 10,000 / 49,000 / 50,000) on the threshold
edges that correctly stay clean. Update README and runbook counts to match
(75 transfers / 870,000 moved / 6 suspects / 8 in investigation set).
Bump the local-DuckDB version floor from 1.11 to 1.12. The programmatic-config
deploy path this template uses (create_config with a deployment section) was
stabilized in 1.12 (PyRel #1730). Verified the script runs unchanged on 1.12.0
(75 transfers / 870,000 / 6 suspects / 8 under review).
Bump the local-DuckDB floor from 1.12 to 1.13. Verified the script runs
unchanged on 1.13.0 (75 transfers / 870,000 / 6 suspects / 8 under review).
…ning-local-template

# Conflicts:
#	v1/README.md
@cafzal cafzal marked this pull request as ready for review June 16, 2026 02:00

@jablonskidev jablonskidev left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my feedback.

@@ -0,0 +1,181 @@
---
title: "Transaction Screening (Local DuckDB)"
description: "Rules + query fraud-ring triage: structuring and large-sender flags, suspect classification, and one-hop investigation expansion via a relationship self-join."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use words and sentences rather than symbols and fragments.

- Getting Started
---

## What this template is for

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section should be a problem statement and motivation (1–2 paragraphs). Focus on the “why” and the value of RelationalAI, not on the technical details of the model or code. Use language that’s accessible to a broad audience.

| Rules / logic (classification flags, chaining) | Optimization solve (`Problem`) |
| Relationship traversal (multi-hop self-joins, connectivity) | GNN training / inference |

## Who this is for

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include assumed knowledge.


- Anyone who wants to try RelationalAI without provisioning Snowflake
- Developers prototyping an ontology, rules, and queries before pointing at production data
- Anyone learning the rules + relationship-traversal patterns on a legible dataset with a realistic low base rate of suspicious activity

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use words rather than symbols.


## What's included

- **Model**: `Account`, the `transfers_to` relationship, and the classification + expansion rules

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use words rather than symbols, throughout the README.

- **Sample data**: a 75-transfer ledger across 54 accounts — a structuring ring and a large sender embedded in a legitimate-traffic majority (6 of 54 accounts flag)
- **Outputs**: printed tables (network overview, per-account volume, suspects, counterparties, investigation set)

## Prerequisites

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


`data/transactions.csv` is a 75-transfer ledger across 54 accounts, with columns `id, src, dst, amount`. Most of it is legitimate small-business traffic (the `C3xxx` accounts — payroll runs, vendor invoices, retail transfers) that never flags. The suspicious activity is a small minority: `C2001–C2005` form a ring that cycles money in amounts just under the $10,000 reporting threshold (structuring), and `C1001` makes one large $60,000 transfer. In all, 6 of 54 accounts flag (~11%) — a realistic AML base rate. A few near-miss transfers (8,900; exactly 10,000; 49,000; exactly 50,000) sit right on the threshold edges and deliberately stay clean.

## Model overview

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model.where(Account.transfers_to(_other), _other.is_suspect()).define(Account.near_suspect())
```

## Customize this template

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrite the README in prose (no symbol shorthand or sentence fragments) and
align section structure with sample-template:
- What this template is for: problem statement and motivation, no capability table
- Who this is for: add assumed-knowledge note
- Prerequisites: split into Access and Tools
- Model overview: key entities, identifier, invariants, plus per-concept and
  relationship tables
- Customize this template: Use your own data / Tune parameters / Extend the
  model / Scale up

Front-matter description rewritten as a full sentence; template index
regenerated. How-it-works code snippets unchanged (verbatim from the script).
…k and script

Extend the README review fixes (words, not symbol shorthand) to the other prose
artifacts: drop '+'/'&'/'/' connectives from the runbook prose and chain diagram
and from the script docstring/comments. Keep the prescribed '# Define semantic
model & load data' banner and the runbook's structural chain-ASCII arrows.
Behavior unchanged (comments/docstring only); script runs identically.
@cafzal cafzal requested a review from jablonskidev June 16, 2026 22:42
cafzal added a commit that referenced this pull request Jun 16, 2026
…nners

Mirrors the README issues reviewers flagged on PR #80:
- Reword the front-matter description as one plain sentence (no colon-fragment).
- Lead "What this template is for" with the business problem and value, naming
  the reasoners in a single bold sentence rather than a technical breakdown.
- Add an assumed-knowledge line to "Who this is for".
- Replace arrow/inequality symbols in prose with words (code blocks and the
  data-flow diagram keep their arrows).

Also align the script's stage banners to the multi-reasoner convention so the
prescriptive stage reads as "Stage 3" instead of the single-reasoner
"Model the decision problem" / "Solve and check solution" headings. Comment-only;
logic and output unchanged from the validated run.
cafzal added a commit that referenced this pull request Jun 16, 2026
…sses PR #80 review feedback)

Rewrites the README to the sample-template standard the reviewer is applying to new templates: 'What this template is for' is now a broad-audience problem statement and motivation with a bold reasoning-type sentence; adds assumed knowledge, an Access/Tools Prerequisites split, a Model overview, and Customize subsections; and uses words and sentences rather than symbols and fragments throughout the prose. Also drops trailing commas after the last where(...) condition per the cleanup-template-code convention (a no-op; re-run confirms identical output). Validated: sample output matches the live run line-for-line, all code snippets are verbatim from the script, and the runbook reproduced blind end-to-end.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants