feat(datafabric): surface system fields under prompt v2 with business-field precedence#916
feat(datafabric): surface system fields under prompt v2 with business-field precedence#916milind-jain-uipath wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a new Data Fabric SQL prompt version (v2) that surfaces Data Fabric’s auto-added system/audit fields to the inner SQL agent while encoding “business-field precedence” guidance to avoid the model choosing system columns when a business column better represents the user’s intent.
Changes:
- Added prompt v2 with a dedicated
SYSTEM / AUDIT FIELDSgrounding section and made it the default prompt version. - Extended the prompt registry with schema-rendering policy flags (include system fields, render descriptions) and updated the builder to follow those flags rather than checking version names.
- Updated schema/context modeling and rendering to tag system fields (
system) and (for v2) include aDescriptioncolumn, while ensuring derived query patterns do not use system fields.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/agent/tools/test_datafabric_prompt_builder.py | Adds/updates tests to verify v1 baseline stability, v2 system-field surfacing + tagging, description-column rendering, query-pattern exclusion, and v2 defaulting. |
| src/uipath_langchain/agent/tools/datafabric_tool/prompts/v2.py | Adds the v2 SQL prompt template including explicit system/audit-field precedence rules and guidance. |
| src/uipath_langchain/agent/tools/datafabric_tool/prompts/registry.py | Registers v2 and adds policy flags to PromptVersion; flips DEFAULT_PROMPT_VERSION to v2. |
| src/uipath_langchain/agent/tools/datafabric_tool/models.py | Adds FieldSchema.is_system_field and appends a system modifier to display_type. |
| src/uipath_langchain/agent/tools/datafabric_tool/datafabric_prompt_builder.py | Gates inclusion of system fields and description rendering via registry flags; excludes system fields from derived query patterns. |
| include_system_fields: bool = False | ||
| """Surface auto-added system/audit fields (Id, CreateTime, UpdateTime, | ||
| CreatedBy, UpdatedBy) in the entity schema table, tagged ``system``.""" | ||
| render_field_descriptions: bool = False | ||
| """Render a Description column in the entity schema table.""" |
There was a problem hiding this comment.
addressed
eaa36bd to
37aa826
Compare
There was a problem hiding this comment.
Update to v1. We are not there yet getting used by customers actively
| } | ||
|
|
||
| DEFAULT_PROMPT_VERSION = "v1" | ||
| DEFAULT_PROMPT_VERSION = "v2" |
There was a problem hiding this comment.
keep it to v1 and make updates there
There was a problem hiding this comment.
addressed
|
|
||
| # Versions absent from this map fall back to the default (legacy) rendering: | ||
| # no system fields, no description column. | ||
| _POLICIES: dict[str, SchemaRenderPolicy] = { |
There was a problem hiding this comment.
I dont think this is warranted. Ideally this needs to be morphed at server side and the policy may not be needed to render it to prompt. A policy in this context is basically a construct of how a process is explored and exploited. Read up about RLs policy evaluation and value iteration constructs to get into how claude may have redirected you here.
There was a problem hiding this comment.
Got it, addressed.
Surface auto-added system/audit fields (Id, CreateTime, UpdateTime, CreatedBy, UpdatedBy) to the inner Data Fabric SQL agent and render field descriptions in the schema table, while keeping business fields authoritative when concepts overlap. - v1 template: add a SYSTEM / AUDIT FIELDS step under FIELD SELECTION — when a business field overlaps a system field's concept, prefer the business field, judged by field names and descriptions. System fields are whatever the backend tags is_system_field, not a hardcoded set. - prompt builder: system fields are surfaced (tagged `system`) and excluded from the derived query patterns; the schema table always includes a Description column. Hidden fields stay filtered. - models: FieldSchema.is_system_field; display_type appends a `system` modifier. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
37aa826 to
a94023c
Compare
|



What
Surfaces the auto-added Data Fabric system/audit fields (
Id,CreateTime,UpdateTime,CreatedBy,UpdatedBy) to the inner SQL agent and renders field descriptions in the schema table, while keeping business fields authoritative when a business field overlaps a system field's concept.Changes
prompts/v1.py— adds aSYSTEM / AUDIT FIELDSstep under FIELD SELECTION: when a business (non-system) field overlaps a system field's concept, prefer the business field, judged by field names and descriptions. System fields are whatever the backend tagsis_system_field, not a hardcoded set.datafabric_prompt_builder.py— system fields are surfaced (taggedsystem) and excluded from the derived query patterns; the schema table includes a Description column. Hidden fields stay filtered.datafabric_tool/models.py—FieldSchema.is_system_field;display_typeappends asystemmodifier.Notes
Folded into the existing
v1prompt (no separate version / default flip) and applied unconditionally in the builder — no per-version rendering policy.Tests
test_datafabric_prompt_builder.py: system fields surfaced + tagged + descriptions, query patterns exclude system fields, ECP strategy + domain guidance preserved. Datafabric + circular-import tests pass; mypy and ruff clean.Local validation — system-field guidance (with vs without)
Data Fabric System-Field Guidance — Validation
A test to check that the prompt's
SYSTEM / AUDIT FIELDSguidance makes theagent pick the business field instead of a similar system field.
What we are checking
Every entity has auto-added system fields (
Id,CreateTime,UpdateTime,CreatedBy,UpdatedBy). When a business field means almost the same thing asone of these, the agent should use the business field. We check that the
guidance makes this happen, and that without the guidance the agent picks the
system field instead.
Test entities
Created in Data Fabric (alpha, tenant
entity/a4e, folder860c6b05-98ba-44a4-970e-760ad956547d). Both entities have the same fields andthe same rows. The only difference is whether the business fields have
descriptions.
OrderTestd1ae6723-836a-f111-8fcb-0022482a9c89OrderTestNoDesc3d1bae30-836a-f111-8fcb-0022482a9c89Fields
OrderTest)OrderNumberIdStatusAmountOrderDateCreateTimeSalesRepCreatedByThe system fields (
Id,CreateTime,UpdateTime,CreatedBy,UpdatedBy)are added by the backend with
isSystemField=true,isHiddenField=false, andthe same generic description
"System built-in field".Seed data (8 rows)
OrderDateis a past date.CreateTimeis the time the row was inserted(~2026-06-17). They are different on purpose, so a wrong field choice is easy to
spot in the answer.
Prompts compared
Both prompts show the system fields (tagged
system) and show fielddescriptions. The only difference is whether the
SYSTEM / AUDIT FIELDSguidance is in the prompt.
The "without guidance" prompt was a temporary local change, picked per run with
an env variable, and removed before merge.
How we ran it
A local agent run used a custom
agent.jsonpointing at the two entities.gpt-4.1-2025-04-14, temperature0, enginebasic-v2Using OrderTest, when was order ORD-1001 created?. Only the entity namechanged; the question stayed the same.
OrderTestNoDesc(no description)OrderTestNoDesc(no description)OrderTest(has description)The word "created" is close to the system field name
CreateTimeon purpose.Results
Arm A — without guidance, on
OrderTestNoDesc(no description)OrderTestNoDesc(user_query="When was order ORD-1001 created?")execute_sql:Wrong — it used the system field
CreateTime(the insert time).Arm B — with guidance, on
OrderTestNoDesc(no description)OrderTestNoDesc(user_query="When was order ORD-1001 created?")execute_sql:Right — it used the business field
OrderDate, with no description to help.Arm C — without guidance, on
OrderTest(has description)OrderTest(user_query="When was order ORD-1001 created?")execute_sql:Right — the description was enough to pick
OrderDate, even with no guidance.Summary
Same question, model, and setup every time. System fields shown in all runs.
Field the agent chose (✅ business
OrderDate/ ❌ systemCreateTime):OrderTestNoDesc(no description)OrderTest(description)CreateTime❌OrderDate✅OrderDate✅OrderDate✅ (expected)(Arm C right, Arm A wrong).
The with-guidance +
OrderTestcell was not run. Both of its inputs (guidance on,description present) already give the right answer on their own in the other runs.