Skip to content

feat(datafabric): surface system fields under prompt v2 with business-field precedence#916

Open
milind-jain-uipath wants to merge 2 commits into
mainfrom
feat/datafabric-system-fields-prompt-v2
Open

feat(datafabric): surface system fields under prompt v2 with business-field precedence#916
milind-jain-uipath wants to merge 2 commits into
mainfrom
feat/datafabric-system-fields-prompt-v2

Conversation

@milind-jain-uipath

@milind-jain-uipath milind-jain-uipath commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What

Surfaces the auto-added Data Fabric system/audit fields (Id, CreateTime, UpdateTime, CreatedBy, UpdatedBy) to the inner SQL agent and renders field descriptions in the schema table, while keeping business fields authoritative when a business field overlaps a system field's concept.

Changes

  • prompts/v1.py — adds a SYSTEM / AUDIT FIELDS step under FIELD SELECTION: when a business (non-system) field overlaps a system field's concept, prefer the business field, judged by field names and descriptions. System fields are whatever the backend tags is_system_field, not a hardcoded set.
  • datafabric_prompt_builder.py — system fields are surfaced (tagged system) and excluded from the derived query patterns; the schema table includes a Description column. Hidden fields stay filtered.
  • datafabric_tool/models.pyFieldSchema.is_system_field; display_type appends a system modifier.

Notes

Folded into the existing v1 prompt (no separate version / default flip) and applied unconditionally in the builder — no per-version rendering policy.

Tests

test_datafabric_prompt_builder.py: system fields surfaced + tagged + descriptions, query patterns exclude system fields, ECP strategy + domain guidance preserved. Datafabric + circular-import tests pass; mypy and ruff clean.

Local validation — system-field guidance (with vs without)

Data Fabric System-Field Guidance — Validation

A test to check that the prompt's SYSTEM / AUDIT FIELDS guidance makes the
agent pick the business field instead of a similar system field.

What we are checking

Every entity has auto-added system fields (Id, CreateTime, UpdateTime,
CreatedBy, UpdatedBy). When a business field means almost the same thing as
one of these, the agent should use the business field. We check that the
guidance makes this happen, and that without the guidance the agent picks the
system field instead.

Test entities

Created in Data Fabric (alpha, tenant entity/a4e, folder
860c6b05-98ba-44a4-970e-760ad956547d). Both entities have the same fields and
the same rows. The only difference is whether the business fields have
descriptions.

Entity Id Field descriptions
OrderTest d1ae6723-836a-f111-8fcb-0022482a9c89 yes
OrderTestNoDesc 3d1bae30-836a-f111-8fcb-0022482a9c89 no

Fields

Field SQL type Description (OrderTest) Overlaps with
OrderNumber NVARCHAR Human-facing order number shown to customers Id
Status NVARCHAR Order status: Open, Shipped, or Cancelled
Amount DECIMAL Total order amount in USD
OrderDate NVARCHAR Date the customer placed the order CreateTime
SalesRep NVARCHAR Name of the sales representative who owns the order CreatedBy

The system fields (Id, CreateTime, UpdateTime, CreatedBy, UpdatedBy)
are added by the backend with isSystemField=true, isHiddenField=false, and
the same generic description "System built-in field".

Seed data (8 rows)

OrderDate is a past date. CreateTime is the time the row was inserted
(~2026-06-17). They are different on purpose, so a wrong field choice is easy to
spot in the answer.

OrderNumber Status Amount OrderDate SalesRep
ORD-1001 Shipped 1250.00 2023-01-15 Alice Chen
ORD-1002 Open 875.50 2023-03-22 Bob Martinez
ORD-1003 Cancelled 430.00 2023-05-10 Alice Chen
ORD-1004 Shipped 2100.75 2023-07-01 Carla Diaz
ORD-1005 Open 99.99 2024-02-14 Bob Martinez
ORD-1006 Shipped 560.25 2024-04-30 Carla Diaz
ORD-1007 Cancelled 1500.00 2024-06-18 Alice Chen
ORD-1008 Open 320.40 2024-09-09 Dan Kapoor

Prompts compared

Both prompts show the system fields (tagged system) and show field
descriptions. The only difference is whether the SYSTEM / AUDIT FIELDS
guidance is in the prompt.

Prompt System fields shown Guidance
with guidance (shipped) yes yes
without guidance (control) yes no

The "without guidance" prompt was a temporary local change, picked per run with
an env variable, and removed before merge.

How we ran it

A local agent run used a custom agent.json pointing at the two entities.

  • Model: gpt-4.1-2025-04-14, temperature 0, engine basic-v2
  • Same question each time: "when was order ORD-1001 created?"
  • The input named the entity so the run hit the right one, e.g.
    Using OrderTest, when was order ORD-1001 created?. Only the entity name
    changed; the question stayed the same.
  • Three runs:
    • without guidance, on OrderTestNoDesc (no description)
    • with guidance, on OrderTestNoDesc (no description)
    • without guidance, on OrderTest (has description)

The word "created" is close to the system field name CreateTime on purpose.

Results

Arm A — without guidance, on OrderTestNoDesc (no description)

  1. Tool call → OrderTestNoDesc(user_query="When was order ORD-1001 created?")
  2. SQL → execute_sql:
    SELECT OrderNumber, CreateTime FROM OrderTestNoDesc WHERE OrderNumber = 'ORD-1001' LIMIT 1
  3. Result:
    {'records': [{'CreateTime': '2026-06-17T19:32:02.5744477+00:00', 'OrderNumber': 'ORD-1001'}], 'total_count': 1}
    
  4. Answer:

    Order ORD-1001 was created on 2026-06-17T19:32:02.5744477+00:00.

Wrong — it used the system field CreateTime (the insert time).

Arm B — with guidance, on OrderTestNoDesc (no description)

  1. Tool call → OrderTestNoDesc(user_query="When was order ORD-1001 created?")
  2. SQL → execute_sql:
    SELECT OrderDate FROM OrderTestNoDesc WHERE OrderNumber = 'ORD-1001' LIMIT 1
  3. Result:
    {'records': [{'OrderDate': '2023-01-15'}], 'total_count': 1}
    
  4. Answer:

    Order ORD-1001 was created on 2023-01-15.

Right — it used the business field OrderDate, with no description to help.

Arm C — without guidance, on OrderTest (has description)

  1. Tool call → OrderTest(user_query="When was order ORD-1001 created?")
  2. SQL → execute_sql:
    SELECT OrderDate FROM OrderTest WHERE OrderNumber = 'ORD-1001' LIMIT 1
  3. Result:
    {'records': [{'OrderDate': '2023-01-15'}], 'total_count': 1}
    
  4. Answer:

    Order ORD-1001 was created on 2023-01-15.

Right — the description was enough to pick OrderDate, even with no guidance.

Summary

Same question, model, and setup every time. System fields shown in all runs.
Field the agent chose (✅ business OrderDate / ❌ system CreateTime):

Prompt OrderTestNoDesc (no description) OrderTest (description)
without guidance CreateTime OrderDate
with guidance OrderDate OrderDate ✅ (expected)
  • Without the guidance, the agent only gets it right when a description helps
    (Arm C right, Arm A wrong).
  • With the guidance, the agent gets it right even with no description (Arm B).

The with-guidance + OrderTest cell was not run. Both of its inputs (guidance on,
description present) already give the right answer on their own in the other runs.

Copilot AI review requested due to automatic review settings June 17, 2026 15:20

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new Data Fabric SQL prompt version (v2) that surfaces Data Fabric’s auto-added system/audit fields to the inner SQL agent while encoding “business-field precedence” guidance to avoid the model choosing system columns when a business column better represents the user’s intent.

Changes:

  • Added prompt v2 with a dedicated SYSTEM / AUDIT FIELDS grounding section and made it the default prompt version.
  • Extended the prompt registry with schema-rendering policy flags (include system fields, render descriptions) and updated the builder to follow those flags rather than checking version names.
  • Updated schema/context modeling and rendering to tag system fields (system) and (for v2) include a Description column, while ensuring derived query patterns do not use system fields.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/agent/tools/test_datafabric_prompt_builder.py Adds/updates tests to verify v1 baseline stability, v2 system-field surfacing + tagging, description-column rendering, query-pattern exclusion, and v2 defaulting.
src/uipath_langchain/agent/tools/datafabric_tool/prompts/v2.py Adds the v2 SQL prompt template including explicit system/audit-field precedence rules and guidance.
src/uipath_langchain/agent/tools/datafabric_tool/prompts/registry.py Registers v2 and adds policy flags to PromptVersion; flips DEFAULT_PROMPT_VERSION to v2.
src/uipath_langchain/agent/tools/datafabric_tool/models.py Adds FieldSchema.is_system_field and appends a system modifier to display_type.
src/uipath_langchain/agent/tools/datafabric_tool/datafabric_prompt_builder.py Gates inclusion of system fields and description rendering via registry flags; excludes system fields from derived query patterns.

Comment on lines +41 to +45
include_system_fields: bool = False
"""Surface auto-added system/audit fields (Id, CreateTime, UpdateTime,
CreatedBy, UpdatedBy) in the entity schema table, tagged ``system``."""
render_field_descriptions: bool = False
"""Render a Description column in the entity schema table."""

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

@milind-jain-uipath milind-jain-uipath force-pushed the feat/datafabric-system-fields-prompt-v2 branch 3 times, most recently from eaa36bd to 37aa826 Compare June 17, 2026 16:28

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to v1. We are not there yet getting used by customers actively

}

DEFAULT_PROMPT_VERSION = "v1"
DEFAULT_PROMPT_VERSION = "v2"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep it to v1 and make updates there

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed


# Versions absent from this map fall back to the default (legacy) rendering:
# no system fields, no description column.
_POLICIES: dict[str, SchemaRenderPolicy] = {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think this is warranted. Ideally this needs to be morphed at server side and the policy may not be needed to render it to prompt. A policy in this context is basically a construct of how a process is explored and exploited. Read up about RLs policy evaluation and value iteration constructs to get into how claude may have redirected you here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, addressed.

Surface auto-added system/audit fields (Id, CreateTime, UpdateTime, CreatedBy,
UpdatedBy) to the inner Data Fabric SQL agent and render field descriptions in
the schema table, while keeping business fields authoritative when concepts
overlap.

- v1 template: add a SYSTEM / AUDIT FIELDS step under FIELD SELECTION — when a
  business field overlaps a system field's concept, prefer the business field,
  judged by field names and descriptions. System fields are whatever the backend
  tags is_system_field, not a hardcoded set.
- prompt builder: system fields are surfaced (tagged `system`) and excluded from
  the derived query patterns; the schema table always includes a Description
  column. Hidden fields stay filtered.
- models: FieldSchema.is_system_field; display_type appends a `system` modifier.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@milind-jain-uipath milind-jain-uipath force-pushed the feat/datafabric-system-fields-prompt-v2 branch from 37aa826 to a94023c Compare June 18, 2026 04:56
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants