Skip to content

feat(mcp): add HugeGraph MCP V1 stable tool surface#368

Open
UIengF wants to merge 53 commits into
apache:mainfrom
hugegraph:graph-mcp
Open

feat(mcp): add HugeGraph MCP V1 stable tool surface#368
UIengF wants to merge 53 commits into
apache:mainfrom
hugegraph:graph-mcp

Conversation

@UIengF

@UIengF UIengF commented Jun 29, 2026

Copy link
Copy Markdown

Summary

This PR adds hugegraph-mcp, a FastMCP-based MCP server for HugeGraph. It is a safe thin adapter for MCP clients/agents to inspect HugeGraph, generate or execute read-only Gremlin, extract graph data from text, and run guarded graph data import/delete or schema preview workflows.

Main Changes

  • Add hugegraph-mcp as a uv workspace member with CLI entrypoint, README/zh README, V1 docs, and MCP CI.
  • Expose V1 tools: inspect_graph_tool, generate_gremlin_tool, execute_gremlin_read_tool, extract_graph_data_tool, import_graph_data_tool, delete_graph_data_tool, design_schema_tool, apply_schema_tool.
  • Keep admin/debug tools registered but disabled by default: execute_gremlin_write_tool, refresh_vid_embeddings_tool.
  • Add unified response envelope, standardized errors, readonly/admin guards, Gremlin read-safety checks, and dry_run -> plan_hash -> confirm write protection.
  • Add HugeGraph-AI thin APIs, pyhugegraph graphspace/auth/logging adjustments, and HugeGraph Codex skills.

Test

  • Add unit tests for MCP tool contracts, envelope/error shape, config parsing, readonly/admin guards, Gremlin safety, graph inspection, graph extraction, import/delete dry-run confirmation, schema validate/dry-run, plan hash checks, and HugeGraph-AI wrappers.
  • Add CI for ruff format/check and MCP pytest on Python 3.10/3.11/3.12.
  • Add real HugeGraph 1.7.0 write-path integration test for guarded import/delete behavior.

Scope

V1 intentionally excludes GraphRAG QA, SQL/table import, graph data update, and real schema apply. These return FEATURE_DISABLED and can be added later in separate PRs.

imbajin and others added 30 commits June 11, 2025 19:36
This workflow will be triggered when a pull request is opened. It will then post a comment "@codecov-ai-reviewer review" to help with automated AI code reviews.

It will use the `peter-evans/create-or-update-comment` action to create the comment.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Yan Chao Mei <1653720237@qq.com>
Co-authored-by: imbajin <jin@apache.org>
fantasy-lotus and others added 23 commits December 2, 2025 15:28
Updated configuration instructions and file paths for MCP.
Reformat 4 test files to pass Ruff Code Quality CI:
- tests/test_error_handling.py
- tests/test_execute_gremlin_read.py
- tests/test_execute_gremlin_write.py
- tests/test_execute_schema_operations.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add design_schema() function in schema_tools.py with best practices documentation
- Add design_schema_tool() MCP tool wrapper in server.py
- Update README.md with new feature description
- Include usage guidelines: when to use, when not to use, and workflow
Catch graph-mcp up with the 11 latest commits on main (LLM fixes, REST
graph-extract API, schema generator persistence, PDF RAG uploads, client
1.7.0 assertions, code-scan refactors). graph-mcp retains its own MCP work.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Change-Id: I3ec5e62b4caa3df52bb808c7ed1beead785aff2b
Main changes:

Add the hugegraph-mcp workspace member and package entrypoint.
Expose the V1 stable MCP tools for graph inspection, Gremlin generation/read execution, graph data extraction/import/delete, schema design/validation/dry-run, and admin-gated debug operations.
Add runtime readonly guards, conservative Gremlin read safety checks, unified response envelopes, and dry-run / plan_hash / confirm safety flow for write paths.
Add HugeGraph MCP tests and CI workflow coverage.
Add HugeGraph-related Codex skills under skills/.

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Jun 29, 2026

@imbajin imbajin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the current MCP/client changes at head 8842bb53007102b47d50e4eaabb6e2cc51e9b526. I found several install/runtime safety issues that should be addressed before relying on the new MCP package independently. Local non-live checks passed, so these comments focus on behavioral and packaging gaps rather than test failures.


dependencies = [
"fastmcp>=2.2.0",
"hugegraph-python-client",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hugegraph-mcp depends on unconstrained hugegraph-python-client here. When this package is installed outside the current uv workspace, resolution falls back to the PyPI package hugegraph-python-client==0.1.1, not the 1.7.0 client in this PR. The MCP code depends on the 1.7.0 graphspace/auth-routing behavior, so an isolated install can pick an incompatible client.

Please constrain this dependency to the required version, or use the correct published package name, and add an isolated install/import test that does not rely on workspace sources. That test should verify the resolved client version and graphspace routing behavior.

errors.append(
f"vertex {idx} property '{prop_name}' expects {data_type}, got {type(prop_value).__name__}"
)
primary_keys = schema_primary_keys.get(label, [])

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For CUSTOMIZE_STRING / CUSTOMIZE_NUMBER vertex labels, dry-run validation should require each vertex payload to include id. This branch only validates primary keys when primary_keys exists; if a custom-id label has no primary keys, a vertex missing id passes validate_graph_payload() and dry-run.

I reproduced this with an id_strategy=CUSTOMIZE_STRING person label with no primary keys: dry-run returned valid, but the generated write query was g.addV('person').property('name','Alice') without property(T.id, ...). The failure is delayed until confirmed execution, which means earlier batch writes can already have succeeded before the custom-id vertex fails. That breaks the expected dry-run/confirm safety chain.

)
continue

for field in REQUIRED_FIELDS[op_type]:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema validate/dry-run can currently return success for schema operations that HugeGraph will reject. The validation mostly checks required fields, duplicate names, and properties / fields references, but it does not validate legal create_property_key.data_type / cardinality values, nor whether vertex/edge primary_keys, nullable_keys, and parent/sub edge-label fields reference live or planned property/edge labels.

I reproduced two invalid inputs that returned valid: true: data_type="NOT_A_TYPE", and a vertex label with primary_keys=["missing"]. Please add these semantic checks plus negative tests, so apply_schema_tool(mode="validate"|"dry_run") does not return a confirmable plan for invalid schema changes.

)


@mcp.tool()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This raw Gremlin write tool bypasses the dry_run -> plan_hash -> confirm write-safety chain documented in the README. With HUGEGRAPH_MCP_ADMIN_MODE=true and HUGEGRAPH_MCP_READONLY=false, one MCP call can execute arbitrary write Gremlin, including destructive statements such as drops.

If this is an intentional debug escape hatch, please document clearly in the public safety contract and tests that it is outside the safety chain. Otherwise, it should require a per-call confirmation such as confirm=True, or be gated behind a separate debug-only environment flag, so the implementation does not conflict with the documented rule that user-reachable writes follow the safety chain.

log_filename = f"{log_filename}.rank{rank}"

os.makedirs(os.path.dirname(log_filename), exist_ok=True)
try:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init_logger(log_output="client.log", stdout_logging=False) now leaves only the NullHandler. os.path.dirname("client.log") returns an empty string, so os.makedirs("") raises OSError; the exception is swallowed and the function returns before creating the RotatingFileHandler.

A plain filename in the current directory is a valid log target, so this should only call makedirs when the dirname is non-empty, then continue creating the file handler.

}
if json is not None:
kwargs["json"] = json
if cfg.password:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When HugeGraph-AI login is enabled, thin_router routes use FastAPI HTTPBearer, but this MCP AI client only sends Basic auth when HUGEGRAPH_PASSWORD is present. There is no Bearer token configuration path.

Please add a token setting such as HUGEGRAPH_AI_TOKEN and send Authorization: Bearer ..., or explicitly document that MCP AI calls require HugeGraph-AI login to be disabled.

@UIengF UIengF changed the title feat(mcp): Add HugeGraph MCP serverGraph mcp feat(mcp): add HugeGraph MCP V1 stable tool surface Jun 29, 2026

@VGalaxies VGalaxies left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary

  • Blocking: yes
  • Summary: The PR still has blocking correctness and write-safety issues in the new MCP/Thin API surface.
  • Evidence:
    • static review of git diff origin/main...HEAD
    • git diff --check origin/main...HEAD only reports the known blank-line style issue

req.text,
req.example_prompt,
"property_graph",
req.language,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High: /graph-extract passes language as split_type

hugegraph-llm/src/hugegraph_llm/api/thin_api.py:106

Evidence

  • graph_extract_api() passes req.language as the fifth flow argument, but GraphExtractFlow.prepare() expects split_type in that position and rejects anything except document, paragraph, or sentence.

Impact

  • Normal requests with language="zh" or "en" fail before extraction runs, so the MCP graph extraction path returns FLOW_EXECUTION_FAILED.

Requested fix

  • Pass the default split type, e.g. "document", before req.language, or call the flow with explicit keyword arguments; update the thin API test to assert the real flow contract.

extra_context: dict[str, Any] = field(default_factory=dict)


def compute_plan_hash(context: PlanContext) -> str:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High: Confirm plan hashes are client-forgeable

hugegraph-mcp/hugegraph_mcp/plan_hash.py:55

Evidence

  • compute_plan_hash() is a plain public SHA-256 over PlanContext, and verify_plan_hash() recomputes it from caller-supplied nonce and expires_at; manage_graph_data.py:221 passes those submitted values directly.

Impact

  • A caller with writes enabled can compute a valid hash and submit confirm=True without first receiving a server-issued dry-run token, bypassing the documented review/confirm safety chain and choosing an arbitrary future expiry.

Requested fix

  • Make confirm tokens server-issued and unforgeable, for example with server-side one-time plan records or an HMAC using a server secret, and enforce a bounded TTL.

)


@thin_router.post("/graph-import", status_code=status.HTTP_200_OK, response_model=ThinAPIResponse)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High: Thin write endpoints bypass MCP write controls

hugegraph-llm/src/hugegraph_llm/api/thin_api.py:110

Evidence

  • /graph-import directly schedules FlowName.IMPORT_GRAPH_DATA, and /vid-embeddings/refresh directly schedules FlowName.UPDATE_VID_EMBEDDINGS; the new router is included in app.py under auth that defaults to disabled.

Impact

  • A default HugeGraph-LLM deployment exposes graph import and VID embedding mutation without MCP readonly/admin gating or the dry_run -> plan_hash -> confirm controls.

Requested fix

  • Remove these mutating routes from the public thin router, or require explicit authentication/admin authorization plus the same readonly and confirm/dry-run controls before scheduling mutating flows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request llm python-client size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants