Skip to content

feat(providers): add Google Vertex AI inference provider#1568

Draft
maxamillion wants to merge 15 commits into
NVIDIA:mainfrom
maxamillion:vertex-provider
Draft

feat(providers): add Google Vertex AI inference provider#1568
maxamillion wants to merge 15 commits into
NVIDIA:mainfrom
maxamillion:vertex-provider

Conversation

@maxamillion
Copy link
Copy Markdown
Collaborator

Summary

Add Google Vertex AI as a first-class inference provider, supporting both service account (JWT) and gcloud ADC (OAuth2 refresh token) credential flows. Routes Anthropic models through the Vertex AI Anthropic Messages endpoint and Gemini/other models through the OpenAI-compatible endpoint.

Related Issue

Changes

Core provider (3 commits):

  • feat(providers): add Google Vertex AI inference provider

    • New providers/google-vertex-ai.yaml provider profile (two credential definitions: service account key + gcloud ADC)
    • crates/openshell-providers/src/providers/vertex.rs — provider discovery from GOOGLE_APPLICATION_CREDENTIALS / GOOGLE_CLOUD_PROJECT / GOOGLE_CLOUD_LOCATION env vars
    • crates/openshell-core/src/inference.rsVERTEX_AI_PROFILE static, AuthHeader::ServiceAccountJwt/OAuth2Token variants, profile_for dispatch
    • proto/inference.protoResolvedRoute fields 8 (model_in_path bool) and 9 (request_path_override optional string)
    • crates/openshell-router/src/config.rsResolvedRoute struct additions
    • crates/openshell-server/src/inference.rsresolve_vertex_ai_route with 4-case dispatch (Anthropic model, Gemini model, explicit publisher override, unknown → OpenAI compat), infer_vertex_publisher for 6 model families, full test suite
    • crates/openshell-router/src/backend.rsbuild_provider_url Vertex AI case: URL construction + body injection of anthropic-beta/anthropic-version for Anthropic Vertex routes
    • crates/openshell-cli/src/run.rsread_gcloud_adc, --from-gcloud-adc flag on provider create
    • docs/providers/google-vertex-ai.mdx — user-facing docs for both auth paths
  • refactor(providers): scope vertex-provider branch to vertex ai only

    • Removes non-Vertex changes that crept into the branch
  • fix(providers): address vertex-ai code review findings

    • Replace fragile body injection heuristic (substring match on request_path_override) with semantic check on model_in_path and anthropic_messages protocol
    • Add two negative tests confirming non-Vertex Anthropic and Vertex Gemini routes do NOT inject the Anthropic version header
    • Replace silent let _ = rollback in provider_create with an eprintln warning including manual deletion instructions
    • Improve infer_vertex_publisher doc comment to clarify only "anthropic" is consumed by routing logic today
    • Add tracing::warn! when VERTEX_AI_BASE_URL escape hatch is used with an Anthropic model
    • Fix clippy doc_markdown lints in new test doc comments

Testing

  • mise run pre-commit passes (lint, format, license headers)
  • Unit tests added/updated (cargo test -p openshell-router -p openshell-cli -p openshell-server — all pass)
  • E2E tests added/updated (requires live Vertex AI credentials; not run in CI without secrets)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)

Add Google Vertex AI as an inference provider with support for:
- Anthropic Claude models via the native rawPredict endpoint
- All other models (Gemini, Llama, Mistral, etc.) via the OpenAI-compatible endpoint
- Credential bootstrapping from gcloud Application Default Credentials via --from-gcloud-adc
- Automatic publisher inference from model ID prefixes
- VERTEX_AI_BASE_URL escape hatch for custom deployments

New proto fields: model_in_path (bool) and request_path_override (optional string)
on ResolvedRoute, enabling per-route URL construction and body injection.

Adds anthropic-version header injection for rawPredict/streamRawPredict requests.
Remove nine non-Vertex AI provider YAML profiles (anthropic, claude,
codex, copilot, google-drive, gitlab, openai, opencode, outlook) that
were bundled into the Vertex AI feature commit. These profiles are
additive catalog expansion for pre-existing provider plugins and will
land in a separate branch.

Restore BUILT_IN_PROFILE_YAMLS to the original three entries plus the
new google-vertex-ai.yaml. Revert test assertions that were adjusted
solely because the catalog entries changed:
- credential_env_vars_are_deduplicated_in_profile_order: back to claude-code
- list_provider_profiles_returns_built_in_profile_categories: back to 4-entry list
- Replace fragile body injection heuristic (substring match on
  request_path_override) with semantic check on model_in_path and
  anthropic_messages protocol; add two negative tests confirming
  non-Vertex Anthropic and Vertex Gemini routes do not inject
- Replace silent let _ = rollback in provider_create with an
  eprintln warning that includes manual deletion instructions
- Improve infer_vertex_publisher doc comment: clarify only 'anthropic'
  result is consumed by routing logic today
- Add tracing::warn! in resolve_vertex_ai_route when VERTEX_AI_BASE_URL
  escape hatch is used with an Anthropic model
- Fix clippy doc_markdown lints in new test doc comments
@maxamillion maxamillion requested review from a team, derekwaynecarr and mrunalp as code owners May 26, 2026 16:15
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 26, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@maxamillion maxamillion marked this pull request as draft May 26, 2026 16:21

pub struct VertexAiProvider;

fn discover_with_context(ctx: &dyn DiscoveryContext) -> Option<DiscoveredProvider> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specific to the v1 of providers. In general I'd like to only support providers v2 moving forward which would rely on a provider profile and using the discovery capabilities we have for v2 profiles. Is there a specific need to support the "legacy" providers?

Comment on lines +4 to +7
title: "Google Vertex AI"
sidebar-title: "Google Vertex AI"
description: "Configure OpenShell to route inference traffic through Google Vertex AI, including Anthropic Claude and Gemini models."
keywords: "Generative AI, Cybersecurity, AI Agents, Sandboxing, Google Vertex AI, Anthropic Claude, Inference Routing"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxamillion and others added 6 commits May 26, 2026 11:52
Align Vertex routing, discovery, and credential refresh behavior with
the documented setup, and harden git sync helpers against hook-time
Git environment leakage so pre-commit stays reliable in worktrees.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Add runtime mutual exclusivity guard for --from-gcloud-adc in
  provider_create; clap enforces this at parse time but the guard was
  missing for programmatic callers

- Replace fragile unwrap_or_default + if-chain in read_gcloud_adc with
  an explicit match on Option<&str>, distinguishing service_account,
  authorized_user, unknown type, and missing type field; corrects the
  error message for service account ADC to reference the actual workflow

- Normalize leading slash in build_provider_url (Some(override), false)
  arm; prevents silent URL corruption when override_path lacks a leading
  slash; hoist trim_end_matches('/') before the match to remove 3x
  duplication

- Populate model in non-Vertex ResolvedProviderRoute at construction
  instead of leaving it empty; removes the clone-and-patch in
  verify_provider_endpoint

- Replace allow_fallback: bool in find_provider_api_key with a
  CredentialLookup enum (PreferredOnly / PreferredThenAny) so the call
  site is self-documenting and the negation logic is gone

- Add tests: build_provider_url_override_path_normalizes_missing_leading_slash,
  vertex_ai_body_preserves_client_anthropic_version,
  resolve_vertex_ai_route_google_prefixed_base_url_override,
  resolve_vertex_ai_route_base_url_priority_google_wins
Support global and multi-region Vertex hosts and reject
Anthropic base URL overrides so inference routing keeps the
correct request shaping, headers, and operator guidance.

Signed-off-by: Adam Miller <admiller@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
S-1: add validate_gcp_project_id/validate_gcp_region in openshell-server
to reject malformed project IDs and regions before URL interpolation.

DRY-1: introduce normalize_inference_provider_type in openshell-core as
the single source of truth for inference provider alias resolution.
profile_for delegates to it; normalize_provider_type in openshell-providers
delegates inference cases to core, eliminating the duplicated alias list.

DRY-2: extract VERTEX_AI_CREDENTIAL_KEY_NAMES const into openshell-core.
VERTEX_AI_PROFILE and vertex.rs (discover_with_context + credential_env_vars)
all reference the same constant instead of three independent copies.

R-1: change read_gcloud_adc return type from anyhow::Result to miette::Result
for consistency with the rest of the CLI crate. Remove anyhow dependency
from openshell-cli entirely.

R-2: replace the unreachable unwrap_or_else fallback in body mutation with
expect(); refactor match to map_or to satisfy clippy::option_if_let_else.

C-1: route_headers_for_route now extends profile passthrough headers rather
than replacing them, preserving any future profile-level entries.

C-4: filter empty GOOGLE_APPLICATION_CREDENTIALS before using it as a path,
falling through to the default ADC location instead of PathBuf::from("").

C-5: document the JSON-body invariant above the body mutation block.

C-6: debug_assert in build_provider_url (Some(suffix), true) arm that suffix
does not start with '/', guarding against future misuse of the API.

DRY-3: extract VERTEX_AI_PROVIDER_TYPE const in run.rs; replace 4 hardcoded
string comparisons with the constant.

T-2: add resolve_vertex_ai_route_whitespace_only_project_fails test.
T-3: add read_gcloud_adc_malformed_json_errors test.
T-4: add upsert_cluster_inference_route_vertex_ai_anthropic_sets_model_in_path
test verifying model_in_path=true and request_path_override persistence.
Reject unsafe Vertex base URL overrides, canonicalize provider
aliases, and drop Anthropic model discovery until the route
contract is correct.

Tighten ADC flag validation and align docs with the supported
behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>
The vertex-provider branch diverged from main on three unrelated PRs:
- PR NVIDIA#1526: OCSF builder macro and shared driver helpers (reverted here
  to match main's macro-based approach)
- PR NVIDIA#1547: Python SDK FileNotFoundError -> SandboxError translation
  (restores user-friendly error messages for missing gateway files)
- PR NVIDIA#1539: bash 3.2-compatible read loop in helm-k3s-local.sh
  (restores mapfile -> while IFS= read for macOS compat)
@maxamillion maxamillion marked this pull request as ready for review May 27, 2026 02:52
@maxamillion maxamillion marked this pull request as draft May 27, 2026 03:28
maxamillion and others added 6 commits May 27, 2026 07:44
Avoid unnecessary String allocation in passthrough header comparison,
validate model IDs for all Vertex routes (not just Anthropic) as
defense-in-depth, document :rawPredict forward-compat in
is_vertex_anthropic_rawpredict_route, and collapse duplicate IPv4/IPv6
match arms in validate_vertex_base_url.
Keep Vertex Claude requests on the rawPredict contract and only upgrade
streaming calls to streamRawPredict.

Mint the initial ADC-backed access token during provider creation so
successful Vertex bootstrap yields an immediately usable provider.

Co-authored-by: Cursor <cursoragent@cursor.com>
The VertexAiProvider plugin (discover_with_context, credential_env_vars)
was the only V1-specific code added by this branch. Vertex AI discovery
now relies entirely on V2 profile-based discovery:

- Credentials are scanned by discover_from_profile() via the
  google-vertex-ai.yaml profile's discovery.credentials list.
- Config keys (project ID, region, base URL, publisher) are scanned
  directly from VERTEX_AI_CONFIG_KEY_NAMES in the V2 path of
  discover_existing_provider_data().

--from-gcloud-adc and --credential flows are unaffected; they never
used the plugin. --from-existing now requires providers_v2_enabled=true
on the gateway, which is the correct V2-only posture.

Remove the V1 registry fallback test for Vertex and update the
config-only credential error test to run with V2 enabled.
When a google-vertex-ai provider is attached to a sandbox,
resolve_provider_environment now derives agent-specific environment
variables from the provider's config and injects them alongside the
credential env vars.

Static flags (always present):
  CLAUDE_CODE_USE_VERTEX=1
  GOOSE_PROVIDER=gcp_vertex_ai

Derived from VERTEX_AI_PROJECT_ID (when set):
  ANTHROPIC_VERTEX_PROJECT_ID
  GCP_PROJECT_ID
  GOOGLE_CLOUD_PROJECT

Derived from VERTEX_AI_REGION (when set):
  CLOUD_ML_REGION
  GCP_LOCATION
  VERTEX_LOCATION

Injected values use entry().or_insert() so explicit credentials take
precedence. Sandbox --env overrides are applied at the process level
after environment installation, so they naturally shadow these values.

Non-Vertex providers are unaffected.
Blanket-blocking AF_NETLINK prevented getifaddrs(3) from working inside
sandboxes. glibc and musl both use socket(AF_NETLINK, SOCK_RAW,
NETLINK_ROUTE) internally — there is no /proc fallback. This caused
runtimes such as Node.js, Python, and Go to fail with errors like
"getifaddrs returned an error" at startup.

Replace the unconditional AF_NETLINK domain block with a two-condition
seccomp rule that blocks socket() only when arg0==AF_NETLINK AND
arg2!=0. This allows NETLINK_ROUTE (protocol 0) while keeping every
other netlink protocol (NETLINK_SOCK_DIAG, NETLINK_NETFILTER,
NETLINK_AUDIT, NETLINK_GENERIC, etc.) blocked with EPERM.

Risk remains low: write operations via NETLINK_ROUTE require
CAP_NET_ADMIN which the sandbox does not grant, and the network
namespace scopes all reads to sandbox-local interfaces only.
Vertex AI rawPredict encodes the model in the URL path and rejects a
'model' field in the request body with HTTP 400 'Extra inputs are not
permitted'. Anthropic SDK clients (including Claude Code) always include
'model' in the body for the standard Anthropic API format.

Remove any client-supplied 'model' key from the body when the route
targets a Vertex AI Anthropic rawPredict endpoint, complementing the
existing anthropic-beta header stripping fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants