feat(providers): add Google Vertex AI inference provider by maxamillion · Pull Request #1568 · NVIDIA/OpenShell

maxamillion · 2026-05-26T16:15:51Z

Summary

Add Google Vertex AI as a first-class inference provider, supporting both service account (JWT) and gcloud ADC (OAuth2 refresh token) credential flows. Routes Anthropic models through the Vertex AI Anthropic Messages endpoint and Gemini/other models through the OpenAI-compatible endpoint.

Related Issue

Changes

Core provider (3 commits):

feat(providers): add Google Vertex AI inference provider
- New providers/google-vertex-ai.yaml provider profile (two credential definitions: service account key + gcloud ADC)
- crates/openshell-providers/src/providers/vertex.rs — provider discovery from GOOGLE_APPLICATION_CREDENTIALS / GOOGLE_CLOUD_PROJECT / GOOGLE_CLOUD_LOCATION env vars
- crates/openshell-core/src/inference.rs — VERTEX_AI_PROFILE static, AuthHeader::ServiceAccountJwt/OAuth2Token variants, profile_for dispatch
- proto/inference.proto — ResolvedRoute fields 8 (model_in_path bool) and 9 (request_path_override optional string)
- crates/openshell-router/src/config.rs — ResolvedRoute struct additions
- crates/openshell-server/src/inference.rs — resolve_vertex_ai_route with 4-case dispatch (Anthropic model, Gemini model, explicit publisher override, unknown → OpenAI compat), infer_vertex_publisher for 6 model families, full test suite
- crates/openshell-router/src/backend.rs — build_provider_url Vertex AI case: URL construction + body injection of anthropic-beta/anthropic-version for Anthropic Vertex routes
- crates/openshell-cli/src/run.rs — read_gcloud_adc, --from-gcloud-adc flag on provider create
- docs/providers/google-vertex-ai.mdx — user-facing docs for both auth paths
refactor(providers): scope vertex-provider branch to vertex ai only
- Removes non-Vertex changes that crept into the branch
fix(providers): address vertex-ai code review findings
- Replace fragile body injection heuristic (substring match on request_path_override) with semantic check on model_in_path and anthropic_messages protocol
- Add two negative tests confirming non-Vertex Anthropic and Vertex Gemini routes do NOT inject the Anthropic version header
- Replace silent let _ = rollback in provider_create with an eprintln warning including manual deletion instructions
- Improve infer_vertex_publisher doc comment to clarify only "anthropic" is consumed by routing logic today
- Add tracing::warn! when VERTEX_AI_BASE_URL escape hatch is used with an Anthropic model
- Fix clippy doc_markdown lints in new test doc comments

Testing

mise run pre-commit passes (lint, format, license headers)
Unit tests added/updated (cargo test -p openshell-router -p openshell-cli -p openshell-server — all pass)
E2E tests added/updated (requires live Vertex AI credentials; not run in CI without secrets)

Checklist

Follows Conventional Commits
Commits are signed off (DCO)

Add Google Vertex AI as an inference provider with support for: - Anthropic Claude models via the native rawPredict endpoint - All other models (Gemini, Llama, Mistral, etc.) via the OpenAI-compatible endpoint - Credential bootstrapping from gcloud Application Default Credentials via --from-gcloud-adc - Automatic publisher inference from model ID prefixes - VERTEX_AI_BASE_URL escape hatch for custom deployments New proto fields: model_in_path (bool) and request_path_override (optional string) on ResolvedRoute, enabling per-route URL construction and body injection. Adds anthropic-version header injection for rawPredict/streamRawPredict requests.

Remove nine non-Vertex AI provider YAML profiles (anthropic, claude, codex, copilot, google-drive, gitlab, openai, opencode, outlook) that were bundled into the Vertex AI feature commit. These profiles are additive catalog expansion for pre-existing provider plugins and will land in a separate branch. Restore BUILT_IN_PROFILE_YAMLS to the original three entries plus the new google-vertex-ai.yaml. Revert test assertions that were adjusted solely because the catalog entries changed: - credential_env_vars_are_deduplicated_in_profile_order: back to claude-code - list_provider_profiles_returns_built_in_profile_categories: back to 4-entry list

- Replace fragile body injection heuristic (substring match on request_path_override) with semantic check on model_in_path and anthropic_messages protocol; add two negative tests confirming non-Vertex Anthropic and Vertex Gemini routes do not inject - Replace silent let _ = rollback in provider_create with an eprintln warning that includes manual deletion instructions - Improve infer_vertex_publisher doc comment: clarify only 'anthropic' result is consumed by routing logic today - Add tracing::warn! in resolve_vertex_ai_route when VERTEX_AI_BASE_URL escape hatch is used with an Anthropic model - Fix clippy doc_markdown lints in new test doc comments

copy-pr-bot · 2026-05-26T16:15:55Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

johntmyers · 2026-05-26T16:47:57Z

+
+pub struct VertexAiProvider;
+
+fn discover_with_context(ctx: &dyn DiscoveryContext) -> Option<DiscoveredProvider> {


This is specific to the v1 of providers. In general I'd like to only support providers v2 moving forward which would rely on a provider profile and using the discovery capabilities we have for v2 profiles. Is there a specific need to support the "legacy" providers?

johntmyers · 2026-05-26T16:52:32Z

+title: "Google Vertex AI"
+sidebar-title: "Google Vertex AI"
+description: "Configure OpenShell to route inference traffic through Google Vertex AI, including Anthropic Claude and Gemini models."
+keywords: "Generative AI, Cybersecurity, AI Agents, Sandboxing, Google Vertex AI, Anthropic Claude, Inference Routing"


Could this be under tutorials? We already have one there: https://docs.nvidia.com/openshell/latest/get-started/tutorials/microsoft-graph-provider-refresh

Align Vertex routing, discovery, and credential refresh behavior with the documented setup, and harden git sync helpers against hook-time Git environment leakage so pre-commit stays reliable in worktrees. Co-authored-by: Cursor <cursoragent@cursor.com>

- Add runtime mutual exclusivity guard for --from-gcloud-adc in provider_create; clap enforces this at parse time but the guard was missing for programmatic callers - Replace fragile unwrap_or_default + if-chain in read_gcloud_adc with an explicit match on Option<&str>, distinguishing service_account, authorized_user, unknown type, and missing type field; corrects the error message for service account ADC to reference the actual workflow - Normalize leading slash in build_provider_url (Some(override), false) arm; prevents silent URL corruption when override_path lacks a leading slash; hoist trim_end_matches('/') before the match to remove 3x duplication - Populate model in non-Vertex ResolvedProviderRoute at construction instead of leaving it empty; removes the clone-and-patch in verify_provider_endpoint - Replace allow_fallback: bool in find_provider_api_key with a CredentialLookup enum (PreferredOnly / PreferredThenAny) so the call site is self-documenting and the negation logic is gone - Add tests: build_provider_url_override_path_normalizes_missing_leading_slash, vertex_ai_body_preserves_client_anthropic_version, resolve_vertex_ai_route_google_prefixed_base_url_override, resolve_vertex_ai_route_base_url_priority_google_wins

Support global and multi-region Vertex hosts and reject Anthropic base URL overrides so inference routing keeps the correct request shaping, headers, and operator guidance. Signed-off-by: Adam Miller <admiller@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

S-1: add validate_gcp_project_id/validate_gcp_region in openshell-server to reject malformed project IDs and regions before URL interpolation. DRY-1: introduce normalize_inference_provider_type in openshell-core as the single source of truth for inference provider alias resolution. profile_for delegates to it; normalize_provider_type in openshell-providers delegates inference cases to core, eliminating the duplicated alias list. DRY-2: extract VERTEX_AI_CREDENTIAL_KEY_NAMES const into openshell-core. VERTEX_AI_PROFILE and vertex.rs (discover_with_context + credential_env_vars) all reference the same constant instead of three independent copies. R-1: change read_gcloud_adc return type from anyhow::Result to miette::Result for consistency with the rest of the CLI crate. Remove anyhow dependency from openshell-cli entirely. R-2: replace the unreachable unwrap_or_else fallback in body mutation with expect(); refactor match to map_or to satisfy clippy::option_if_let_else. C-1: route_headers_for_route now extends profile passthrough headers rather than replacing them, preserving any future profile-level entries. C-4: filter empty GOOGLE_APPLICATION_CREDENTIALS before using it as a path, falling through to the default ADC location instead of PathBuf::from(""). C-5: document the JSON-body invariant above the body mutation block. C-6: debug_assert in build_provider_url (Some(suffix), true) arm that suffix does not start with '/', guarding against future misuse of the API. DRY-3: extract VERTEX_AI_PROVIDER_TYPE const in run.rs; replace 4 hardcoded string comparisons with the constant. T-2: add resolve_vertex_ai_route_whitespace_only_project_fails test. T-3: add read_gcloud_adc_malformed_json_errors test. T-4: add upsert_cluster_inference_route_vertex_ai_anthropic_sets_model_in_path test verifying model_in_path=true and request_path_override persistence.

Reject unsafe Vertex base URL overrides, canonicalize provider aliases, and drop Anthropic model discovery until the route contract is correct. Tighten ADC flag validation and align docs with the supported behavior. Co-authored-by: Cursor <cursoragent@cursor.com>

The vertex-provider branch diverged from main on three unrelated PRs: - PR NVIDIA#1526: OCSF builder macro and shared driver helpers (reverted here to match main's macro-based approach) - PR NVIDIA#1547: Python SDK FileNotFoundError -> SandboxError translation (restores user-friendly error messages for missing gateway files) - PR NVIDIA#1539: bash 3.2-compatible read loop in helm-k3s-local.sh (restores mapfile -> while IFS= read for macOS compat)

Avoid unnecessary String allocation in passthrough header comparison, validate model IDs for all Vertex routes (not just Anthropic) as defense-in-depth, document :rawPredict forward-compat in is_vertex_anthropic_rawpredict_route, and collapse duplicate IPv4/IPv6 match arms in validate_vertex_base_url.

Keep Vertex Claude requests on the rawPredict contract and only upgrade streaming calls to streamRawPredict. Mint the initial ADC-backed access token during provider creation so successful Vertex bootstrap yields an immediately usable provider. Co-authored-by: Cursor <cursoragent@cursor.com>

The VertexAiProvider plugin (discover_with_context, credential_env_vars) was the only V1-specific code added by this branch. Vertex AI discovery now relies entirely on V2 profile-based discovery: - Credentials are scanned by discover_from_profile() via the google-vertex-ai.yaml profile's discovery.credentials list. - Config keys (project ID, region, base URL, publisher) are scanned directly from VERTEX_AI_CONFIG_KEY_NAMES in the V2 path of discover_existing_provider_data(). --from-gcloud-adc and --credential flows are unaffected; they never used the plugin. --from-existing now requires providers_v2_enabled=true on the gateway, which is the correct V2-only posture. Remove the V1 registry fallback test for Vertex and update the config-only credential error test to run with V2 enabled.

When a google-vertex-ai provider is attached to a sandbox, resolve_provider_environment now derives agent-specific environment variables from the provider's config and injects them alongside the credential env vars. Static flags (always present): CLAUDE_CODE_USE_VERTEX=1 GOOSE_PROVIDER=gcp_vertex_ai Derived from VERTEX_AI_PROJECT_ID (when set): ANTHROPIC_VERTEX_PROJECT_ID GCP_PROJECT_ID GOOGLE_CLOUD_PROJECT Derived from VERTEX_AI_REGION (when set): CLOUD_ML_REGION GCP_LOCATION VERTEX_LOCATION Injected values use entry().or_insert() so explicit credentials take precedence. Sandbox --env overrides are applied at the process level after environment installation, so they naturally shadow these values. Non-Vertex providers are unaffected.

Blanket-blocking AF_NETLINK prevented getifaddrs(3) from working inside sandboxes. glibc and musl both use socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE) internally — there is no /proc fallback. This caused runtimes such as Node.js, Python, and Go to fail with errors like "getifaddrs returned an error" at startup. Replace the unconditional AF_NETLINK domain block with a two-condition seccomp rule that blocks socket() only when arg0==AF_NETLINK AND arg2!=0. This allows NETLINK_ROUTE (protocol 0) while keeping every other netlink protocol (NETLINK_SOCK_DIAG, NETLINK_NETFILTER, NETLINK_AUDIT, NETLINK_GENERIC, etc.) blocked with EPERM. Risk remains low: write operations via NETLINK_ROUTE require CAP_NET_ADMIN which the sandbox does not grant, and the network namespace scopes all reads to sandbox-local interfaces only.

Vertex AI rawPredict encodes the model in the URL path and rejects a 'model' field in the request body with HTTP 400 'Extra inputs are not permitted'. Anthropic SDK clients (including Claude Code) always include 'model' in the body for the standard Anthropic API format. Remove any client-supplied 'model' key from the body when the route targets a Vertex AI Anthropic rawPredict endpoint, complementing the existing anthropic-beta header stripping fix.

maxamillion added 3 commits May 26, 2026 10:05

maxamillion requested review from a team, derekwaynecarr and mrunalp as code owners May 26, 2026 16:15

maxamillion marked this pull request as draft May 26, 2026 16:21

johntmyers reviewed May 26, 2026

View reviewed changes

maxamillion and others added 6 commits May 26, 2026 11:52

maxamillion marked this pull request as ready for review May 27, 2026 02:52

maxamillion marked this pull request as draft May 27, 2026 03:28

maxamillion and others added 6 commits May 27, 2026 07:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(providers): add Google Vertex AI inference provider#1568

feat(providers): add Google Vertex AI inference provider#1568
maxamillion wants to merge 15 commits into
NVIDIA:mainfrom
maxamillion:vertex-provider

maxamillion commented May 26, 2026

Uh oh!

copy-pr-bot Bot commented May 26, 2026

Uh oh!

johntmyers May 26, 2026

Uh oh!

johntmyers May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		pub struct VertexAiProvider;

		fn discover_with_context(ctx: &dyn DiscoveryContext) -> Option<DiscoveredProvider> {

Conversation

maxamillion commented May 26, 2026

Summary

Related Issue

Changes

Testing

Checklist

Uh oh!

copy-pr-bot Bot commented May 26, 2026

Uh oh!

johntmyers May 26, 2026

Choose a reason for hiding this comment

Uh oh!

johntmyers May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants