Skip to content

feat(grpc,rest): gRPC + grpc-gateway REST surface for public APIs (Phase 2)#616

Closed
lakhansamani wants to merge 1 commit into
feat/service-layerfrom
feat/grpc-rest-surface
Closed

feat(grpc,rest): gRPC + grpc-gateway REST surface for public APIs (Phase 2)#616
lakhansamani wants to merge 1 commit into
feat/service-layerfrom
feat/grpc-rest-surface

Conversation

@lakhansamani
Copy link
Copy Markdown
Contributor

Stacked on top of #615 (which is stacked on #614). Review and merge after Phase 1 is in.

Summary

Lands the v1 public-API surface over gRPC and REST while keeping the existing GraphQL surface untouched. All nine proto services are authored and registered with the gRPC server. MetaService.GetMeta is fully implemented end-to-end as the vertical-slice proof; the other eight services are registered as Unimplemented stubs so gRPC reflection shows the complete API surface and REST `/v1/*` returns 501 for not-yet-migrated ops. Subsequent PRs migrate ops out of `internal/graphql` into `internal/service` one service at a time, swapping out the stubs.

Proto (`proto/authorizer/*/v1`)

  • `meta/v1`, `user/v1`, `session/v1` (+ `magic_link_service`)
  • `verification/v1` (`email_verification`, `password_reset`, `otp_challenge`)
  • `token/v1`, `authz/v1`
  • All response types are RPC-unique (buf STANDARD lint enforces this).
  • `google.api.http` annotations drive REST mapping; no `:verb` paths.
  • `buf.validate.field` on every Request; protovalidate interceptor enforces.
  • Custom options: `required_permissions`, `mcp_tool`, `audit_log`, `public`.

Toolchain (`proto/buf.gen.yaml`)

  • Added grpc-gateway plugin (Go REST stubs alongside gRPC).
  • Added openapiv2 plugin → `gen/openapi/authorizer.swagger.json` (single merged spec, served at `/openapi.json`).

gRPC server (`internal/grpcsrv`)

  • `server.go`: registers all 9 services, gRPC reflection (gated on `--enable-grpc-reflection`), gRPC health checking (`grpc.health.v1.Health`), graceful shutdown.
  • Interceptors: recovery (panic → `codes.Internal`), logging (one structured line per RPC), validate (protovalidate, request → `codes.InvalidArgument`).
  • `handlers/meta.go`: `MetaService.GetMeta` delegates to `service.Meta`.
  • `handlers/stubs.go`: 8 other services return `codes.Unimplemented`.
  • `transport/grpc_metadata.go`: gRPC metadata ↔ `service.RequestMetadata` bridge; honors `grpcgateway-*` headers so REST and gRPC see the same host/IP/UA/auth header.

REST gateway (`internal/gateway`)

  • `mount.go`: serves grpc-gateway via in-process bufconn dial to the same `grpc.Server` — no extra TCP hop, no TLS plumbing.
  • JSONPb marshaler configured with `UseProtoNames=true` so REST payloads match the existing GraphQL snake_case shape.
  • Mounted at `/v1/*` under the existing gin router (shares CORS, security headers, rate limit, logger middleware automatically).
  • `/openapi.json` serves the merged swagger doc.

CLI flags & wiring (`cmd/root.go`, `internal/config/config.go`)

  • `--grpc-port` (default 8081), `--enable-grpc-reflection` (default true)
  • `--grpc-tls-cert` / `--grpc-tls-key` / `--grpc-insecure` (TLS plumbing placeholders; current build runs cleartext for parity with HTTP)
  • `grpcsrv.Server` constructed alongside `http_handlers`; `server.Run` starts HTTP, metrics, AND gRPC listeners with shared graceful shutdown.

Service migration (one op)

  • `internal/service/meta.go`: Meta migrated out of `internal/graphql`.
  • `internal/graphql/meta.go` now delegates (same pattern as SignUp).

Test plan

  • `TestGRPCMeta`: bufconn end-to-end through `MetaServiceClient.GetMeta`
  • `TestRESTMeta`: gateway translates `GET /v1/meta` to the same gRPC call
  • Full SQLite integration suite (65s) still green — no behaviour drift
  • `go build ./...` clean
  • CI green on this stacked PR

What's still stubbed (codes.Unimplemented in this PR)

`UserService` (Create/Get/Update/Delete), `SessionService` (Create/Get/Delete/Validate), `MagicLinkService` (Create), `EmailVerificationService` (Create/Confirm), `PasswordResetService` (Create/Confirm), `OtpChallengeService` (Create/Confirm), `TokenService` (CreateValidation/RevokeRefresh), `AuthzService` (ListMyPermissions). Each gets a follow-up PR that migrates its op(s) from `internal/graphql` to `internal/service` and replaces the stub.

🤖 Generated with Claude Code

…ase 2)

Lands the v1 public-API surface over gRPC and REST while keeping the
existing GraphQL surface untouched. All nine proto services are authored
and registered with the gRPC server. MetaService.GetMeta is fully
implemented end-to-end as the vertical-slice proof; the other eight
services are registered as Unimplemented stubs so gRPC reflection shows
the complete API surface and REST `/v1/*` returns 501 for not-yet-migrated
ops. Subsequent PRs migrate ops out of internal/graphql into internal/service
one service at a time, swapping out the stubs.

Proto (proto/authorizer/*/v1):
  - meta/v1, user/v1, session/v1 (+ magic_link_service)
  - verification/v1 (email_verification, password_reset, otp_challenge)
  - token/v1, authz/v1
  - All response types are RPC-unique (buf STANDARD lint enforces this).
  - google.api.http annotations drive REST mapping; no `:verb` paths.
  - buf.validate.field on every Request; protovalidate interceptor enforces.
  - Custom options: required_permissions, mcp_tool, audit_log, public.

Toolchain (proto/buf.gen.yaml):
  - Added grpc-gateway plugin (Go REST stubs alongside gRPC).
  - Added openapiv2 plugin → gen/openapi/authorizer.swagger.json (single
    merged spec, served at /openapi.json).

gRPC server (internal/grpcsrv):
  - server.go: registers all 9 services, gRPC reflection (gated on
    --enable-grpc-reflection), gRPC health checking (grpc.health.v1.Health),
    graceful shutdown.
  - interceptors: recovery (panic → codes.Internal), logging (one structured
    line per RPC), validate (protovalidate, request → codes.InvalidArgument).
  - handlers/meta.go: MetaService.GetMeta delegates to service.Meta.
  - handlers/stubs.go: UserService, SessionService, MagicLinkService,
    EmailVerificationService, PasswordResetService, OtpChallengeService,
    TokenService, AuthzService all return codes.Unimplemented.
  - transport/grpc_metadata.go: gRPC metadata ↔ service.RequestMetadata
    bridge; honors grpcgateway-* headers so REST and gRPC see the same
    host/IP/UA/auth header.

REST gateway (internal/gateway):
  - mount.go: serves grpc-gateway via in-process bufconn dial to the same
    grpc.Server — no extra TCP hop, no TLS plumbing.
  - JSONPb marshaler configured with UseProtoNames=true so REST payloads
    match the existing GraphQL snake_case shape.
  - Mounted at /v1/* under the existing gin router (shares CORS, security
    headers, rate limit, logger middleware automatically).
  - /openapi.json serves the merged swagger doc.

CLI flags & wiring (cmd/root.go, internal/config/config.go):
  - --grpc-port (default 8081), --enable-grpc-reflection (default true)
  - --grpc-tls-cert / --grpc-tls-key / --grpc-insecure (TLS plumbing
    placeholders; current build runs cleartext for parity with HTTP).
  - grpcsrv.Server constructed alongside http_handlers; server.Run starts
    HTTP, metrics, AND gRPC listeners with shared graceful shutdown.

Service migration (one op):
  - internal/service/meta.go: Meta migrated out of internal/graphql.
  - internal/graphql/meta.go now delegates (same pattern as SignUp).

Tests:
  - TestGRPCMeta: bufconn end-to-end through MetaServiceClient.GetMeta.
  - TestRESTMeta: gateway translates GET /v1/meta to the same gRPC call.
  - Full SQLite integration suite (65s) still green.

Follow-up PRs migrate remaining 17 public ops + replace stubs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor Author

@lakhansamani lakhansamani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Principal-engineer review. This is a strong architectural cut — proto/handler separation is clean, the in-process bufconn gateway is a nice latency-saver, and using Unimplemented* stubs to register the full surface up front is the right call. The big concerns are around TLS not actually being wired, a port-collision default, and secrets in URL paths. Detail below.

Strengths

  • Proto layout: per-package files (meta/user/session/verification/token/authz) with shared types in common/v1, AIP-style resource naming (users/me, sessions/me), oneof for CreateSession.grant, FieldMask for UpdateUser. Request/response uniqueness is preserved per buf STANDARD lint. Custom options for public, audit_log, required_permissions, mcp_tool are well-scoped and forward-compatible.
  • buf toolchain: managed mode + per-module disables for protovalidate/googleapis, merged openapiv2 with json_names_for_fields=false so the swagger uses the same snake_case names. require_unimplemented_servers=false is intentional and matches the embedding strategy in stubs.go.
  • gRPC server: interceptor ordering recovery → logging → validate is correct (recovery outermost). Recovery captures stack to server logs but only surfaces codes.Internal to clients. Validate is built once at startup (protovalidate.New) and reused. Logging chooses level by code (Error for Internal/Unknown/DataLoss, Warn for everything non-OK).
  • REST gateway: in-process bufconn dial avoids a real loopback hop and TLS plumbing for the gateway leg. Mounting under gin means CORS, security headers, rate limit, CSRF, logger, metrics middleware all apply automatically (verified end-to-end in http_routes.go).
  • Transport bridge: honoring grpcgateway-* prefixed headers in MetaFromGRPC is the right pattern, and falling back to :authority for HostURL is a nice touch.
  • Vertical slice: MetaService.GetMeta exercises proto → handler → service → response projection end-to-end, with both TestGRPCMeta (bufconn) and TestRESTMeta (gateway through gin). The migration pattern proven here is repeatable.

Issues

Blocking

  1. TLS flags are dead code (cmd/root.go:112-114, internal/grpcsrv/server.go:53). --grpc-tls-cert, --grpc-tls-key, --grpc-insecure are wired into Config but grpc.NewServer(...) is called with no grpc.Creds option and there is no startup check that refuses to start without TLS material when GRPCInsecure=false. The config comment lies ("When unset and GRPCInsecure is false, the server refuses to start") — there is no such enforcement anywhere. Default behavior: the server boots cleartext gRPC regardless of --grpc-insecure. Either implement the TLS path (credentials.NewServerTLSFromFile, refuse to start without it) or — at minimum — flip --grpc-insecure to true default with a loud log.Warn and update the config doc to say "placeholder, TLS not yet wired". The PR description does call this out as "TLS plumbing placeholders" but the flag UX implies hard enforcement.
  2. Port collision: --grpc-port and --metrics-port both default to 8081 (cmd/root.go:89, 110). The collision check on cmd/root.go:347 only compares HTTP↔Metrics; it does not include GRPC. On a fresh install both ports are 8081 → one of Serve calls fails to bind (silent goroutine error log only). Add --grpc-port to the existing check and pick a different default (e.g. 8082).
  3. gen/openapi/authorizer.swagger.json is not shipped in the production image. Dockerfile only COPYs cmd/, internal/, web/templates, and the built web assets — never gen/openapi/. The handler at http_routes.go:94 reads the file via the relative path gen/openapi/authorizer.swagger.json from CWD. In production /openapi.json will always return 404. Fix options: embed via //go:embed gen/openapi/authorizer.swagger.json (no file-system dependency, no Dockerfile change needed) — strongly preferred. Otherwise add the COPY and use a configurable absolute path.
  4. Secrets in URL paths. Multiple methods route secrets through the path component, which means they land in access logs, ingress logs, proxy logs, OS kernel listen-backlog dumps, browser history, Referer headers:
    • DELETE /v1/refresh-tokens/{refresh_token} (token_service.proto:29)
    • PUT /v1/email-verifications/{token} (email_verification_service.proto:34)
    • PUT /v1/password-resets/{token} (password_reset_service.proto:28)
    • PUT /v1/otp-challenges/{challenge_id} (otp_challenge_service.proto:29) — the challenge_id doc says "today: the email / phone the OTP was sent to", so the user identifier goes in the URL. PII at minimum, enumeration vector at worst.
      Move tokens/OTPs into the request body. RFC 7009 (POST /oauth/revoke) and RFC 6749 (POST /oauth/token) both put credentials in the body for exactly this reason. For revoke this also means switching from DELETE to a custom verb (POST /v1/refresh-tokens:revoke per AIP-136) or POST /v1/refresh-tokens/revocations body-bearing.

Important

  1. Reflection defaults ON in production. --enable-grpc-reflection defaults to true (cmd/root.go:111). Reflection leaks the full API surface (including admin-adjacent methods once they migrate) to anyone who can reach the gRPC port. Default it to off, and have ops opt in for staging/dev. The CLAUDE.md project rules already mark EnableGraphQLIntrospection as a deploy-time concern; treat reflection the same way.
  2. gateway.bufSize = 1 << 20 is fragile. The comment claims "large enough that in-process gateway calls never block". Not true — bufconn blocks once N writer-side bytes exceed the buffer until a reader drains it. A single REST request body larger than 1 MiB will stall the in-process call. The existing HTTP server enforces MaxHeaderBytes: 1 << 20 but does not cap body size, so a 5 MB POST through /v1/sessions is reachable. Either bump bufSize to ~16 MiB or align it with a documented max body cap (gin Limit reader middleware on /v1/*).
  3. Cookies are not extracted in MetaFromGRPC (internal/grpcsrv/transport/grpc_metadata.go). When REST callers hit /v1/sessions/me, the gateway forwards the browser Cookie header as the grpcgateway-cookie metadata key. The transport bridge never reads it, so service methods that look at meta.Cookies (every authenticated path) will see an empty slice. Also: meta.Request is unset for gRPC callers, so signup.go:329's gcShim := &gin.Context{Request: meta.Request} will deref nil the moment any service method invoked via gRPC reaches that code path. This is fine for GetMeta but will explode in PR #4. Recommend: parse Cookie headers into []*http.Cookie in MetaFromGRPC and synthesize a minimal *http.Request (URL, Header, Host) so the gcShim pattern survives until TokenProvider is refactored.
  4. ApplyToGRPC drops all but the first Set-Cookie (grpc_metadata.go:61). The //nolint:staticcheck comment acknowledges this. metadata.Pairs("Set-Cookie", a, b, c) produces a single comma-joined value, which is wrong for cookies (commas are valid in cookie values). The correct pattern is metadata.MD{"set-cookie": values} and grpc.SendHeader once with the full MD. As written, a service method that sets both the session cookie and the MFA cookie will silently lose one through the gateway.
  5. No server-side keepalive / max message size / connection limits. grpc.NewServer is called with only the interceptor chain. No grpc.KeepaliveEnforcementPolicy (clients can pummel with PING frames), no MaxConcurrentStreams, no MaxRecvMsgSize override (defaults to 4 MiB but worth explicit). For a public-facing server these matter; the HTTP listener already sets explicit timeouts — match that here.
  6. Stub services advertise unimplemented methods on the wire. Every service is registered and reflection lists them. A client doing capability discovery will see e.g. CreateUser and try it, getting Unimplemented. That's the trade-off you took (it's documented in stubs.go) and it's defensible — but it does mean a write call to POST /v1/users returns 501 with no audit log, no rate-limit accounting, no observability. Worth adding a single registry/list of "unimplemented surface" so an operator can --disable-unimplemented if they don't want the noise.
  7. grpcsrv.Server.Run calls srv.Serve(lis) for the public TCP listener, but gateway.Handler also calls Serve on the same *grpc.Server against bufconn (server.go:73 vs server.go:108). Multiple Serve calls on a *grpc.Server are supported (s.lis is a map), but the ordering matters: Run schedules the TCP serve in a goroutine after gateway.Handler already started bufconn Serve. If gateway.Handler errors (it shouldn't, but if it does), the public gRPC listener never starts. Not blocking, but the dependency would be more obvious if grpcsrv owned listener-startup ordering and the gateway only got a *grpc.ClientConn (or a Dial(bufnet) callback).
  8. http_routes.go:88: router.Any("/v1/*path", gw) shadowing collision risk — once _authz_ admin GraphQL prefixes (per recent commits) or any future /v1/... non-gateway routes are added under gin, they'll be silently overridden by the gateway catch-all. Add a guard test that no other gin route starts with /v1/.

Nit

  1. The signup.go gcShim pattern is acknowledged with a TODO(grpc): — recommend filing a tracked follow-up before this PR merges so it doesn't ossify across all 8 future migration PRs.
  2. gateway.Handler signature returns (http.Handler, func(), error) — typed Cleanup named return or a small Gateway struct with Handler()/Close() reads cleaner than a bare func.
  3. internal/grpcsrv/server.go:81-83: reflection registration uses the v1 reflection API (default for reflection.Register). gRPC reflection v1alpha is also supported but considered deprecated; depending on client tooling (older grpcurl) consider reflection.RegisterV1. Not blocking.
  4. MetaService GetMeta request type GetMetaRequest{} is empty — perfectly fine, but consider google.protobuf.Empty for clarity. Counter-argument: empty request type future-proofs adding query fields. Either is defensible.
  5. audit_log / required_permissions / public proto options exist but no interceptor reads them (verified — no grep hits in internal/grpcsrv/). The proto comments imply runtime enforcement ("read at runtime by the gRPC server"). Update the comment in annotations.proto to "will be read at runtime in a follow-up PR; currently metadata-only" so future contributors don't think the auth/permission interceptors already exist.
  6. EmitUnpopulated=true + UseProtoNames=true: this DOES align with the existing GraphQL response shape (which always emits selected fields), but it means tokens/secrets in responses (access_token: "" when absent) get rendered as empty strings rather than omitted. Mildly leaky-by-convention; acceptable, but worth a comment in mount.go explaining the trade-off (the existing comment only justifies UseProtoNames, not EmitUnpopulated).
  7. The bufconn goroutine in gateway.Handler swallows the Serve error (_ = grpcSrv.Serve(lis)). At minimum log at warn level on non-ErrServerStopped returns.

Test coverage gaps

The vertical slice for meta is good. Beyond that:

  • No interceptor tests. Recovery (a panicking handler → codes.Internal), validate (a request that fails buf.validate → codes.InvalidArgument), logging (level selection by code). These are pure unit tests and should land in this PR — they protect every future service migration.
  • No stub-returns-Unimplemented test. A single for service := range allServices { call one RPC, expect codes.Unimplemented } would catch a misregistration during a future migration (e.g. someone removes the embedded Unimplemented*Server without replacing it).
  • No health endpoint test. grpc.health.v1.Health is registered; verify HealthCheck returns SERVING for empty service name and that shutdown flips it to NOT_SERVING before GracefulStop returns.
  • No reflection on/off test. Construct a server with EnableGRPCReflection=false and assert reflection dial returns Unimplemented.
  • No gateway-through-full-middleware test. TestRESTMeta mounts the gateway under a bare gin.New(), bypassing CSRF, CORS, rate limit, security headers, ClientCheck. A test that mounts the gateway under the actual production router (i.e. exercises server.NewRouter()) and verifies (a) CSRF blocks POST /v1/sessions without Content-Type: application/json, (b) rate-limit returns 429 after N requests, (c) CORS preflight returns the expected headers — would catch any middleware-order regression.
  • No openapi.json test. Trivial GET test that verifies the file is served (and once it's go:embed-ed per blocking item #3, that it's not byte-empty).
  • No port-collision test. A startup test where --grpc-port == --metrics-port should fail loudly (after fixing blocking item #2).
  • No TLS-required test. After blocking item #1 is fixed, a startup test that with --grpc-insecure=false and no cert/key the server fails to start with a clear error.

Recommendation

Request changes. The blocking items (TLS dead code, port collision default, swagger file missing in image, secrets in URL paths) are all small fixes individually but each is a real production issue. Once those are addressed I'd be happy to approve — the architecture is solid and the migration pattern set up here will scale to the remaining 8 services cleanly.

@lakhansamani
Copy link
Copy Markdown
Contributor Author

Superseded by #620, which consolidates this stack into a single PR against main. All blocking review findings from this PR were addressed in #620; see its body for the per-finding traceback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant