Skip to content

v3rpc: add lightweight request audit logging#21764

Open
icoolchn wants to merge 2 commits into
etcd-io:mainfrom
icoolchn:feature/experimental-log-request-info
Open

v3rpc: add lightweight request audit logging#21764
icoolchn wants to merge 2 commits into
etcd-io:mainfrom
icoolchn:feature/experimental-log-request-info

Conversation

@icoolchn
Copy link
Copy Markdown

@icoolchn icoolchn commented May 19, 2026

Summary

Add --experimental-log-request-info flag to enable lightweight audit logging of all unary gRPC requests at warn level. Logs method, key, client IP, and
duration without expensive proto serialization (proto.Size, .String()).

This fills the gap between:

  • --log-level=debug: logs all requests but with heavy serialization, unsuitable for production
  • --warning-unary-request-duration: only logs slow requests

Key Design

  • O(1) field extraction per request type — no proto.Size() or .String() calls
  • Warn level — visible with default --log-level=info, won't flood Info-level logs
  • Defaults to disabled — zero overhead when off
  • PutRequest — key only, value is never logged (matches existing NewLoggablePutRequest redaction)
  • Stream requests — out of scope for this PR

Test Plan

  • Unit tests for logLightweightRequestInfo with all request types
  • Build passes
  • Manual verification
  • CI passes

icoolchn and others added 2 commits May 19, 2026 16:38
Add a new flag to enable lightweight audit logging of all unary gRPC
requests at warn level. Logs method, key, client IP, and duration
without expensive proto serialization (proto.Size, .String).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: icoolchn <icoolchn@users.noreply.github.com>
Implement logLightweightRequestInfo that extracts only O(1) fields per
request type (key, range_end, lease_id, etc.) without calling
proto.Size() or .String(). When --experimental-log-request-info is
enabled, every unary gRPC request is logged at warn level with method,
key, client IP, and duration.

PutRequest deliberately omits value to match existing redaction in
NewLoggablePutRequest. Stream requests are out of scope.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: icoolchn <icoolchn@users.noreply.github.com>
@k8s-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: icoolchn
Once this PR has been reviewed and has the lgtm label, please assign fuweid for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Copy Markdown

Hi @icoolchn. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@serathius
Copy link
Copy Markdown
Member

Audit logging pipeline have totally different requirements and expected properties than normal logging. I would start from defining the differences and goals you want to achieve.

@icoolchn
Copy link
Copy Markdown
Author

icoolchn commented May 20, 2026

Audit logging pipeline have totally different requirements and expected properties than normal logging. I would start from defining the differences and goals you want to achieve.

I understand audit logging has different requirements than normal logging. Having defined the differences, I believe etcd's current need is not a full audit pipeline — it's a lightweight request logging path within the existing logging system.

Definitions

  • Normal Logging: Records operational events (server lifecycle, cluster state changes, errors). Audience: operators and developers.
  • Request Logging (this proposal): Records who accessed what key and when, for every qualifying request. Audience: operators answering "who changed this key and when".
  • Audit Logging: Records who accessed what data and when, with requirements that records must not be lost, must not be tampered with, and must be verifiable. Audience: security and compliance teams.

Key distinction: Request Logging vs Audit Logging

Request Logging Audit Logging
Core question "What requests happened?" "Who did what to the data, and can it serve as evidence?"
Completeness Best-effort; missing entries acceptable Must not lose any entry; a single gap breaks the audit chain
Tamper-proof No Required — cryptographically signed, verifiable
Guaranteed delivery No — zap buffer may lose last few entries on crash Required — must block or alert on write failure
Analogy Nginx access log Bank transaction ledger

Request logging focuses on "what happened", audit logging focuses on "can you prove it happened". This proposal is positioned as request logging, not audit logging.

Why lightweight approach

I looked at how other systems handle this:

  1. Full audit pipelines are enterprise features: MySQL, MongoDB, Consul, and Elasticsearch all reserve audit logging for enterprise editions. etcd is fully open-source.
  2. ZooKeeper proves the lightweight approach works: Both ZooKeeper and etcd are distributed coordination systems. ZooKeeper reuses logback with isolated output — viable and sufficient for operational debugging.
  3. Nginx demonstrates access log ≠ audit system: Nginx's access log is an operational tool, not a compliance system, yet it's one of the most useful features for operators.

Proposed approach

Following ZooKeeper's pattern (reusing logback with isolated logger name), I propose reusing etcd's existing logging framework (zap) with a dedicated zap.Logger instance writing to a separate file.

--experimental-log-request-path ''
    Path to the request log file. When set, lightweight request logging
    is enabled and request entries are written to this file in JSON format.
    When empty (default), request logging is disabled with zero overhead.

This follows the same pattern as PR #13236 (--experimental-warning-unary-request-duration): new flag → ServerConfig → gRPC interceptor. The difference is that #13236 changed the threshold for an existing path, while this adds a new lightweight path with output to a separate file.

Would you be open to this direction? I can update the PR accordingly, and add the full design document to the PR description if you'd like more details.


@icoolchn icoolchn closed this May 20, 2026
@icoolchn icoolchn reopened this May 20, 2026
@icoolchn
Copy link
Copy Markdown
Author

Audit logging pipeline have totally different requirements and expected properties than normal logging. I would start from defining the differences and goals you want to achieve.

I've updated my previous comment with a clearer breakdown of request logging vs audit logging, and why I believe the lightweight approach fits etcd's current needs. Would appreciate your thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants