Skip to content

CS-Fasih/OpSecGuard-API

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ›ก๏ธ OpSecGuard API

Real-Time Log Security Scanner & Secret Redaction Microservice

Features โ€ข Architecture โ€ข Quick Start โ€ข API Reference โ€ข Benchmarks โ€ข Security


Overview

OpSecGuard API is a lightweight, production-ready microservice that scans real-time application logs for sensitive data leaks โ€” API keys, secrets, credentials, database connection strings โ€” and redacts them instantly.

Built with Python/FastAPI and engineered for ultra-low latency (< 10ms per log batch), it uses pre-compiled, ReDoS-safe regex patterns with zero external database dependencies.

Features

  • ๐Ÿ” 8 Specialized Detectors: OpenAI keys, AWS keys, Stripe keys, GitHub tokens, Bearer tokens, MongoDB/PostgreSQL URIs, Private keys
  • ๐Ÿงฎ Optional Entropy Detection: Shannon entropy analysis for catching unknown secret formats (configurable, off by default)
  • โšก Ultra-Fast Scanning: Synchronous sequential processing optimized for CPU-bound regex work โ€” no GIL contention
  • ๐Ÿ›ก๏ธ ReDoS-Safe: All patterns use bounded character classes โ€” no catastrophic backtracking
  • ๐Ÿ”„ Batch + Stream Modes: REST batch endpoint + WebSocket streaming for real-time log tailing
  • ๐Ÿณ Docker-Ready: Multi-stage Dockerfile with google-re2 for linear-time regex guarantees
  • ๐Ÿ“Š Built-in Benchmarking: Measure throughput and P50/P95/P99 latency

Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  FastAPI Application                 โ”‚
โ”‚                                                     โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚ POST /batch  โ”‚โ”€โ”€โ”€โ–ถโ”‚     Scanning Engine         โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚                            โ”‚ โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚ โ”‚
โ”‚  โ”‚ WS /stream   โ”‚โ”€โ”€โ”€โ–ถโ”‚  โ”‚  Regex Detectors     โ”‚  โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚  โ”‚  (pre-compiled,       โ”‚  โ”‚ โ”‚
โ”‚                      โ”‚  โ”‚   ReDoS-safe)         โ”‚  โ”‚ โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚ โ”‚
โ”‚  โ”‚ GET /health  โ”‚    โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚  โ”‚  Entropy Detector     โ”‚  โ”‚ โ”‚
โ”‚                      โ”‚  โ”‚  (optional, gated)    โ”‚  โ”‚ โ”‚
โ”‚                      โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚ โ”‚
โ”‚                      โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚ โ”‚
โ”‚                      โ”‚  โ”‚  Redaction Engine     โ”‚  โ”‚ โ”‚
โ”‚                      โ”‚  โ”‚  (single-pass merge)  โ”‚  โ”‚ โ”‚
โ”‚                      โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚ โ”‚
โ”‚                      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚                                                     โ”‚
โ”‚  Multi-worker Uvicorn (process-level parallelism)   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Design Decisions

Decision Rationale
Synchronous sequential scanning Python's GIL makes threading counterproductive for CPU-bound regex. Tight loop > thread pool overhead.
Process-level parallelism uvicorn --workers N spawns independent processes, each with its own GIL. True multi-core scaling.
Entropy detector gated math.log2 per-token computation is expensive. Disabled by default (ENABLE_ENTROPY=false).
Bounded regex quantifiers [^=:\s]{0,20} instead of [^=]* prevents runaway scanning on long lines.
Stripe before OpenAI ordering Both start with sk_/sk-. Stripe patterns match first to prevent misclassification.
Password-only URI redaction DB URIs redact only the password (group capture), preserving connection info for debugging.

Quick Start

Local Development

# Clone the repository
git clone https://github.com/CS-Fasih/OpSecGuard-API.git
cd OpSecGuard-API

# Install dependencies
pip install -r requirements.txt

# Start the server
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

# Run tests
pytest tests/ -v

# Run benchmark (server must be running)
python benchmark.py

Docker

# Build and run
docker compose up --build

# Or manually
docker build -t opsecguard-api .
docker run -p 8000:8000 opsecguard-api

API Reference

POST /v1/scan/batch

Scan a batch of log lines for sensitive data.

Request:

{
  "logs": [
    "2026-05-27 23:15:00 INFO: User login successful for admin",
    "2026-05-27 23:15:02 ERROR: OpenAI initialization failed with key sk-proj-1234567890abcdef"
  ]
}

Response:

{
  "leak_detected": true,
  "leaks_found": [
    {
      "type": "OpenAI API Key",
      "line_index": 1
    }
  ],
  "sanitized_logs": [
    "2026-05-27 23:15:00 INFO: User login successful for admin",
    "2026-05-27 23:15:02 ERROR: OpenAI initialization failed with key [REDACTED_OPENAI_KEY]"
  ],
  "scan_time_ms": 0.245
}

WebSocket /v1/scan/stream

Real-time streaming scan via WebSocket.

Send:

{"log": "ERROR: key sk-proj-1234567890abcdef used"}

or

{"logs": ["line1", "line2"]}

Receive (per line):

{
  "line_index": 0,
  "leak_detected": true,
  "leaks": [{"type": "OpenAI API Key", "line_index": 0}],
  "sanitized_line": "ERROR: key [REDACTED_OPENAI_KEY] used"
}

GET /health

{"status": "healthy", "service": "OpSecGuard API", "version": "1.0.0"}

Supported Detectors

Detector Pattern Example Redaction Tag
OpenAI API Key sk-proj-abc123... [REDACTED_OPENAI_KEY]
AWS Access Key ID AKIAIOSFODNN7EXAMPLE [REDACTED_AWS_ACCESS_KEY]
AWS Secret Key aws_secret_access_key=... [REDACTED_AWS_SECRET_KEY]
Stripe Live Key sk_live_abc123... [REDACTED_STRIPE_LIVE_KEY]
Stripe Test Key sk_test_abc123... [REDACTED_STRIPE_TEST_KEY]
GitHub Token ghp_abc123... [REDACTED_GITHUB_TOKEN]
Bearer Token Bearer eyJhbG... [REDACTED_BEARER_TOKEN]
MongoDB URI mongodb://user:pass@host Password โ†’ [REDACTED_PASSWORD]
PostgreSQL URI postgresql://user:pass@host Password โ†’ [REDACTED_PASSWORD]
Private Key -----BEGIN PRIVATE KEY----- [REDACTED_PRIVATE_KEY]
High-Entropy String* Random 20+ char tokens [REDACTED_HIGH_ENTROPY]

*Requires ENABLE_ENTROPY=true environment variable.

Configuration

All settings are controlled via environment variables:

Variable Default Description
HOST 0.0.0.0 Server bind address
PORT 8000 Server bind port
LOG_LEVEL info Logging verbosity
MAX_BATCH_SIZE 50000 Maximum lines per batch request
ENABLE_ENTROPY false Toggle entropy detector (CPU-intensive)
ENTROPY_THRESHOLD 4.5 Minimum Shannon entropy to flag
ENTROPY_MIN_LENGTH 20 Minimum token length for entropy analysis
WORKERS 4 Uvicorn worker processes

Benchmarks

Run the benchmark suite:

# Start the server
uvicorn app.main:app --host 0.0.0.0 --port 8000

# Run with defaults (10,000 lines, batch size 100)
python benchmark.py

# Custom parameters
python benchmark.py --lines 50000 --batch-size 500 --poison-ratio 0.20

Security Considerations

ReDoS Protection

  • All regex patterns use bounded character classes and constrained quantifiers
  • No nested quantifiers ((a+)+) or overlapping alternations
  • Production Docker image includes google-re2 for O(n) guaranteed matching
  • Adversarial input tests included in test suite

API Security

  • Global exception handler prevents stack trace leaks
  • Batch size limits protect against resource exhaustion
  • No sensitive data is stored โ€” pure stateless processing
  • CORS middleware configured (restrict allow_origins in production)

Project Structure

OpSecGuard-API/
โ”œโ”€โ”€ app/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ main.py              # FastAPI app, middleware, routes
โ”‚   โ”œโ”€โ”€ config.py             # Environment-based settings
โ”‚   โ”œโ”€โ”€ models.py             # Pydantic request/response schemas
โ”‚   โ”œโ”€โ”€ scanner/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ detectors.py      # Pre-compiled regex detectors
โ”‚   โ”‚   โ”œโ”€โ”€ entropy.py        # Shannon entropy calculator
โ”‚   โ”‚   โ””โ”€โ”€ engine.py         # Scanning orchestration engine
โ”‚   โ””โ”€โ”€ routes/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ batch.py           # POST /v1/scan/batch
โ”‚       โ””โ”€โ”€ stream.py          # WebSocket /v1/scan/stream
โ”œโ”€โ”€ tests/
โ”‚   โ””โ”€โ”€ test_scanner.py        # Comprehensive test suite
โ”œโ”€โ”€ benchmark.py               # Performance benchmarking script
โ”œโ”€โ”€ Dockerfile                 # Multi-stage production image
โ”œโ”€โ”€ docker-compose.yml         # One-command deployment
โ”œโ”€โ”€ requirements.txt           # Python dependencies
โ”œโ”€โ”€ requirements-docker.txt    # Docker-only dependencies (re2)
โ”œโ”€โ”€ LICENSE
โ””โ”€โ”€ README.md

License

MIT

About

OpSecGuard is a self-hosted log security microservice that redacts API keys and credentials from production logs in real-time. It processes 30,000+ lines/sec at a 48ms P99 latency, stopping leaks before they hit Datadog or CloudWatch.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors