Skip to content

feat: autoclaim L1 -> L2#1644

Open
arnaubennassar wants to merge 20 commits into
developfrom
feat/autoclaim-plan
Open

feat: autoclaim L1 -> L2#1644
arnaubennassar wants to merge 20 commits into
developfrom
feat/autoclaim-plan

Conversation

@arnaubennassar

@arnaubennassar arnaubennassar commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

🔄 Changes Summary

  • Adds an opt-in autoclaim Aggkit component for L1 -> L2 bridge claims. It discovers eligible exits from l1bridgesync, persists request/cursor/attempt state, prepares claim proofs with l1infotreesync plus a per-claimer l2gersync, submits claimAsset / claimMessage transactions through EthTxManager, and tracks requests through confirmation or failure.
  • Adds per-destination claimers with named policies: allow-all, api-approve, no-message, and basic-filter.
  • Adds optional Auto Claim REST endpoints under /autoclaim/v1 for request listing, inspection, manual approval, and manual rejection, with generated Swagger docs.
  • Wires Auto Claim into startup through the new autoclaim component, config loading/validation/defaults, docs, mocks, and the OP e2e environment.
  • Hardens the bridge-service L2 deposit L1 info tree lookup used by /l1-info-tree-index: if a verified batch LER is missing from the local L2 exit tree, the lookup now avoids mixing L2 bridge-sync block numbers with L1 verified-batch block numbers and falls back to a bounded linear scan.

⚠️ Breaking Changes

  • 🛠️ Config: No breaking migration. Auto Claim adds a new optional [AutoClaim] config section and remains disabled by default.
  • 🔌 API/CLI: Adds a new optional autoclaim component selector and optional /autoclaim/v1 API. Existing bridge service API paths are unchanged.
  • 🗑️ Deprecated Features: None.

📋 Config Updates

  • 🧾 Diff/Config snippet: Adds defaults for the new disabled-by-default Auto Claim section:
[AutoClaim]
Enabled = false
StoragePath = "{{PathRWData}}/autoclaim.sqlite"
Claimers = []

[AutoClaim.API]
Enabled = false
Host = "0.0.0.0"
Port = 5579

[AutoClaim.L1ToL2Watchdog]
Enabled = true
StartBlock = 0
PollInterval = "3s"
RetryAfterErrorPeriod = "1s"
MaxRetryAttemptsAfterError = -1
EtrogL1UpgradeBlock = 0

[AutoClaim.L2ToLxWatchdog]
Enabled = false
  • Enabling Auto Claim requires both selecting the autoclaim component and setting [AutoClaim].Enabled = true.
  • Each enabled [[AutoClaim.Claimers]] entry configures one EVM destination network with its own RPC URL, bridge address, policy, gas settings, retry settings, and EthTxManager storage/signing config.
  • Auto Claim owns AutoClaim.StoragePath; each claimer also has an independent EthTxManager.StoragePath and isolated per-claimer l2gersync / L2 reorg detector databases under the Auto Claim storage directory.
  • L2 -> Lx Auto Claim remains unsupported and must stay disabled.

✅ Testing

  • 🤖 Automatic: Added unit coverage across autoclaim/api, claimer, claimtx, config, policy, proof, runtime, sender, simulator, storage, types, and watchdog.
  • 🤖 Automatic: Added Go e2e coverage for TestAutoClaimL1ToL2AllowAll, TestAutoClaimL1ToL2APIApprove, and TestAutoClaimL1ToL2BasicFilter.
  • 🤖 Automatic: Current PR checks show lint, test-unit, Go e2e, Docker image build, CodeQL, SonarCloud, and the configured multi/single-chain e2e workflows passing. Run govulncheck is currently failing.
  • 🖱️ Manual: No separate manual testing was run while refreshing this PR description.

🐞 Issues

🔗 Related PRs

  • None listed.

📝 Notes

  • Auto Claim currently supports L1 -> L2 only. L2 -> Lx discovery is reserved for future work.
  • Request IDs are derived from origin_network:destination_network:deposit_count; origin_network is the bridged token's origin network, not necessarily the chain where the bridge was initiated.
  • GER readiness is checked during proof preparation by each claimer's own l2gersync, which supports both legacy and sovereign L2 GER manager behavior.
  • The optional api-approve policy requires the Auto Claim API so operators can approve or reject requests in manual-approval-required status.
  • This branch currently skips five known flaky remove-GER e2e entrypoint tests: TestRemoveGER_NoProblematicClaims, TestRemoveGER_CategoryA, TestRemoveGER_CategoryB1, TestRemoveGER_CategoryB2, and TestGenerateInvalidGER. This is being worked on feat: migrate e2e tests from bats to go #1643
  • Bridge-service changes are a scoped regression fix for L2-origin bridge proof lookup. The previous fallback could call GetFirstVerifiedBatchesAfterBlock with GetLastRoot().BlockNum, which is an L2 bridge-sync block number even though the L1 info tree query expects an L1 block number. The new path only falls back on db.ErrNotFound, keeps the original L1 verified-batch bounds, and skips missing LERs with a linear scan when binary search cannot safely continue.

@arnaubennassar arnaubennassar marked this pull request as draft June 8, 2026 09:06
Comment thread autoclaim/storage/storage.go Fixed
@arnaubennassar arnaubennassar changed the title Feat/autoclaim plan feat: autoclaim plan Jun 9, 2026
@arnaubennassar arnaubennassar self-assigned this Jun 9, 2026
# Conflicts:
#	multidownloader/evm_multidownloader_reorg.go
#	multidownloader/evm_multidownloader_reorg_test.go
Comment thread autoclaim/storage/storage.go Fixed
@arnaubennassar arnaubennassar changed the title feat: autoclaim plan feat: autoclaim L1 -> L2 Jun 11, 2026
arnaubennassar and others added 12 commits June 11, 2026 10:57
…tic logs

- config/config_test.go: move testify/urfave imports to end of non-stdlib
  group to satisfy gci two-group ordering (fixes CI lint failure)
- config/default.go: set AutoClaim.L1ToL2Watchdog Enabled = true to match
  the assertion in TestLoadDefaultConfig (fixes unit test failure)
- autoclaim/proof/preparer.go: add debug logs in selectL1InfoTreeIndex to
  surface whether the proof is blocked by l1InfoTree lag or gerSyncer lag
- autoclaim/claimer/claimer.go: log info when proof not ready so E2E logs
  clearly show the pending state on each poll cycle

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rackedBlocks wipe

reorgDetector.Start calls loadTrackedHeaders which replaces the in-memory
trackedBlocks map entirely from DB. If l2gersync.New (which calls Subscribe
internally) ran first, the subscription entry it added would be wiped by the
subsequent loadTrackedHeaders call, causing every AddBlockToTrack to return
"subscriber not subscribed" forever.

The per-claimer l2gersync was stuck in a retry loop at the first GER injection
block (block 24), never processing the GER that covered the bridge deposit, so
GetFirstGERAfterL1InfoTreeIndex always returned ErrNotFound and the claim proof
was never prepared.

Fix: call reorgDetector.Start synchronously before l2gersync.New so that
loadTrackedHeaders completes before Subscribe adds the entry.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The 3 SQL injection hotspots in autoclaim/storage/storage.go were marked
SAFE on SonarCloud (all use parameterized ? placeholders, not user input).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add tests across 11 files covering uncovered nil-arg error paths, option
closures, getter/accessor methods, disabled-state branches, and constructor
validation to push SonarCloud new_coverage above the 80% quality gate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…overage

Adds 7 targeted tests across sender and api packages to cover
previously unreachable lines: default time.Now closure bodies,
IsClaimed error propagation, pollResult Result() failure path,
zero WaitPeriod fallback to s.pollPeriod, zero-MaxRetries target
fallback to request.MaxRetries, and claimTxHash with nil Tx entry.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sonarqubecloud

Copy link
Copy Markdown

@agglayer agglayer deleted a comment from claude Bot Jun 12, 2026
@arnaubennassar arnaubennassar marked this pull request as ready for review June 12, 2026 07:52
@joanestebanr

Copy link
Copy Markdown
Collaborator

@claude

@claude

claude Bot commented Jun 15, 2026

Copy link
Copy Markdown

Claude finished @joanestebanr's task in 2m 17s —— View job


Code Review Complete ✓

I've completed a comprehensive review of PR #1644: feat: autoclaim L1 -> L2


🚀 What's New

This PR adds an opt-in autoclaim component that automatically discovers L1->L2 bridge exits from l1bridgesync, evaluates them against configurable policies (allow-all, api-approve, no-message, basic-filter), prepares claim proofs using l1infotreesync and per-claimer l2gersync instances, and submits claimAsset/claimMessage transactions through EthTxManager. It includes an optional REST API (/autoclaim/v1) for request inspection and manual approval workflows, complete with Swagger documentation. The component remains disabled by default and integrates cleanly with the existing aggkit architecture.

🐛 Bug Fixes

  • Bridge-service L2 deposit L1 info tree lookup fix: Fixed a regression where GetFirstVerifiedBatchesAfterBlock was incorrectly called with an L2 bridge-sync block number instead of an L1 block number. The new implementation restricts the fallback to db.ErrNotFound cases only, maintains proper L1 block bounds, and implements a linear scan to skip missing LERs when binary search encounters gaps.

📋 Config Updates

Adds a new optional [AutoClaim] section (disabled by default):

[AutoClaim]
Enabled = false
StoragePath = "{{PathRWData}}/autoclaim.sqlite"
Claimers = []

[AutoClaim.API]
Enabled = false
Host = "0.0.0.0"
Port = 5579

[AutoClaim.L1ToL2Watchdog]
Enabled = true
StartBlock = 0
PollInterval = "3s"
RetryAfterErrorPeriod = "1s"
MaxRetryAttemptsAfterError = -1
EtrogL1UpgradeBlock = 0

Each claimer requires its own EthTxManager.StoragePath and gets isolated l2gersync/reorg detector databases.

⚠️ Breaking Changes

None. The feature is opt-in and disabled by default.


Review Findings

🟢 Strengths

  1. Excellent Architecture: Clean separation of concerns with well-defined interfaces (Policy, ProofPreparer, ClaimSender, Claimer). The factory pattern in runtime.go enables testability.

  2. Comprehensive Testing: 14 test files covering API, claimer, claimtx, config, policy, proof, runtime, sender, simulator, storage, types, and watchdog. SonarCloud reports 80.2% coverage on new code.

  3. Proper Error Handling: Consistent error wrapping with context (fmt.Errorf("context: %w", err)), appropriate use of require over assert in tests, and proper transaction rollback handling.

  4. State Machine Design: Well-defined request status transitions with validation in TransitionRequest. The CanTransitionTo pattern prevents invalid state changes.

  5. Database Safety: Uses prepared statements via meddler, proper SQL injection protection, atomic updates with precondition checks (ErrPreconditionFailed), and transaction isolation for multi-step operations.

  6. Security Measures: Policy simulator detects nested bridge calls, gas limit enforcement, origin/token allowlists, and target-chain simulation before claiming.

🔴 Required Issues

1. CodeQL Security Alert: Slice Memory Allocation (storage.go)

Location: autoclaim/storage/storage.go (lines flagged by CodeQL)

Issue: CodeQL flagged "Slice memory allocation with excessive size value" depending on user-provided values. This occurs in ListRequests and ListRecoverableRequests where slice capacity is determined by user input.

Evidence:

// storage.go:797
requests := make([]*autoclaimtypes.AutoClaimRequest, 0, len(requestRows))

The len(requestRows) is bounded by pageSize, which users control via query parameters. While there's validation (MaxRequestPageSize = 1000), CodeQL is concerned about DoS through repeated large allocations.

Fix: Add rate limiting to API endpoints or implement request coalescing. The validation at line 809-818 is good but doesn't prevent repeated requests.

// Suggested addition to API layer
if pageSize > autoclaimtypes.MaxRequestPageSize {
    return 0, fmt.Errorf("autoclaim request page size %d exceeds maximum %d",
        pageSize, autoclaimtypes.MaxRequestPageSize)
}

Priority: 🔴 Required - Security scanner finding

2. Missing Input Validation in Manual Decision Endpoints

Location: autoclaim/api/api.go:308-354

Issue: The manualDecision function reads the request body but doesn't validate Decider, DeciderID, or Metadata fields for maximum length or injection attacks.

Evidence:

// api.go:318-333
body, err := readDecisionRequest(c.Request.Body)
decision := autoclaimtypes.PolicyDecision{
    PolicyName: manualPolicy,
    Result:     result,
    Reason:     defaultDecisionReason(result, body.Reason),
    Metadata:   body.Metadata,  // No validation
    Decider:    body.Decider,    // No max length check
    DeciderID:  body.DeciderID,  // No max length check
    ...
}

Fix: Add validation:

const maxDeciderLength = 256
const maxReasonLength = 1024

if len(body.Decider) > maxDeciderLength {
    writeError(c, http.StatusBadRequest, fmt.Errorf("decider exceeds maximum length"))
    return
}

Priority: 🔴 Required - Potential DoS/database bloat

🟡 Suggested Improvements

3. Race Condition in Runtime Startup

Location: autoclaim/runtime/runtime.go:355-372

Issue: Multiple goroutines are started without synchronization. If one goroutine panics during startup, others may access partially initialized state.

Evidence:

// runtime.go:349-372
for _, txManager := range runtime.EthTxManagers {
    txManager := txManager
    factories.Go(func() {
        factories.StartEthTxManager(ctx, txManager)
    })
}
// No WaitGroup or error channel to detect failures

Fix: Use errgroup to detect startup failures:

import "golang.org/x/sync/errgroup"

g, ctx := errgroup.WithContext(ctx)
for _, txManager := range runtime.EthTxManagers {
    txManager := txManager
    g.Go(func() error {
        factories.StartEthTxManager(ctx, txManager)
        return ctx.Err()
    })
}

Priority: 🟡 Suggested - Improves reliability

4. Potential Integer Overflow in Binary Search

Location: autoclaim/proof/preparer.go:256 and bridgeservice/bridge.go (new code)

Issue: The midpoint calculation lowerLimit + ((upperLimit - lowerLimit) / 2) could theoretically overflow for very large block numbers, though unlikely in practice.

Evidence:

// preparer.go:256
targetBlock := lowerLimit + ((upperLimit - lowerLimit) / binarySearchDivider)

Fix: Use safer midpoint calculation:

targetBlock := lowerLimit + (upperLimit - lowerLimit) / 2  // Same semantic meaning but clearer
// Or explicitly check for overflow
if upperLimit - lowerLimit > math.MaxUint64 / 2 {
    return 0, fmt.Errorf("block range too large")
}

Priority: 🟡 Suggested - Low risk but best practice

5. Missing Context Cancellation Checks in Loops

Location: autoclaim/claimer/claimer.go:234-264, autoclaim/sender/sender.go:217-258

Issue: Long-running loops in Advance and pollResult don't check ctx.Err() frequently enough, potentially delaying shutdown.

Evidence:

// claimer.go:234-264
for {
    switch request.Status {
    case autoclaimtypes.RequestStatusDetected:
        // ... potentially long operation
    case autoclaimtypes.RequestStatusManualApprovalRequired:
        // ... no context check
    // ...
}

Fix: Add context checks:

for {
    if err := ctx.Err(); err != nil {
        return nil, err
    }
    switch request.Status {
    // ...
}

Priority: 🟡 Suggested - Improves graceful shutdown

6. Bridgeservice Regression Fix Missing Test Coverage

Location: bridgeservice/bridge.go:1495-1631 (the regression fix)

Issue: The new getFirstVerifiedBatchesForL2DepositCountLinear fallback and db.ErrNotFound handling don't appear to have corresponding test coverage in bridgeservice/bridge_test.go.

Evidence: Diff shows new functions but PR description mentions "scoped regression fix" without explicit test additions for the fallback path.

Fix: Add test cases for:

  • Missing LER scenario triggering linear scan
  • Correct block number bounds in fallback
  • Proper handling when LER is eventually found

Priority: 🟡 Suggested - Prevents future regressions

🟢 Optional Enhancements

7. Consider Adding Prometheus Metrics

Location: Throughout autoclaim package

Suggestion: Add metrics for:

  • Request state distribution (autoclaim_requests_by_status)
  • Policy evaluation duration (autoclaim_policy_evaluation_duration_seconds)
  • Claim submission success/failure rate (autoclaim_claims_total{status="confirmed|failed"})
  • Transaction retry counts

Priority: 🟢 Optional - Operational visibility

8. API Request Body Size Limit

Location: autoclaim/api/api.go:529-545

Suggestion: The readDecisionRequest function uses io.ReadAll without size limits, allowing potential memory exhaustion.

Fix:

func readDecisionRequest(reader io.Reader) (DecisionRequest, error) {
    const maxBodySize = 1024 * 1024 // 1MB
    limitedReader := io.LimitReader(reader, maxBodySize)
    body, err := io.ReadAll(limitedReader)
    // ...
}

Priority: 🟢 Optional - Defense in depth


Additional Notes

Code Quality Observations

  • ✅ Follows repo guidelines (CLAUDE.md): proper error wrapping, require in tests, interface-based dependencies
  • ✅ Import grouping correct (stdlib → external → internal with gci enforcement)
  • ✅ Exported functions have doc comments starting with function name
  • ✅ Line length under 120 chars
  • ✅ No stuttering in names

Architecture Decisions

  • Isolated per-claimer l2gersync: Smart choice to avoid GER state collisions between destination networks
  • Policy registry pattern: Extensible design for future policy types
  • Factory injection: Excellent testability without mocks bleeding into production code
  • Request key derivation: origin_network:destination_network:deposit_count provides natural deduplication

Testing Strategy

  • ✅ Unit tests cover happy paths and error cases
  • ✅ E2e tests for AllowAll, APIApprove, and BasicFilter policies
  • ⚠️ Missing: chaos testing for concurrent claim submissions
  • ⚠️ Missing: L2 reorg during claim proof preparation

Summary

This is a high-quality, production-ready feature with excellent architecture, comprehensive testing, and proper security considerations. The autoclaim component integrates cleanly with existing aggkit infrastructure and provides real operational value for bridge operators.

Recommendation: ✅ Approve with minor fixes

The two required issues (CodeQL finding and input validation) should be addressed before merge, but they don't block the overall design. The suggested improvements can be addressed in follow-up PRs.

Merge Readiness:

  • ✅ SonarCloud Quality Gate passed
  • ✅ 80.2% coverage on new code
  • ⚠️ govulncheck currently failing (per PR notes)
  • ⚠️ CodeQL security finding needs resolution

Comment thread docs/autoclaim.md
Enabled = true
StoragePath = "/var/lib/aggkit/autoclaim.sqlite"

[AutoClaim.API]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Perhaps it would make more sense to use the same REST endpoint (host + port) as Bridge, allowing both services to coexist while sharing the same host and port.

Take a look to config:

[REST]
Host = "0.0.0.0"
Port = 5577
ReadTimeout = "5m"
WriteTimeout = "5m"
MaxRequestsPerIPAndSecond = 10

Comment thread autoclaim/types/types.go

// NewBridgeExitFromSyncWithEtrog converts a bridge sync record using Etrog-upgrade awareness.
func NewBridgeExitFromSyncWithEtrog(bridge bridgesync.Bridge, etrogL1UpgradeBlock uint64) BridgeExit {
preEtrog := isPreEtrogBridge(bridge, etrogL1UpgradeBlock)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This transformation is not done in bridgeservice? It sounds to me like this code already exists somewhere

Comment thread docs/autoclaim.md

```toml
[AutoClaim]
Enabled = true

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's redundant... if you don't want AutoClaim remove from component list... it would make more sense to call it DryRun or something like that, that run normally but skip the send claim tx

NewTargetClaimReader func(common.Address, aggkittypes.BaseEthereumClienter) (autoclaimtypes.TargetClaimReader, error)
// NewGERSyncer builds the per-claimer destination-L2 GER syncer and returns a start function that
// runs its reorg detector and sync loop (the start function blocks and is meant to run in a goroutine).
NewGERSyncer func(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new L2GerSyncer can't set individually BlockFinality or InitialBlockNum. They inherit the configuration from the instance in config file

@joanestebanr joanestebanr left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point is that the API for bridgeservice and for autoclaim is very similar. But there are a different that can be a bit misleading: the first page_number: bridgeservice use 1 and autoclaim use 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

L1 -> L2 Autoclaim

3 participants