Skip to content

Implement bootnode health checks and related configurations#22

Open
anunay-xin wants to merge 2 commits into
masterfrom
aj/bootnode-health-stats
Open

Implement bootnode health checks and related configurations#22
anunay-xin wants to merge 2 commits into
masterfrom
aj/bootnode-health-stats

Conversation

@anunay-xin

@anunay-xin anunay-xin commented Jun 26, 2026

Copy link
Copy Markdown
  • Added environment variables for bootnode health checks in docker-compose.yml.
  • Updated start-wizard.sh to export bootnode network configuration.
  • Enhanced main.go to initialize bootnode health checking if enabled.
  • Introduced new API endpoints for bootnode health status and checks.
  • Updated README with bootnode health configuration details.
  • Modified Go modules to include necessary dependencies for bootnode functionality.

Summary by CodeRabbit

  • New Features

    • Added bootnode health monitoring to the app, including a new dashboard panel with live status, response times, filtering, sorting, and manual refresh.
    • Added support for selecting network-specific bootnode checks and viewing periodic health updates in the main interface.
  • Bug Fixes

    • Improved bootnode-related configuration handling and defaults so health checks can be enabled consistently across environments.
    • Expanded backend API support so bootnode status can be viewed and checked on demand.

- Added environment variables for bootnode health checks in `docker-compose.yml`.
- Updated `start-wizard.sh` to export bootnode network configuration.
- Enhanced `main.go` to initialize bootnode health checking if enabled.
- Introduced new API endpoints for bootnode health status and checks.
- Updated README with bootnode health configuration details.
- Modified Go modules to include necessary dependencies for bootnode functionality.
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds end-to-end bootnode health monitoring: a discv4 UDP ping client, an RLPx TCP dial client, a probe aggregation model, a periodic BootnodeChecker service, two new REST API endpoints (GET /v2/bootnodes/health, POST /v2/bootnodes/check), and a BootnodesPanel React component with polling hook. Config, docker-compose, and start-wizard are extended with bootnode env vars; Go toolchain is bumped to 1.25.

Changes

Bootnode Health Monitoring

Layer / File(s) Summary
Probe result model and aggregation
backend/internal/probe/probe.go, backend/internal/probe/probe_test.go
Defines Outcome, Attempt, Summary types, Aggregate function (minSuccess-based health classification with RTT averaging), and LocalSummary; validated by three unit tests.
discv4 node types, parsing, and list loader
backend/internal/discv4/node.go, backend/internal/discv4/node_test.go, backend/internal/discv4/load.go
Implements NodeID/Node types, ParseNode (full enode:// URL and incomplete hex form), hex decode helpers, LoadNodesFromURL with HTTP fetch and parseNodes, and exported mainnet/testnet URL constants.
discv4 UDP client
backend/internal/discv4/client.go, backend/internal/discv4/client_integration_test.go
NewClient binds ephemeral UDP4, opens enode DB, starts discovery listening; Ping measures RTT via discv4 table; integration test pings a live bootnode.
RLPx TCP dial client and error classification
backend/internal/rlpx/classify.go, backend/internal/rlpx/classify_test.go, backend/internal/rlpx/client.go
classifyDial maps xdp2p.DiscReason values to OK/Warning/Fail outcomes; Client wraps a minimal xdp2p.Server (MaxPeers:1, NoDiscovery) and Dial performs a timed RLPx handshake with connection-refused/timeout normalization.
BootnodeChecker service and bootnode loader
backend/internal/service/bootnode_checker.go, backend/internal/service/bootnode_load.go, backend/internal/service/bootnode_load_test.go
BootnodeChecker runs parallel UDP+TCP probes over worker goroutines, aggregates per-node BootnodeStatus, stores BootnodeHealthReport snapshots under a mutex; LoadBootnodes resolves network to URL or devnet fallback.
Config extension and main.go wiring
backend/config/config.go, backend/main.go
Adds EnableBootnodeHealth, BootnodeNetwork, BootnodeInterval, BootnodeTimeout, BootnodeParallel to Config with env-var parsing; main.go conditionally constructs and starts BootnodeChecker and registers handlers; adds POST to CORS allowed methods.
Bootnode API handler and router
backend/internal/api/bootnodes.go, backend/internal/api/handlers.go, backend/internal/api/router.go
BootnodeHandler exposes Health (GET, no-cache snapshot) and Check (POST, admin-secret-gated, async CheckNow); bootnodeGuard returns 503 when checker is nil; routes registered under /v2/bootnodes/*.
Frontend types and polling hook
frontend/src/types.ts, frontend/src/hooks/useBootnodePolling.ts
Defines BootnodeStatus and BootnodeHealthReport interfaces; useBootnodePolling polls /v2/bootnodes/health, disables on 503, and triggerCheck POSTs a check and waits up to 120 s for checkedAt to change.
BootnodesPanel UI and App integration
frontend/src/components/BootnodesPanel.tsx, frontend/src/App.tsx
Sortable, filterable, paginated panel with status badges, RTT columns, enode tooltips, error-detail modal, persisted collapsed state; wired into App via useBootnodePolling.
Infrastructure and docs
backend/Dockerfile, backend/go.mod, docker-compose.yml, backend/.env_sample, start-wizard.sh, backend/README.md
Go toolchain bumped to 1.25; docker-compose and .env_sample extended with bootnode env vars; start-wizard.sh exports BOOTNODE_NETWORK; README documents new env vars and probe dependencies.

Sequence Diagram(s)

sequenceDiagram
  participant Browser as Browser
  participant useBootnodePolling as useBootnodePolling
  participant API as GET /v2/bootnodes/health
  participant BootnodeChecker as BootnodeChecker
  participant discv4Client as discv4.Client (UDP)
  participant rlpxClient as rlpx.Client (TCP)

  Browser->>useBootnodePolling: mount / interval tick
  useBootnodePolling->>API: GET /v2/bootnodes/health
  API->>BootnodeChecker: Snapshot()
  BootnodeChecker-->>API: BootnodeHealthReport
  API-->>useBootnodePolling: 200 JSON
  useBootnodePolling-->>Browser: report state update → BootnodesPanel render

  Browser->>useBootnodePolling: onCheckNow click
  useBootnodePolling->>API: POST /v2/bootnodes/check (x-api-secret)
  API->>BootnodeChecker: CheckNow() async
  BootnodeChecker->>discv4Client: Ping(node) × N workers
  BootnodeChecker->>rlpxClient: Dial(node, timeout) × N workers
  discv4Client-->>BootnodeChecker: RTT / ErrTimeout
  rlpxClient-->>BootnodeChecker: DialOutcome
  BootnodeChecker->>BootnodeChecker: storeReport(snapshot)
  useBootnodePolling->>API: poll GET /v2/bootnodes/health until checkedAt changes
  API-->>useBootnodePolling: updated BootnodeHealthReport
  useBootnodePolling-->>Browser: checking=false, updated panel
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Suggested reviewers

  • wanwiset25
  • gzliudan

Poem

🐇 A rabbit hopped through enode land,
Pinging nodes with a tiny hand.
UDP and TCP in tow,
Watching bootnodes' health aglow.
The panel sorts by RTT fine—
Every node in health divine! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.52% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding bootnode health checks and related configuration updates.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch aj/bootnode-health-stats

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@anunay-xin anunay-xin marked this pull request as ready for review June 26, 2026 13:24

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/README.md (1)

18-18: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Update Go prerequisite to match bumped toolchain.

README lists Go 1.23+ but Dockerfile and go.mod were bumped to 1.25. Align the prerequisite to Go 1.25+ to prevent developer environment mismatches.

📝 Suggested fix
- - Go 1.23+
+ - Go 1.25+
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/README.md` at line 18, The prerequisite listed in the README is out
of sync with the bumped Go toolchain. Update the Go version requirement in the
README so it matches the version used by the project’s toolchain and module
settings, aligning it with the `Dockerfile` and `go.mod` changes to `Go 1.25+`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/internal/api/bootnodes.go`:
- Around line 22-24: The BootnodeHandler.Health endpoint currently always
returns 200 with h.checker.Snapshot(), even when the checker has not produced a
real report and Snapshot() is still zero-valued. Update Health to detect whether
a successful snapshot exists from BootnodeChecker/checkAll state and return 503
(or an explicit error response) until a valid report is available, using the
BootnodeHandler and checker snapshot flow as the place to enforce the failure
state.
- Around line 39-45: The BootnodeHandler.checkAuth path currently allows POST
/v2/bootnodes/check when adminSecret is empty, which leaves the admin-only check
endpoint exposed. Update checkAuth to reject requests when h.adminSecret is
unset (return a 403/503-style failure instead of true), or gate the
BootnodesCheck route registration so it is not wired without a configured
secret. Use BootnodeHandler, checkAuth, and BootnodesCheck to locate the fix.

In `@backend/internal/discv4/client_integration_test.go`:
- Around line 5-22: Update the integration test to match the current Client.Ping
signature: in TestPingLiveBootnode, stop passing the timeout argument and call
Ping with only the *Node returned by ParseNode. Use the Client.Ping method in
backend/internal/discv4/client.go as the reference for the expected call shape,
and keep the rest of the test flow unchanged so it compiles under the
integration tag.

In `@backend/internal/discv4/client.go`:
- Around line 56-70: `BOOTNODE_TIMEOUT` is still not being applied to discv4
ping probes because `Client.Ping` in `backend/internal/discv4/client.go` uses
the discovery stack’s fixed timeout path. Update the `Ping` API (and its call
sites such as the UDP path from `bootnode_checker.go`) to accept and honor a
caller-provided timeout, and make sure the underlying `c.tab.Ping`/pong wait
uses that value instead of the built-in 500 ms limit.

In `@backend/internal/discv4/node.go`:
- Around line 26-35: `Node.Endpoint()` and `Node.TCPEndpoint()` are rendering
host:port manually, which breaks IPv6 formatting; update both methods to use
`net.JoinHostPort` so `Node` emits bracketed IPv6 endpoints consistently. Keep
the existing TCP-vs-UDP selection logic in `Endpoint()`, but change the final
string construction in these methods so callers like `bootnode_checker` receive
valid endpoint strings for both IPv4 and IPv6.

In `@backend/internal/rlpx/classify_test.go`:
- Around line 9-33: The TestClassifyDial table is not asserting the expected
error text because the contains field is never checked, so wrong disconnect
messages can still pass. Update TestClassifyDial to validate classifyDial’s
Message against each row’s contains value for the non-empty cases, using the
classifyDial helper and the existing tt.contains column to ensure the returned
substring matches the expected disconnect reason.

In `@backend/internal/service/bootnode_checker.go`:
- Around line 205-258: The probe helpers are not respecting the configured
timeout as a real per-node budget: probeUDP ignores timeout entirely, and
probeTCP can exceed it because the minimum 1-second floor is applied per
attempt. Update probeUDP and probeTCP in bootnode_checker.go so the timeout is
divided across attempts and enforced as a total budget, with backoff/counting
included, while keeping the existing attempt aggregation via probe.Aggregate and
the local-init fast paths in workerClients.
- Around line 126-136: `checkAll()` leaves `c.report` unchanged when
`loadNodes()` fails or returns no nodes, so `Snapshot()` keeps serving stale
health data. Update `checkAll()` to explicitly clear or replace the current
snapshot in the `loadNodes()` error and empty-list branches, using the existing
`c.report`/`Snapshot()` flow so `GET /v2/bootnodes/health` reflects the current
failure or empty state instead of the previous success.

In `@backend/internal/service/bootnode_load.go`:
- Around line 37-48: `LoadBootnodes` is re-fetching bootnode data on every
health check, which ties each probe cycle to `discv4.LoadNodesFromURL` and makes
transient network/GitHub failures abort checks. Update `LoadBootnodes` and the
caller in `bootnode_checker` to use a cached parsed bootnode list, refreshing it
on startup or behind a long TTL instead of downloading on every interval. Keep
the network-specific selection logic (`BootnodesURLForNetwork`,
`loadDevnetBootnodes`) but decouple it from per-check fetching.

In `@frontend/src/App.tsx`:
- Around line 162-172: Bootnode polling is being disabled by the
Boolean(API_URL) gate in App’s useBootnodePolling setup, which blocks the
default same-origin case where API_URL falls back to ''. Update the App
component so useBootnodePolling still runs when API_URL is empty/same-origin,
and only disable it for truly invalid or intentionally off configurations. Keep
the existing API_URL handling consistent with the rest of the file and preserve
the polling flow for /v2/bootnodes/* in the default setup.

In `@frontend/src/components/BootnodesPanel.tsx`:
- Around line 331-344: The new action buttons in BootnodesPanel remove the
native focus outline with focus:outline-none but do not provide a replacement,
so keyboard focus becomes invisible. Update the affected button styles in the
copy/close controls and the other button groups in this component (the modal,
error, collapse, and pagination actions) to use a visible focus state, such as a
focus ring or outlined border, consistent with the existing button styling.
- Around line 759-769: The sortable table headers in BootnodesPanel are only
clickable on the <th>, which makes them unreachable by keyboard. Move the
sorting action into a वास्तविक button inside the header cell in the
BOOTNODE_COLUMNS render loop, and keep the tooltip behavior on the cell if
needed. Make the button focusable/activatable by keyboard and expose the current
sort state with aria-sort on the corresponding header based on the active sort
key and direction. Ensure handleSort and SortIcon still drive the same sorting
logic and visual state.

---

Outside diff comments:
In `@backend/README.md`:
- Line 18: The prerequisite listed in the README is out of sync with the bumped
Go toolchain. Update the Go version requirement in the README so it matches the
version used by the project’s toolchain and module settings, aligning it with
the `Dockerfile` and `go.mod` changes to `Go 1.25+`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 14be5205-1349-4bef-8c1e-f96c2f9176ec

📥 Commits

Reviewing files that changed from the base of the PR and between e886051 and d8467fa.

⛔ Files ignored due to path filters (1)
  • backend/go.sum is excluded by !**/*.sum
📒 Files selected for processing (28)
  • backend/.env_sample
  • backend/Dockerfile
  • backend/README.md
  • backend/config/config.go
  • backend/go.mod
  • backend/internal/api/bootnodes.go
  • backend/internal/api/handlers.go
  • backend/internal/api/router.go
  • backend/internal/discv4/client.go
  • backend/internal/discv4/client_integration_test.go
  • backend/internal/discv4/load.go
  • backend/internal/discv4/node.go
  • backend/internal/discv4/node_test.go
  • backend/internal/probe/probe.go
  • backend/internal/probe/probe_test.go
  • backend/internal/rlpx/classify.go
  • backend/internal/rlpx/classify_test.go
  • backend/internal/rlpx/client.go
  • backend/internal/service/bootnode_checker.go
  • backend/internal/service/bootnode_load.go
  • backend/internal/service/bootnode_load_test.go
  • backend/main.go
  • docker-compose.yml
  • frontend/src/App.tsx
  • frontend/src/components/BootnodesPanel.tsx
  • frontend/src/hooks/useBootnodePolling.ts
  • frontend/src/types.ts
  • start-wizard.sh

Comment on lines +22 to +24
func (h *BootnodeHandler) Health(c *gin.Context) {
c.Header("Cache-Control", "no-store")
c.JSON(http.StatusOK, h.checker.Snapshot())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Return a failure state until the checker has produced a real report.

checkAll() exits without storing a report when bootnode loading fails or no nodes are available, so Snapshot() stays zero-valued. This handler still answers 200, which makes clients treat the feature as available and render an empty result instead of the disabled/error path. Please return 503 (or an explicit error payload) until a successful snapshot exists.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/internal/api/bootnodes.go` around lines 22 - 24, The
BootnodeHandler.Health endpoint currently always returns 200 with
h.checker.Snapshot(), even when the checker has not produced a real report and
Snapshot() is still zero-valued. Update Health to detect whether a successful
snapshot exists from BootnodeChecker/checkAll state and return 503 (or an
explicit error response) until a valid report is available, using the
BootnodeHandler and checker snapshot flow as the place to enforce the failure
state.

Comment on lines +39 to +45
func (h *BootnodeHandler) checkAuth(c *gin.Context) bool {
if h.adminSecret == "" {
return true
}
secret := c.GetHeader("x-api-secret")
if secret != h.adminSecret {
c.JSON(http.StatusUnauthorized, gin.H{

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Keep the manual check endpoint closed when no admin secret is configured.

This is labeled admin-only, but checkAuth returns true when ADMIN_SECRET is unset. That leaves POST /v2/bootnodes/check callable by anyone, and each request can trigger outbound UDP/TCP probes. Return 403/503 here when no admin secret is configured, or avoid wiring BootnodesCheck at all in that case.

Possible fix
 func (h *BootnodeHandler) checkAuth(c *gin.Context) bool {
 	if h.adminSecret == "" {
-		return true
+		c.JSON(http.StatusServiceUnavailable, gin.H{
+			"error":   "admin secret not configured",
+			"message": "manual bootnode checks are disabled until ADMIN_SECRET is set",
+		})
+		return false
 	}
 	secret := c.GetHeader("x-api-secret")
 	if secret != h.adminSecret {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (h *BootnodeHandler) checkAuth(c *gin.Context) bool {
if h.adminSecret == "" {
return true
}
secret := c.GetHeader("x-api-secret")
if secret != h.adminSecret {
c.JSON(http.StatusUnauthorized, gin.H{
func (h *BootnodeHandler) checkAuth(c *gin.Context) bool {
if h.adminSecret == "" {
c.JSON(http.StatusServiceUnavailable, gin.H{
"error": "admin secret not configured",
"message": "manual bootnode checks are disabled until ADMIN_SECRET is set",
})
return false
}
secret := c.GetHeader("x-api-secret")
if secret != h.adminSecret {
c.JSON(http.StatusUnauthorized, gin.H{
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/internal/api/bootnodes.go` around lines 39 - 45, The
BootnodeHandler.checkAuth path currently allows POST /v2/bootnodes/check when
adminSecret is empty, which leaves the admin-only check endpoint exposed. Update
checkAuth to reject requests when h.adminSecret is unset (return a 403/503-style
failure instead of true), or gate the BootnodesCheck route registration so it is
not wired without a configured secret. Use BootnodeHandler, checkAuth, and
BootnodesCheck to locate the fix.

Comment on lines +5 to +22
import (
"testing"
"time"
)

func TestPingLiveBootnode(t *testing.T) {
enode := "enode://efc710050d65cda5d563fc6c9520986a69bbed81036b427a5ad7a2079bc8ba421e78d291a9388a1bd10ff5d9afdc2a095900528f5c45942104d818feda37cf7a@154.12.116.170:30301"
n, err := ParseNode(enode)
if err != nil {
t.Fatal(err)
}
client, err := NewClient()
if err != nil {
t.Fatal(err)
}
defer client.Close()

rtt, err := client.Ping(n, 5*time.Second)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🔴 Critical | ⚡ Quick win

Update the integration test to the current Ping signature.

Client.Ping in backend/internal/discv4/client.go:58-70 only accepts *Node, so this file no longer compiles under go test -tags=integration.

Suggested fix
 import (
 	"testing"
-	"time"
 )
@@
-	rtt, err := client.Ping(n, 5*time.Second)
+	rtt, err := client.Ping(n)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import (
"testing"
"time"
)
func TestPingLiveBootnode(t *testing.T) {
enode := "enode://efc710050d65cda5d563fc6c9520986a69bbed81036b427a5ad7a2079bc8ba421e78d291a9388a1bd10ff5d9afdc2a095900528f5c45942104d818feda37cf7a@154.12.116.170:30301"
n, err := ParseNode(enode)
if err != nil {
t.Fatal(err)
}
client, err := NewClient()
if err != nil {
t.Fatal(err)
}
defer client.Close()
rtt, err := client.Ping(n, 5*time.Second)
import (
"testing"
)
func TestPingLiveBootnode(t *testing.T) {
enode := "enode://efc710050d65cda5d563fc6c9520986a69bbed81036b427a5ad7a2079bc8ba421e78d291a9388a1bd10ff5d9afdc2a095900528f5c45942104d818feda37cf7a@154.12.116.170:30301"
n, err := ParseNode(enode)
if err != nil {
t.Fatal(err)
}
client, err := NewClient()
if err != nil {
t.Fatal(err)
}
defer client.Close()
rtt, err := client.Ping(n)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/internal/discv4/client_integration_test.go` around lines 5 - 22,
Update the integration test to match the current Client.Ping signature: in
TestPingLiveBootnode, stop passing the timeout argument and call Ping with only
the *Node returned by ParseNode. Use the Client.Ping method in
backend/internal/discv4/client.go as the reference for the expected call shape,
and keep the rest of the test flow unchanged so it compiles under the
integration tag.

Comment on lines +56 to +70
// Ping sends a single pingXDC to n and returns the RTT on success.
// The internal discovery timeout (500 ms) governs how long to wait for a pong.
func (c *Client) Ping(n *Node) (time.Duration, error) {
if err := n.validate(); err != nil {
return 0, err
}
target, err := enode.ParseV4(n.Enode())
if err != nil {
return 0, err
}
start := time.Now()
if err := c.tab.Ping(target); err != nil {
return 0, err
}
return time.Since(start), nil

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

BOOTNODE_TIMEOUT is still ignored by discv4 probes. backend/internal/service/bootnode_checker.go:205-223 passes a timeout through the UDP path, but backend/internal/discv4/client.go only exposes Ping(n *Node) and still relies on the discovery stack’s fixed internal timeout. That leaves UDP health checks capped at the old limit even when the configured timeout is higher.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/internal/discv4/client.go` around lines 56 - 70, `BOOTNODE_TIMEOUT`
is still not being applied to discv4 ping probes because `Client.Ping` in
`backend/internal/discv4/client.go` uses the discovery stack’s fixed timeout
path. Update the `Ping` API (and its call sites such as the UDP path from
`bootnode_checker.go`) to accept and honor a caller-provided timeout, and make
sure the underlying `c.tab.Ping`/pong wait uses that value instead of the
built-in 500 ms limit.

Comment on lines +26 to +35
func (n *Node) Endpoint() string {
if n.UDP == n.TCP {
return fmt.Sprintf("%s:%d", n.IP, n.TCP)
}
return fmt.Sprintf("%s:%d", n.IP, n.UDP)
}

func (n *Node) TCPEndpoint() string {
return fmt.Sprintf("%s:%d", n.IP, n.TCP)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Use net.JoinHostPort for endpoint rendering.

Endpoint() and TCPEndpoint() currently build host:port with %s:%d, so an IPv6 node becomes 2001:db8::1:30303 instead of [2001:db8::1]:30303. ParseNode() and Enode() already handle IPv6, and backend/internal/service/bootnode_checker.go:165-172 surfaces these strings directly, so any IPv6 bootnode will be exposed with a broken endpoint.

Suggested fix
 func (n *Node) Endpoint() string {
-	if n.UDP == n.TCP {
-		return fmt.Sprintf("%s:%d", n.IP, n.TCP)
-	}
-	return fmt.Sprintf("%s:%d", n.IP, n.UDP)
+	port := n.UDP
+	if n.UDP == n.TCP {
+		port = n.TCP
+	}
+	return net.JoinHostPort(n.IP.String(), strconv.Itoa(int(port)))
 }
 
 func (n *Node) TCPEndpoint() string {
-	return fmt.Sprintf("%s:%d", n.IP, n.TCP)
+	return net.JoinHostPort(n.IP.String(), strconv.Itoa(int(n.TCP)))
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (n *Node) Endpoint() string {
if n.UDP == n.TCP {
return fmt.Sprintf("%s:%d", n.IP, n.TCP)
}
return fmt.Sprintf("%s:%d", n.IP, n.UDP)
}
func (n *Node) TCPEndpoint() string {
return fmt.Sprintf("%s:%d", n.IP, n.TCP)
}
func (n *Node) Endpoint() string {
port := n.UDP
if n.UDP == n.TCP {
port = n.TCP
}
return net.JoinHostPort(n.IP.String(), strconv.Itoa(int(port)))
}
func (n *Node) TCPEndpoint() string {
return net.JoinHostPort(n.IP.String(), strconv.Itoa(int(n.TCP)))
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/internal/discv4/node.go` around lines 26 - 35, `Node.Endpoint()` and
`Node.TCPEndpoint()` are rendering host:port manually, which breaks IPv6
formatting; update both methods to use `net.JoinHostPort` so `Node` emits
bracketed IPv6 endpoints consistently. Keep the existing TCP-vs-UDP selection
logic in `Endpoint()`, but change the final string construction in these methods
so callers like `bootnode_checker` receive valid endpoint strings for both IPv4
and IPv6.

Comment on lines +205 to +258
func probeUDP(wc workerClients, n *discv4.Node, timeout time.Duration, attempts, minSuccess int) probe.Summary {
if wc.udpInit != nil {
return probe.LocalSummary(wc.udpInit)
}

tries := make([]probe.Attempt, 0, attempts)
for i := 0; i < attempts; i++ {
if i > 0 {
time.Sleep(150 * time.Millisecond)
}
rtt, err := wc.udp.Ping(n)
if err == nil {
tries = append(tries, probe.Attempt{Outcome: probe.OutcomeOK, RTT: rtt})
continue
}
tries = append(tries, probe.Attempt{Outcome: probe.OutcomeFail, Message: err.Error()})
}
return probe.Aggregate(tries, minSuccess)
}

func probeTCP(wc workerClients, n *discv4.Node, timeout time.Duration, attempts, minSuccess int) probe.Summary {
if wc.tcpInit != nil {
return probe.LocalSummary(wc.tcpInit)
}
perProbe := timeout / time.Duration(attempts)
if perProbe < time.Second {
perProbe = time.Second
}

tries := make([]probe.Attempt, 0, attempts)
for i := 0; i < attempts; i++ {
if i > 0 {
time.Sleep(150 * time.Millisecond)
}
out := wc.tcp.Dial(n, perProbe)
switch out.Outcome {
case rlpx.OutcomeOK:
tries = append(tries, probe.Attempt{
Outcome: probe.OutcomeOK,
RTT: time.Duration(out.RTT) * time.Millisecond,
})
case rlpx.OutcomeWarning:
tries = append(tries, probe.Attempt{
Outcome: probe.OutcomeWarning,
Message: out.Message,
})
default:
tries = append(tries, probe.Attempt{
Outcome: probe.OutcomeFail,
Message: out.Message,
})
}
}
return probe.Aggregate(tries, minSuccess)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

Honor the configured timeout as an actual probe budget.

probeUDP never uses timeout at all, and probeTCP can expand a smaller timeout to at least attempts * 1s plus backoff because the 1-second floor is applied per attempt. That means the checker timeout does not actually bound per-node probe duration.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/internal/service/bootnode_checker.go` around lines 205 - 258, The
probe helpers are not respecting the configured timeout as a real per-node
budget: probeUDP ignores timeout entirely, and probeTCP can exceed it because
the minimum 1-second floor is applied per attempt. Update probeUDP and probeTCP
in bootnode_checker.go so the timeout is divided across attempts and enforced as
a total budget, with backoff/counting included, while keeping the existing
attempt aggregation via probe.Aggregate and the local-init fast paths in
workerClients.

Comment on lines +37 to +48
func LoadBootnodes(cfg *config.Config) ([]*discv4.Node, error) {
if !cfg.EnableBootnodeHealth {
return nil, fmt.Errorf("bootnode health disabled")
}
if strings.EqualFold(strings.TrimSpace(cfg.BootnodeNetwork), "devnet") {
return loadDevnetBootnodes()
}
url, err := BootnodesURLForNetwork(cfg.BootnodeNetwork)
if err != nil {
return nil, err
}
return discv4.LoadNodesFromURL(url)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

Avoid coupling every health pass to a fresh GitHub fetch.

backend/internal/service/bootnode_checker.go:111-134 calls this loader on every check cycle, and LoadBootnodes always falls through to discv4.LoadNodesFromURL. That means a transient GitHub/raw failure or rate limit aborts the whole probe round before any existing node is checked. Cache the parsed list and refresh it on startup or a long TTL instead of re-downloading it every interval.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/internal/service/bootnode_load.go` around lines 37 - 48,
`LoadBootnodes` is re-fetching bootnode data on every health check, which ties
each probe cycle to `discv4.LoadNodesFromURL` and makes transient network/GitHub
failures abort checks. Update `LoadBootnodes` and the caller in
`bootnode_checker` to use a cached parsed bootnode list, refreshing it on
startup or behind a long TTL instead of downloading on every interval. Keep the
network-specific selection logic (`BootnodesURLForNetwork`,
`loadDevnetBootnodes`) but decouple it from per-check fetching.

Comment thread frontend/src/App.tsx
Comment on lines +162 to +172
const {
report: bootnodeReport,
loading: bootnodeLoading,
checking: bootnodeChecking,
error: bootnodeError,
disabled: bootnodeDisabled,
triggerCheck: triggerBootnodeCheck,
} = useBootnodePolling({
apiUrl: API_URL,
enabled: Boolean(API_URL),
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Don't disable bootnode polling for the same-origin default.

API_URL already falls back to '' elsewhere in this file, so this gate prevents /v2/bootnodes/* polling in the default same-origin setup while the panel still renders.

Suggested fix
   } = useBootnodePolling({
     apiUrl: API_URL,
-    enabled: Boolean(API_URL),
   });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const {
report: bootnodeReport,
loading: bootnodeLoading,
checking: bootnodeChecking,
error: bootnodeError,
disabled: bootnodeDisabled,
triggerCheck: triggerBootnodeCheck,
} = useBootnodePolling({
apiUrl: API_URL,
enabled: Boolean(API_URL),
});
const {
report: bootnodeReport,
loading: bootnodeLoading,
checking: bootnodeChecking,
error: bootnodeError,
disabled: bootnodeDisabled,
triggerCheck: triggerBootnodeCheck,
} = useBootnodePolling({
apiUrl: API_URL,
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/App.tsx` around lines 162 - 172, Bootnode polling is being
disabled by the Boolean(API_URL) gate in App’s useBootnodePolling setup, which
blocks the default same-origin case where API_URL falls back to ''. Update the
App component so useBootnodePolling still runs when API_URL is
empty/same-origin, and only disable it for truly invalid or intentionally off
configurations. Keep the existing API_URL handling consistent with the rest of
the file and preserve the polling flow for /v2/bootnodes/* in the default setup.

Comment on lines +331 to +344
<button
type="button"
onClick={copy}
className="text-xs px-3 py-1 rounded-full focus:outline-none"
style={{ background: '#f0f4f8', color: '#6b7280', border: 'none', cursor: 'pointer' }}
>
Copy
</button>
<button
type="button"
onClick={onClose}
className="text-xs px-3 py-1 rounded-full focus:outline-none"
style={{ background: '#242c6d', color: '#fff', border: 'none', cursor: 'pointer' }}
>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Restore a visible focus style on the new actions.

These controls remove the native outline with focus:outline-none but never add a replacement, so keyboard users lose track of the active button.

Suggested fix
- className="text-xs px-3 py-1 rounded-full transition-colors focus:outline-none"
+ className="text-xs px-3 py-1 rounded-full transition-colors focus:outline-none focus-visible:ring-2 focus-visible:ring-offset-2 focus-visible:ring-blue-600"

Apply the same pattern to the modal, error, collapse, and pagination buttons in this component.

Also applies to: 399-414, 631-661, 720-749

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/components/BootnodesPanel.tsx` around lines 331 - 344, The new
action buttons in BootnodesPanel remove the native focus outline with
focus:outline-none but do not provide a replacement, so keyboard focus becomes
invisible. Update the affected button styles in the copy/close controls and the
other button groups in this component (the modal, error, collapse, and
pagination actions) to use a visible focus state, such as a focus ring or
outlined border, consistent with the existing button styling.

Comment on lines +759 to +769
{BOOTNODE_COLUMNS.map((col) => (
<th
key={col.key}
className={`px-3 py-2 text-${col.align}${col.sortable ? ' cursor-pointer' : ''}`}
onClick={col.sortable ? () => handleSort(col.key) : undefined}
onMouseEnter={(e) => showHeaderTooltip(e, col.tooltip)}
onMouseLeave={hideTooltip}
onMouseMove={moveTooltip}
>
{col.label}
{col.sortable && <SortIcon colKey={col.key} />}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Make the sortable headers keyboard-operable.

The new sort controls live on <th> click handlers, which are not focusable and cannot be activated from the keyboard. Put a real button inside the header cell and expose the current sort via aria-sort.

Suggested fix
- <th
-   key={col.key}
-   className={`px-3 py-2 text-${col.align}${col.sortable ? ' cursor-pointer' : ''}`}
-   onClick={col.sortable ? () => handleSort(col.key) : undefined}
-   onMouseEnter={(e) => showHeaderTooltip(e, col.tooltip)}
-   onMouseLeave={hideTooltip}
-   onMouseMove={moveTooltip}
- >
-   {col.label}
-   {col.sortable && <SortIcon colKey={col.key} />}
- </th>
+ <th
+   key={col.key}
+   scope="col"
+   aria-sort={
+     activeSort.key === col.key
+       ? activeSort.dir === 'asc' ? 'ascending' : 'descending'
+       : 'none'
+   }
+   className={col.align === 'right' ? 'px-3 py-2 text-right' : 'px-3 py-2 text-left'}
+   onMouseEnter={(e) => showHeaderTooltip(e, col.tooltip)}
+   onMouseLeave={hideTooltip}
+   onMouseMove={moveTooltip}
+ >
+   <button
+     type="button"
+     onClick={() => handleSort(col.key)}
+     className="w-full text-inherit focus:outline-none focus-visible:ring-2 focus-visible:ring-offset-2 focus-visible:ring-blue-600"
+   >
+     {col.label}
+     {col.sortable && <SortIcon colKey={col.key} />}
+   </button>
+ </th>
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@frontend/src/components/BootnodesPanel.tsx` around lines 759 - 769, The
sortable table headers in BootnodesPanel are only clickable on the <th>, which
makes them unreachable by keyboard. Move the sorting action into a वास्तविक
button inside the header cell in the BOOTNODE_COLUMNS render loop, and keep the
tooltip behavior on the cell if needed. Make the button focusable/activatable by
keyboard and expose the current sort state with aria-sort on the corresponding
header based on the active sort key and direction. Ensure handleSort and
SortIcon still drive the same sorting logic and visual state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants