Skip to content

perf: speed up manifest JSON rendering#879

Merged
stephenamar-db merged 1 commit into
databricks:masterfrom
He-Pin:perf/manifest-json-rendering
Jun 2, 2026
Merged

perf: speed up manifest JSON rendering#879
stephenamar-db merged 1 commit into
databricks:masterfrom
He-Pin:perf/manifest-json-rendering

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented May 30, 2026

Motivation

std.manifestJson, std.manifestJsonMinified, and std.manifestJsonEx routed through java.io.StringWriter, paying StringBuffer synchronization per write/flush on the hot manifestation path. Source-built jrsonnet comparisons showed sjsonnet trailing on object-heavy manifest workloads.

Modification

  • Add StringBuilderWriter: an unsynchronized Writer over a StringBuilder.
  • Add package-private FastMaterializeJsonRenderer backed by StringBuilderWriter; route the three std.manifestJson* builtins through it. Public MaterializeJsonRenderer ABI/shape unchanged.
  • Use an in-place codepoint sort for sortedVisibleKeyNames / maybeSortKeys (avoids .sorted boxing).
  • Fix codepoint comparison for raw surrogate prefixes; UnicodeHandlingTests extended.

Result

Scala Native hyperfine on kube-prometheus, jrsonnet HEAD 2d7eed05:

Workload (native) Before After Δ
kube-prometheus, sjsonnet 158.4 ± 16.8 ms 143.7 ± 3.2 ms −9.3%
manifestJsonEx, sjsonnet 5.09 ± 1.01 ms new

Test plan

  • ./mill __.reformat
  • ./mill 'sjsonnet.jvm[3.3.7]'.test — 518/518 pass

This PR is the base for the stacked follow-ups #875 (TomlRenderer reuses StringBuilderWriter), and the independent #876/#877/#878.

Motivation:
std.manifestJson* still contributed to the local Scala Native gap versus source-built jrsonnet, especially in real-world object-heavy rendering.

Modification:
Add an internal StringBuilder-backed FastMaterializeJsonRenderer for std.manifestJson, std.manifestJsonMinified, and std.manifestJsonEx while preserving the public MaterializeJsonRenderer StringWriter API. Reuse an in-place codepoint key sorter backed by java.util.Arrays.sort, and fix raw-surrogate prefix ordering in compareStringsByCodepoint.

Result:
Full validation passed: ./mill --no-server --ticker false --color false __.reformat and ./mill --no-server --ticker false --color false -j 1 __.test reported 451/451 tests passing. JMH regressions: manifestJsonEx 0.055 ms/op, realistic2 43.596 ms/op, gen_big_object 0.842 ms/op. Direct hyperfine against source-built jrsonnet: manifestJsonEx sjsonnet-native 5.090 ms vs jrsonnet 4.075 ms; kube-prometheus sjsonnet-native 143.738 ms vs jrsonnet 97.385 ms.
@stephenamar-db stephenamar-db merged commit 6dc684f into databricks:master Jun 2, 2026
10 checks passed
stephenamar-db pushed a commit that referenced this pull request Jun 3, 2026
## Motivation

`std.manifestTomlEx` had three sources of avoidable overhead on the hot
manifestation path:

1. **Synchronized writer.** `TomlRenderer` and `ManifestModule.evalRhs`
rendered into a `java.io.StringWriter`, whose backing `StringBuffer`
pays a monitor enter/exit on every `write`/`flush`. The
`FastMaterializeJsonRenderer` already uses the unsynchronized
`StringBuilderWriter` (#874); TOML did not.
2. **Redundant field lookups in `renderTableInternal`.** Each key's
`Val.Obj.value(k)` was resolved twice — once to classify scalar vs
section, then again to render or recurse. The cache deduplicates the
result, but the lookup itself still costs.
3. **Wasted indexing work.** `visibleKeyNames` was iterated and each key
binary-searched back into `sortedVisibleKeyNames` —
`sortedVisibleKeyNames` can be iterated directly, skipping `O(n log n)`
compares per table.

## Modification

Two commits:

- **`perf: use unsynchronized StringBuilderWriter in TomlRenderer`** —
Swap `TomlRenderer` and the `manifestTomlEx` render path in
`ManifestModule` from `java.io.StringWriter` to the package-private
`StringBuilderWriter`. `std.deepJoin` keeps `StringWriter` (separate
concern).
- **`perf: cache resolved field values and skip binary search in
renderTableInternal`** — Resolve each field once into a `resolved:
Array[Val]` during section classification and reuse it during
render/recurse; iterate `sortedVisibleKeyNames` directly (removes the
now-unused `sortedKeyIndex` binary search); hoist `childIndent =
cumulatedIndent + indent` out of the section loop (was an identical
allocation per sibling section); pre-size the output
`StringBuilderWriter` to 1 KiB so small/medium outputs skip the first ~6
doublings.

Output is byte-identical (verified at 1,228,186 bytes on the benchmark
workload).

## Result

Scala Native, hyperfine A/B against `master` (`fc292fa6`). Workload:
object comprehension over 8000 small tables → ~1.2 MB TOML output
(render-dominated). Four interleaved-order passes, `--warmup 10
--min-runs 100 --shell=none`:

| pass | order | before mean | after mean | before min | after min |
**min ratio** |
|---|---|---:|---:|---:|---:|---:|
| 1 | before → after | 59.4 ± 2.7 ms | 53.2 ± 23.4 ms | 55.4 ms | 43.8
ms | **1.27×** |
| 2 | after → before | 64.1 ± 7.7 ms | 51.8 ± 12.2 ms | 56.4 ms | 43.7
ms | **1.29×** |
| 3 | before → after | 64.1 ± 8.1 ms | 53.2 ± 14.3 ms | 56.4 ms | 42.0
ms | **1.34×** |
| 4 | after → before | 63.3 ± 14.3 ms | 49.2 ± 3.7 ms | 57.2 ms | 42.8
ms | **1.34×** |

Mean is noisy on the host (1.12× – 1.29×), but **after is faster in
every one of the 4 passes** and the **min values are tight at
~1.27–1.34× faster** (best observed: 42.0 ms vs 56.4 ms, ~25.5%
reduction). Output byte-identical, 1,228,186 bytes both sides.

For comparison, the StringBuilderWriter swap alone (commit 1) measures
~1.08–1.14× min; the cache + binary-search elimination + childIndent
hoist (commit 2) lifts that to ~1.27–1.34× min.

## Test plan

- [x] `./mill __.reformat`
- [x] `./mill 'sjsonnet.jvm[3.3.7]'.test` — 519/519 pass
- [x] Scala Native A/B hyperfine — 4 interleaved-order passes, all
positive; output byte-identical

---

> Rebased onto current `master` (`fc292fa6`). The companion commit
"speed up manifest JSON rendering" was merged separately as #879, so
this PR now contains only the TomlRenderer / ManifestModule changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants