Skip to content

perf: use unsynchronized StringBuilderWriter in std.deepJoin#889

Merged
stephenamar-db merged 1 commit into
databricks:masterfrom
He-Pin:perf/deepjoin-stringbuilder-writer
Jun 3, 2026
Merged

perf: use unsynchronized StringBuilderWriter in std.deepJoin#889
stephenamar-db merged 1 commit into
databricks:masterfrom
He-Pin:perf/deepjoin-stringbuilder-writer

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Jun 3, 2026

Motivation

std.deepJoin writes each Val.Str chunk into a java.io.StringWriter inside a tight loop. StringWriter's backing StringBuffer pays a monitor enter/exit on every write/append call, which on a typical deepJoin walk over a deeply nested array can be hundreds of thousands of synchronized writes — wasted overhead in single-threaded jsonnet evaluation.

TomlRenderer and FastMaterializeJsonRenderer already use the unsynchronized package-private StringBuilderWriter for the same reason (#874, #875). std.deepJoin was explicitly left as a follow-up in #875's description ("std.deepJoin keeps StringWriter (separate concern)") — this PR is that follow-up.

Modification

Single change in ManifestModule.scala: swap the new StringWriter() in DeepJoin.evalRhs for new StringBuilderWriter(). No other code changes; output is byte-identical.

Result

Scala Native, hyperfine A/B against master (fc292fa6). Workload: a 50,000-row array of 10 pre-allocated strings → 2 MB of deepJoin output, render-dominated. Four interleaved-order passes, --warmup 10 --min-runs 100 --shell=none:

pass order before mean after mean before min after min min ratio
1 before → after 35.1 ± 16.5 ms 32.2 ± 19.1 ms 23.1 ms 18.7 ms 1.24×
2 after → before 43.7 ± 30.6 ms 29.9 ± 25.3 ms 25.7 ms 20.3 ms 1.27×
3 before → after 30.3 ± 8.5 ms 29.5 ± 7.1 ms 24.6 ms 20.8 ms 1.18×
4 after → before 32.6 ± 7.6 ms 28.0 ± 6.8 ms 24.0 ms 20.7 ms 1.16×

After is faster in every one of the 4 passes; mean is noisy on the host but min values are tight at 1.16–1.27× faster (best observed 18.7 vs 23.1 ms, ~19% reduction). Output byte-identical (2,000,000 bytes both sides).

Test plan

  • ./mill __.reformat
  • ./mill 'sjsonnet.jvm[3.3.7]'.test — 519/519 pass
  • Scala Native A/B hyperfine — 4 interleaved-order passes, all positive; output byte-identical

Independent of #875; can land in either order. After both land, the import java.io.StringWriter in ManifestModule.scala can be removed in a small cleanup.

Motivation:

`std.deepJoin` writes each Val.Str chunk into a `java.io.StringWriter` inside a
tight loop. StringWriter's backing `StringBuffer` pays a monitor enter/exit on
every `write`/`append` call, which on a typical deepJoin walk over a deeply
nested array can be hundreds of thousands of synchronized writes.

`TomlRenderer` and `FastMaterializeJsonRenderer` already use the unsynchronized
package-private `StringBuilderWriter` for the same reason (see databricks#874, databricks#875).
deepJoin was explicitly left as a follow-up in databricks#875's description ("std.deepJoin
keeps StringWriter (separate concern)") — this is that follow-up.

Modification:

Swap the single `new StringWriter()` in `DeepJoin.evalRhs` for
`new StringBuilderWriter()`. No other changes; output is byte-identical.

Result:

Scala Native hyperfine, A/B against master (`fc292fa6`). Workload: a 50000-row
array of 10-string rows → 2 MB of deepJoin output, render-dominated. Four
interleaved-order passes (`--warmup 10 --min-runs 100 --shell=none`):

| pass | order | before mean | after mean | before min | after min | min ratio |
|---|---|---:|---:|---:|---:|---:|
| 1 | before → after | 35.1 ± 16.5 ms | 32.2 ± 19.1 ms | 23.1 ms | 18.7 ms | 1.24x |
| 2 | after → before | 43.7 ± 30.6 ms | 29.9 ± 25.3 ms | 25.7 ms | 20.3 ms | 1.27x |
| 3 | before → after | 30.3 ±  8.5 ms | 29.5 ±  7.1 ms | 24.6 ms | 20.8 ms | 1.18x |
| 4 | after → before | 32.6 ±  7.6 ms | 28.0 ±  6.8 ms | 24.0 ms | 20.7 ms | 1.16x |

After is faster in every one of the 4 passes; min values are tight at
1.16-1.27x faster. Output byte-identical (2,000,000 bytes both sides).
@stephenamar-db stephenamar-db merged commit e4d44ce into databricks:master Jun 3, 2026
5 checks passed
@He-Pin He-Pin deleted the perf/deepjoin-stringbuilder-writer branch June 3, 2026 19:26
He-Pin added a commit to He-Pin/sjsonnet that referenced this pull request Jun 4, 2026
… stdlib

Motivation:
Continues the synchronous StringWriter elimination started by PR databricks#875
(TomlRenderer), databricks#889 (std.deepJoin), and the existing commit in this PR
(escapeStringJson/Python). Remaining hot paths still allocate a
java.io.StringWriter with its synchronized StringBuffer.

Modification:
- StringBuilderWriter: expose getBuilder for direct StringBuilder access
  (YamlRenderer trims trailing spaces via length/charAt/setLength).
- Renderer / PythonRenderer: default `out` constructor parameter changes
  from java.io.StringWriter to StringBuilderWriter. Format.scala's
  complex-type Renderer paths and Val.Obj.renderString automatically
  benefit via the new default.
- YamlRenderer: full visitor type-parameter migration from StringWriter
  to StringBuilderWriter; outBuffer switches from StringBuffer to
  StringBuilder (same length/setLength/charAt interface).
- PrettyYamlRenderer: default `out` parameter to StringBuilderWriter;
  also drops a dead StringWriter allocation in the quoted-string branch
  (the writer's result was never consumed — `quotedStr` is used instead).
- std.escapeStringXML: replace internal StringWriter with
  StringBuilderWriter, pre-sized to input length + 16.
- bench: MaterializerBenchmark switches its renderWith helper to
  StringBuilderWriter to match the new constructor signatures.

Result:
- Eliminates StringBuffer monitor acquisition on every char/string write
  across all remaining renderers (Yaml/PrettyYaml/Python/Json default
  paths) and std.escapeStringXML.
- Removes one dead StringWriter allocation per quoted-string in
  PrettyYamlRenderer.
- All JVM tests pass (sjsonnet.jvm[3.3.7].test).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants