Skip to content

PCF-DCP v1.0: specification + reference implementation (pcf-dcp)#16

Merged
kduma merged 4 commits into
masterfrom
claude/dynamic-nonlinear-partitions-vSXIn
Jun 8, 2026
Merged

PCF-DCP v1.0: specification + reference implementation (pcf-dcp)#16
kduma merged 4 commits into
masterfrom
claude/dynamic-nonlinear-partitions-vSXIn

Conversation

@kduma

@kduma kduma commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds the PCF-DCP profile — Dynamic Container Partition — to PCF v1.0: a new
partition type (0xAAAC0001) whose bytes are an arena of inner partitions
that can grow, shrink, be edited in the middle, and share/deduplicate extents,
all without relocating neighbours. Layered strictly above PCF exactly as
pcf-sig and pfs-ms are: a DCP file is always a conforming PCF v1.0 file, and
a generic PCF reader sees one opaque partition.

This PR now contains both the specification (specs/PCF-DCP-spec-v1.0.txt)
and the new reference implementation, tooling, and CI/release wiring.

What's inside an arena

[ DCP Header (24 B) | data extents | Fragment Tables | Inner Table Block(s) ]
  • DCP Header"PDCP" magic, profile version, inner_table_offset, arena_used (bump pointer).
  • Inner Table Block — a chain of reused PCF Table Blocks (74 B + 141 B entries), byte-for-byte the top-level table, listing inner partitions. Two entry fields are reinterpreted: start_offset → the partition's Fragment Table, max_length = used_bytes.
  • Fragment Table — per inner partition, 9-byte block headers + 18-byte Fragment Entries naming extents (offset, length, kind, flags). Logical content = concatenation of DATA extents.

Each inner partition's data_hash covers its logical content, so
fragmentation, dedup, compaction, and promotion all leave the hash — and any
PCF-SIG signature over it — unchanged.

New crate reference/PCF-DCP-v1.0 (pcf-dcp)

  • arena — in-memory model (byte pool + fragment lists); content-defined deduplication (intra- and cross-partition), copy-on-write edits (append / insert / overwrite / delete / truncate via fragment splitting), mark-and-sweep compaction that normalises the SHARED flag, and a canonical serialiser reproducing the spec's Section 17 layout.
  • readerDcpReader over pcf::Container, so trailer-mode host files read transparently; full DCP-aware verify (PCF integrity, inner table_hash, reconstruction length + data_hash, no nesting, file-wide uid uniqueness).
  • writer — whole-file model emitting a fresh canonical PCF image; promotion (dynamic→fixed, a MOVE preserving uid + data_hash), demotion, dedup, defrag, optional trailer finalisation.
  • dcp CLIinfo / dedup / defrag / promote / demote (--trailer); every mutating command re-verifies before writing.
  • example gen_testvector, README, and testdata/canonical.bin.
  • 34 tests: byte-exact 700-byte Section 17 vector, spec conformance, round-trips, error paths.

Tooling

pcf-debug gains a DcpContainerDecoder plugin (renders DCP Header, inner
table with reinterpreted fields, fragment tables with SHARED flags, and an
extent summary) plus a decode_dcp test, mirroring the PCF-SIG decoder.
pcf-debug now depends on pcf-dcp.

Spec change worth a look

The Section 17 hex dump had profile_version_minor = 01 at file offset
0x00F0, contradicting Section 6, the field label, and the constant (all 0).
No hash covers that byte, so the file is still 700 bytes and every hash
verifies — it was a typo. The reference generator emits the correct 00, and
the dump is corrected to match byte-for-byte. Also adds Section 2.2
"Compatibility with the PCF File Trailer"
.

CI / release

  • Dedicated pcf-dcp CI job (fmt / clippy / build / test + a 700-byte test-vector assertion), mirroring the pcf-debug / pcf-compact jobs.
  • release.yml publishes pcf-dcp before pcf-debug (which now depends on it).
  • release-prepare.yml bumps + pins pcf-dcp in lockstep 0.0.8.

Verification (local)

  • cargo build / test / clippy / fmt --workspace — all green.
  • cargo run -p pcf-dcp --example gen_testvector700 bytes, 0 diffs vs the corrected Section 17 dump.
  • cargo publish -p pcf-dcp --dry-run — packages and verifies against the published pcf 0.0.8.

https://claude.ai/code/session_01XzcjWWbNiuNX9ZywevfbQu

claude added 3 commits June 6, 2026 17:41
Introduce specs/PCF-DCP-spec-v1.0.txt, an application-level profile over
PCF v1.0 that adds inner partitions which can grow, shrink, and be mutated
in the middle without relocating neighbours.

Key elements:
- DCP_CONTAINER partition type 0xAAAC0001; arena addressed by relative
  offsets, with a 24-byte DCP Header (bump-pointer allocator, derived free
  space).
- Inner partitions listed by reused PCF Table Blocks/Entries; content
  described by per-partition Fragment Tables of variable-length extents
  (18-byte fixed Fragment Entry).
- data_hash committed over logical (reconstructed) content, making it
  invariant under fragmentation, sharing, relocation, and promotion.
- Promotion invariant: the six fields preserved by promotion/demotion equal
  the fields a PCF-SIG signature protects; inner partitions are signable in
  place via a reader-side uid-scope extension.
- Optional deduplication via shared extents, with a per-extent SHARED flag
  enforcing safe copy-on-write while allowing cheap in-place edits of private
  extents; sharing-preserving mark-and-sweep compaction.
- Byte-exact 700-byte test vector (verified to parse as valid PCF v1.0 and to
  round-trip its embedded hex dump) demonstrating fragmentation and a shared
  extent.

https://claude.ai/code/session_01XzcjWWbNiuNX9ZywevfbQu
Implement the reference reader/writer for the PCF-DCP profile (Dynamic
Container Partition), layered strictly above PCF v1.0 exactly as pcf-sig
and pfs-ms are. A DCP container is one opaque PCF partition whose bytes are
an arena: a 24-byte DCP Header, a chain of reused PCF Table Blocks listing
inner partitions, a Fragment Table per inner partition, and the data
extents those fragments name. Inner content is the concatenation of DATA
extents, and each inner data_hash covers that logical content, so
fragmentation, deduplication, compaction, and promotion all leave the hash
(and any PCF-SIG signature over it) unchanged.

New crate reference/PCF-DCP-v1.0 (pcf-dcp):
- arena: in-memory model with a byte pool plus fragment lists; content-
  defined deduplication (intra- and cross-partition), copy-on-write edits
  (append/insert/overwrite/delete/truncate via fragment splitting),
  mark-and-sweep compaction that normalises the SHARED flag, and a
  canonical serialiser that reproduces the spec's Section 17 layout.
- reader: DcpReader over pcf::Container, so trailer-mode host files read
  transparently; full DCP-aware verify (PCF integrity, inner table_hash,
  reconstruction length + data_hash, no nesting, file-wide uid uniqueness).
- writer: whole-file model emitting a fresh canonical PCF image; promotion
  (dynamic->fixed, a MOVE preserving uid + data_hash), demotion, dedup,
  defrag, and optional trailer-mode finalisation.
- vector + example + dcp CLI (info/dedup/defrag/promote/demote, --trailer);
  every mutating command re-verifies.
- tests: byte-exact 700-byte Section 17 vector, spec conformance,
  round-trips, and error paths (34 tests). testdata/canonical.bin committed.

pcf-debug: add a DcpContainerDecoder plugin (renders DCP Header, inner
table, fragment tables with SHARED flags, extent summary) and a decode_dcp
test, mirroring the PCF-SIG decoder. pcf-debug now depends on pcf-dcp.

Wiring: add the crate to the workspace; add a dedicated pcf-dcp CI job
(fmt/clippy/build/test + 700-byte vector assertion); publish pcf-dcp before
pcf-debug in release.yml; bump/pin pcf-dcp in release-prepare.yml.

Spec: add Section 2.2 "Compatibility with the PCF File Trailer". Fix the
Section 17 hex dump's profile_version_minor byte at file offset 0x00F0
(01 -> 00): the field is semantically 0 for v1.0 (matching Section 6, the
field label, and the const), and no hash covers it, so the file is still
700 bytes and all hashes verify. The reference generator now reproduces the
corrected dump byte-for-byte.

https://claude.ai/code/session_01XzcjWWbNiuNX9ZywevfbQu
@kduma kduma changed the title Add PCF-DCP v1.0 spec: dynamic container partition profile PCF-DCP v1.0: specification + reference implementation (pcf-dcp) Jun 7, 2026
The DcpContainerDecoder showed a DCP container's structure and the metadata
of its inner partitions, but never interpreted their content. Add a generic
recursion mechanism so container decoders can have what they hold decoded:

- New optional `PartitionDecoder::children(meta, data) -> Vec<DecodedChild>`
  (default empty) plus `DecoderRegistry::children` (first-match, mirrors
  `decode`). `DecodedChild` carries a sub-partition's type/uid/label and its
  reconstructed logical content.
- `decode_recursive` / `attach_inner_decodes` in lib.rs decode a partition,
  then decode each child recursively and nest the results under a "decoded
  inner partitions" group (child titled `content[label] -> FORMAT`, with the
  child's own warnings preserved as a sub-group). Guarded by MAX_DECODE_DEPTH.
  build_report and the `decode` subcommand (filter_decode) use it, so nesting
  applies to text, HTML, and forced-decoder output alike.
- DcpContainerDecoder implements `children` via pcf_dcp::Arena (parse + per
  inner content reconstruction); malformed arenas or non-reconstructable
  inners are skipped defensively.

The mechanism is profile-agnostic: lib.rs gains no pcf-dcp dependency, and any
future container-like decoder gets recursion for free. Renderers and the
`Report.decoded` key are unchanged — the nested group flows through the
existing field-tree renderer.

Tests: nested decode of the canonical vector (content[A]/content[B] -> RAW),
routing of a recognizable inner format (PFS_NODE) through the registry, and
that leaf partitions report no children. Existing decode_dcp assertions still
hold (the new group uses a distinct name prefix; inner warnings nest under the
child, not the container).

https://claude.ai/code/session_01XzcjWWbNiuNX9ZywevfbQu
@kduma kduma merged commit d1e87c3 into master Jun 8, 2026
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants