PCF-DCP v1.0: specification + reference implementation (pcf-dcp)#16
Merged
Conversation
Introduce specs/PCF-DCP-spec-v1.0.txt, an application-level profile over PCF v1.0 that adds inner partitions which can grow, shrink, and be mutated in the middle without relocating neighbours. Key elements: - DCP_CONTAINER partition type 0xAAAC0001; arena addressed by relative offsets, with a 24-byte DCP Header (bump-pointer allocator, derived free space). - Inner partitions listed by reused PCF Table Blocks/Entries; content described by per-partition Fragment Tables of variable-length extents (18-byte fixed Fragment Entry). - data_hash committed over logical (reconstructed) content, making it invariant under fragmentation, sharing, relocation, and promotion. - Promotion invariant: the six fields preserved by promotion/demotion equal the fields a PCF-SIG signature protects; inner partitions are signable in place via a reader-side uid-scope extension. - Optional deduplication via shared extents, with a per-extent SHARED flag enforcing safe copy-on-write while allowing cheap in-place edits of private extents; sharing-preserving mark-and-sweep compaction. - Byte-exact 700-byte test vector (verified to parse as valid PCF v1.0 and to round-trip its embedded hex dump) demonstrating fragmentation and a shared extent. https://claude.ai/code/session_01XzcjWWbNiuNX9ZywevfbQu
…inear-partitions-vSXIn
Implement the reference reader/writer for the PCF-DCP profile (Dynamic Container Partition), layered strictly above PCF v1.0 exactly as pcf-sig and pfs-ms are. A DCP container is one opaque PCF partition whose bytes are an arena: a 24-byte DCP Header, a chain of reused PCF Table Blocks listing inner partitions, a Fragment Table per inner partition, and the data extents those fragments name. Inner content is the concatenation of DATA extents, and each inner data_hash covers that logical content, so fragmentation, deduplication, compaction, and promotion all leave the hash (and any PCF-SIG signature over it) unchanged. New crate reference/PCF-DCP-v1.0 (pcf-dcp): - arena: in-memory model with a byte pool plus fragment lists; content- defined deduplication (intra- and cross-partition), copy-on-write edits (append/insert/overwrite/delete/truncate via fragment splitting), mark-and-sweep compaction that normalises the SHARED flag, and a canonical serialiser that reproduces the spec's Section 17 layout. - reader: DcpReader over pcf::Container, so trailer-mode host files read transparently; full DCP-aware verify (PCF integrity, inner table_hash, reconstruction length + data_hash, no nesting, file-wide uid uniqueness). - writer: whole-file model emitting a fresh canonical PCF image; promotion (dynamic->fixed, a MOVE preserving uid + data_hash), demotion, dedup, defrag, and optional trailer-mode finalisation. - vector + example + dcp CLI (info/dedup/defrag/promote/demote, --trailer); every mutating command re-verifies. - tests: byte-exact 700-byte Section 17 vector, spec conformance, round-trips, and error paths (34 tests). testdata/canonical.bin committed. pcf-debug: add a DcpContainerDecoder plugin (renders DCP Header, inner table, fragment tables with SHARED flags, extent summary) and a decode_dcp test, mirroring the PCF-SIG decoder. pcf-debug now depends on pcf-dcp. Wiring: add the crate to the workspace; add a dedicated pcf-dcp CI job (fmt/clippy/build/test + 700-byte vector assertion); publish pcf-dcp before pcf-debug in release.yml; bump/pin pcf-dcp in release-prepare.yml. Spec: add Section 2.2 "Compatibility with the PCF File Trailer". Fix the Section 17 hex dump's profile_version_minor byte at file offset 0x00F0 (01 -> 00): the field is semantically 0 for v1.0 (matching Section 6, the field label, and the const), and no hash covers it, so the file is still 700 bytes and all hashes verify. The reference generator now reproduces the corrected dump byte-for-byte. https://claude.ai/code/session_01XzcjWWbNiuNX9ZywevfbQu
The DcpContainerDecoder showed a DCP container's structure and the metadata of its inner partitions, but never interpreted their content. Add a generic recursion mechanism so container decoders can have what they hold decoded: - New optional `PartitionDecoder::children(meta, data) -> Vec<DecodedChild>` (default empty) plus `DecoderRegistry::children` (first-match, mirrors `decode`). `DecodedChild` carries a sub-partition's type/uid/label and its reconstructed logical content. - `decode_recursive` / `attach_inner_decodes` in lib.rs decode a partition, then decode each child recursively and nest the results under a "decoded inner partitions" group (child titled `content[label] -> FORMAT`, with the child's own warnings preserved as a sub-group). Guarded by MAX_DECODE_DEPTH. build_report and the `decode` subcommand (filter_decode) use it, so nesting applies to text, HTML, and forced-decoder output alike. - DcpContainerDecoder implements `children` via pcf_dcp::Arena (parse + per inner content reconstruction); malformed arenas or non-reconstructable inners are skipped defensively. The mechanism is profile-agnostic: lib.rs gains no pcf-dcp dependency, and any future container-like decoder gets recursion for free. Renderers and the `Report.decoded` key are unchanged — the nested group flows through the existing field-tree renderer. Tests: nested decode of the canonical vector (content[A]/content[B] -> RAW), routing of a recognizable inner format (PFS_NODE) through the registry, and that leaf partitions report no children. Existing decode_dcp assertions still hold (the new group uses a distinct name prefix; inner warnings nest under the child, not the container). https://claude.ai/code/session_01XzcjWWbNiuNX9ZywevfbQu
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the PCF-DCP profile — Dynamic Container Partition — to PCF v1.0: a new
partition type (
0xAAAC0001) whose bytes are an arena of inner partitionsthat can grow, shrink, be edited in the middle, and share/deduplicate extents,
all without relocating neighbours. Layered strictly above PCF exactly as
pcf-sigandpfs-msare: a DCP file is always a conforming PCF v1.0 file, anda generic PCF reader sees one opaque partition.
This PR now contains both the specification (
specs/PCF-DCP-spec-v1.0.txt)and the new reference implementation, tooling, and CI/release wiring.
What's inside an arena
"PDCP"magic, profile version,inner_table_offset,arena_used(bump pointer).start_offset→ the partition's Fragment Table,max_length=used_bytes.(offset, length, kind, flags). Logical content = concatenation of DATA extents.Each inner partition's
data_hashcovers its logical content, sofragmentation, dedup, compaction, and promotion all leave the hash — and any
PCF-SIG signature over it — unchanged.
New crate
reference/PCF-DCP-v1.0(pcf-dcp)SHAREDflag, and a canonical serialiser reproducing the spec's Section 17 layout.DcpReaderoverpcf::Container, so trailer-mode host files read transparently; full DCP-awareverify(PCF integrity, innertable_hash, reconstruction length +data_hash, no nesting, file-wide uid uniqueness).data_hash), demotion, dedup, defrag, optional trailer finalisation.dcpCLI —info/dedup/defrag/promote/demote(--trailer); every mutating command re-verifies before writing.gen_testvector, README, andtestdata/canonical.bin.Tooling
pcf-debuggains aDcpContainerDecoderplugin (renders DCP Header, innertable with reinterpreted fields, fragment tables with
SHAREDflags, and anextent summary) plus a
decode_dcptest, mirroring the PCF-SIG decoder.pcf-debugnow depends onpcf-dcp.Spec change worth a look
The Section 17 hex dump had
profile_version_minor = 01at file offset0x00F0, contradicting Section 6, the field label, and the constant (all0).No hash covers that byte, so the file is still 700 bytes and every hash
verifies — it was a typo. The reference generator emits the correct
00, andthe dump is corrected to match byte-for-byte. Also adds Section 2.2
"Compatibility with the PCF File Trailer".
CI / release
pcf-dcpCI job (fmt / clippy / build / test + a 700-byte test-vector assertion), mirroring thepcf-debug/pcf-compactjobs.release.ymlpublishespcf-dcpbeforepcf-debug(which now depends on it).release-prepare.ymlbumps + pinspcf-dcpin lockstep0.0.8.Verification (local)
cargo build / test / clippy / fmt --workspace— all green.cargo run -p pcf-dcp --example gen_testvector→ 700 bytes, 0 diffs vs the corrected Section 17 dump.cargo publish -p pcf-dcp --dry-run— packages and verifies against the publishedpcf 0.0.8.https://claude.ai/code/session_01XzcjWWbNiuNX9ZywevfbQu