feat(build): migrate project build and docs support#20
Conversation
|
Thanks @ChaomingZhangCN, @Eyizoha for your previous contributions to paimon-cpp. This migration PR carries that work forward into Apache Paimon C++. |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR bootstraps the C++ Paimon repository with a full build system (CMake), CI helper scripts, third-party dependencies (xxHash + CRoaring), and a comprehensive Sphinx documentation site (user guide + API reference).
Changes:
- Add vendored third-party libraries (xxHash, CRoaring) and wire them into the top-level CMake build.
- Introduce offline dependency download metadata/scripts (
third_party/versions.txt,download_dependencies.sh). - Add Sphinx documentation (site structure, theme overrides, user guide pages, API pages) plus doc build tooling.
Reviewed changes
Copilot reviewed 64 out of 70 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| third_party/xxhash/xxhash.c | Adds xxHash implementation translation unit for static build integration. |
| third_party/xxhash/CMakeLists.txt | Defines static library target for xxHash. |
| third_party/versions.txt | Centralizes third-party version pins + tarball URLs/checksums. |
| third_party/roaring_bitmap/CMakeLists.txt | Defines static library target for CRoaring and local compile flags. |
| third_party/download_dependencies.sh | Adds offline dependency downloader with SHA-256 verification. |
| docs/source/user_guide/write.rst | New user guide content for write path concepts and contracts. |
| docs/source/user_guide/snapshot.rst | New snapshot format documentation. |
| docs/source/user_guide/schema.rst | New schema format + evolution notes. |
| docs/source/user_guide/read.rst | New read path + schema evolution behavior docs. |
| docs/source/user_guide/primary_key_table.rst | New PK table conceptual documentation. |
| docs/source/user_guide/prefetch.rst | New prefetch design documentation plus diagram reference. |
| docs/source/user_guide/manifest.rst | New manifest format documentation. |
| docs/source/user_guide/global_index.rst | New global index overview and configuration docs. |
| docs/source/user_guide/data_types.rst | New Java-to-Arrow type mapping documentation. |
| docs/source/user_guide/compaction.rst | New compaction behavior and tuning documentation. |
| docs/source/user_guide/commit.rst | New commit path documentation. |
| docs/source/user_guide/clean.rst | New cleanup feature documentation. |
| docs/source/user_guide/catalog.rst | New catalog overview documentation. |
| docs/source/user_guide/arrow.rst | New Arrow / memory format documentation. |
| docs/source/user_guide/append_only_table.rst | New append-only table overview documentation. |
| docs/source/user_guide.rst | Adds user guide toctree structure. |
| docs/source/index.rst | Adds documentation landing page and site entry toctree. |
| docs/source/getting_started.rst | Adds getting-started section and toctree. |
| docs/source/examples/write_commit_scan_read.rst | Adds example index page for read/write demo. |
| docs/source/examples/index.rst | Adds examples toctree. |
| docs/source/examples/clean.rst | Adds cleanup example index page. |
| docs/source/documentations.rst | Adds doc build instructions (Doxygen + Sphinx). |
| docs/source/conf.py | Adds Sphinx configuration (theme/extensions/context). |
| docs/source/building.rst | Adds build instructions and dependency source guidance. |
| docs/source/build_system.rst | Adds CMake integration guidance for downstream consumers. |
| docs/source/basic_concepts.rst | Adds basic concepts documentation (layout/consistency). |
| docs/source/api/write.rst | Adds API reference page for write-related classes. |
| docs/source/api/scan.rst | Adds API reference page for scan-related classes. |
| docs/source/api/read.rst | Adds API reference page for read-related classes. |
| docs/source/api/predicate.rst | Adds API reference page for predicates. |
| docs/source/api/memory.rst | Adds API reference page for memory utilities/types. |
| docs/source/api/io.rst | Adds API reference page for IO utilities/types. |
| docs/source/api/global_index.rst | Adds API reference page for global index types. |
| docs/source/api/file_system.rst | Adds API reference page for filesystem interfaces. |
| docs/source/api/file_index.rst | Adds API reference page for file index interfaces. |
| docs/source/api/file_format.rst | Adds API reference page for file formats. |
| docs/source/api/executor.rst | Adds API reference page for executor interfaces. |
| docs/source/api/defs.rst | Adds API reference page for options/definitions. |
| docs/source/api/data_types.rst | Adds API reference page for core data types. |
| docs/source/api/commit.rst | Adds API reference page for commit interfaces/types. |
| docs/source/api/clean.rst | Adds API reference page for orphan cleanup. |
| docs/source/api/catalog.rst | Adds API reference page for catalog interfaces. |
| docs/source/api.rst | Adds top-level API reference toctree. |
| docs/source/_static/versions.json | Adds doc version-switcher metadata. |
| docs/source/_static/theme_overrides.css | Adds theme customizations for Sphinx site. |
| docs/source/_static/prefetch.svg | Adds prefetch architecture diagram. |
| docs/requirements.txt | Adds Python requirements for building docs. |
| docs/make.bat | Adds Windows Sphinx build entrypoint. |
| docs/code-style.md | Updates coding conventions with fixed-width integer guidance. |
| docs/Makefile | Adds Makefile build entrypoint for Sphinx docs. |
| docs/.gitignore | Ignores documentation build output. |
| ci/scripts/setup_ccache.sh | CI helper to configure ccache env vars. |
| ci/scripts/build_paimon.sh | CI helper to configure/build/test with Ninja + optional sanitizers. |
| PaimonConfig.cmake.in | Adds CMake package config template for installed targets. |
| NOTICE | Updates third-party notices (xxHash, CRoaring) and formatting. |
| LICENSE | Adds license attributions for new third-party code. |
| CMakeLists.txt | Adds root CMake build, options, toolchain integration, installs, and subdirs. |
| .github/.rat-excludes | Adds RAT exclude patterns for generated/third_party content. |
| .gitattributes | Adds Git LFS patterns for large binaries/test data. |
Comments suppressed due to low confidence (8)
third_party/roaring_bitmap/CMakeLists.txt:1
- add_compile_options() applies to the directory scope and can unintentionally affect other targets when this directory is added via add_subdirectory(). Prefer scoping the warning suppression to only the roaring_bitmap target (e.g., via target_compile_options(roaring_bitmap PRIVATE ...)) to avoid leaking build flags.
third_party/versions.txt:1 - The mirror prefix variable is named THIRDPARTY_MIRROR_URL here, but the build docs refer to PAIMON_THIRDPARTY_MIRROR_URL. As-is, users setting PAIMON_THIRDPARTY_MIRROR_URL won’t affect downloads. Align the variable name (or source PAIMON_THIRDPARTY_MIRROR_URL into THIRDPARTY_MIRROR_URL) so the documented env var actually works.
third_party/versions.txt:1 - The mirror prefix variable is named THIRDPARTY_MIRROR_URL here, but the build docs refer to PAIMON_THIRDPARTY_MIRROR_URL. As-is, users setting PAIMON_THIRDPARTY_MIRROR_URL won’t affect downloads. Align the variable name (or source PAIMON_THIRDPARTY_MIRROR_URL into THIRDPARTY_MIRROR_URL) so the documented env var actually works.
docs/source/basic_concepts.rst:1 - This statement conflicts with the presence of a full compaction user guide (docs/source/user_guide/compaction.rst) and other docs describing compaction behavior. Update this note to reflect the actual current capability (or clarify which compaction modes are unsupported) to avoid misleading users.
docs/source/user_guide/prefetch.rst:1 - These headings use the same top-level underline style ("=") as the document title, which makes them peer sections rather than subsections of "Prefetch" and can produce an odd ToC structure in Sphinx. Use a lower-level adornment (e.g., "-" and "~") for subsections to keep the hierarchy consistent.
docs/source/user_guide/prefetch.rst:1 - These headings use the same top-level underline style ("=") as the document title, which makes them peer sections rather than subsections of "Prefetch" and can produce an odd ToC structure in Sphinx. Use a lower-level adornment (e.g., "-" and "~") for subsections to keep the hierarchy consistent.
docs/source/user_guide/catalog.rst:1 - Fix grammar: 'support one types of metastores filesystem metastore' is ungrammatical; consider rephrasing to 'supports one type of metastore: filesystem (default)' for clarity.
docs/source/conf.py:1 - html_sidebars keys are docnames (source file basenames). In this tree, the landing page is index.rst (with a label named 'implementations'), so the 'implementations' entry likely won’t apply. If the intent is to hide sidebars on the landing page, use 'index' as the key (and remove/adjust 'status' unless a status.rst exists).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| mkdir ${build_dir} | ||
| pushd ${build_dir} |
|
|
||
| popd | ||
|
|
||
| rm -rf ${build_dir} |
| INSTALL_DESTINATION lib/cmake/Paimon) | ||
|
|
||
| install(FILES "${CMAKE_CURRENT_BINARY_DIR}/PaimonConfig.cmake" | ||
| "${CMAKE_CURRENT_BINARY_DIR}/PaimonConfigVersion.cmake" | ||
| DESTINATION lib/cmake/Paimon) |
| # Main library | ||
| add_library(paimon_shared SHARED IMPORTED) | ||
| set_target_properties(paimon_shared PROPERTIES | ||
| IMPORTED_LOCATION "@CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_LIBDIR@/libpaimon.so" | ||
| INTERFACE_INCLUDE_DIRECTORIES "@CMAKE_INSTALL_PREFIX@/include" | ||
| ) | ||
| add_library(paimon_static STATIC IMPORTED) | ||
| set_target_properties(paimon_static PROPERTIES | ||
| IMPORTED_LOCATION "@CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_LIBDIR@/libpaimon.a" | ||
| INTERFACE_INCLUDE_DIRECTORIES "@CMAKE_INSTALL_PREFIX@/include" | ||
| ) | ||
|
|
||
| # paimon_parquet_file_format | ||
| add_library(paimon_parquet_file_format_shared SHARED IMPORTED) | ||
| set_target_properties(paimon_parquet_file_format_shared PROPERTIES | ||
| IMPORTED_LOCATION "@CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_LIBDIR@/libpaimon_parquet_file_format.so" | ||
| INTERFACE_INCLUDE_DIRECTORIES "@CMAKE_INSTALL_PREFIX@/include" | ||
| ) | ||
| add_library(paimon_parquet_file_format_static STATIC IMPORTED) | ||
| set_target_properties(paimon_parquet_file_format_static PROPERTIES | ||
| IMPORTED_LOCATION "@CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_LIBDIR@/libpaimon_parquet_file_format.a" | ||
| INTERFACE_INCLUDE_DIRECTORIES "@CMAKE_INSTALL_PREFIX@/include" | ||
| ) | ||
|
|
||
| # paimon_orc_file_format | ||
| add_library(paimon_orc_file_format_shared SHARED IMPORTED) | ||
| set_target_properties(paimon_orc_file_format_shared PROPERTIES | ||
| IMPORTED_LOCATION "@CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_LIBDIR@/libpaimon_orc_file_format.so" | ||
| INTERFACE_INCLUDE_DIRECTORIES "@CMAKE_INSTALL_PREFIX@/include" | ||
| ) | ||
| add_library(paimon_orc_file_format_static STATIC IMPORTED) | ||
| set_target_properties(paimon_orc_file_format_static PROPERTIES | ||
| IMPORTED_LOCATION "@CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_LIBDIR@/libpaimon_orc_file_format.a" | ||
| INTERFACE_INCLUDE_DIRECTORIES "@CMAKE_INSTALL_PREFIX@/include" |
92b7f7e to
11e95a0
Compare
Thanks for carrying this forward! Happy to see the work continuing under Apache Paimon. |
leaves12138
left a comment
There was a problem hiding this comment.
Thanks for adding the build and docs scaffolding. I think the install/package support needs to be fixed before merging: the generated package config hard-codes the build-time install prefix and Linux library suffixes, and it manually advertises static/shared/optional imported targets even when those artifacts were not installed for the selected build options. This can make find_package(Paimon) resolve to stale paths or expose targets that downstream consumers cannot link. Please switch this to a relocatable CMake package using @PACKAGE_INIT@ / PACKAGE_PREFIX_DIR and exported installed targets (for example via install(EXPORT ...)), and only expose targets that are actually installed.
11e95a0 to
0df2b78
Compare
leaves12138
left a comment
There was a problem hiding this comment.
Thanks for the update. I re-ran a configure smoke test on the current head and the root CMake build still cannot configure: CMakeLists.txt adds several subdirectories whose CMakeLists.txt files or directories are not present in this PR, so cmake -S . -B ... stops during configuration before anything can be built. The earlier comments on ci/scripts/build_paimon.sh are also still unresolved (mkdir ${build_dir}, pushd ${build_dir}, cmake ... ${source_dir}, and rm -rf ${build_dir} are still unquoted, and mkdir still lacks -p). Please fix these before merging.
|
|
||
| config_summary_message() | ||
|
|
||
| add_subdirectory(src/paimon) |
There was a problem hiding this comment.
This root CMake file does not configure on the current PR head. It adds src/paimon and several nested directories, but those directories either do not exist or do not contain a CMakeLists.txt in this PR. A local smoke test with cmake -S . -B /tmp/paimon-cpp-pr20-cmake-smoke -DPAIMON_ENABLE_AVRO=OFF -DPAIMON_ENABLE_ORC=OFF -DPAIMON_BUILD_TESTS=OFF fails immediately at this add_subdirectory block. Please add the missing subdirectory CMake files / correct the paths, and make sure a clean configure passes.
leaves12138
left a comment
There was a problem hiding this comment.
Thanks for the update. The quoting / mkdir -p issues in ci/scripts/build_paimon.sh are fixed now, but the PR still cannot build from a clean checkout because the repository root no longer contains CMakeLists.txt. The new build script runs cmake ... "${source_dir}", so it fails immediately with The source directory ... does not appear to contain CMakeLists.txt. Please keep/add the top-level CMakeLists.txt (or change the build entrypoint to the actual CMake source directory) and verify bash ci/scripts/build_paimon.sh <checkout> false false Debug succeeds from a clean checkout before merging.
leaves12138
left a comment
There was a problem hiding this comment.
Thanks for the update. The previous build-script quoting issue is gone because ci/scripts/build_paimon.sh is no longer in this PR, but the package/example/docs pieces are still inconsistent with the current tree: this PR adds PaimonConfig.cmake.in, BuildUtils.cmake, examples/CMakeLists.txt, and docs that instruct users to find_package(Paimon), while it does not add any top-level CMake/install/export path that can generate or install PaimonConfig.cmake, PaimonTargets.cmake, or the referenced Paimon::* targets. In a clean checkout, the repository root has no CMakeLists.txt, and the examples cannot resolve Paimon from this PR. Please either include the actual package generation/install/export build entrypoint in the same PR, or remove/defer the package config, examples, and docs that depend on it until the build system lands.
Purpose
Linked issue: close #xxx
Migrate documentation, ccache CI helper, examples, package-config scaffolding, and selected third-party source support from the Alibaba-origin C++ repository.
This PR includes:
PaimonConfig.cmake.in.gitattributes,.github/.rat-excludesci/docs/examples/xxhashandroaring_bitmapthird_party/download_dependencies.shandthird_party/versions.txtLICENSEandNOTICEupdates for migrated third-party codeThis PR intentionally defers the root
CMakeLists.txtandci/scripts/build_paimon.shmigrations until the required source subdirectory CMake files are migrated together in a later batch.This PR intentionally does not include
third_party/boost/boost_1_66_0.tar.gz. It also skipsthird_party/lance/,third_party/lumina/, andthird_party/jindosdk-nextarch/because those need separate third-party license declaration review before migration.Tests
git diff --check HEAD~1..HEADcmake-format --check examples/CMakeLists.txtclang-format --dry-run --Werror examples/clean_demo.cpp examples/read_write_demo.cpppython3 /home/jinli.zjw/.codex/skills/paimon-cpp-migrate/scripts/check_migration_batch.py --files examples/CMakeLists.txt examples/clean_demo.cpp examples/read_write_demo.cpppython3 /home/jinli.zjw/.codex/skills/paimon-cpp-migrate/scripts/analyze_external_contributors.py --files examples/CMakeLists.txt examples/clean_demo.cpp examples/read_write_demo.cppFull root CMake configure is intentionally deferred because this PR no longer includes the root
CMakeLists.txt.API and Format
No public API or storage format changes.
Documentation
Adds the migrated Sphinx documentation tree under
docs/, updatesdocs/code-style.md, and updates build-system examples to use exported CMake package targets.Generative AI tooling
Migrate-by: OpenAI Codex