Skip to content

feat(build): migrate project build and docs support#20

Open
zjw1111 wants to merge 4 commits into
apache:mainfrom
zjw1111:migrate/build-support
Open

feat(build): migrate project build and docs support#20
zjw1111 wants to merge 4 commits into
apache:mainfrom
zjw1111:migrate/build-support

Conversation

@zjw1111
Copy link
Copy Markdown
Contributor

@zjw1111 zjw1111 commented May 26, 2026

Purpose

Linked issue: close #xxx

Migrate documentation, ccache CI helper, examples, package-config scaffolding, and selected third-party source support from the Alibaba-origin C++ repository.

This PR includes:

  • package config template: PaimonConfig.cmake.in
  • repository metadata: .gitattributes, .github/.rat-excludes
  • CI ccache setup helper under ci/
  • Sphinx documentation under docs/
  • example build and source files under examples/
  • third-party support files for xxhash and roaring_bitmap
  • third_party/download_dependencies.sh and third_party/versions.txt
  • corresponding LICENSE and NOTICE updates for migrated third-party code

This PR intentionally defers the root CMakeLists.txt and ci/scripts/build_paimon.sh migrations until the required source subdirectory CMake files are migrated together in a later batch.

This PR intentionally does not include third_party/boost/boost_1_66_0.tar.gz. It also skips third_party/lance/, third_party/lumina/, and third_party/jindosdk-nextarch/ because those need separate third-party license declaration review before migration.

Tests

  • git diff --check HEAD~1..HEAD
  • cmake-format --check examples/CMakeLists.txt
  • clang-format --dry-run --Werror examples/clean_demo.cpp examples/read_write_demo.cpp
  • python3 /home/jinli.zjw/.codex/skills/paimon-cpp-migrate/scripts/check_migration_batch.py --files examples/CMakeLists.txt examples/clean_demo.cpp examples/read_write_demo.cpp
  • python3 /home/jinli.zjw/.codex/skills/paimon-cpp-migrate/scripts/analyze_external_contributors.py --files examples/CMakeLists.txt examples/clean_demo.cpp examples/read_write_demo.cpp

Full root CMake configure is intentionally deferred because this PR no longer includes the root CMakeLists.txt.

API and Format

No public API or storage format changes.

Documentation

Adds the migrated Sphinx documentation tree under docs/, updates docs/code-style.md, and updates build-system examples to use exported CMake package targets.

Generative AI tooling

Migrate-by: OpenAI Codex

Copilot AI review requested due to automatic review settings May 26, 2026 09:18
@zjw1111
Copy link
Copy Markdown
Contributor Author

zjw1111 commented May 26, 2026

Thanks @ChaomingZhangCN, @Eyizoha for your previous contributions to paimon-cpp. This migration PR carries that work forward into Apache Paimon C++.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR bootstraps the C++ Paimon repository with a full build system (CMake), CI helper scripts, third-party dependencies (xxHash + CRoaring), and a comprehensive Sphinx documentation site (user guide + API reference).

Changes:

  • Add vendored third-party libraries (xxHash, CRoaring) and wire them into the top-level CMake build.
  • Introduce offline dependency download metadata/scripts (third_party/versions.txt, download_dependencies.sh).
  • Add Sphinx documentation (site structure, theme overrides, user guide pages, API pages) plus doc build tooling.

Reviewed changes

Copilot reviewed 64 out of 70 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
third_party/xxhash/xxhash.c Adds xxHash implementation translation unit for static build integration.
third_party/xxhash/CMakeLists.txt Defines static library target for xxHash.
third_party/versions.txt Centralizes third-party version pins + tarball URLs/checksums.
third_party/roaring_bitmap/CMakeLists.txt Defines static library target for CRoaring and local compile flags.
third_party/download_dependencies.sh Adds offline dependency downloader with SHA-256 verification.
docs/source/user_guide/write.rst New user guide content for write path concepts and contracts.
docs/source/user_guide/snapshot.rst New snapshot format documentation.
docs/source/user_guide/schema.rst New schema format + evolution notes.
docs/source/user_guide/read.rst New read path + schema evolution behavior docs.
docs/source/user_guide/primary_key_table.rst New PK table conceptual documentation.
docs/source/user_guide/prefetch.rst New prefetch design documentation plus diagram reference.
docs/source/user_guide/manifest.rst New manifest format documentation.
docs/source/user_guide/global_index.rst New global index overview and configuration docs.
docs/source/user_guide/data_types.rst New Java-to-Arrow type mapping documentation.
docs/source/user_guide/compaction.rst New compaction behavior and tuning documentation.
docs/source/user_guide/commit.rst New commit path documentation.
docs/source/user_guide/clean.rst New cleanup feature documentation.
docs/source/user_guide/catalog.rst New catalog overview documentation.
docs/source/user_guide/arrow.rst New Arrow / memory format documentation.
docs/source/user_guide/append_only_table.rst New append-only table overview documentation.
docs/source/user_guide.rst Adds user guide toctree structure.
docs/source/index.rst Adds documentation landing page and site entry toctree.
docs/source/getting_started.rst Adds getting-started section and toctree.
docs/source/examples/write_commit_scan_read.rst Adds example index page for read/write demo.
docs/source/examples/index.rst Adds examples toctree.
docs/source/examples/clean.rst Adds cleanup example index page.
docs/source/documentations.rst Adds doc build instructions (Doxygen + Sphinx).
docs/source/conf.py Adds Sphinx configuration (theme/extensions/context).
docs/source/building.rst Adds build instructions and dependency source guidance.
docs/source/build_system.rst Adds CMake integration guidance for downstream consumers.
docs/source/basic_concepts.rst Adds basic concepts documentation (layout/consistency).
docs/source/api/write.rst Adds API reference page for write-related classes.
docs/source/api/scan.rst Adds API reference page for scan-related classes.
docs/source/api/read.rst Adds API reference page for read-related classes.
docs/source/api/predicate.rst Adds API reference page for predicates.
docs/source/api/memory.rst Adds API reference page for memory utilities/types.
docs/source/api/io.rst Adds API reference page for IO utilities/types.
docs/source/api/global_index.rst Adds API reference page for global index types.
docs/source/api/file_system.rst Adds API reference page for filesystem interfaces.
docs/source/api/file_index.rst Adds API reference page for file index interfaces.
docs/source/api/file_format.rst Adds API reference page for file formats.
docs/source/api/executor.rst Adds API reference page for executor interfaces.
docs/source/api/defs.rst Adds API reference page for options/definitions.
docs/source/api/data_types.rst Adds API reference page for core data types.
docs/source/api/commit.rst Adds API reference page for commit interfaces/types.
docs/source/api/clean.rst Adds API reference page for orphan cleanup.
docs/source/api/catalog.rst Adds API reference page for catalog interfaces.
docs/source/api.rst Adds top-level API reference toctree.
docs/source/_static/versions.json Adds doc version-switcher metadata.
docs/source/_static/theme_overrides.css Adds theme customizations for Sphinx site.
docs/source/_static/prefetch.svg Adds prefetch architecture diagram.
docs/requirements.txt Adds Python requirements for building docs.
docs/make.bat Adds Windows Sphinx build entrypoint.
docs/code-style.md Updates coding conventions with fixed-width integer guidance.
docs/Makefile Adds Makefile build entrypoint for Sphinx docs.
docs/.gitignore Ignores documentation build output.
ci/scripts/setup_ccache.sh CI helper to configure ccache env vars.
ci/scripts/build_paimon.sh CI helper to configure/build/test with Ninja + optional sanitizers.
PaimonConfig.cmake.in Adds CMake package config template for installed targets.
NOTICE Updates third-party notices (xxHash, CRoaring) and formatting.
LICENSE Adds license attributions for new third-party code.
CMakeLists.txt Adds root CMake build, options, toolchain integration, installs, and subdirs.
.github/.rat-excludes Adds RAT exclude patterns for generated/third_party content.
.gitattributes Adds Git LFS patterns for large binaries/test data.
Comments suppressed due to low confidence (8)

third_party/roaring_bitmap/CMakeLists.txt:1

  • add_compile_options() applies to the directory scope and can unintentionally affect other targets when this directory is added via add_subdirectory(). Prefer scoping the warning suppression to only the roaring_bitmap target (e.g., via target_compile_options(roaring_bitmap PRIVATE ...)) to avoid leaking build flags.
    third_party/versions.txt:1
  • The mirror prefix variable is named THIRDPARTY_MIRROR_URL here, but the build docs refer to PAIMON_THIRDPARTY_MIRROR_URL. As-is, users setting PAIMON_THIRDPARTY_MIRROR_URL won’t affect downloads. Align the variable name (or source PAIMON_THIRDPARTY_MIRROR_URL into THIRDPARTY_MIRROR_URL) so the documented env var actually works.
    third_party/versions.txt:1
  • The mirror prefix variable is named THIRDPARTY_MIRROR_URL here, but the build docs refer to PAIMON_THIRDPARTY_MIRROR_URL. As-is, users setting PAIMON_THIRDPARTY_MIRROR_URL won’t affect downloads. Align the variable name (or source PAIMON_THIRDPARTY_MIRROR_URL into THIRDPARTY_MIRROR_URL) so the documented env var actually works.
    docs/source/basic_concepts.rst:1
  • This statement conflicts with the presence of a full compaction user guide (docs/source/user_guide/compaction.rst) and other docs describing compaction behavior. Update this note to reflect the actual current capability (or clarify which compaction modes are unsupported) to avoid misleading users.
    docs/source/user_guide/prefetch.rst:1
  • These headings use the same top-level underline style ("=") as the document title, which makes them peer sections rather than subsections of "Prefetch" and can produce an odd ToC structure in Sphinx. Use a lower-level adornment (e.g., "-" and "~") for subsections to keep the hierarchy consistent.
    docs/source/user_guide/prefetch.rst:1
  • These headings use the same top-level underline style ("=") as the document title, which makes them peer sections rather than subsections of "Prefetch" and can produce an odd ToC structure in Sphinx. Use a lower-level adornment (e.g., "-" and "~") for subsections to keep the hierarchy consistent.
    docs/source/user_guide/catalog.rst:1
  • Fix grammar: 'support one types of metastores filesystem metastore' is ungrammatical; consider rephrasing to 'supports one type of metastore: filesystem (default)' for clarity.
    docs/source/conf.py:1
  • html_sidebars keys are docnames (source file basenames). In this tree, the landing page is index.rst (with a label named 'implementations'), so the 'implementations' entry likely won’t apply. If the intent is to hide sidebars on the landing page, use 'index' as the key (and remove/adjust 'status' unless a status.rst exists).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ci/scripts/build_paimon.sh Outdated
Comment on lines +37 to +38
mkdir ${build_dir}
pushd ${build_dir}
Comment thread ci/scripts/build_paimon.sh Outdated

popd

rm -rf ${build_dir}
Comment thread CMakeLists.txt Outdated
Comment on lines +438 to +442
INSTALL_DESTINATION lib/cmake/Paimon)

install(FILES "${CMAKE_CURRENT_BINARY_DIR}/PaimonConfig.cmake"
"${CMAKE_CURRENT_BINARY_DIR}/PaimonConfigVersion.cmake"
DESTINATION lib/cmake/Paimon)
Comment thread PaimonConfig.cmake.in Outdated
Comment on lines +18 to +51
# Main library
add_library(paimon_shared SHARED IMPORTED)
set_target_properties(paimon_shared PROPERTIES
IMPORTED_LOCATION "@CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_LIBDIR@/libpaimon.so"
INTERFACE_INCLUDE_DIRECTORIES "@CMAKE_INSTALL_PREFIX@/include"
)
add_library(paimon_static STATIC IMPORTED)
set_target_properties(paimon_static PROPERTIES
IMPORTED_LOCATION "@CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_LIBDIR@/libpaimon.a"
INTERFACE_INCLUDE_DIRECTORIES "@CMAKE_INSTALL_PREFIX@/include"
)

# paimon_parquet_file_format
add_library(paimon_parquet_file_format_shared SHARED IMPORTED)
set_target_properties(paimon_parquet_file_format_shared PROPERTIES
IMPORTED_LOCATION "@CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_LIBDIR@/libpaimon_parquet_file_format.so"
INTERFACE_INCLUDE_DIRECTORIES "@CMAKE_INSTALL_PREFIX@/include"
)
add_library(paimon_parquet_file_format_static STATIC IMPORTED)
set_target_properties(paimon_parquet_file_format_static PROPERTIES
IMPORTED_LOCATION "@CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_LIBDIR@/libpaimon_parquet_file_format.a"
INTERFACE_INCLUDE_DIRECTORIES "@CMAKE_INSTALL_PREFIX@/include"
)

# paimon_orc_file_format
add_library(paimon_orc_file_format_shared SHARED IMPORTED)
set_target_properties(paimon_orc_file_format_shared PROPERTIES
IMPORTED_LOCATION "@CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_LIBDIR@/libpaimon_orc_file_format.so"
INTERFACE_INCLUDE_DIRECTORIES "@CMAKE_INSTALL_PREFIX@/include"
)
add_library(paimon_orc_file_format_static STATIC IMPORTED)
set_target_properties(paimon_orc_file_format_static PROPERTIES
IMPORTED_LOCATION "@CMAKE_INSTALL_PREFIX@/@CMAKE_INSTALL_LIBDIR@/libpaimon_orc_file_format.a"
INTERFACE_INCLUDE_DIRECTORIES "@CMAKE_INSTALL_PREFIX@/include"
@zjw1111 zjw1111 force-pushed the migrate/build-support branch from 92b7f7e to 11e95a0 Compare May 26, 2026 09:31
@ChaomingZhangCN
Copy link
Copy Markdown

Thanks @ChaomingZhangCN, @Eyizoha for your previous contributions to paimon-cpp. This migration PR carries that work forward into Apache Paimon C++.

Thanks for carrying this forward! Happy to see the work continuing under Apache Paimon.

Copy link
Copy Markdown

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the build and docs scaffolding. I think the install/package support needs to be fixed before merging: the generated package config hard-codes the build-time install prefix and Linux library suffixes, and it manually advertises static/shared/optional imported targets even when those artifacts were not installed for the selected build options. This can make find_package(Paimon) resolve to stale paths or expose targets that downstream consumers cannot link. Please switch this to a relocatable CMake package using @PACKAGE_INIT@ / PACKAGE_PREFIX_DIR and exported installed targets (for example via install(EXPORT ...)), and only expose targets that are actually installed.

@zjw1111 zjw1111 force-pushed the migrate/build-support branch from 11e95a0 to 0df2b78 Compare May 26, 2026 13:47
Copy link
Copy Markdown

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I re-ran a configure smoke test on the current head and the root CMake build still cannot configure: CMakeLists.txt adds several subdirectories whose CMakeLists.txt files or directories are not present in this PR, so cmake -S . -B ... stops during configuration before anything can be built. The earlier comments on ci/scripts/build_paimon.sh are also still unresolved (mkdir ${build_dir}, pushd ${build_dir}, cmake ... ${source_dir}, and rm -rf ${build_dir} are still unquoted, and mkdir still lacks -p). Please fix these before merging.

Comment thread CMakeLists.txt Outdated

config_summary_message()

add_subdirectory(src/paimon)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This root CMake file does not configure on the current PR head. It adds src/paimon and several nested directories, but those directories either do not exist or do not contain a CMakeLists.txt in this PR. A local smoke test with cmake -S . -B /tmp/paimon-cpp-pr20-cmake-smoke -DPAIMON_ENABLE_AVRO=OFF -DPAIMON_ENABLE_ORC=OFF -DPAIMON_BUILD_TESTS=OFF fails immediately at this add_subdirectory block. Please add the missing subdirectory CMake files / correct the paths, and make sure a clean configure passes.

Copy link
Copy Markdown

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. The quoting / mkdir -p issues in ci/scripts/build_paimon.sh are fixed now, but the PR still cannot build from a clean checkout because the repository root no longer contains CMakeLists.txt. The new build script runs cmake ... "${source_dir}", so it fails immediately with The source directory ... does not appear to contain CMakeLists.txt. Please keep/add the top-level CMakeLists.txt (or change the build entrypoint to the actual CMake source directory) and verify bash ci/scripts/build_paimon.sh <checkout> false false Debug succeeds from a clean checkout before merging.

Copy link
Copy Markdown

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. The previous build-script quoting issue is gone because ci/scripts/build_paimon.sh is no longer in this PR, but the package/example/docs pieces are still inconsistent with the current tree: this PR adds PaimonConfig.cmake.in, BuildUtils.cmake, examples/CMakeLists.txt, and docs that instruct users to find_package(Paimon), while it does not add any top-level CMake/install/export path that can generate or install PaimonConfig.cmake, PaimonTargets.cmake, or the referenced Paimon::* targets. In a clean checkout, the repository root has no CMakeLists.txt, and the examples cannot resolve Paimon from this PR. Please either include the actual package generation/install/export build entrypoint in the same PR, or remove/defer the package config, examples, and docs that depend on it until the build system lands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants