-
Notifications
You must be signed in to change notification settings - Fork 5
feat: Unicode LIKE/upper()/lower() via statically-linked ICU #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
58a1317
feat: Unicode LIKE/upper()/lower() via statically-linked ICU
arv 88bf726
deps/icu.js: drop dead Windows code path
arv 0b728ec
icu.js: validate ICU headers, fail loudly instead of silent dynamic l…
arv 64664af
icu.js: link ICU dynamically on glibc Linux (non-PIC static archives)
arv 6f8f244
download.sh: note why SQLITE_ENABLE_ICU is not in DEFINES
arv File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,181 @@ | ||
| 'use strict'; | ||
|
|
||
| // === | ||
| // ICU discovery helper for node-gyp. | ||
| // | ||
| // Defining SQLITE_ENABLE_ICU compiles SQLite's bundled ICU extension (already | ||
| // present in the amalgamation, guarded by #ifdef SQLITE_ENABLE_ICU) and | ||
| // auto-registers Unicode-aware LIKE/upper()/lower()/REGEXP on every | ||
| // connection. That code calls into ICU. | ||
| // | ||
| // We prefer STATIC linking so the prebuilt .node binaries stay self-contained: | ||
| // zero-cache ships them via prebuild-install onto runtime images (e.g. Alpine) | ||
| // that do not have ICU installed, and a dynamic NEEDED libicu*.so.<ver> would | ||
| // fail to load there. Static linking is only possible where the ICU archives | ||
| // are -fPIC, which holds on macOS (Homebrew) and Alpine (musl). Debian/Ubuntu | ||
| // (glibc) ship non-PIC static archives, so there we link ICU dynamically | ||
| // against the system .so (the consumer must have ICU installed at runtime). | ||
| // See `useStatic` below. | ||
| // | ||
| // Usage: | ||
| // node icu.js include -> the ICU include directory (for #include <unicode/...>) | ||
| // node icu.js libs -> newline-separated linker inputs (static archive | ||
| // paths or -L/-l flags), then the C++ runtime / | ||
| // system libraries ICU depends on. | ||
| // | ||
| // Discovery order: pkg-config (Linux/Alpine) -> Homebrew icu4c (macOS) -> | ||
| // common system locations. Set ICU_ROOT to override (expects ICU_ROOT/lib and | ||
| // ICU_ROOT/include). | ||
| // | ||
| // ICU is not enabled on Windows (see deps/sqlite3.gyp), so this script only | ||
| // ever runs on macOS and Linux. | ||
| // === | ||
|
|
||
| const {execSync} = require('child_process'); | ||
| const fs = require('fs'); | ||
| const path = require('path'); | ||
|
|
||
| const isMac = process.platform === 'darwin'; | ||
| const isLinux = process.platform === 'linux'; | ||
| const isAlpine = isLinux && fs.existsSync('/etc/alpine-release'); | ||
|
|
||
| // We static-link ICU only where the static archives are position-independent | ||
| // (-fPIC) and can therefore be linked into a shared object (the .node): | ||
| // * macOS — Homebrew's icu4c archives are PIC. | ||
| // * Alpine — musl builds everything PIC, so icu-static is PIC. | ||
| // Debian/Ubuntu (glibc) ship NON-PIC static archives (libicu*.a), which fail to | ||
| // link into a shared object ("recompile with -fPIC"), so on glibc Linux we link | ||
| // ICU dynamically against the distro .so instead — those consumers must have | ||
| // ICU installed at runtime. ICU_ALLOW_DYNAMIC=1 forces dynamic everywhere as a | ||
| // local-dev escape hatch. | ||
| const useStatic = | ||
| (isMac || isAlpine) && process.env.ICU_ALLOW_DYNAMIC !== '1'; | ||
|
|
||
| function run(cmd) { | ||
| try { | ||
| return execSync(cmd, {stdio: ['ignore', 'pipe', 'ignore']}).toString().trim(); | ||
| } catch { | ||
| return ''; | ||
| } | ||
| } | ||
|
|
||
| function firstDir(candidates) { | ||
| return candidates.find(p => p && fs.existsSync(p)) || ''; | ||
| } | ||
|
|
||
| // An include dir is only useful if the ICU headers are actually under it | ||
| // (<dir>/unicode/utypes.h). Validating this lets us reject a misconfigured | ||
| // pkg-config .pc and fall through to another discovery method, instead of | ||
| // emitting a bogus path that fails later with a confusing missing-header error. | ||
| function hasIcuHeaders(dir) { | ||
| return !!dir && fs.existsSync(path.join(dir, 'unicode', 'utypes.h')); | ||
| } | ||
|
|
||
| function fail(message) { | ||
| process.stderr.write(`deps/icu.js: ${message}\n`); | ||
| process.exit(1); | ||
| } | ||
|
|
||
| // Locate the ICU lib and include directories. Returns {libDir, includeDir}. | ||
| function locate() { | ||
| if (process.env.ICU_ROOT) { | ||
| const root = process.env.ICU_ROOT; | ||
| return {libDir: path.join(root, 'lib'), includeDir: path.join(root, 'include')}; | ||
| } | ||
|
|
||
| // pkg-config (Debian's libicu-dev and Alpine's icu-dev ship icu-i18n.pc). | ||
| // Require both the lib dir and the actual ICU headers before trusting it. | ||
| const pcLibDir = run('pkg-config --variable=libdir icu-i18n'); | ||
| const pcIncDir = run('pkg-config --variable=includedir icu-i18n'); | ||
| if (pcLibDir && fs.existsSync(pcLibDir) && hasIcuHeaders(pcIncDir)) { | ||
| return {libDir: pcLibDir, includeDir: pcIncDir}; | ||
| } | ||
|
|
||
| // Homebrew icu4c (macOS, keg-only so not on default search paths). | ||
| if (isMac) { | ||
| let prefix = run('brew --prefix icu4c'); | ||
| if (!prefix || !fs.existsSync(prefix)) { | ||
| prefix = firstDir(['/opt/homebrew/opt/icu4c', '/usr/local/opt/icu4c']); | ||
| } | ||
| if (prefix) { | ||
| return {libDir: path.join(prefix, 'lib'), includeDir: path.join(prefix, 'include')}; | ||
| } | ||
| } | ||
|
|
||
| // Common system locations (Debian multiarch, Alpine, manual installs). | ||
| const libDir = firstDir([ | ||
| '/usr/lib/x86_64-linux-gnu', | ||
| '/usr/lib/aarch64-linux-gnu', | ||
| '/usr/lib/arm-linux-gnueabihf', | ||
| '/usr/lib', | ||
| '/usr/local/lib', | ||
| ]); | ||
| const includeDir = ['/usr/include', '/usr/local/include'].find(hasIcuHeaders) || ''; | ||
| return {libDir, includeDir}; | ||
| } | ||
|
|
||
| // ICU static archives, in dependency order (i18n -> uc -> data). | ||
| const ARCHIVE_NAMES = ['libicui18n', 'libicuuc', 'libicudata']; | ||
|
|
||
| // Full paths to the ICU static archives, so the linker pulls them in | ||
| // statically and the resulting binary stays self-contained. | ||
| function staticLibInputs(loc) { | ||
| return ARCHIVE_NAMES.map(name => { | ||
| const full = loc.libDir && path.join(loc.libDir, name + '.a'); | ||
| if (full && fs.existsSync(full)) { | ||
| return full; | ||
| } | ||
| // On the static platforms (macOS, Alpine) a missing archive is fatal: we | ||
| // must not silently produce a dynamically-linked binary, since zero-cache | ||
| // ships these prebuilds onto images (e.g. Alpine) that have no ICU. | ||
| fail( | ||
| `static ICU archive ${name}.a not found in ${loc.libDir || '(unknown library dir)'}.\n` + | ||
| ` This platform links ICU statically to stay self-contained, so the build is aborting\n` + | ||
| ` rather than linking ICU dynamically. Install the static ICU libraries (icu-dev +\n` + | ||
| ` icu-static on Alpine, icu4c via Homebrew on macOS), or set ICU_ALLOW_DYNAMIC=1 to\n` + | ||
| ` allow a dynamic fallback for local development.`, | ||
| ); | ||
| return null; // unreachable; fail() exits | ||
| }); | ||
| } | ||
|
|
||
| // Ordinary -l flags, resolved against the system ICU shared libraries. Used on | ||
| // glibc Linux (Debian/Ubuntu), whose static archives are not -fPIC and so can't | ||
| // be linked into a shared object; the consumer must have ICU at runtime. | ||
| function dynamicLibInputs(loc) { | ||
| const out = []; | ||
| if (loc.libDir) { | ||
| out.push('-L' + loc.libDir); | ||
| } | ||
| for (const name of ARCHIVE_NAMES) { | ||
| out.push('-l' + name.replace(/^lib/, '')); | ||
| } | ||
| return out; | ||
| } | ||
|
|
||
| function libsOutput(loc) { | ||
| const out = useStatic ? staticLibInputs(loc) : dynamicLibInputs(loc); | ||
| // C++ runtime + system libraries that ICU depends on. | ||
| if (isMac) { | ||
| out.push('-lc++'); | ||
| } else { | ||
| out.push('-lstdc++', '-lm', '-lpthread', '-ldl'); | ||
| } | ||
| return out; | ||
| } | ||
|
|
||
| const mode = process.argv[2]; | ||
| const loc = locate(); | ||
|
|
||
| if (mode === 'include') { | ||
| if (!hasIcuHeaders(loc.includeDir)) { | ||
| fail( | ||
| `could not find the ICU headers (unicode/utypes.h) in ${loc.includeDir || '(unknown include dir)'}.\n` + | ||
| ` Install the ICU development package (libicu-dev on Debian, icu-dev on Alpine,\n` + | ||
| ` icu4c via Homebrew on macOS), or set ICU_ROOT to an ICU install prefix.`, | ||
| ); | ||
| } | ||
| process.stdout.write(loc.includeDir); | ||
| } else { | ||
| process.stdout.write(libsOutput(loc).join('\n')); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| 'use strict'; | ||
| const os = require('os'); | ||
| const Database = require('../.'); | ||
|
|
||
| // ICU is statically linked on macOS and Linux (see deps/icu.js and binding.gyp), | ||
| // which makes LIKE/lower()/upper() Unicode-aware. It is intentionally NOT linked | ||
| // on Windows (static ICU there is impractical to build in CI), so those builds | ||
| // keep SQLite's ASCII-only behavior. | ||
| const isWindows = os.platform().startsWith('win'); | ||
| const itWindows = isWindows ? it : it.skip; | ||
|
|
||
| describe('ICU Unicode support', function () { | ||
| beforeEach(function () { | ||
| this.db = new Database(util.next()); | ||
| }); | ||
| afterEach(function () { | ||
| this.db.close(); | ||
| }); | ||
|
|
||
| const evalScalar = function (db, expr) { | ||
| return db.prepare(`SELECT ${expr} AS v`).pluck().get(); | ||
| }; | ||
|
|
||
| util.itUnix('case-folds non-ASCII characters when ICU is enabled', function () { | ||
| expect(evalScalar(this.db, "lower('Ä')")).to.equal('ä'); | ||
| expect(evalScalar(this.db, "upper('ß')")).to.equal('SS'); | ||
| // SQLite's LIKE is provided by ICU here, so it folds case across Unicode. | ||
| expect(evalScalar(this.db, "'Ä' LIKE 'ä'")).to.equal(1); | ||
| expect(evalScalar(this.db, "'ПРИВЕТ' LIKE 'привет'")).to.equal(1); | ||
| // Distinct characters still do not match. | ||
| expect(evalScalar(this.db, "'Ä' LIKE 'å'")).to.equal(0); | ||
| }); | ||
|
|
||
| itWindows('leaves non-ASCII characters unchanged when ICU is disabled', function () { | ||
| expect(evalScalar(this.db, "lower('Ä')")).to.equal('Ä'); | ||
| expect(evalScalar(this.db, "'Ä' LIKE 'ä'")).to.equal(0); | ||
| // ASCII case-insensitivity still works without ICU. | ||
| expect(evalScalar(this.db, "'ABC' LIKE 'abc'")).to.equal(1); | ||
| }); | ||
| }); |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.