Skip to content

fix: safe opcache_reset#2073

Open
AlliBalliBaba wants to merge 50 commits into
mainfrom
fix/opcache-safe-reset
Open

fix: safe opcache_reset#2073
AlliBalliBaba wants to merge 50 commits into
mainfrom
fix/opcache-safe-reset

Conversation

@AlliBalliBaba
Copy link
Copy Markdown
Contributor

@AlliBalliBaba AlliBalliBaba commented Dec 14, 2025

Idea to fix #1737

Just a WIP, to test in CI.

@alexandre-daubois
Copy link
Copy Markdown
Member

Hi @AlliBalliBaba, any news on this side? 🙂
I think I stumbled across the very same issue with #2265

@AlliBalliBaba
Copy link
Copy Markdown
Contributor Author

I experimented a lot with juggling thread states, but not sure it's possible to make a reset fully safe from our side alone.

There are 2 main race conditions:

  • A thread parses a script without having restarted after a opcache_reset has been triggered.
  • Multiple threads trigger an opcache_reset simultaneously.

The first race condition kind of gets mitigated by the changes in this PR. THe second one would need to somehow be fixed in php-src or requires a hook to reject opcache_resets.

@withinboredom
Copy link
Copy Markdown
Member

Can we try doing what we do with the environment functions and simply override/replace the existing one:

  1. take a global write lock (read lock taken before executing a php script)
  2. do the real reset
  3. release the lock

@AlliBalliBaba
Copy link
Copy Markdown
Contributor Author

Yeah I tried that as well. But an opcache_reset can also happen for other reasons, like after calling opcache_invalidate repeatedly because memory is filling up.

But maybe overwriting the user function would still be a good first step 👍

@henderkes
Copy link
Copy Markdown
Contributor

Can we try doing what we do with the environment functions and simply override/replace the existing one:

  1. take a global write lock (read lock taken before executing a php script)
  2. do the real reset
  3. release the lock

I tried this and it kind of works, but workers are an issue. Only got it working by locking new requests, draining all workers, doing the reset, restarting all workers and then unlocking.

@AlliBalliBaba
Copy link
Copy Markdown
Contributor Author

AlliBalliBaba commented Mar 13, 2026

Oh I just realized that overwriting opcache and restarting workers was also exactly what this branch does. Do you hard-link against the opcache extension in your solution @henderkes? When hard-linking and calling the reset directly, this can probably be done much more cleanly.
Simply calling the original reset function will make some Docker image builds still fail with segfaults.

@henderkes
Copy link
Copy Markdown
Contributor

Yes, I hard linked because opcache is always active in 8.5+ (which is where I test). I'll tidy up my changes and merge them on top of this branch when I get to it.

@AlliBalliBaba
Copy link
Copy Markdown
Contributor Author

Maybe we can make it a requirement also for other PHP versions, not sure how many installations there are without it.
Solution would still be the same, all threads need to stop completely, but we would force a flush instead of letting PHP schedule one.

@henderkes
Copy link
Copy Markdown
Contributor

henderkes commented Mar 13, 2026

You already had logic to check whether opcache exists in the branch so I only had to pick in a minor change here. Let's see if tests pass.

Edit: well, apparently not. I only tested with 8.5 locally and guess what's passing x)

Okay, I think I get the issue now. Hard linking to opcache won't help because the init chain for shared extensions is different than for static extensions, which is why it works for me locally, but not in CI in < 8.5. Need to wait with actually calling opcache reset until the extension is properly loaded.

@henderkes henderkes force-pushed the fix/opcache-safe-reset branch 2 times, most recently from e54cd96 to 5c8e340 Compare March 13, 2026 13:52
@henderkes henderkes force-pushed the fix/opcache-safe-reset branch from 5c8e340 to 1d75824 Compare March 13, 2026 13:56
@henderkes henderkes force-pushed the fix/opcache-safe-reset branch from e7bd25a to 0d87765 Compare March 13, 2026 15:44
@henderkes
Copy link
Copy Markdown
Contributor

Alright, we're down to php 8.2 segfaulting. I'm not terribly keep to find out why.

@henderkes
Copy link
Copy Markdown
Contributor

Down to only debian 8.2 failing...

@AlliBalliBaba
Copy link
Copy Markdown
Contributor Author

@henderkes do you remember what exactly fixed 8.2 for you before? I think I re-introduced the failure for some 8.2 versions while fixing a different race.

@henderkes
Copy link
Copy Markdown
Contributor

@henderkes do you remember what exactly fixed 8.2 for you before? I think I re-introduced the failure for some 8.2 versions while fixing a different race.

It turned out that the failures were unrelated to the branch changes. I merged main so if it's still happening now it'll be something different.

Comment thread frankenphp.c
@henderkes
Copy link
Copy Markdown
Contributor

    caddy_test.go:1790: failed to call server Get "http://localhost:9080/sleep.php?sleep=11&work=11": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    caddy_test.go:1779: failed to call server Get "http://localhost:9080/opcache_reset.php": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    caddy_test.go:1790: failed to call server Get "http://localhost:9080/sleep.php?sleep=22&work=22": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    caddy_test.go:1790: failed to call server Get "http://localhost:9080/sleep.php?sleep=26&work=26": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    caddy_test.go:1790: failed to call server Get "http://localhost:9080/sleep.php?sleep=21&work=21": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    caddy_test.go:1790: failed to call server Get "http://localhost:9080/sleep.php?sleep=20&work=20": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    caddy_test.go:1790: failed to call server Get "http://localhost:9080/sleep.php?sleep=23&work=23": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    caddy_test.go:1790: failed to call server Get "http://localhost:9080/sleep.php?sleep=27&work=27": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    caddy_test.go:1790: failed to call server Get "http://localhost:9080/sleep.php?sleep=25&work=25": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    caddy_test.go:1790: failed to call server Get "http://localhost:9080/sleep.php?sleep=24&work=24": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    caddy_test.go:1779: failed to call server Get "http://localhost:9080/opcache_reset.php": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
    caddy_test.go:1779: failed to call server Get "http://localhost:9080/opcache_reset.php": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Not sure why... I've ran the exact same docker container tests 100x locally and didn't run into the failure a single time.

henderkes pushed a commit to henderkes/frankenphp that referenced this pull request May 30, 2026
Replace the ad-hoc opcache test with the canonical concurrency test and
fixtures from PR php#2073 (caddy TestOpcacheReset, opcache_reset.php,
require.php): 500 mixed sleep.php / opcache_reset.php requests across 20
workers. Verified passing against this implementation with opcache enabled
and every opcache_reset() routed through the thread-safe coordinator.

https://claude.ai/code/session_01K4jwAnp9mA9ApgaFiJDf6d
@Juoper
Copy link
Copy Markdown

Juoper commented Jun 1, 2026

Hey, quick question, would be really great if we could get this PR merged soonish, because it's blocking a lot of our team members. Is there any estimation for how long this takes to fix or can we assist in some way?

@Juoper
Copy link
Copy Markdown

Juoper commented Jun 1, 2026

so i even checked out this branch and built the docker image for it locally, but i still run into zend_mm_heap corrupted errors
is there a nice and easy way to help you guys to investigate the issue?

@henderkes
Copy link
Copy Markdown
Contributor

zend_mm_heap_corrupted is generally an error in upstream php-src or caused by a not-fully-zts-compatible extension. This PR here would fix segfaults, rather than zend memory manager logic errors

@AlliBalliBaba
Copy link
Copy Markdown
Contributor Author

@Juoper can you also try building from this branch: #2364

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FrankenPHP crashes with zend_mm_heap corrupted while running some wordpress sites

7 participants