riscv_fpu: IEEE 754 rounding and exception conformance for the scalar FPU by SolAstrius · Pull Request #239 · LekKit/RVVM

SolAstrius · 2026-06-19T14:48:51Z

Summary

Bring RVVM's scalar F/D floating-point rounding and exception behaviour up to IEEE 754 / RISC-V conformance.

This began as the #204 RMM (roundTiesToAway) fix and grew to close the rounding and flag gaps the conformance suite exposed.

What's fixed

RMM rounding computed via exact error-free transforms, and the instruction's static rm field honoured (not just frm) for all arithmetic ops.
FP-op dispatch no longer size-optimised (func_opt_size dropped) — roughly 2x interpreter FP throughput at zero size cost, since the FP path is not JIT'd.
The FMA family: rounding mode honoured (including the fnmadd operand-negation fix), RMM exact ties via an error-free FMA, and the underflow flag set by true IEEE after-rounding tininess.
fmul ties-away near the underflow boundary, where the Dekker product error itself underflows.
sqrt(-0.0) returns -0.0 without raising invalid.
A spurious INEXACT flag leaked by integer rounding, removed.

Everything stays within the existing fpu_lib wrapper discipline — no raw host FP, no host-fenv-only tricks — so it remains compatible with a software-fenv / jittable softfloat path.

Validation

MPFR-oracle bare-metal harness (rvvm-hal examples/rmm-test): 43652/0 over the five OP-FP ops and the four FMA ops, each under both dynamic frm=RMM and the static ,rmm suffix, including subnormal and underflow-boundary ties.
Together with the canonical-NaN changes (riscv_fpu: canonicalize NaN results and mal-boxed narrow operands #240), riscv-arch-test (ACT4, Spike reference) F/D/I/M reaches 260/260.

Notes

Commits are split one-fix-per-commit for review. No functional change outside src/cpu/riscv_fpu.{c,h} and src/util/fpu_lib.{c,h}.

SolAstrius · 2026-06-20T00:17:41Z

@purplesyringa please help us with your wisdom 🙏

purplesyringa · 2026-06-21T19:40:17Z

+    if (unlikely(rmm || (rm != 0x07 && eff != frm))) {
+        const uint32_t host = fpu_get_rounding_mode();
+        fpu_set_rounding_mode(rmm ? FPU_LIB_ROUND_NE : eff);
+        riscv_emulate_f_opc_op_impl(vm, insn, rmm);


Does it actually make sense to handle RMM in riscv_fpu? This feels like something that is better integrated into fpu_lib, where the rest of math-heavy logic lives. This arbitrary separation confused me.

purplesyringa · 2026-06-21T19:47:05Z

    }
-    if (fpu_is_negative32(f)) {
-        // Raise invalid flag, return canonical NaN
+    if (fpu_is_negative32(f) && (fpu_bit_f32_to_u32(f) << 1) != 0) {


Maybe add a #define FPU_LIB_FPxx_NEGATIVE_ZERO to fpu_lib to make the intent of this comparison clearer?

purplesyringa · 2026-06-21T19:58:46Z

+    // unchanged: the round-to-nearest path below adds +/-0.5, which is inexact once
+    // 0.5 underflows a large value's ULP and would raise a spurious INEXACT -- e.g.
+    // on an out-of-range fcvt-to-int, leaving NX wrongly set alongside NV.
+    if (likely(!fpu_is_fractional32(f))) {


I'm not sure what the intention of this if is: if it's just to avoid raising wrong exceptions, you're already preventing that with the fpu_get_exception logic below. I don't necessarily think adding a fast path here is wrong, but it seems out of scope unless I'm missing something.

Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>

…low boundary Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>

Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>

purplesyringa · 2026-06-21T20:02:11Z

            break;
    }
-    return f;
+    fpu_set_exceptions(exc);


Have you benchmarked this? If this is a measurable slow-down, here's an alternative to the "remove fractional if" advice: maybe look at fpu_exponent32(f) to detect which group it belongs to and choose the path appropriately, so that exceptions are never raised:

f is large enough that it's already necessarily whole and thus doesn't need rounding.

f is in a range where adding 0.5 is always exact.

f is small enough that its rounded value can be clearly determined to be 0 (or maybe 1, depending on the rounding mode; I trust you can check this case-by-case yourself).

Hmm, nevermind, I don't think there's actually necessarily a range with exact addition like that. e.g. adding 0.1 to 1111.10001010101 (in binary) will cause an inexact result even though the exponent is quite small.

If exceptions are slow, maybe this is a good reason to manipulate the mantissa manually? We have some similar code in fpu_is_fractional32, and it doesn't seem too complex as to be clearly worse than setting exceptions.

Alternatively, maybe always rounding to zero and making the rounding decision based on the fractional part is a good idea.

purplesyringa · 2026-06-21T20:24:21Z

        case FPU_LIB_ROUND_NE:
        case FPU_LIB_ROUND_MM:
-            return fpu_add32(f, fpu_bit_u32_to_f32(0x3F000000U | s));
+            r = fpu_add32(f, fpu_bit_u32_to_f32(0x3F000000U | s));


Now that I think about it, I don't think this logic actually ever worked? If this addition is inexact, then it absolutely can round in the wrong direction. Consider f = prev(0.5), where prev denotes the float just before 0.5. Then:

f = 0.0111111111111111111111111_2

0.5 = 0.1_2

f + 0.5 = 0.1111111111111111111111111_2, which is inexact, so if the rounding mode in the environment is round-to-nearest, this rounds to 1. But round(f) should've been 0, not 1.

Maybe you can fix this by temporarily changing the active rounding mode to flooring, but at this point maybe inspecting the fractional part by hand is more efficient.

purplesyringa · 2026-06-21T20:38:52Z

+ * in RNE. funct3 == rm only carries a rounding mode on rounding-capable ops, so
+ * this never misfires on fsgnj/fcmp/fclass/fmv.


I'm not sure what this means? e.g. fsgnj.s has 0 in bits 12-14, so this code will read rm = 0 and temporarily switch the rounding mode to whatever 0 represents, unless I'm missing something. Shouldn't the mode change be gated to only run on specific instructions?

purplesyringa · 2026-06-21T20:41:42Z

+ * range and exact. Flag-isolated, as the residual machinery can raise spurious
+ * exceptions.
+ */
+static forceinline fpu_f32_t riscv_rmm_div_apply_f32(fpu_f32_t n, fpu_f32_t a, fpu_f32_t b)


nit: why is this named *_apply_* when the corresponding fixup functions for addition and multiplication don't have _apply in the middle?

purplesyringa · 2026-06-21T20:56:09Z

+        fpu_set_rounding_mode(host);
+    } else if (rm != 0x07 && eff != frm) {
+        // Static host-native mode that differs from frm.
+        const uint32_t host = fpu_get_rounding_mode();


This feels like a repetition of the logic in riscv_emulate_f_opc_op. Can the two be merged together?

purplesyringa · 2026-06-21T21:10:03Z

LGTM re: sqrt(-0.0), fnmadd fix, dropping func_opt_size. NACK re: INEXACT in rounding, honoring rm. Will review the FMA UF logic tomorrow. I feel like RMM is a little too complex to review and should be a separate patch, there's plenty of complexity here without it.

purplesyringa · 2026-06-22T06:07:58Z

+// Underflow flag, RISC-V semantics: UF iff the result is tiny *after rounding* and
+// inexact. A subnormal result is unambiguously tiny; a normal result above the
+// smallest normal is not. The only hard case is a result of exactly the smallest
+// normal: some hosts (e.g. aarch64) detect tininess *before* rounding and raise a


some hosts (e.g. aarch64) detect tininess before rounding

Is this an issue for any other arithmetic operations?

purplesyringa · 2026-06-22T06:11:19Z

+    if (!(exc & FPU_LIB_FLAG_NX)) {
+        return;  // an exact smallest-normal result never underflows
+    }
+    const fpu_f64_t e = fpu_add64(fpu_mul64(fpu_fcvt_f32_to_f64(a), fpu_fcvt_f32_to_f64(b)),


If this is exact (which I think it always is, maybe modulo NaN propagation due to conversions? but I think it becomes canonical in this case and it's still fine), why does fpu_fma32 uses more complex math? I think it makes sense to update fpu_fma32 together with this patch.

I've submitted a couple more comments which highlight issues with this function, but let me provide more general feedback here.

If f64->f32 conversion sets flags correctly, then this line should be used to implement fma_f32, regardless of whether the native architecture supports FMA, so that we always get the correct flags without special handling.

If the conversion doesn't always set flags correctly, then it should be updated to do so, and I think it's still better to use this line.

purplesyringa · 2026-06-22T06:21:56Z

+// tiny iff that scaled magnitude is below the scaled smallest normal (2^-822).
+static forceinline void riscv_fma_fixup_uf64(fpu_f64_t r, fpu_f64_t a, fpu_f64_t b, fpu_f64_t c, uint32_t eff)
+{
+    if (likely((fpu_bit_f64_to_u64(r) & 0x7FFFFFFFFFFFFFFFULL) != 0x0010000000000000ULL)) {


Please add a comment saying this is 2^-1022.

purplesyringa · 2026-06-22T06:22:02Z

+        return;  // subnormal UF is genuine; a larger normal result has none
+    }
+    const uint32_t exc = fpu_get_exceptions();
+    if (!(exc & FPU_LIB_FLAG_NX)) {


I need a recap on how exceptions work. Are they ever reset implicitly? This condition tries to check whether NX was set by the previous fma operation, but I think in reality it also checks whether it has previously ever been set and not reset.

Honestly I think this condition can be just removed? It seems to be just a fast path, but accessing exceptions is slow anyway.

purplesyringa · 2026-06-22T06:22:07Z

+    const fpu_f64_t rs = fpu_fma64(fpu_mul64(a, S), b, fpu_mul64(c, S));  // (a*b+c) * 2^200
+    fpu_set_rounding_mode(host);
+    const bool tiny = (fpu_bit_f64_to_u64(rs) & 0x7FFFFFFFFFFFFFFFULL) < 0x0C90000000000000ULL;  // |rs| < 2^-822
+    fpu_set_exceptions(tiny ? (exc | FPU_LIB_FLAG_UF) : (exc & ~FPU_LIB_FLAG_UF));


same concern re: exceptions, this unsets UF which I think may have been set by a different operation than FMA.

purplesyringa · 2026-06-22T06:31:27Z

+    if (!(exc & FPU_LIB_FLAG_NX)) {
+        return;  // an exact smallest-normal result never underflows
+    }
+    const fpu_f64_t S = fpu_bit_u64_to_f64(0x4C70000000000000ULL);  // 2^200


2^200 feels a bit arbitrary, it took me a while to figure out where it came from. Maybe change this to 2^52?

purplesyringa · 2026-06-22T06:34:56Z

+    }
+    const fpu_f64_t S = fpu_bit_u64_to_f64(0x4C70000000000000ULL);  // 2^200
+    const uint32_t host = fpu_get_rounding_mode();
+    fpu_set_rounding_mode(eff == 0x04 ? FPU_LIB_ROUND_NE : eff);


Why not just call this before resetting the rounding mode back to host in riscv_fma_round_f64?

purplesyringa · 2026-06-22T06:43:40Z

+// smallest-normal f64 result bounds |a|,|b| (a tiny product forces small operands),
+// so a*2^200 cannot overflow, and fma(a*2^200, b, c*2^200) = (a*b+c)*2^200 lands in


Yeah, but we don't have a tiny product, we have a tiny FMA. "a*b+c is tiny" doesn't imply "a*b is tiny". You can probably make an argument from precision, but I don't quite see why this works.

We know that a*b+c rounds to 2^-1022 exactly. We don't know what the rounding mode is, but we know that at least one of a*b and c must include a set bit at 2^-1022 or below -- otherwise, the sum will either produce a greater value or 0.

If the set bit is in c, the maximum value of |c| is reached when it's the lowest bits of the mantissa, e.g. c = 2^-1022 * (2^52 + (2^52 - 1)), providing |c| <= 2^-969 - 2^-1022. By triangle inequality this also gives |a*b| <= 2^-969.

If the set bit is in a*b, things are a little trickier: since this is FMA, a*b is computed exactly and thus has more than 52 bits of precision. Estimating it as 104 bits gives us |a*b| <= 2^-917 - 2^-1022 and thus |c| <= 2^-917.

Taking |a*b|, |c| <= 2^-917 already demonstrates that computing c * 2^200 is safe. As long as b is non-zero, |b| >= 2^-1074, hence |a| <= 2^157 and computing a * 2^200 is also safe.

If b is zero, a is unlimited. Such an FMA is always exact, so the NX check passes, but if we remove it (see the relevant comment), it still works out: if a * 2^200 overflows to Infinity, we get rs = NaN, and tiny ends up being reset because +NaN is greater than any finite number.

A similar explanation needs to be included in the code.

purplesyringa · 2026-06-22T08:10:13Z

Reviewed FMA UF handling.

LekKit · 2026-06-22T16:28:34Z

Stupidly, github silently merged #239 together with #240 without asking

SolAstrius mentioned this pull request Jun 19, 2026

RMM rounding mode doesn't work correctly #204

Open

SolAstrius force-pushed the fix/fpu-gaps branch from 813c005 to 96202bb Compare June 21, 2026 19:27

SolAstrius mentioned this pull request Jun 21, 2026

riscv_fpu: canonicalize NaN results and mal-boxed narrow operands pufit/RVVM#26

Closed

SolAstrius changed the title ~~riscv_fpu: IEEE 754 conformance for the scalar FPU~~ riscv_fpu: IEEE 754 rounding and exception conformance for the scalar FPU Jun 21, 2026

SolAstrius mentioned this pull request Jun 21, 2026

riscv_fpu: canonicalize NaN results and mal-boxed narrow operands #240

Merged

purplesyringa reviewed Jun 21, 2026

View reviewed changes

SolAstrius added 9 commits June 21, 2026 21:59

riscv_fpu: correct RMM rounding to roundTiesToAway

685255b

Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>

riscv_fpu: honor the static rounding-mode field

9c16bcc

Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>

riscv_fpu: don't size-optimize the FP op dispatch

77187a2

Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>

riscv_fpu: honor the rounding mode in the FMA ops

ee70714

Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>

fpu_lib: sqrt(-0.0) returns -0.0 without raising invalid

e7687bd

Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>

riscv_fpu: set the FMA underflow flag by IEEE after-rounding tininess

0468b7d

Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>

riscv_fpu: round the FMA family to nearest, ties-away under RMM

2fd9ef5

Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>

riscv_fpu: round fmul to nearest, ties-away under RMM near the underf…

81d257c

…low boundary Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>

fpu_lib: don't leak a spurious INEXACT from integer rounding

75639de

Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>

SolAstrius force-pushed the fix/fpu-gaps branch from 96202bb to 75639de Compare June 21, 2026 20:00

purplesyringa reviewed Jun 21, 2026

View reviewed changes

purplesyringa reviewed Jun 22, 2026

View reviewed changes

LekKit merged commit 75639de into LekKit:staging Jun 22, 2026
15 checks passed

SolAstrius mentioned this pull request Jun 22, 2026

riscv_fpu: pre-approved scalar FPU conformance fixes #241

Open

		* in RNE. funct3 == rm only carries a rounding mode on rounding-capable ops, so
		* this never misfires on fsgnj/fcmp/fclass/fmv.

		// smallest-normal f64 result bounds \|a\|,\|b\| (a tiny product forces small operands),
		// so a2^200 cannot overflow, and fma(a2^200, b, c2^200) = (ab+c)*2^200 lands in

Conversation

SolAstrius commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's fixed

Validation

Notes

Uh oh!

SolAstrius commented Jun 20, 2026

Uh oh!

purplesyringa Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

purplesyringa Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

purplesyringa commented Jun 21, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

purplesyringa Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

purplesyringa Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

purplesyringa Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

purplesyringa commented Jun 22, 2026

Uh oh!

Uh oh!

LekKit commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SolAstrius commented Jun 19, 2026 •

edited

Loading

purplesyringa Jun 21, 2026 •

edited

Loading

purplesyringa Jun 21, 2026 •

edited

Loading

purplesyringa Jun 22, 2026 •

edited

Loading

purplesyringa Jun 22, 2026 •

edited

Loading

purplesyringa Jun 22, 2026 •

edited

Loading