Skip to content

Fix and rework binary_to_float/list_to_float#2246

Merged
bettio merged 3 commits intoatomvm:release-0.7from
bettio:fix-to-float-functions
Mar 30, 2026
Merged

Fix and rework binary_to_float/list_to_float#2246
bettio merged 3 commits intoatomvm:release-0.7from
bettio:fix-to-float-functions

Conversation

@bettio
Copy link
Copy Markdown
Collaborator

@bettio bettio commented Mar 28, 2026

Fix binary_to_float/1 and list_to_float/1 overflow bug and replace
the locale-dependent sscanf-based parser with strict, locale-
independent float parsing.

  • Fix binary_to_float(<<"1.0e999">>) returning inf instead of
    raising badarg, matching Erlang/OTP behavior.
  • Replace sscanf with unlocalized_strtod()/unlocalized_strtof(),
    which validate strict bare float format and use a 3-strategy
    locale-independent approach (strtod_l/strtof_l / uselocale /
    fallback with prepare_locale_retry_buf() helper).
  • Add unlocalized_strto_avm_float() inline dispatcher that selects
    the right variant based on AVM_USE_SINGLE_PRECISION.
  • Rename unlocalized_snprintf.{c,h} to unlocalized.{c,h} and add
    unlocalized_validate_bare_float_format(), unlocalized_strtod(),
    and unlocalized_strtof() sharing the same cached C locale
    infrastructure.
  • Guard double-only code behind UNLOCALIZED_ENABLE_DOUBLE_API /
    FLOAT_UTILS_ENABLE_DOUBLE_API and float-only code behind
    UNLOCALIZED_ENABLE_FLOAT_API / FLOAT_UTILS_ENABLE_FLOAT_API,
    so no double code leaks into a float-only build and vice versa.
  • Rename double_format_t to float_format_t and DoubleFormat*
    to FloatFormat* since the enum belongs to the float_utils
    module, not the double type.
  • Reject inputs that sscanf wrongly accepted: hex floats, inf,
    nan, whitespace, missing decimal point, missing digits around
    dot, and commas as decimal separator (per binary_to_float doesn't correctly handle scientific notation with no decimal places specified, such as "1e0" erlang/otp#9061).
  • Merge bin2float and list2float test modules into a single
    string2float test and expand coverage from 9 to 39 cases
    including overflow, underflow, special values, format violations,
    and edge cases.

These changes are made under both the "Apache 2.0" and the "GNU Lesser
General Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

@petermm
Copy link
Copy Markdown
Contributor

petermm commented Mar 29, 2026

usual caveats, I'm just the messenger;-)

PR Review: Add locale-independent float parsing

Commit: 65324dffAdd locale-independent float parsing
Author: Davide Bettio
Reviewed: 2026-03-29

Summary

Replaces sscanf-based float parsing in binary_to_float/list_to_float with a proper locale-independent implementation using a 3-strategy approach (strtod_l / uselocale / fallback with separator retry). Renames unlocalized_snprintf.{c,h}unlocalized.{c,h}, adds unlocalized_strtod() and validate_bare_float_format(). Comprehensive tests added.

Overall assessment: Good direction — approve with changes. Two correctness bugs must be fixed before merge.


🔴 Must-Fix Issues

1. unlocalized_strtod() can reject valid finite subnormal numbers

File: src/libAtomVM/unlocalized.c:435-437

strtod() may set ERANGE for underflow to a non-zero subnormal, not just underflow to 0.0. This means valid finite float values (very small denormals) could be incorrectly rejected as badarg.

Current code:

/* ERANGE with val != 0 means overflow. ERANGE with val == 0 is
 * underflow to zero, which is allowed (matches OTP behavior). */
if (errno == ERANGE && val != 0.0) {
    return -1;
}

Suggested fix:

-    /* ERANGE with val != 0 means overflow. ERANGE with val == 0 is
-     * underflow to zero, which is allowed (matches OTP behavior). */
-    if (errno == ERANGE && val != 0.0) {
-        return -1;
-    }
+    /* isfinite() already rejects overflow (±inf). Underflow — whether
+     * flushed to zero or to a finite subnormal — is allowed, matching
+     * OTP behavior. No additional errno check needed. */

The isfinite() check above already catches overflow (±inf). Removing the ERANGE check lets valid subnormals through while still rejecting overflow.

Erlang/OTP verification — these all succeed on OTP but would raise badarg on AtomVM with the current code:

1> binary_to_float(<<"1.0e-320">>).   %% subnormal, strtod sets ERANGE
1.0e-320
2> binary_to_float(<<"5.0e-324">>).   %% smallest subnormal
5.0e-324
3> binary_to_float(<<"2.0e-308">>).   %% just below normal minimum
2.0e-308
4> erlang:list_to_float("1.0e-320").  %% same via list_to_float
1.0e-320

2. Off-by-one in max length check rejects valid 255-byte inputs

File: src/libAtomVM/unlocalized.c:411-412

With PARSE_FLOAT_BUF_SIZE == 256, the check rejects len == 255, but the buffer char tmp[256] can safely hold 255 bytes + NUL. This contradicts the header doc ("max 255").

Current code:

if (len == 0 || len >= PARSE_FLOAT_BUF_SIZE - 1) {
    return -1;
}

Suggested fix:

-    if (len == 0 || len >= PARSE_FLOAT_BUF_SIZE - 1) {
+    if (len == 0 || len > PARSE_FLOAT_BUF_SIZE - 1) {
         return -1;
     }

🟡 Recommended Improvements

3. Strategy 3 strtod_c(): reset errno before retry

File: src/libAtomVM/unlocalized.c:326-337

When Strategy 3 retries strtod() with the locale separator, it doesn't reset errno. A failed first attempt can leak ERANGE into the retry result.

Current code:

    val = strtod(retry_buf, endp);

    /* Adjust endp back to the original buffer. */
    size_t consumed = (size_t) (*endp - retry_buf);

Suggested fix:

+    errno = 0;
     val = strtod(retry_buf, endp);

     /* Adjust endp back to the original buffer. */
     size_t consumed = (size_t) (*endp - retry_buf);

Also consider computing dot_pos once before the retry for clarity:

+    const char *dot = strchr(buf, '.');   /* guaranteed non-NULL by validation */
+    size_t dot_pos = (size_t)(dot - buf);
+
+    errno = 0;
     val = strtod(retry_buf, endp);

     /* Adjust endp back to the original buffer. */
     size_t consumed = (size_t) (*endp - retry_buf);
-    /* Account for the separator length difference. */
-    if (consumed > (size_t) (strchr(buf, '.') - buf)) {
+    if (consumed > dot_pos) {
         consumed -= (seplen - 1);
     }
     *endp = (char *) (buf + consumed);

4. retry_buf sizing uses magic number

File: src/libAtomVM/unlocalized.c:309-310

The comment mentions MB_LEN_MAX but the code uses 16. Consider using the actual constant:

-    char retry_buf[PARSE_FLOAT_BUF_SIZE + 16];
+    char retry_buf[PARSE_FLOAT_BUF_SIZE + MB_LEN_MAX];

5. float_cmp/2 in tests doesn't catch sign bugs

Files: tests/erlang_tests/bin2float.erl:96, tests/erlang_tests/list2float.erl:96

The test for "-1.0" compares against 1.0 using abs(), so a parser returning +1.0 instead of -1.0 would still pass:

F4 = bin2float(<<"-1.0">>),
...
float_cmp(F4, 1.0) * 8    %% abs(abs(-1.0) - 1.0) < eps → true ✓
                           %% abs(abs(+1.0) - 1.0) < eps → true ✓ (BUG NOT CAUGHT)

Suggested fix — compare against -1.0 directly and use a sign-aware comparison:

-        float_cmp(F4, ?MODULE:id(1.0)) * 8 +
+        float_cmp(F4, ?MODULE:id(-1.0)) * 8 +

And update float_cmp to be sign-aware:

float_cmp(F1, F2) ->
    case abs(F1 - F2) < 0.00000000001 of
        true -> 1;
        false -> 0
    end.

6. Tests don't validate locale-independence

The main purpose of this commit is locale-independent parsing, but no test runs under a non-C locale. Consider adding a C-level unit test that forces LC_NUMERIC to a comma-decimal locale (e.g., de_DE.UTF-8) and verifies correct parsing, or a conditional Erlang test.


7. Silent fallback when newlocale() fails

File: src/libAtomVM/unlocalized.c (strategies 1 & 2)

If newlocale() returns 0, the code silently falls back to locale-dependent strtod()/vsnprintf(), defeating the purpose. Consider at minimum logging a warning, or falling through to strategy 3 logic.


✅ Positive Observations

  • Strategy 3 endp pointer arithmetic is correct under the stated preconditions (validate_bare_float_format guarantees exactly one .).
  • validate_bare_float_format() correctly matches current Erlang/OTP binary_to_float/list_to_float semantics (decimal only, mandatory dot, etc.).
  • Thread safety is well-handled: pthread_once() for SMP, simple flag for non-SMP builds.
  • Test coverage is comprehensive — 7 valid cases and 23 badarg cases covering signs, exponents, whitespace, hex floats, and various malformed inputs.
  • Expected value arithmetic (15871 = 511 + 30 × 512) is correct.
  • Clean separation of validation from parsing is a good design.

@bettio bettio force-pushed the fix-to-float-functions branch 2 times, most recently from 375f10e to 36b9348 Compare March 29, 2026 15:59
@petermm
Copy link
Copy Markdown
Contributor

petermm commented Mar 29, 2026

Root cause: Test B10 expect_bin_badarg(<<"1,0">>) — OTP currently accepts "," as a decimal separator (binary_to_float(<<"1,0">>) → 1.0). The test expects badarg, losing exactly 512. The commit intentionally rejects commas (per erlang/otp#9061 future direction), but the CI runs on BEAM where commas still work.

@bettio bettio force-pushed the fix-to-float-functions branch 2 times, most recently from 21ca1c2 to eb4000b Compare March 30, 2026 12:06
bettio added 3 commits March 30, 2026 14:12
`binary_to_float(<<"1.0e999">>)` returned inf instead of raising
`badarg`. Add `isfinite()` check after `sscanf()` to reject non-finite
results, matching Erlang/OTP behavior.

Signed-off-by: Davide Bettio <davide@uninstall.it>
Replace sscanf-based float parsing in `binary_to_float`/`list_to_float`
with a proper implementation that validates strict bare float format
and uses locale-independent `strtod`/`strtof`.

Rename `unlocalized_snprintf.{c,h}` to `unlocalized.{c,h}` and add
`unlocalized_strtod()`, `unlocalized_strtof()`, and
`unlocalized_validate_bare_float_format()`. An inline dispatcher
`unlocalized_strto_avm_float()` selects the right variant based on
`AVM_USE_SINGLE_PRECISION`. The three locale strategies (`strtod_l` /
`uselocale` / fallback with separator retry) are shared via a
`prepare_locale_retry_buf()` helper.

Refactor `float_utils` to follow the same pattern: guard
`double_write_to_ascii_buf` and Grisu3 internals behind
`FLOAT_UTILS_ENABLE_DOUBLE_API`, and 32-bit support behind
`FLOAT_UTILS_ENABLE_FLOAT_API`, so no `double` code leaks into a
float-only build and vice versa. Rename `double_format_t` to
`float_format_t` since the enum belongs to the `float_utils` module,
not the `double` type.

The bare float format `[+|-]DIGITS.DIGITS[(e|E)[+|-]DIGITS]` matches
Erlang/OTP's `binary_to_float`/`list_to_float`, except commas are not
accepted as decimal separators (aligning with OTP's future
direction per erlang/otp#9061).

Signed-off-by: Davide Bettio <davide@uninstall.it>
binary_to_float and list_to_float share the same parsing code, so
there is no need for two near-identical test modules. Merge them
into a single string2float test that covers binary_to_float
exhaustively and adds a few basic list_to_float smoke tests.

Signed-off-by: Davide Bettio <davide@uninstall.it>
@bettio bettio force-pushed the fix-to-float-functions branch from eb4000b to c1d2e6c Compare March 30, 2026 12:13
Copy link
Copy Markdown
Contributor

@petermm petermm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amp is happy!

@bettio bettio merged commit 72d635d into atomvm:release-0.7 Mar 30, 2026
222 of 227 checks passed
bettio added a commit that referenced this pull request Mar 30, 2026
Merge fixes, features, and optimizations from release-0.7, including:
- Add signature-driven code loader (#2229)
- Add UART support on generic_unix via POSIX termios NIFs (#2243)
- Add erlang node/1 BIF (#2225)
- Add erts_internal:cmp_term/2 NIF and fix map type ordering (#2226)
- Add short option to float_to_binary/list (#2240)
- Add locale-independent float parsing and fix overflow (#2246)
- JIT: add RISC-V 64-bit backend (#2231)
- JIT: add DWARF debug information support (#1910)
- JIT x86: instruction encoding optimizations (#2234)
- JIT: remove redundant AND when untagging integers (#2235)
- Fix bug in bs_match get_tail handling (#2242)
- Fix binary encoding of int:24 and others (#2230)
- Fix cancel_timer/1 spec and documentation (#2244)
- Resurrect opcodes emitted with no_bs_create_bin (#2245)
- Fix flaky tests related to GitHub DNS Resolver (#2232)
- Add git guide (#2106)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants