Fix and rework binary_to_float/list_to_float by bettio · Pull Request #2246 · atomvm/AtomVM

bettio · 2026-03-28T23:13:33Z

Fix binary_to_float/1 and list_to_float/1 overflow bug and replace
the locale-dependent sscanf-based parser with strict, locale-
independent float parsing.

Fix binary_to_float(<<"1.0e999">>) returning inf instead of
raising badarg, matching Erlang/OTP behavior.
Replace sscanf with unlocalized_strtod()/unlocalized_strtof(),
which validate strict bare float format and use a 3-strategy
locale-independent approach (strtod_l/strtof_l / uselocale /
fallback with prepare_locale_retry_buf() helper).
Add unlocalized_strto_avm_float() inline dispatcher that selects
the right variant based on AVM_USE_SINGLE_PRECISION.
Rename unlocalized_snprintf.{c,h} to unlocalized.{c,h} and add
unlocalized_validate_bare_float_format(), unlocalized_strtod(),
and unlocalized_strtof() sharing the same cached C locale
infrastructure.
Guard double-only code behind UNLOCALIZED_ENABLE_DOUBLE_API /
FLOAT_UTILS_ENABLE_DOUBLE_API and float-only code behind
UNLOCALIZED_ENABLE_FLOAT_API / FLOAT_UTILS_ENABLE_FLOAT_API,
so no double code leaks into a float-only build and vice versa.
Rename double_format_t to float_format_t and DoubleFormat*
to FloatFormat* since the enum belongs to the float_utils
module, not the double type.
Reject inputs that sscanf wrongly accepted: hex floats, inf,
nan, whitespace, missing decimal point, missing digits around
dot, and commas as decimal separator (per binary_to_float doesn't correctly handle scientific notation with no decimal places specified, such as "1e0" erlang/otp#9061).
Merge bin2float and list2float test modules into a single
string2float test and expand coverage from 9 to 39 cases
including overflow, underflow, special values, format violations,
and edge cases.

These changes are made under both the "Apache 2.0" and the "GNU Lesser
General Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

petermm · 2026-03-29T11:40:09Z

usual caveats, I'm just the messenger;-)

PR Review: Add locale-independent float parsing

Commit: 65324dff — Add locale-independent float parsing
Author: Davide Bettio
Reviewed: 2026-03-29

Summary

Replaces sscanf-based float parsing in binary_to_float/list_to_float with a proper locale-independent implementation using a 3-strategy approach (strtod_l / uselocale / fallback with separator retry). Renames unlocalized_snprintf.{c,h} → unlocalized.{c,h}, adds unlocalized_strtod() and validate_bare_float_format(). Comprehensive tests added.

Overall assessment: Good direction — approve with changes. Two correctness bugs must be fixed before merge.

🔴 Must-Fix Issues

1. `unlocalized_strtod()` can reject valid finite subnormal numbers

File: src/libAtomVM/unlocalized.c:435-437

strtod() may set ERANGE for underflow to a non-zero subnormal, not just underflow to 0.0. This means valid finite float values (very small denormals) could be incorrectly rejected as badarg.

Current code:

/* ERANGE with val != 0 means overflow. ERANGE with val == 0 is
 * underflow to zero, which is allowed (matches OTP behavior). */
if (errno == ERANGE && val != 0.0) {
    return -1;
}

Suggested fix:

-    /* ERANGE with val != 0 means overflow. ERANGE with val == 0 is
-     * underflow to zero, which is allowed (matches OTP behavior). */
-    if (errno == ERANGE && val != 0.0) {
-        return -1;
-    }
+    /* isfinite() already rejects overflow (±inf). Underflow — whether
+     * flushed to zero or to a finite subnormal — is allowed, matching
+     * OTP behavior. No additional errno check needed. */

The isfinite() check above already catches overflow (±inf). Removing the ERANGE check lets valid subnormals through while still rejecting overflow.

Erlang/OTP verification — these all succeed on OTP but would raise badarg on AtomVM with the current code:

1> binary_to_float(<<"1.0e-320">>).   %% subnormal, strtod sets ERANGE
1.0e-320
2> binary_to_float(<<"5.0e-324">>).   %% smallest subnormal
5.0e-324
3> binary_to_float(<<"2.0e-308">>).   %% just below normal minimum
2.0e-308
4> erlang:list_to_float("1.0e-320").  %% same via list_to_float
1.0e-320

2. Off-by-one in max length check rejects valid 255-byte inputs

File: src/libAtomVM/unlocalized.c:411-412

With PARSE_FLOAT_BUF_SIZE == 256, the check rejects len == 255, but the buffer char tmp[256] can safely hold 255 bytes + NUL. This contradicts the header doc ("max 255").

Current code:

if (len == 0 || len >= PARSE_FLOAT_BUF_SIZE - 1) {
    return -1;
}

Suggested fix:

-    if (len == 0 || len >= PARSE_FLOAT_BUF_SIZE - 1) {
+    if (len == 0 || len > PARSE_FLOAT_BUF_SIZE - 1) {
         return -1;
     }

🟡 Recommended Improvements

3. Strategy 3 `strtod_c()`: reset `errno` before retry

File: src/libAtomVM/unlocalized.c:326-337

When Strategy 3 retries strtod() with the locale separator, it doesn't reset errno. A failed first attempt can leak ERANGE into the retry result.

Current code:

    val = strtod(retry_buf, endp);

    /* Adjust endp back to the original buffer. */
    size_t consumed = (size_t) (*endp - retry_buf);

Suggested fix:

+    errno = 0;
     val = strtod(retry_buf, endp);

     /* Adjust endp back to the original buffer. */
     size_t consumed = (size_t) (*endp - retry_buf);

Also consider computing dot_pos once before the retry for clarity:

+    const char *dot = strchr(buf, '.');   /* guaranteed non-NULL by validation */
+    size_t dot_pos = (size_t)(dot - buf);
+
+    errno = 0;
     val = strtod(retry_buf, endp);

     /* Adjust endp back to the original buffer. */
     size_t consumed = (size_t) (*endp - retry_buf);
-    /* Account for the separator length difference. */
-    if (consumed > (size_t) (strchr(buf, '.') - buf)) {
+    if (consumed > dot_pos) {
         consumed -= (seplen - 1);
     }
     *endp = (char *) (buf + consumed);

4. `retry_buf` sizing uses magic number

File: src/libAtomVM/unlocalized.c:309-310

The comment mentions MB_LEN_MAX but the code uses 16. Consider using the actual constant:

-    char retry_buf[PARSE_FLOAT_BUF_SIZE + 16];
+    char retry_buf[PARSE_FLOAT_BUF_SIZE + MB_LEN_MAX];

5. `float_cmp/2` in tests doesn't catch sign bugs

Files: tests/erlang_tests/bin2float.erl:96, tests/erlang_tests/list2float.erl:96

The test for "-1.0" compares against 1.0 using abs(), so a parser returning +1.0 instead of -1.0 would still pass:

F4 = bin2float(<<"-1.0">>),
...
float_cmp(F4, 1.0) * 8    %% abs(abs(-1.0) - 1.0) < eps → true ✓
                           %% abs(abs(+1.0) - 1.0) < eps → true ✓ (BUG NOT CAUGHT)

Suggested fix — compare against -1.0 directly and use a sign-aware comparison:

-        float_cmp(F4, ?MODULE:id(1.0)) * 8 +
+        float_cmp(F4, ?MODULE:id(-1.0)) * 8 +

And update float_cmp to be sign-aware:

float_cmp(F1, F2) ->
    case abs(F1 - F2) < 0.00000000001 of
        true -> 1;
        false -> 0
    end.

6. Tests don't validate locale-independence

The main purpose of this commit is locale-independent parsing, but no test runs under a non-C locale. Consider adding a C-level unit test that forces LC_NUMERIC to a comma-decimal locale (e.g., de_DE.UTF-8) and verifies correct parsing, or a conditional Erlang test.

7. Silent fallback when `newlocale()` fails

File: src/libAtomVM/unlocalized.c (strategies 1 & 2)

If newlocale() returns 0, the code silently falls back to locale-dependent strtod()/vsnprintf(), defeating the purpose. Consider at minimum logging a warning, or falling through to strategy 3 logic.

✅ Positive Observations

Strategy 3 endp pointer arithmetic is correct under the stated preconditions (validate_bare_float_format guarantees exactly one .).
validate_bare_float_format() correctly matches current Erlang/OTP binary_to_float/list_to_float semantics (decimal only, mandatory dot, etc.).
Thread safety is well-handled: pthread_once() for SMP, simple flag for non-SMP builds.
Test coverage is comprehensive — 7 valid cases and 23 badarg cases covering signs, exponents, whitespace, hex floats, and various malformed inputs.
Expected value arithmetic (15871 = 511 + 30 × 512) is correct.
Clean separation of validation from parsing is a good design.

petermm · 2026-03-29T16:16:23Z

Root cause: Test B10 expect_bin_badarg(<<"1,0">>) — OTP currently accepts "," as a decimal separator (binary_to_float(<<"1,0">>) → 1.0). The test expects badarg, losing exactly 512. The commit intentionally rejects commas (per erlang/otp#9061 future direction), but the CI runs on BEAM where commas still work.

`binary_to_float(<<"1.0e999">>)` returned inf instead of raising `badarg`. Add `isfinite()` check after `sscanf()` to reject non-finite results, matching Erlang/OTP behavior. Signed-off-by: Davide Bettio <davide@uninstall.it>

Replace sscanf-based float parsing in `binary_to_float`/`list_to_float` with a proper implementation that validates strict bare float format and uses locale-independent `strtod`/`strtof`. Rename `unlocalized_snprintf.{c,h}` to `unlocalized.{c,h}` and add `unlocalized_strtod()`, `unlocalized_strtof()`, and `unlocalized_validate_bare_float_format()`. An inline dispatcher `unlocalized_strto_avm_float()` selects the right variant based on `AVM_USE_SINGLE_PRECISION`. The three locale strategies (`strtod_l` / `uselocale` / fallback with separator retry) are shared via a `prepare_locale_retry_buf()` helper. Refactor `float_utils` to follow the same pattern: guard `double_write_to_ascii_buf` and Grisu3 internals behind `FLOAT_UTILS_ENABLE_DOUBLE_API`, and 32-bit support behind `FLOAT_UTILS_ENABLE_FLOAT_API`, so no `double` code leaks into a float-only build and vice versa. Rename `double_format_t` to `float_format_t` since the enum belongs to the `float_utils` module, not the `double` type. The bare float format `[+|-]DIGITS.DIGITS[(e|E)[+|-]DIGITS]` matches Erlang/OTP's `binary_to_float`/`list_to_float`, except commas are not accepted as decimal separators (aligning with OTP's future direction per erlang/otp#9061). Signed-off-by: Davide Bettio <davide@uninstall.it>

binary_to_float and list_to_float share the same parsing code, so there is no need for two near-identical test modules. Merge them into a single string2float test that covers binary_to_float exhaustively and adds a few basic list_to_float smoke tests. Signed-off-by: Davide Bettio <davide@uninstall.it>

petermm

Amp is happy!

Merge fixes, features, and optimizations from release-0.7, including: - Add signature-driven code loader (#2229) - Add UART support on generic_unix via POSIX termios NIFs (#2243) - Add erlang node/1 BIF (#2225) - Add erts_internal:cmp_term/2 NIF and fix map type ordering (#2226) - Add short option to float_to_binary/list (#2240) - Add locale-independent float parsing and fix overflow (#2246) - JIT: add RISC-V 64-bit backend (#2231) - JIT: add DWARF debug information support (#1910) - JIT x86: instruction encoding optimizations (#2234) - JIT: remove redundant AND when untagging integers (#2235) - Fix bug in bs_match get_tail handling (#2242) - Fix binary encoding of int:24 and others (#2230) - Fix cancel_timer/1 spec and documentation (#2244) - Resurrect opcodes emitted with no_bs_create_bin (#2245) - Fix flaky tests related to GitHub DNS Resolver (#2232) - Add git guide (#2106)

bettio force-pushed the fix-to-float-functions branch 2 times, most recently from 375f10e to 36b9348 Compare March 29, 2026 15:59

bettio force-pushed the fix-to-float-functions branch 2 times, most recently from 21ca1c2 to eb4000b Compare March 30, 2026 12:06

bettio added 3 commits March 30, 2026 14:12

Fix binary_to_float returning inf for overflow

bbf9544

`binary_to_float(<<"1.0e999">>)` returned inf instead of raising `badarg`. Add `isfinite()` check after `sscanf()` to reject non-finite results, matching Erlang/OTP behavior. Signed-off-by: Davide Bettio <davide@uninstall.it>

bettio force-pushed the fix-to-float-functions branch from eb4000b to c1d2e6c Compare March 30, 2026 12:13

petermm approved these changes Mar 30, 2026

View reviewed changes

bettio merged commit 72d635d into atomvm:release-0.7 Mar 30, 2026
222 of 227 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix and rework binary_to_float/list_to_float#2246

Fix and rework binary_to_float/list_to_float#2246
bettio merged 3 commits intoatomvm:release-0.7from
bettio:fix-to-float-functions

bettio commented Mar 28, 2026 •

edited

Loading

Uh oh!

petermm commented Mar 29, 2026 •

edited

Loading

Uh oh!

petermm commented Mar 29, 2026

Uh oh!

petermm left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bettio commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petermm commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Add locale-independent float parsing

Summary

🔴 Must-Fix Issues

1. unlocalized_strtod() can reject valid finite subnormal numbers

2. Off-by-one in max length check rejects valid 255-byte inputs

🟡 Recommended Improvements

3. Strategy 3 strtod_c(): reset errno before retry

4. retry_buf sizing uses magic number

5. float_cmp/2 in tests doesn't catch sign bugs

6. Tests don't validate locale-independence

7. Silent fallback when newlocale() fails

✅ Positive Observations

Uh oh!

petermm commented Mar 29, 2026

Uh oh!

petermm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bettio commented Mar 28, 2026 •

edited

Loading

petermm commented Mar 29, 2026 •

edited

Loading

1. `unlocalized_strtod()` can reject valid finite subnormal numbers

3. Strategy 3 `strtod_c()`: reset `errno` before retry

4. `retry_buf` sizing uses magic number

5. `float_cmp/2` in tests doesn't catch sign bugs

7. Silent fallback when `newlocale()` fails