Skip to content

Efficient uniform int-to-float conversion#93

Merged
joshbainbridge merged 1 commit intoAcademySoftwareFoundation:mainfrom
mr-matthew-jones:integer-float-conversion
Apr 20, 2026
Merged

Efficient uniform int-to-float conversion#93
joshbainbridge merged 1 commit intoAcademySoftwareFoundation:mainfrom
mr-matthew-jones:integer-float-conversion

Conversation

@mr-matthew-jones
Copy link
Copy Markdown
Contributor

Conversion from an integer value across the full range of representable values to floating point values within the range of [0, 1) is a key part to QMC algorithms, as most calculations are done using integer arithmetic, but the resulting output often needs to be floating point.

The current implementation uses a standard conversion to float followed by a division by 2^32. However, the operation uses default rounding mode (round nearest) and therefore may round either up or down. This produces uneven probabilities across the final distribution of values within the representable range. Rounding up also means a value of exactly 1 may generated. Due to this, a min operation is used to clamp all values to the last representable number before 1, which also adds bias.

The high-quality mapping presented in 'Quasi-Monte Carlo Algorithms (not only) for Graphics Software' by Keller Wächter and Binder provides an optimal distribution, but this is computationally more expensive and often costs more than the creation of the QMC point.

This patch implements a new method that is simpler and more efficient, while providing identical results to the Keller et al method. A simple bitwise shift and mask operation is applied to the input integer to ensure that the value is rounded down. This guarantees that the probability of each output is equal to the density of float representations, and constant in each power of two. For example, every float in [0.5, 1.0) has a 2^-24 probability and is produced by exactly 256 input values.

(Issue 84)

@joshbainbridge
Copy link
Copy Markdown
Collaborator

Thank you @mr-matthew-jones. This is great work. Just tagging #84 here so that GH knows to link that ticket.

Copy link
Copy Markdown
Collaborator

@joshbainbridge joshbainbridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such a great write up of the method. Thank you for both the work to find it, and then explain the reasoning so clearly.

If we get consensus that this is the right method, I'd be happy for this to be merged. Only comment was on the unit test cost.

Comment thread src/tests/float.cpp
@joshbainbridge
Copy link
Copy Markdown
Collaborator

joshbainbridge commented Apr 13, 2026

One other comment, it looks like the commit is currently DCO signed-off, thank you for that, but not cryptographically signed with a GPG key. We technically need both.

Details on signing are here and here. Let me know if you need a hand setting that up, happy to help in any way possible!

Comment thread src/tests/float.cpp Outdated
@toxieainc
Copy link
Copy Markdown

@mr-matthew-jones Awesome observation/investigation!!

Tiny sidenote that on some platforms (e.g. CUDA), rounding is encoded into (most of) the math opcodes themselves, so there the round down version could be even more efficient (most likely will not matter in practice though).

@mr-matthew-jones mr-matthew-jones force-pushed the integer-float-conversion branch 2 times, most recently from 0e2b9c9 to 7b0922a Compare April 14, 2026 10:28
@mr-matthew-jones mr-matthew-jones marked this pull request as ready for review April 14, 2026 10:31
@mr-matthew-jones mr-matthew-jones force-pushed the integer-float-conversion branch 3 times, most recently from 77aa9cd to bf7a9e6 Compare April 14, 2026 14:21
@toxieainc
Copy link
Copy Markdown

toxieainc commented Apr 15, 2026

Tried out the original idea of rounding down in CUDA, and yes,
__uint2float_rz/rd(value) * floatOneOverTwoPow32 does do the same (as expected). And saves some opcodes. ;)
(btw: Same 2 implementations also work fine when dealing with 64bit in/outputs (also as expected, but just wanted to confirm))

@joshbainbridge
Copy link
Copy Markdown
Collaborator

Tried out the original idea of rounding down in CUDA, and yes, __uint2float_rz/rd(value) * floatOneOverTwoPow32 does do the same (as expected). And saves some opcodes. ;) (btw: Same 2 implementations also work fine when dealing with 64bit in/outputs (also as expected, but just wanted to confirm))

Thanks @toxieainc! Do you think we should have an ifdef here to call __uint2float_rd directly, or is that not worth it / would NVCC be smart enough to do that for us?

@toxieainc
Copy link
Copy Markdown

I don't think the compiler will do a transform from some integer bit fiddling operations to an intrinsic. That sounds like too much effort to analyze.

@joshbainbridge
Copy link
Copy Markdown
Collaborator

I don't think the compiler will do a transform from some integer bit fiddling operations to an intrinsic. That sounds like too much effort to analyze.

Okay cool. Thinking about this more, there's probably a few places we could add some CUDA intrinsics. I think it would be best to get this merged as is. I can then look at doing another pass and possibly push up a separate PR with CUDA specifics.

@joshbainbridge
Copy link
Copy Markdown
Collaborator

@mr-matthew-jones I think we are good to go with this! Thank you for addressing the feedback. Can I just ask that you add a note the CHANGELOG.md file, and then we can get this into main.

@mr-matthew-jones mr-matthew-jones force-pushed the integer-float-conversion branch from bf7a9e6 to 1500a33 Compare April 20, 2026 13:56
Conversion from an integer value across the full range of representable
values to floating point values within the range of [0, 1) is a key
part to QMC algorithms, as most calculations are done using integer
arithmetic, but the resulting output often needs to be floating point.

The current implementation uses a standard conversion to float followed
by a division by 2^32. However, the operation uses default rounding mode
(round nearest) and therefore may round either up or down. This produces
uneven probabilities across the final distribution of values within the
representable range. Rounding up also means a value of exactly 1 may
generated. Due to this, a min operation is used to clamp all values to
the last representable number before 1, which also adds bias.

The high-quality mapping presented in 'Quasi-Monte Carlo Algorithms
(not only) for Graphics Software' by Keller Wächter and Binder
provides an optimal distribution, but this is computationally
more expensive and often costs more than the creation of the
QMC point.

This patch implements a new method that is simpler and more efficient,
while providing identical results to the Keller et al method. A simple
bitwise shift and mask operation is applied to the input integer to
ensure that the value is rounded down. This guarantees that the
probability of each output is equal to the density of float
representations, and constant in each power of two. For example,
every float in [0.5, 1.0) has a 2^-24 probability and is produced
by exactly 256 input values.

(Issue 84)

Signed-off-by: Matthew Jones <mrmatthewjones@icloud.com>
@mr-matthew-jones mr-matthew-jones force-pushed the integer-float-conversion branch from 1500a33 to 27a61d2 Compare April 20, 2026 14:17
@joshbainbridge joshbainbridge merged commit 21b65e8 into AcademySoftwareFoundation:main Apr 20, 2026
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add efficient method for precise integer to floating point conversion

4 participants