Skip to content

Document NaN Comparison #1233

@ntjohnson1

Description

@ntjohnson1

Describe the bug
NaN < <float> and NaN > <float> should both yield false.
However I don't get that behavior in datafusion.

To Reproduce
I reproduced this on 48 and 49

import numpy as np
import pyarrow as pa
import pyarrow.compute as pc
import datafusion as dfn
from datafusion import col


def py_arrow_example() -> None:
    nan = pa.array([np.nan], type=pa.float64())
    not_nan = pa.array([1.0], type=pa.float64())
    print(f"Less: {pc.less(nan, not_nan)}\nGreater: {pc.greater(nan, not_nan)}")


def datafusion_example() -> None:
    table = pa.table({"a": [np.nan], "b": [1.0]})
    ctx = dfn.SessionContext()
    df = ctx.from_arrow(table)
    result = df.select(
        (col("a") < col("b")).alias("less"),
        (col("a") > col("b")).alias("greater"),
    )
    print(result)


if __name__ == "__main__":
    py_arrow_example()
    datafusion_example()

Output

Less: [
  false
]
Greater: [
  false
]
DataFrame()
+-------+---------+
| less  | greater |
+-------+---------+
| false | true    |
+-------+---------+

Expected behavior
False for both comparisons or clearer documentation for the handling.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions