Describe the bug
pip recently switched to installing datafusion with version string '35.0.0'. Compared to a previous installation of version '34.0.0', creating an external table from hive-partitioned parquet data following the [https://arrow.apache.org/datafusion/user-guide/sql/ddl.html](documented instructions) does not work. While all the partition columns show up as columns of the table, the columns from the parquet data themselves do not appear.
To Reproduce
# prepare fake data
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
data = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
table = pa.Table.from_pandas(data)
import os
os.mkdir("fake=0")
pq.write_table(table,"./fake=0/data.parquet")
# load into datafusion
import datafusion as df
ctx = df.SessionContext()
ctx.sql("""
CREATE EXTERNAL TABLE data
STORED AS PARQUET
PARTITIONED BY (fake)
LOCATION './*/data.parquet'
""")
The loaded data is missing col1 and col2:
>>> ctx.sql("SELECT * FROM data")
DataFrame()
+------+
| fake |
+------+
| 0 |
| 0 |
+------+
>>> ctx.sql("SELECT table_name, column_name FROM information_schema.columns")
DataFrame()
+------------+-------------+
| table_name | column_name |
+------------+-------------+
| data | fake |
+------------+-------------+
Expected behavior
The same steps with DataFusion 34.0.0 produce the following output:
>>> ctx.sql("SELECT * FROM data");
DataFrame()
+------+------+------+
| col1 | col2 | fake |
+------+------+------+
| 1 | 3 | 0 |
| 2 | 4 | 0 |
+------+------+------+
>>> ctx.sql("SELECT table_name, column_name FROM information_schema.columns")
DataFrame()
+------------+-------------+
| table_name | column_name |
+------------+-------------+
| data | col1 |
| data | col2 |
| data | fake |
+------------+-------------+
Additional context
Operating system: Rocky 8
Python version: 3.10.11
DataFusion version: 35.0.0, recently installed via pip
pyarrow version: 15.0.0
Describe the bug
piprecently switched to installing datafusion with version string'35.0.0'. Compared to a previous installation of version'34.0.0', creating an external table from hive-partitioned parquet data following the [https://arrow.apache.org/datafusion/user-guide/sql/ddl.html](documented instructions) does not work. While all the partition columns show up as columns of the table, the columns from the parquet data themselves do not appear.To Reproduce
The loaded data is missing
col1andcol2:Expected behavior
The same steps with DataFusion
34.0.0produce the following output:Additional context
Operating system: Rocky 8
Python version:
3.10.11DataFusion version:
35.0.0, recently installed via pippyarrow version:
15.0.0