Different behavior in datafusion 35.0.0 in reading hive-partitioned parquet data

**Describe the bug**
`pip` recently switched to installing datafusion with version string `'35.0.0'`. Compared to a previous installation of version `'34.0.0'`, creating an external table from hive-partitioned parquet data following the [https://arrow.apache.org/datafusion/user-guide/sql/ddl.html](documented instructions) does not work. While all the partition columns show up as columns of the table, the columns from the parquet data  themselves do not appear.

**To Reproduce**
```
# prepare fake data
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
data = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
table = pa.Table.from_pandas(data)
import os
os.mkdir("fake=0")
pq.write_table(table,"./fake=0/data.parquet")

# load into datafusion
import datafusion as df
ctx = df.SessionContext()
ctx.sql("""
CREATE EXTERNAL TABLE data
STORED AS PARQUET
PARTITIONED BY (fake)
LOCATION './*/data.parquet'
""")
```

The loaded data is missing `col1` and `col2`:
```
>>> ctx.sql("SELECT * FROM data")
DataFrame()
+------+
| fake |
+------+
| 0    |
| 0    |
+------+
>>> ctx.sql("SELECT table_name, column_name FROM information_schema.columns")
DataFrame()
+------------+-------------+
| table_name | column_name |
+------------+-------------+
| data       | fake        |
+------------+-------------+
```

**Expected behavior**
The same steps with DataFusion `34.0.0` produce the following output:
```
>>> ctx.sql("SELECT * FROM data");
DataFrame()
+------+------+------+
| col1 | col2 | fake |
+------+------+------+
| 1    | 3    | 0    |
| 2    | 4    | 0    |
+------+------+------+
>>> ctx.sql("SELECT table_name, column_name FROM information_schema.columns")
DataFrame()
+------------+-------------+
| table_name | column_name |
+------------+-------------+
| data       | col1        |
| data       | col2        |
| data       | fake        |
+------------+-------------+
```

**Additional context**
Operating system: Rocky 8
Python version: `3.10.11`
DataFusion version: `35.0.0`, recently installed via pip
pyarrow version: `15.0.0`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different behavior in datafusion 35.0.0 in reading hive-partitioned parquet data #579

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Different behavior in datafusion 35.0.0 in reading hive-partitioned parquet data #579

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions