We would like to know under which license is released this dataset.
Additionally, we would like to know if we could host the data files at the Hugging Face Hub:
- currently the data files are only hosted at archive.org
- we would like to change their format from JSON to JSON-lines, so that the files can be loaded incrementally (some of them are large and need a lot of RAM)
We would like to know under which license is released this dataset.
Additionally, we would like to know if we could host the data files at the Hugging Face Hub: