Temporal Generalization: A Reality Check

Authors: Divyam Madaan, Sumit Chopra, Kyunghyun Cho

TL;DR: The key to temporal generalization is not to design new algorithms, but to identify the reasonable assumptions about how the data generating process evolves over time.

Abstract

Machine learning (ML) models often struggle to maintain performance under distribution shifts, leading to inaccurate predictions on unseen future data. In this work, we investigate whether and under what conditions models can achieve such a generalization when relying solely on past data. We explore two primary approaches: convex combinations of past model parameters (parameter interpolation) and explicit extrapolation beyond the convex hull of past parameters (parameter extrapolation). We benchmark several methods within these categories on a diverse set of temporal tasks, including language modeling, news summarization, news tag prediction, academic paper categorization, satellite image-based land use classification over time, and historical yearbook photo gender prediction. Our empirical findings show that none of the evaluated methods consistently outperforms the simple baseline of using the latest available model parameters in all scenarios. In the absence of access to future data or robust assumptions about the underlying data-generating process, these results underscore the inherent difficulties of generalizing and extrapolating to future data and warrant caution when evaluating claims of such generalization.

(Left) Performance degrades over time. The widening performance gap between a stale model trained once in January 2012 (red) and a monthly updated model (green) illustrates the decay in performance over time. The evaluation was conducted using data from March 2012 onward. (Right) Temporal generalization framework. Across sequential learning stages ($S_t$ on the y-axis), a model $f_{\theta_t}$ is trained on data $\mathcal{D}_t$ (orange) initialized with $\theta_{t-1}$ from the previous stage. This generates a sequence of historical parameters. This sequence is used to estimate future parameters $\widetilde{\theta}_{t+\delta}$, which are used to evaluate future data (blue) $\mathcal{D}_{t+\delta}$ for $\delta > 0$ (x-axis).

🚀 Getting Started

Prerequisites

To set up the environment and install the necessary dependencies, run the following command:

uv venv --python 3.11 && uv pip install -r requirements.txt

📚 Datasets Overview and Instructions

1. NewsRoom

The NewsRoom dataset is used for news summarization and language modeling.

Download: Obtain the dataset from the official Newsroom website.
Processing: After downloading, process the dataset using the create_monthly_dataset.py.

2. Wilds-Time

Installation: Install the wildtime package using pip:
```
pip install wildtime==1.1.3
```
Dataset Access: The datasets within Wilds-Time can be downloaded and prepared using scripts provided within the wildtime package.

Model Training and Evaluation

1. NewsRoom

Training: From the project root, run time_vectors/experiment_scripts/finetune_month_models.sh (monthly fine-tuning on NewsRoom splits):
```
cd time_vectors/experiment_scripts
bash finetune_month_models.sh
```
Evaluation: After training, run time_vectors/experiment_scripts/run_eval_month_summ.sh:
```
cd time_vectors/experiment_scripts
bash run_eval_month_summ.sh
```

2. Wilds-Time Datasets

Each dataset has a script in wilds_time/eval-stream/ that trains every method with multiple seeds and evaluates parameter aggregation strategies:

cd wilds_time

bash eval-stream/yearbook.sh
bash eval-stream/huffpost.sh
bash eval-stream/fmow.sh
bash eval-stream/arxiv.sh

Contributing

We'd love to accept your contributions to this project. Please feel free to open an issue, or submit a pull request as necessary. If you have implementations of this repository in other ML frameworks, please reach out so we may highlight them here.

Acknowledgements

The code is based on Time vectors and Wild-Time. We thank the authors for their amazing work and releasing the code base.

License

This codebase is released under MIT License.

📌 Citation

If you find this paper useful, please consider staring 🌟 this repo and citing 📑 our paper:

@inproceedings{
madaan2026temporal,
title={Temporal Generalization: A Reality Check},
author={Divyam Madaan and Sumit Chopra and Kyunghyun Cho},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=Wz0ILlbh9U}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
images		images
time_vectors		time_vectors
wilds_time		wilds_time
.gitignore		.gitignore
README.md		README.md
environment.yaml		environment.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Temporal Generalization: A Reality Check

Abstract

🚀 Getting Started

Prerequisites

📚 Datasets Overview and Instructions

1. NewsRoom

2. Wilds-Time

Model Training and Evaluation

1. NewsRoom

2. Wilds-Time Datasets

Contributing

Acknowledgements

License

📌 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Temporal Generalization: A Reality Check

Abstract

🚀 Getting Started

Prerequisites

📚 Datasets Overview and Instructions

1. NewsRoom

2. Wilds-Time

Model Training and Evaluation

1. NewsRoom

2. Wilds-Time Datasets

Contributing

Acknowledgements

License

📌 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages