You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This fall, the KernelBench team will continue to maintain and improve the repo. This issue serves as a roadmap and a document that we might continue to update. If you have concrete feature requests, please post them below or ideally open an issue on the repo.
KernelBench has quickly become the standard for evaluating LLM Kernel Generation capabilities. As pointed out by many others in the community and we found in our follow-up work, there are aspects of the benchmark that could be improved to make it a more valuable tool for the community. We already started with this over the summer with KernelBench v0.1 by @AffectionateCurry@nataliakokoromyti@anneouyang.
Ultimately, We want to make KernelBench easy (push-button eval), usable (easy to integrate), and referenceable (compare across various approaches)
Overall Milestone
Milestone 1: By October (SF GPU mode hackathon), resolve all previous PRs and Issues (at least have an answer regarding it)
Milestone 2: Various integrations with community project for future research directions (RL, evolutionary search, more languages) and for people to experiment with various approaches
Milestone 3: Create a Referenceable, Reproducible Pipeline
We hope we could have an update/announcement by early Dec / NeurIPS.
Below are the concrete goals and (tempororay) assignments. We will try our best to realize all of these features, but we make no guarantees. We would love to welcome community contributions!
Milestone 1: Improve KernelBench itself
Collect community feedback on how we can improve KernelBench
Go through all the current issues & PRs on KernelBench repo. For Issues, reply them. For PR, close them, merge them, or abandon them, make a decision
Writeup / Blog to understand and showcase the difference between do_bench, ncu profiling, cuda event through benchmarking guide + blog Benchmarking guide #106
This fall, the KernelBench team will continue to maintain and improve the repo. This issue serves as a roadmap and a document that we might continue to update. If you have concrete feature requests, please post them below or ideally open an issue on the repo.
We have a fantastic group of Stanford undergrads: @AffectionateCurry @nathanjpaek @pythonomar22 @Marsella8 as core maintainers, with @ethanboneh on RL framework integration. We very much welcome community contributions in these directions (we try our best to review the PRs). Thank you to @alexzhang13 @hqjenny for the feedback.
Goal & Motivation
KernelBench has quickly become the standard for evaluating LLM Kernel Generation capabilities. As pointed out by many others in the community and we found in our follow-up work, there are aspects of the benchmark that could be improved to make it a more valuable tool for the community. We already started with this over the summer with KernelBench v0.1 by @AffectionateCurry @nataliakokoromyti @anneouyang.
Ultimately, We want to make KernelBench easy (push-button eval), usable (easy to integrate), and referenceable (compare across various approaches)
Overall Milestone
We hope we could have an update/announcement by early Dec / NeurIPS.
Below are the concrete goals and (tempororay) assignments. We will try our best to realize all of these features, but we make no guarantees. We would love to welcome community contributions!
Milestone 1: Improve KernelBench itself
Milestone 2: Framework Integration
DSL (NVIDIA hardware) support.
Alternative Hardware platform support.
RL and Search Framework Integration. See #73 for detail.
Milestone 3: Referenceable, Reproducible Pipeline
To make KernelBench an actual standard, led by @pythonomar22 @AffectionateCurry