Releases: CSHVienna/LLMScholarBench
v2.0.0-rc.1
Submission artifact for KDD 2026 (Datasets and Benchmarks Track). This release corresponds to the code and plotting scripts used in the submitted manuscript and may change after rebuttal and acceptance.
v1.0.0
This release provides the full codebase for systematically querying large language models and evaluating their outputs in the context of scholar recommendation. It includes the LLMCaller module for structured and reproducible LLM querying, and the Auditor module for systematic evaluation of model responses. The evaluation includes measuring consistency and actuality, and a descriptive analysis of biases wrt. gender, ethnicity and popularity.
This release corresponds to the paper "Whose Name Comes Up? Auditing LLM-Based Scholar Recommendations" arXiv:2506.00074 and reflects the methods and experiments reported.