evals metadata in builtin JUnit reporter by giovanni-guidini · Pull Request #22 · getsentry/vitest-evals

giovanni-guidini · 2025-07-18T15:11:34Z

These changes annotate the testTask in a way that the built-in JUnit reporter can expose the evals metadata - similar to what is done for the JSON reporter.

The formatting is discussed at length here JUnit is a very pervasive format, well established in the industry.

As an example, generating the JUnit report for the tests in the repo we get for one of the tests:

<testcase classname="src/ai-sdk-integration.test.ts" name="Tool Argument Validation &gt; What&apos;s the weather in Seattle in Celsius?" time="0.004360291">
  <properties>
      <property name="evals.scores.score_0.value" value="1">
      </property>
      <property name="evals.scores.score_0.type" value="float">
      </property>
      <property name="evals.scores.score_0.metadata.rationale" value="All expected tools were called">
      </property>
      <property name="evals.toolCalls.0.id" value="call_9999">
      </property>
      <property name="evals.toolCalls.0.name" value="getWeather">
      </property>
      <property name="evals.toolCalls.0.arguments.location" value="Seattle">
      </property>
      <property name="evals.toolCalls.0.arguments.units" value="celsius">
      </property>
      <property name="evals.toolCalls.0.result.temperature" value="18">
      </property>
      <property name="evals.toolCalls.0.result.condition" value="partly cloudy">
      </property>
      <property name="evals.toolCalls.0.status" value="completed">
      </property>
      <property name="evals.toolCalls.0.type" value="function">
      </property>
  </properties>
</testcase>

These changes annotate the `testTask` in a way that the built-in JUnit reporter can expose the evals metadata - similar to what is done for the JSON reporter. The formatting is discussed at length [here](https://www.notion.so/sentry/Evals-Schema-for-JUnit-XML-2098b10e4b5d80609d62f5beffc3de26?source=copy_link) JUnit is a very pervasive format, well established in the industry.

codecov · 2025-07-18T15:15:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.43%. Comparing base (c0945ef) to head (22b06c7).
⚠️ Report is 19 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #22      +/-   ##
==========================================
+ Coverage   84.50%   86.43%   +1.93%     
==========================================
  Files           4        4              
  Lines         400      457      +57     
  Branches      115      134      +19     
==========================================
+ Hits          338      395      +57     
  Misses         62       62

Flag	Coverage Δ
unittests	`86.43% <100.00%> (+1.93%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

dcramer · 2025-07-29T16:42:10Z

+          name: "Factuality",
+          score: 0.9,
+          metadata: {
+            llm_judge: "gemini_2.5pro",


should we consider just using the term model here?

aside I think metadata might be entirely arbitrary in scorers

giovanni-guidini requested a review from dcramer July 18, 2025 15:43

dcramer reviewed Jul 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

evals metadata in builtin JUnit reporter#22

evals metadata in builtin JUnit reporter#22
giovanni-guidini wants to merge 1 commit intomainfrom
gio/explore-junit-report-annotations

giovanni-guidini commented Jul 18, 2025

Uh oh!

codecov Bot commented Jul 18, 2025 •

edited

Loading

Uh oh!

dcramer Jul 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

giovanni-guidini commented Jul 18, 2025

Uh oh!

codecov Bot commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dcramer Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jul 18, 2025 •

edited

Loading