vLLM/SGLang: comparison report by podkidyshev · Pull Request #904 · NVIDIA/cloudai

podkidyshev · 2026-05-27T21:49:13Z

Summary

Enable comparison reporting for vLLM and SGLang

Test Plan

Automated CI
Manual runs

Additional Notes

coderabbitai · 2026-05-27T21:49:20Z

📝 Walkthrough

Walkthrough

This PR introduces a shared LLM serving benchmark comparison report framework. It refactors ComparisonReport to handle internal TestRun grouping, extracts diff utilities, creates an abstract LLMServingComparisonReport base class for standardized metric extraction and visualization, implements vLLM and SGLang variants, registers them, and validates via integration tests.

Changes

LLM Serving Comparison Reports

Layer / File(s)	Summary
Diff utility refactoring `src/cloudai/report_generator/util.py`	New helpers `default_test_run_comparison_values` and `diff_comparison_values` extract and compare per-run values, replacing raw set-based diffing with repr-based comparison; `diff_test_runs` return type widened to `dict[str, list[object]]`.
ComparisonReport grouping refactoring `src/cloudai/report_generator/comparison_report.py`, `src/cloudai/report_generator/groups.py`	`ComparisonReport` now groups TestRuns internally via `group_test_runs()`, `get_group_value()`, `group_name()`, `create_group()` using the new diff utilities instead of `TestRunsGrouper`; `generate()` and `get_bokeh_html()` call the internal method to produce `cmp_groups`.
LLMServingComparisonReport abstract framework `src/cloudai/workloads/common/llm_serving_report.py`	Abstract base class defines workload-specific contract (`can_handle`, `parse_results`, `benchmark_cmd_args`, `parse_accuracy`), implements metric extraction (`extract_data_as_df` for latency/success/throughput/quality), and provides Rich table (`create_tables`) and Bokeh chart (`_create_metric_bar_chart`) rendering.
vLLM comparison report implementation `src/cloudai/workloads/vllm/vllm_comparison_report.py`, `src/cloudai/workloads/vllm/__init__.py`	`VLLMComparisonReport` concrete class routes vLLM test detection, accesses benchmark command args, parses results via `VLLMBenchReportGenerationStrategy`, computes semantic accuracy, outputs `vllm_comparison.html`; exported via package `__all__`.
SGLang comparison report implementation `src/cloudai/workloads/sglang/sglang_comparison_report.py`, `src/cloudai/workloads/sglang/__init__.py`	`SGLangComparisonReport` concrete class routes SGLang test detection, accesses benchmark command args, parses results via `SGLangBenchReportGenerationStrategy`, computes semantic accuracy, outputs `sglang_comparison.html`; exported via package `__all__`.
Registry integration `src/cloudai/registration.py`	Register `"vllm_comparison"` and `"sglang_comparison"` scenario reports with their corresponding classes and enabled `ComparisonReportConfig`.
Test updates and integration test `tests/test_init.py`, `tests/workloads/common/test_llm_serving_report.py`	Registry tests verify the new comparison reports are registered; integration test synthesizes benchmark/accuracy JSON, loads TestRuns into `VLLMComparisonReport`, generates diff-formatted comparison tables and Bokeh charts, and asserts `vllm_comparison.html` output.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

NVIDIA/cloudai#890: The new vLLM and SGLang comparison reports consume semantic-eval accuracy metrics and parsing hooks (parse_semantic_accuracy) that were introduced in that PR via LLMServingReportGenerationStrategy and workload-specific parsing/command-generation changes.

Suggested labels

feature

Suggested reviewers

srivatsankrishnan
jeffnvidia

🐰 A Framework Unfolds!
TestRuns now gather in groups with grace,
LLM benchmarks side by side show pace,
vLLM, SGLang—each their own domain,
Rich tables, Bokeh charts explain the gain! ✨

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding comparison report functionality for vLLM and SGLang workloads.
Description check	✅ Passed	The description is directly related to the changeset, explaining that it enables comparison reporting for vLLM and SGLang with a clear test plan.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ipod/llm-comparison

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/cloudai/report_generator/util.py`:
- Around line 175-183: diff_comparison_values currently uses repr(v) to detect
differences which produces false positives (e.g., 1 vs 1.0 or dicts with
different insertion order); replace that with a canonicalization + equality
check: add a small helper (inside diff_comparison_values) named
canonicalize(value) that (a) for Mapping or list/tuple returns json.dumps(value,
sort_keys=True, default=str) to normalize structural content/order, (b) for
numeric types normalizes to a single numeric form (e.g., float(value) or
Decimal) so 1 and 1.0 compare equal, and (c) otherwise returns the value itself
(or str(value) as a last resort); then compute canonical_forms =
[canonicalize(v) for v in all_values] and use len(set(canonical_forms)) > 1 to
decide diffs and assign diff[key] = all_values as before (refer to
diff_comparison_values, values_by_run, all_values).

In `@src/cloudai/workloads/common/llm_serving_report.py`:
- Line 176: The calls to extract_data_as_df are duplicated in create_tables and
create_charts causing repeated parsing; modify the class to compute and cache
extracted DataFrames once and reuse them: e.g., add a lightweight cache
attribute (like self._df_cache) keyed by the test run object or an immutable id
(referencing extract_data_as_df, create_tables, create_charts, and
group.items/item.tr), populate it the first time you call extract_data_as_df for
a given tr, and have both create_tables and create_charts read from that cache
(or accept an injected precomputed list of DataFrames) instead of calling
extract_data_as_df again.

In `@src/cloudai/workloads/vllm/vllm_comparison_report.py`:
- Around line 24-25: The current import in vllm_comparison_report.py pulls
VllmBenchCmdArgs, VLLMBenchReportGenerationStrategy, and VllmTestDefinition from
the package root (cloudai.workloads.vllm) which creates a circular init
dependency; change those to direct imports from the submodules that actually
define them (use relative/submodule imports) so VllmBenchCmdArgs,
VLLMBenchReportGenerationStrategy, and VllmTestDefinition are imported from
their defining modules instead of from cloudai.workloads.vllm.

In `@tests/workloads/common/test_llm_serving_report.py`:
- Around line 40-137: Add a second test mirroring
test_llm_comparison_report_generates_html that exercises the SGLang path
end-to-end: create two cloudai.core.TestRun instances for SGLang (use
SGLangTestDefinition, SGLangCmdArgs, SGLangBenchCmdArgs,
SGLangSemanticEvalCmdArgs), write corresponding bench and quality JSON results
to slurm_system.output_path under each test run (use the SGLANG_BENCH_JSON_FILE
and SGLANG_GSM8K_JSON_FILE constants or their equivalents), instantiate
SGLangComparisonReport with a TestScenario containing those runs, call
load_test_runs(), create_tables(group_test_runs()) and assert key table cells
(latency/throughput/quality/success) similar to the vLLM test, call
report.generate(), and finally assert that (slurm_system.output_path /
"sglang_comparison.html").exists() to ensure HTML was produced.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 6a508bb1-229e-4702-b1ad-bdb49888fef9

📥 Commits

Reviewing files that changed from the base of the PR and between 0738c4b and 727552b.

📒 Files selected for processing (12)

src/cloudai/registration.py
src/cloudai/report_generator/comparison_report.py
src/cloudai/report_generator/groups.py
src/cloudai/report_generator/util.py
src/cloudai/workloads/common/llm_serving_report.py
src/cloudai/workloads/sglang/__init__.py
src/cloudai/workloads/sglang/sglang_comparison_report.py
src/cloudai/workloads/vllm/__init__.py
src/cloudai/workloads/vllm/vllm_comparison_report.py
tests/test_init.py
tests/workloads/common/test_llm_serving.py
tests/workloads/common/test_llm_serving_report.py

coderabbitai · 2026-05-28T16:24:43Z

+def diff_comparison_values(values_by_run: list[Mapping[str, object]]) -> dict[str, list[object]]:
+    """Return value differences across comparable TestRun value dictionaries."""
+    all_keys = set().union(*[values.keys() for values in values_by_run])

    diff = {}
    for key in all_keys:
-        all_values = [d[key] for d in dicts]
-        if len(set(all_values)) > 1:
+        all_values = [values.get(key) for values in values_by_run]
+        if len({repr(v) for v in all_values}) > 1:
            diff[key] = all_values


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash python - <<'PY' samples = [ [1, 1.0], [{"a": 1, "b": 2}, {"b": 2, "a": 1}], ] for vals in samples: repr_diff = len({repr(v) for v in vals}) > 1 eq_diff = any(v != vals[0] for v in vals[1:]) print(f"values={vals} | repr_diff={repr_diff} | eq_diff={eq_diff}") PY

Repository: NVIDIA/cloudai

Length of output: 184

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '150,210p' src/cloudai/report_generator/util.py | nl -ba | sed -n '1,120p'

Repository: NVIDIA/cloudai

Length of output: 101

🏁 Script executed:

#!/bin/bash set -euo pipefail awk 'NR>=150 && NR<=210 {printf "%d:%s\n", NR, $0}' src/cloudai/report_generator/util.py

Repository: NVIDIA/cloudai

Length of output: 2536

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "diff_comparison_values|diff_test_runs\(" -S src/cloudai/report_generator

Repository: NVIDIA/cloudai

Length of output: 729

🏁 Script executed:

#!/bin/bash set -euo pipefail for f in src/cloudai/report_generator/groups.py src/cloudai/report_generator/comparison_report.py; do echo "---- $f ----" awk 'NR>=1 && NR<=220 {print FNR ":" $0}' "$f" done

Repository: NVIDIA/cloudai

Length of output: 12747

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "def comparison_values\(" -S src/cloudai/report_generator

Repository: NVIDIA/cloudai

Length of output: 178

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "class TestRun|def model_dump\\(|cmd_args|extra_env_vars" -S src/cloudai | head -n 200

Repository: NVIDIA/cloudai

Length of output: 25850

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "class TestRun|class TestDefinition|extra_env_vars|bench_cmd_args|model_dump\\(" -S src/cloudai/core src/cloudai | head -n 200

Repository: NVIDIA/cloudai

Length of output: 9527

🏁 Script executed:

#!/bin/bash set -euo pipefail for f in src/cloudai/_core/test_scenario.py src/cloudai/models/workload.py; do echo "---- $f (relevant sections) ----" done

Repository: NVIDIA/cloudai

Length of output: 184

🏁 Script executed:

#!/bin/bash set -euo pipefail wc -l src/cloudai/_core/test_scenario.py src/cloudai/models/workload.py

Repository: NVIDIA/cloudai

Length of output: 148

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '60,220p' src/cloudai/_core/test_scenario.py | nl -ba | head -n 200

Repository: NVIDIA/cloudai

Length of output: 101

🏁 Script executed:

#!/bin/bash set -euo pipefail start=60; end=220 awk -v s=$start -v e=$end 'NR>=s && NR<=e {print NR ":" $0}' src/cloudai/_core/test_scenario.py

Repository: NVIDIA/cloudai

Length of output: 5757

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1,180p' src/cloudai/models/workload.py | nl -ba | head -n 220

Repository: NVIDIA/cloudai

Length of output: 101

🏁 Script executed:

#!/bin/bash set -euo pipefail awk 'NR>=1 && NR<=190 {print NR ":" $0}' src/cloudai/models/workload.py

Repository: NVIDIA/cloudai

Length of output: 6076

Fix false diffs caused by repr-based comparison in diff_comparison_values() (src/cloudai/report_generator/util.py, lines 175-183).

diff_comparison_values() currently decides whether values differ via len({repr(v) for v in all_values}) > 1, which can mark semantically equal values as different (e.g., 1 vs 1.0, or dicts equal by contents but with different insertion order). These diffs then flow into comparison group item names.

💡 Suggested fix

def diff_comparison_values(values_by_run: list[Mapping[str, object]]) -> dict[str, list[object]]: """Return value differences across comparable TestRun value dictionaries.""" - all_keys = set().union(*[values.keys() for values in values_by_run]) + all_keys = set().union(*(values.keys() for values in values_by_run)) diff = {} for key in all_keys: all_values = [values.get(key) for values in values_by_run] - if len({repr(v) for v in all_values}) > 1: + if any(v != all_values[0] for v in all_values[1:]): diff[key] = all_values

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/report_generator/util.py` around lines 175 - 183, diff_comparison_values currently uses repr(v) to detect differences which produces false positives (e.g., 1 vs 1.0 or dicts with different insertion order); replace that with a canonicalization + equality check: add a small helper (inside diff_comparison_values) named canonicalize(value) that (a) for Mapping or list/tuple returns json.dumps(value, sort_keys=True, default=str) to normalize structural content/order, (b) for numeric types normalizes to a single numeric form (e.g., float(value) or Decimal) so 1 and 1.0 compare equal, and (c) otherwise returns the value itself (or str(value) as a last resort); then compute canonical_forms = [canonicalize(v) for v in all_values] and use len(set(canonical_forms)) > 1 to decide diffs and assign diff[key] = all_values as before (refer to diff_comparison_values, values_by_run, all_values).

coderabbitai · 2026-05-28T16:24:43Z

+    ) -> list[rich.table.Table]:
+        tables: list[rich.table.Table] = []
+        for group in cmp_groups:
+            extracted_dfs = [self.extract_data_as_df(item.tr) for item in group.items]


🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider caching or memoizing extract_data_as_df results.

Both create_tables (line 176) and create_charts (line 268) call extract_data_as_df for the same test runs. If the parent class calls both methods during report generation, benchmark results are parsed twice per test run. Since benchmark result files are immutable, consider caching the extraction or computing it once and passing to both methods.

This is a minor optimization and can be deferred if parsing overhead is negligible.

Also applies to: 268-268

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/workloads/common/llm_serving_report.py` at line 176, The calls to extract_data_as_df are duplicated in create_tables and create_charts causing repeated parsing; modify the class to compute and cache extracted DataFrames once and reuse them: e.g., add a lightweight cache attribute (like self._df_cache) keyed by the test run object or an immutable id (referencing extract_data_as_df, create_tables, create_charts, and group.items/item.tr), populate it the first time you call extract_data_as_df for a given tr, and have both create_tables and create_charts read from that cache (or accept an injected precomputed list of DataFrames) instead of calling extract_data_as_df again.

coderabbitai · 2026-05-28T16:24:43Z

+from cloudai.workloads.vllm import VllmBenchCmdArgs, VLLMBenchReportGenerationStrategy, VllmTestDefinition
+


🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify package self-import coupling and potential cycle edges. rg -n -C2 'from cloudai\.workloads\.vllm import|from \.vllm_comparison_report import' src/cloudai/workloads/vllm/__init__.py src/cloudai/workloads/vllm/vllm_comparison_report.py

Repository: NVIDIA/cloudai

Length of output: 889

Avoid package-root self-import in vLLM comparison module.

vllm_comparison_report.py imports VllmBenchCmdArgs, VLLMBenchReportGenerationStrategy, and VllmTestDefinition from cloudai.workloads.vllm, while src/cloudai/workloads/vllm/__init__.py imports VLLMComparisonReport from .vllm_comparison_report, creating fragile circular initialization-order coupling. Import these symbols directly from their defining submodules (use relative/submodule imports) instead of the package root.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cloudai/workloads/vllm/vllm_comparison_report.py` around lines 24 - 25, The current import in vllm_comparison_report.py pulls VllmBenchCmdArgs, VLLMBenchReportGenerationStrategy, and VllmTestDefinition from the package root (cloudai.workloads.vllm) which creates a circular init dependency; change those to direct imports from the submodules that actually define them (use relative/submodule imports) so VllmBenchCmdArgs, VLLMBenchReportGenerationStrategy, and VllmTestDefinition are imported from their defining modules instead of from cloudai.workloads.vllm.

coderabbitai · 2026-05-28T16:24:43Z

+def test_llm_comparison_report_generates_html(slurm_system: cloudai.systems.slurm.SlurmSystem) -> None:
+    tr1 = cloudai.core.TestRun(
+        name="vllm-8",
+        test=VllmTestDefinition(
+            name="vllm",
+            description="vLLM benchmark",
+            test_template_name="vllm",
+            cmd_args=VllmCmdArgs(docker_image_url="nvcr.io/nvidia/vllm:latest", model="Qwen/Qwen3-0.6B"),
+            bench_cmd_args=VllmBenchCmdArgs(max_concurrency=8),
+            semantic_eval_cmd_args=VllmSemanticEvalCmdArgs(),
+        ),
+        num_nodes=1,
+        nodes=[],
+    )
+    tr2 = cloudai.core.TestRun(
+        name="vllm-16",
+        test=VllmTestDefinition(
+            name="vllm",
+            description="vLLM benchmark",
+            test_template_name="vllm",
+            cmd_args=VllmCmdArgs(docker_image_url="nvcr.io/nvidia/vllm:latest", model="Qwen/Qwen3-0.6B"),
+            bench_cmd_args=VllmBenchCmdArgs(max_concurrency=16),
+            semantic_eval_cmd_args=VllmSemanticEvalCmdArgs(),
+        ),
+        num_nodes=1,
+        nodes=[],
+    )
+    _write_result(
+        slurm_system.output_path / tr1.name / "0",
+        VLLM_BENCH_JSON_FILE,
+        json.dumps(
+            {
+                "num_prompts": 30,
+                "completed": 27,
+                "mean_ttft_ms": 100.0,
+                "median_ttft_ms": 90.0,
+                "p99_ttft_ms": 150.0,
+                "mean_tpot_ms": 10.0,
+                "median_tpot_ms": 9.0,
+                "p99_tpot_ms": 15.0,
+                "output_throughput": 1200.0,
+                "max_concurrency": 8,
+            }
+        ),
+    )
+    _write_result(slurm_system.output_path / tr1.name / "0", VLLM_GSM8K_JSON_FILE, '{"accuracy": 0.81}')
+    _write_result(
+        slurm_system.output_path / tr2.name / "0",
+        VLLM_BENCH_JSON_FILE,
+        json.dumps(
+            {
+                "num_prompts": 30,
+                "completed": 30,
+                "mean_ttft_ms": 80.0,
+                "median_ttft_ms": 70.0,
+                "p99_ttft_ms": 130.0,
+                "mean_tpot_ms": 8.0,
+                "median_tpot_ms": 7.0,
+                "p99_tpot_ms": 13.0,
+                "output_throughput": 1800.0,
+                "max_concurrency": 16,
+            }
+        ),
+    )
+    _write_result(slurm_system.output_path / tr2.name / "0", VLLM_GSM8K_JSON_FILE, '{"accuracy": 0.9}')
+
+    report = VLLMComparisonReport(
+        slurm_system,
+        cloudai.core.TestScenario(name="vllm-comparison", test_runs=[tr1, tr2]),
+        slurm_system.output_path,
+        cloudai.report_generator.comparison_report.ComparisonReportConfig(enable=True, group_by=[]),
+    )
+
+    report.load_test_runs()
+    assert len(report.trs) == 2
+    tables = report.create_tables(report.group_test_runs())
+    latency_table = tables[0]
+    success_table = tables[1]
+    throughput_table = tables[2]
+    quality_table = tables[3]
+
+    assert "bench_cmd_args.max_concurrency=8" in str(latency_table.columns[1].header)
+    assert "bench_cmd_args.max_concurrency=16" in str(latency_table.columns[2].header)
+    assert list(latency_table.columns[0].cells)[:3] == ["Mean TTFT (ms)", "Median TTFT (ms)", "P99 TTFT (ms)"]
+    assert list(success_table.columns[0].cells) == ["Successful Prompts", "Successful Prompts (%)"]
+    assert list(success_table.columns[3].cells) == [
+        cloudai.report_generator.comparison_report.ComparisonReport._format_diff_cell(27.0, 30.0),
+        cloudai.report_generator.comparison_report.ComparisonReport._format_diff_cell(90.0, 100.0),
+    ]
+    throughput_row = list(throughput_table.columns[0].cells).index("TPS/GPU")
+    assert list(throughput_table.columns[1].cells)[throughput_row] == "150.0"
+    assert list(quality_table.columns[0].cells) == ["Accuracy"]
+    assert list(quality_table.columns[1].cells) == ["0.81"]
+    assert list(quality_table.columns[2].cells) == ["0.9"]
+
+    report.generate()
+
+    assert (slurm_system.output_path / "vllm_comparison.html").exists()


🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add a symmetric integration test for SGLangComparisonReport.

This test fully validates the vLLM path, but the new SGLang comparison implementation is only covered indirectly via registry checks. Please add a parallel end-to-end SGLang report generation test to protect the new workload-specific parsing and output path.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/workloads/common/test_llm_serving_report.py` around lines 40 - 137, Add a second test mirroring test_llm_comparison_report_generates_html that exercises the SGLang path end-to-end: create two cloudai.core.TestRun instances for SGLang (use SGLangTestDefinition, SGLangCmdArgs, SGLangBenchCmdArgs, SGLangSemanticEvalCmdArgs), write corresponding bench and quality JSON results to slurm_system.output_path under each test run (use the SGLANG_BENCH_JSON_FILE and SGLANG_GSM8K_JSON_FILE constants or their equivalents), instantiate SGLangComparisonReport with a TestScenario containing those runs, call load_test_runs(), create_tables(group_test_runs()) and assert key table cells (latency/throughput/quality/success) similar to the vLLM test, call report.generate(), and finally assert that (slurm_system.output_path / "sglang_comparison.html").exists() to ensure HTML was produced.

comparison reports for vllm and sglang

bf13531

podkidyshev self-assigned this May 27, 2026

podkidyshev added the enhancement New feature or request label May 27, 2026

podkidyshev added 6 commits May 27, 2026 14:54

use updated import style

243dc0d

one more

3f10c13

minor refactor

634e39a

refactor base comp-reporting code to avoid hacky getattr

ba30fa6

better code than getattr

abe9ac0

remove callables as args in reporting

2596726

podkidyshev force-pushed the ipod/llm-comparison branch from bd9012d to 2596726 Compare May 28, 2026 15:58

fix copyright

727552b

podkidyshev marked this pull request as ready for review May 28, 2026 16:12

podkidyshev requested review from jeffnvidia and srivatsankrishnan as code owners May 28, 2026 16:12

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM/SGLang: comparison report#904

vLLM/SGLang: comparison report#904
podkidyshev wants to merge 8 commits into
mainfrom
ipod/llm-comparison

podkidyshev commented May 27, 2026

Uh oh!

coderabbitai Bot commented May 27, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 28, 2026

Uh oh!

coderabbitai Bot May 28, 2026

Uh oh!

coderabbitai Bot May 28, 2026

Uh oh!

coderabbitai Bot May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		from cloudai.workloads.vllm import VllmBenchCmdArgs, VLLMBenchReportGenerationStrategy, VllmTestDefinition

Conversation

podkidyshev commented May 27, 2026

Summary

Test Plan

Additional Notes

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 27, 2026 •

edited

Loading