Releases: kubernetes-sigs/gateway-api-inference-extension
v1.4.0-rc.2
RC Highlights
v1.4.0-rc.2is available for community testing before the finalv1.4.0release- fixes the release-branch quickstart vLLM image tags so they stay aligned with
mainwhile keeping release-branchIfNotPresentpull policy - bumps the
./conformancenested Go module to Gateway APIv1.5.0
What's Changed
- [release-1.4] fix(release): sync quickstart vllm images by @danehans in #2522
- [release-1.4] chore(conformance): bump gateway-api to v1.5.0 by @danehans in #2520
Full Changelog: v1.4.0-rc.1...v1.4.0-rc.2
v1.4.0-rc.1
RC Highlights
v1.4.0-rc.1is available for community testing before the final v1.4.0 release- standalone chart work landed and is included in release artifacts
- conformance was split into its own Go module
- InferencePool / Helm / gRPC-related improvements landed, including
appProtocol,FailOpen, and ALPNh2 - significant ongoing work landed in flow control, BBR, predicted latency, and datalayer internals
What's Changed
- cleanup: resolve technical debt and link tracking issues by @LukeAVanDrie in #2083
- Removing dead code that throws an err when no match is found by @kfswain in #2088
- cleanup: rename integration test utilities to remove _test suffix by @LukeAVanDrie in #2084
- Fixed targetPorts copy error by @capri-xiyue in #2092
- Add PR write permissions to label checker GHA, as it cannot add label… by @kfswain in #2094
- clean unused interface by @nirrozenbaum in #2098
- prefill aware prefix plugin by @ahg-g in #2104
- Removing perm-restricted GHA by @kfswain in #2105
- Updating vllm versions and fixing git commit sign by @kfswain in #2108
- Standardize inferencepool Helm templates and drop unnecessary tpl by @tsj-30 in #1989
- feat(bbr): add configuration flags for metrics auth and secure serving by @jpekmez in #2112
- chore(deps): bump github.com/prometheus/prometheus from 0.308.1 to 0.309.0 by @dependabot[bot] in #2090
- fix both error propogation and priority band fullness by @wseaton in #2103
- Datalayer refactoring: HTTP datasource and client by @irar2 in #2120
- Add v1 conformance report for alibabacloud ack gateway by @delavet in #2007
- changed httproute creation to be behind a flag. by @nirrozenbaum in #2118
- Rename part two by @shmuelk in #1968
- rename of experimental http route creation section in helm by @nirrozenbaum in #2123
- add scoring preference to scorer interface. by @nirrozenbaum in #2119
- feat: make epp-standalone be its own chart by @capri-xiyue in #2122
- fix: [Flow Control]: Optionally disable endpoint subset filtering while dispatching requests by @aishukamal in #2126
- fix: add update helm dependency by @zetxqx in #2135
- chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.3 to 2.27.5 by @dependabot[bot] in #2138
- chore(deps): bump github.com/prometheus/prometheus from 0.309.0 to 0.309.1 by @dependabot[bot] in #2136
- chore(deps): bump github.com/onsi/gomega from 1.38.3 to 1.39.0 by @dependabot[bot] in #2137
- Rename part three by @shmuelk in #2124
- fixed latest guide to use httproute creation in via the helm chart by @nirrozenbaum in #2141
- Removed duplicated field in log message by @shmuelk in #2142
- Update the metrics used by the dashboard by @learner0810 in #2139
- registry: switch to fine-grained leasing for flow lifecycle by @LukeAVanDrie in #2127
- Increase default FlowGCTimeout to 1h to prevent premature GC by @LukeAVanDrie in #2143
- update bbr quickstart guide with latest functionality by @nirrozenbaum in #2150
- Separate conformance tests modules from main tests by @rikatz in #1994
- feat: Add concurrency saturation detector by @LukeAVanDrie in #2062
- feat: epp standalone helm chart included in release to docker by @capri-xiyue in #2148
- Fix indention error for latency predictor by @liu-cong in #2158
- Removing alpha status in GH landing page by @kfswain in #2132
- docs: added epp standalone user guide by @capri-xiyue in #2147
- Add tracing entry span with W3C propagation to EPP handler by @sallyom in #2057
- feat(docs): enable content tab linking in mkdocs by @AvineshTripathi in #2176
- update bbr label filtering to align with best practices by @nirrozenbaum in #2178
- updated kgateway section in bbr quickstart guide by @nirrozenbaum in #2179
- move logging util to common pkg by @nirrozenbaum in #2180
- Interfaces towards pluggable BBR framework (initial PR) by @davidbreitgand in #2121
- feat(api): Add appProtocol to InferencePool API for gRPC support by @zetxqx in #2162
- docs: reference right manifest file by @sats-23 in #2186
- test: add hermetic coverage for standalone mode by @LukeAVanDrie in #2175
- Add support for video/audio formats for multimodal inputs by @rahulgurnani in #2181
- fix identation bug in quickstart by @nirrozenbaum in #2182
- refactor(flowcontrol): Migrate Fairness Policies to EPP Plugin System by @LukeAVanDrie in #2031
- [Conformance] copy pkgs from gateway-api to enable upgrade to gateway-api v1.4.0 by @zetxqx in #2159
- controller: extend flow lease scope to fix orphaned queues #1982 by @LukeAVanDrie in #2131
- rename slo-aware-router to predicted-latency by @kaushikmitr in #2183
- Better encapsulate data layer set up and validation. by @elevran in #2185
- test: added latency predictor converage for inferencepool and added convera… by @capri-xiyue in #2187
- cleanup: refactor multiple include into one file by @capri-xiyue in #2191
- feat: Allow request control plugins to return ext_proc dynamic metadata by @fcfort in #2156
- Moving the scheduling component pluggable interface and types to the common framework pkg by @ahg-g in #2192
- Update troubleshooting guide to include remediation for incorrect pre… by @BenjaminBraunDev in #2040
- Add flowcontrol queue length in bytes metric by @RyanRosario in #2044
- Moved the epp/plugins pkg to be under the new framework pkg by @ahg-g in #2194
- Move framework interfaces under epp/framework/interface by @ahg-g in #2195
- feat: added a local mode in verify helm script by @capri-xiyue in #2196
- [Flow Control] ...
v1.3.1
Fixes
This patch cherry picks a few fixes for:
#2321
#2300
#2316
v1.3.0
Noteworthy
LoRA Syncer
This release, and future releases will not have the lora syncer image associated with them, as we are deprecating that feature, a similar functionality will still exist in the form of the file system resolver. For model servers that do not yet support this form of LoRA management, but support the discrete LoRA management endpoints that the lora-syncer uses, the old images will be kept indefinitely, and can still be used.
In the next release, the lora syncer code will be removed from the codebase.
Flow Control
Flow Control continues to evolve with the addition of Scale from/to Zero support. Allowing requests to be sent to an EPP with no model serving endpoints behind it, and emitting metrics to be used by the autoscaler to then scale up the pool.
In following releases we will continue to develop towards this feature being default enabled.
Standalone EPP
This functionality allows the EPP to be deployed as a proxy, all contained within a single pod. This is achieved by the Envoy proxy having EPP as a sidecar container. This feature was developed for batch inference scenarios, and is currently considered experimental.
v1.3.1-rc.1
This patch cherry picks a few fixes for:
#2321
#2300
#2316
Full Changelog: v1.3.0...v1.3.1-rc.1
v1.3.0
Noteworthy
LoRA Syncer
This release, and future releases will not have the lora syncer image associated with them, as we are deprecating that feature, a similar functionality will still exist in the form of the file system resolver. For model servers that do not yet support this form of LoRA management, but support the discrete LoRA management endpoints that the lora-syncer uses, the old images will be kept indefinitely, and can still be used.
In the next release, the lora syncer code will be removed from the codebase.
Flow Control
Flow Control continues to evolve with the addition of Scale from/to Zero support. Allowing requests to be sent to an EPP with no model serving endpoints behind it, and emitting metrics to be used by the autoscaler to then scale up the pool.
In following releases we will continue to develop towards this feature being default enabled.
Standalone EPP
This functionality allows the EPP to be deployed as a proxy, all contained within a single pod. This is achieved by the Envoy proxy having EPP as a sidecar container. This feature was developed for batch inference scenarios, and is currently considered experimental.
Fix(es)
- We improved the functionality of the approximate prefix cache scorer when working with the llm-d P/D setup
What's Changed
- Added crd validation ci workflow. by @bexxmodd in #1879
- chore: bump sim version by @nirrozenbaum in #1890
- feat(conformance): add conformance test for verifying
x-gateway-destination-endpoint-servedby @zetxqx in #1862 - Add deprecation notice on metrics port in runner and datastore by @elevran in #1886
- refactor: Flatten Flow Control inter-flow policy plugin directory structure by @LukeAVanDrie in #1841
- Execute prepare data plugins in topological order of data dependencies by @rahulgurnani in #1878
- chore(deps): bump go.uber.org/zap from 1.27.0 to 1.27.1 by @dependabot[bot] in #1896
- chore(deps): bump google.golang.org/grpc from 1.76.0 to 1.77.0 by @dependabot[bot] in #1897
- chore(deps): bump github.com/prometheus/common from 0.67.2 to 0.67.4 by @dependabot[bot] in #1895
- enhance bbr helm chart to generalize cmd-line args by @nirrozenbaum in #1900
- feat: Add totalRunningRequests metric for latency predictor by @BenjaminBraunDev in #1899
- chore(deps): bump sigs.k8s.io/structured-merge-diff/v6 from 6.3.0 to 6.3.1 by @dependabot[bot] in #1898
- SLO Aware Routing Sidecar + Plugin EPP Integration and Helm Deployment by @BenjaminBraunDev in #1839
- Use the correct vllm metric gpu_cache_usage_perc --> kv_cache_usage_perc by @ezrasilvera in #1905
- fix: fixed helm chart by @capri-xiyue in #1907
- docs: add Kgateway BBR documentation by @howardjohn in #1908
- Implement EPP Plugins by datalayer objects by @elevran in #1901
- feat: Implement Model Rewrite and Traffic Splitting Logic by @zetxqx in #1820
- docs: Updated quickstart to use stable Istio release 1.28.0 by @atharva-310 in #1902
- fix(release): correctly update lora-syncer and epp image tags across RC and final releases by @googs1025 in #1916
- fix: sort InferenceModelRewrite lists by (Namespace, Name) in tests by @googs1025 in #1917
- Define and register plugin factories for datalayer by @elevran in #1911
- fix: Properly install the InferenceModelRewrite CRD using kustomize by @shmuelk in #1934
- Move AllPodsPredicate to datastore package by @elevran in #1939
- Add automatic TLS certificate reloading for EPP by @pierDipi in #1765
- feat(modelRewrite): Add metrics for InferenceModelRewrite decisions by @zetxqx in #1938
- fix: CI golangci-lint errors by @shmuelk in #1948
- Update inference perf chart to match upstream chart + Add Prefix Cache Github Actions by @rlakhtakia in #1949
- Standardize plugins.TypedName field name from 'tn' to 'typedName' by @rohithnarasimha in #1918
- Update inference perf chart to use new hf token structure. by @rlakhtakia in #1955
- fix infinite loop in profile picker and switch predictor based routing to on by default with a header to disable by @BenjaminBraunDev in #1929
- fix config load error when picker is set before the scoerer w/o weight. by @zetxqx in #1958
- add kaushikmitr as appoved of slo aware routing plugin by @kaushikmitr in #1956
- refactor: [Scale from Zero] Introduce PodLocator by @LukeAVanDrie in #1950
- feat: add config validation in predicted-latency-scorer plugin by @googs1025 in #1904
- Run tests with two data layer implementations by @irar2 in #1930
- Rename PodInfo struct to EndpointMetadata to better reflect its purpose by @shmuelk in #1866
- feat(metrics): add scheduler attempt counter by @googs1025 in #1931
- chore: update released quickstart to v1.2.1 by @nirrozenbaum in #1941
- generalize latest release quickstart by @nirrozenbaum in #1966
- chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.2 to 2.27.3 by @dependabot[bot] in #1971
- chore(deps): bump golang.org/x/sync from 0.18.0 to 0.19.0 by @dependabot[bot] in #1972
- chore(deps): bump go.opentelemetry.io/otel/sdk from 1.38.0 to 1.39.0 by @dependabot[bot] in #1975
- refactor: Standardize config loading and system default injection by @LukeAVanDrie in #1953
- chore(deps): bump github.com/onsi/gomega from 1.38.2 to 1.38.3 by @dependabot[bot] in #1974
- chore(deps): bump go.opentelemetry.io/otel/exporters/stdout/stdouttrace from 1.38.0 to 1.39.0 by @dependabot[bot] in #1973
- feat: Enable Scale-from-Zero with Flow Control enabled by @LukeAVanDrie in #1952
- feature: (helm) support custom volumes and volumeMounts for epp by @delavet in #1945
- Use spf13/pflag instead of Go's standard flag package by @elevran in #1979
- Extend textual configuration support with the Datalayer's configuration by @shmuelk in #1914
- test/integration: introduce robust harness and migrate BBR suite by @LukeAVanDrie in #1959
- test/bbr: fix startup race condition and IPv6 address formatting by @LukeAVanDrie in #1987
- [chore]Bump vLLM Image Tags by @Frapschen in #1733
- Add Prefill Heavy E2E Test to Github Actions by @rlakhtakia in #1894
...
v1.3.0-rc.3
RC diff
- Helm fixes
- Scale from zero fixes
What's Changed
- Added crd validation ci workflow. by @bexxmodd in #1879
- chore: bump sim version by @nirrozenbaum in #1890
- feat(conformance): add conformance test for verifying
x-gateway-destination-endpoint-servedby @zetxqx in #1862 - Add deprecation notice on metrics port in runner and datastore by @elevran in #1886
- refactor: Flatten Flow Control inter-flow policy plugin directory structure by @LukeAVanDrie in #1841
- Execute prepare data plugins in topological order of data dependencies by @rahulgurnani in #1878
- chore(deps): bump go.uber.org/zap from 1.27.0 to 1.27.1 by @dependabot[bot] in #1896
- chore(deps): bump google.golang.org/grpc from 1.76.0 to 1.77.0 by @dependabot[bot] in #1897
- chore(deps): bump github.com/prometheus/common from 0.67.2 to 0.67.4 by @dependabot[bot] in #1895
- enhance bbr helm chart to generalize cmd-line args by @nirrozenbaum in #1900
- feat: Add totalRunningRequests metric for latency predictor by @BenjaminBraunDev in #1899
- chore(deps): bump sigs.k8s.io/structured-merge-diff/v6 from 6.3.0 to 6.3.1 by @dependabot[bot] in #1898
- SLO Aware Routing Sidecar + Plugin EPP Integration and Helm Deployment by @BenjaminBraunDev in #1839
- Use the correct vllm metric gpu_cache_usage_perc --> kv_cache_usage_perc by @ezrasilvera in #1905
- fix: fixed helm chart by @capri-xiyue in #1907
- docs: add Kgateway BBR documentation by @howardjohn in #1908
- Implement EPP Plugins by datalayer objects by @elevran in #1901
- feat: Implement Model Rewrite and Traffic Splitting Logic by @zetxqx in #1820
- docs: Updated quickstart to use stable Istio release 1.28.0 by @atharva-310 in #1902
- fix(release): correctly update lora-syncer and epp image tags across RC and final releases by @googs1025 in #1916
- fix: sort InferenceModelRewrite lists by (Namespace, Name) in tests by @googs1025 in #1917
- Define and register plugin factories for datalayer by @elevran in #1911
- fix: Properly install the InferenceModelRewrite CRD using kustomize by @shmuelk in #1934
- Move AllPodsPredicate to datastore package by @elevran in #1939
- Add automatic TLS certificate reloading for EPP by @pierDipi in #1765
- feat(modelRewrite): Add metrics for InferenceModelRewrite decisions by @zetxqx in #1938
- fix: CI golangci-lint errors by @shmuelk in #1948
- Update inference perf chart to match upstream chart + Add Prefix Cache Github Actions by @rlakhtakia in #1949
- Standardize plugins.TypedName field name from 'tn' to 'typedName' by @rohithnarasimha in #1918
- Update inference perf chart to use new hf token structure. by @rlakhtakia in #1955
- fix infinite loop in profile picker and switch predictor based routing to on by default with a header to disable by @BenjaminBraunDev in #1929
- fix config load error when picker is set before the scoerer w/o weight. by @zetxqx in #1958
- add kaushikmitr as appoved of slo aware routing plugin by @kaushikmitr in #1956
- refactor: [Scale from Zero] Introduce PodLocator by @LukeAVanDrie in #1950
- feat: add config validation in predicted-latency-scorer plugin by @googs1025 in #1904
- Run tests with two data layer implementations by @irar2 in #1930
- Rename PodInfo struct to EndpointMetadata to better reflect its purpose by @shmuelk in #1866
- feat(metrics): add scheduler attempt counter by @googs1025 in #1931
- chore: update released quickstart to v1.2.1 by @nirrozenbaum in #1941
- generalize latest release quickstart by @nirrozenbaum in #1966
- chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.2 to 2.27.3 by @dependabot[bot] in #1971
- chore(deps): bump golang.org/x/sync from 0.18.0 to 0.19.0 by @dependabot[bot] in #1972
- chore(deps): bump go.opentelemetry.io/otel/sdk from 1.38.0 to 1.39.0 by @dependabot[bot] in #1975
- refactor: Standardize config loading and system default injection by @LukeAVanDrie in #1953
- chore(deps): bump github.com/onsi/gomega from 1.38.2 to 1.38.3 by @dependabot[bot] in #1974
- chore(deps): bump go.opentelemetry.io/otel/exporters/stdout/stdouttrace from 1.38.0 to 1.39.0 by @dependabot[bot] in #1973
- feat: Enable Scale-from-Zero with Flow Control enabled by @LukeAVanDrie in #1952
- feature: (helm) support custom volumes and volumeMounts for epp by @delavet in #1945
- Use spf13/pflag instead of Go's standard flag package by @elevran in #1979
- Extend textual configuration support with the Datalayer's configuration by @shmuelk in #1914
- test/integration: introduce robust harness and migrate BBR suite by @LukeAVanDrie in #1959
- test/bbr: fix startup race condition and IPv6 address formatting by @LukeAVanDrie in #1987
- [chore]Bump vLLM Image Tags by @Frapschen in #1733
- Add Prefill Heavy E2E Test to Github Actions by @rlakhtakia in #1894
- Add decode heavy benchmark e2e test to github actions. by @rlakhtakia in #1893
- BBR multi lora guide by @davidbreitgand in #1940
- [feat] Add running requests scorer and tests by @BenjaminBraunDev in #1957
- Implement PrepareDataPlugin for prefix cache match plugin by @rahulgurnani in #1942
- Define and implement command line parsing with Options struct by @elevran in #1984
- fix(inferenceModelRewrites): conditionally skip watching InferenceModelRewrite and InferenceObjective by @zetxqx in #1967
- Add e2e test for multiport InferencePool enhancement by @RyanRosario in #1885
- chore(deps): bump go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc from 1.38.0 to 1.39.0 by @dependabot[bot] in #1997
- flowcontrol: refactor registry config to support dynamic priority...
v1.3.0-rc.2
Fixes in this RC
- Issue with standalone EPP fixed
- Issue with approx prefix not working in the P/D scenario
Noteworthy
LoRA Syncer
This release, and future releases will not have the lora syncer image associated with them, as we are deprecating that feature, a similar functionality will still exist in the form of the file system resolver. For model servers that do not yet support this form of LoRA management, but support the discrete LoRA management endpoints that the lora-syncer uses, the old images will be kept indefinitely, and can still be used.
In the next release, the lora syncer code will be removed from the codebase.
What's Changed
- Added crd validation ci workflow. by @bexxmodd in #1879
- chore: bump sim version by @nirrozenbaum in #1890
- feat(conformance): add conformance test for verifying
x-gateway-destination-endpoint-servedby @zetxqx in #1862 - Add deprecation notice on metrics port in runner and datastore by @elevran in #1886
- refactor: Flatten Flow Control inter-flow policy plugin directory structure by @LukeAVanDrie in #1841
- Execute prepare data plugins in topological order of data dependencies by @rahulgurnani in #1878
- chore(deps): bump go.uber.org/zap from 1.27.0 to 1.27.1 by @dependabot[bot] in #1896
- chore(deps): bump google.golang.org/grpc from 1.76.0 to 1.77.0 by @dependabot[bot] in #1897
- chore(deps): bump github.com/prometheus/common from 0.67.2 to 0.67.4 by @dependabot[bot] in #1895
- enhance bbr helm chart to generalize cmd-line args by @nirrozenbaum in #1900
- feat: Add totalRunningRequests metric for latency predictor by @BenjaminBraunDev in #1899
- chore(deps): bump sigs.k8s.io/structured-merge-diff/v6 from 6.3.0 to 6.3.1 by @dependabot[bot] in #1898
- SLO Aware Routing Sidecar + Plugin EPP Integration and Helm Deployment by @BenjaminBraunDev in #1839
- Use the correct vllm metric gpu_cache_usage_perc --> kv_cache_usage_perc by @ezrasilvera in #1905
- fix: fixed helm chart by @capri-xiyue in #1907
- docs: add Kgateway BBR documentation by @howardjohn in #1908
- Implement EPP Plugins by datalayer objects by @elevran in #1901
- feat: Implement Model Rewrite and Traffic Splitting Logic by @zetxqx in #1820
- docs: Updated quickstart to use stable Istio release 1.28.0 by @atharva-310 in #1902
- fix(release): correctly update lora-syncer and epp image tags across RC and final releases by @googs1025 in #1916
- fix: sort InferenceModelRewrite lists by (Namespace, Name) in tests by @googs1025 in #1917
- Define and register plugin factories for datalayer by @elevran in #1911
- fix: Properly install the InferenceModelRewrite CRD using kustomize by @shmuelk in #1934
- Move AllPodsPredicate to datastore package by @elevran in #1939
- Add automatic TLS certificate reloading for EPP by @pierDipi in #1765
- feat(modelRewrite): Add metrics for InferenceModelRewrite decisions by @zetxqx in #1938
- fix: CI golangci-lint errors by @shmuelk in #1948
- Update inference perf chart to match upstream chart + Add Prefix Cache Github Actions by @rlakhtakia in #1949
- Standardize plugins.TypedName field name from 'tn' to 'typedName' by @rohithnarasimha in #1918
- Update inference perf chart to use new hf token structure. by @rlakhtakia in #1955
- fix infinite loop in profile picker and switch predictor based routing to on by default with a header to disable by @BenjaminBraunDev in #1929
- fix config load error when picker is set before the scoerer w/o weight. by @zetxqx in #1958
- add kaushikmitr as appoved of slo aware routing plugin by @kaushikmitr in #1956
- refactor: [Scale from Zero] Introduce PodLocator by @LukeAVanDrie in #1950
- feat: add config validation in predicted-latency-scorer plugin by @googs1025 in #1904
- Run tests with two data layer implementations by @irar2 in #1930
- Rename PodInfo struct to EndpointMetadata to better reflect its purpose by @shmuelk in #1866
- feat(metrics): add scheduler attempt counter by @googs1025 in #1931
- chore: update released quickstart to v1.2.1 by @nirrozenbaum in #1941
- generalize latest release quickstart by @nirrozenbaum in #1966
- chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.2 to 2.27.3 by @dependabot[bot] in #1971
- chore(deps): bump golang.org/x/sync from 0.18.0 to 0.19.0 by @dependabot[bot] in #1972
- chore(deps): bump go.opentelemetry.io/otel/sdk from 1.38.0 to 1.39.0 by @dependabot[bot] in #1975
- refactor: Standardize config loading and system default injection by @LukeAVanDrie in #1953
- chore(deps): bump github.com/onsi/gomega from 1.38.2 to 1.38.3 by @dependabot[bot] in #1974
- chore(deps): bump go.opentelemetry.io/otel/exporters/stdout/stdouttrace from 1.38.0 to 1.39.0 by @dependabot[bot] in #1973
- feat: Enable Scale-from-Zero with Flow Control enabled by @LukeAVanDrie in #1952
- feature: (helm) support custom volumes and volumeMounts for epp by @delavet in #1945
- Use spf13/pflag instead of Go's standard flag package by @elevran in #1979
- Extend textual configuration support with the Datalayer's configuration by @shmuelk in #1914
- test/integration: introduce robust harness and migrate BBR suite by @LukeAVanDrie in #1959
- test/bbr: fix startup race condition and IPv6 address formatting by @LukeAVanDrie in #1987
- [chore]Bump vLLM Image Tags by @Frapschen in #1733
- Add Prefill Heavy E2E Test to Github Actions by @rlakhtakia in #1894
- Add decode heavy benchmark e2e test to github actions. by @rlakhtakia in #1893
- BBR multi lora guide by @davidbreitgand in #1940
- [feat] Add running requests scorer and tests by @BenjaminBraunDev in #1957
- Implement PrepareDataPlugin for prefix cache match plugin by @rahulgurnani in #1942
- Define and implement command line parsing with Options struct by @elevran in ...
v1.3.0-rc.1
Noteworthy
LoRA Syncer
This release, and future releases will not have the lora syncer image associated with them, as we are deprecating that feature, a similar functionality will still exist in the form of the file system resolver. For model servers that do not yet support this form of LoRA management, but support the discrete LoRA management endpoints that the lora-syncer uses, the old images will be kept indefinitely, and can still be used.
In the next release, the lora syncer code will be removed from the codebase.
What's Changed
- Added crd validation ci workflow. by @bexxmodd in #1879
- chore: bump sim version by @nirrozenbaum in #1890
- feat(conformance): add conformance test for verifying
x-gateway-destination-endpoint-servedby @zetxqx in #1862 - Add deprecation notice on metrics port in runner and datastore by @elevran in #1886
- refactor: Flatten Flow Control inter-flow policy plugin directory structure by @LukeAVanDrie in #1841
- Execute prepare data plugins in topological order of data dependencies by @rahulgurnani in #1878
- chore(deps): bump go.uber.org/zap from 1.27.0 to 1.27.1 by @dependabot[bot] in #1896
- chore(deps): bump google.golang.org/grpc from 1.76.0 to 1.77.0 by @dependabot[bot] in #1897
- chore(deps): bump github.com/prometheus/common from 0.67.2 to 0.67.4 by @dependabot[bot] in #1895
- enhance bbr helm chart to generalize cmd-line args by @nirrozenbaum in #1900
- feat: Add totalRunningRequests metric for latency predictor by @BenjaminBraunDev in #1899
- chore(deps): bump sigs.k8s.io/structured-merge-diff/v6 from 6.3.0 to 6.3.1 by @dependabot[bot] in #1898
- SLO Aware Routing Sidecar + Plugin EPP Integration and Helm Deployment by @BenjaminBraunDev in #1839
- Use the correct vllm metric gpu_cache_usage_perc --> kv_cache_usage_perc by @ezrasilvera in #1905
- fix: fixed helm chart by @capri-xiyue in #1907
- docs: add Kgateway BBR documentation by @howardjohn in #1908
- Implement EPP Plugins by datalayer objects by @elevran in #1901
- feat: Implement Model Rewrite and Traffic Splitting Logic by @zetxqx in #1820
- docs: Updated quickstart to use stable Istio release 1.28.0 by @atharva-310 in #1902
- fix(release): correctly update lora-syncer and epp image tags across RC and final releases by @googs1025 in #1916
- fix: sort InferenceModelRewrite lists by (Namespace, Name) in tests by @googs1025 in #1917
- Define and register plugin factories for datalayer by @elevran in #1911
- fix: Properly install the InferenceModelRewrite CRD using kustomize by @shmuelk in #1934
- Move AllPodsPredicate to datastore package by @elevran in #1939
- Add automatic TLS certificate reloading for EPP by @pierDipi in #1765
- feat(modelRewrite): Add metrics for InferenceModelRewrite decisions by @zetxqx in #1938
- fix: CI golangci-lint errors by @shmuelk in #1948
- Update inference perf chart to match upstream chart + Add Prefix Cache Github Actions by @rlakhtakia in #1949
- Standardize plugins.TypedName field name from 'tn' to 'typedName' by @rohithnarasimha in #1918
- Update inference perf chart to use new hf token structure. by @rlakhtakia in #1955
- fix infinite loop in profile picker and switch predictor based routing to on by default with a header to disable by @BenjaminBraunDev in #1929
- fix config load error when picker is set before the scoerer w/o weight. by @zetxqx in #1958
- add kaushikmitr as appoved of slo aware routing plugin by @kaushikmitr in #1956
- refactor: [Scale from Zero] Introduce PodLocator by @LukeAVanDrie in #1950
- feat: add config validation in predicted-latency-scorer plugin by @googs1025 in #1904
- Run tests with two data layer implementations by @irar2 in #1930
- Rename PodInfo struct to EndpointMetadata to better reflect its purpose by @shmuelk in #1866
- feat(metrics): add scheduler attempt counter by @googs1025 in #1931
- chore: update released quickstart to v1.2.1 by @nirrozenbaum in #1941
- generalize latest release quickstart by @nirrozenbaum in #1966
- chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.2 to 2.27.3 by @dependabot[bot] in #1971
- chore(deps): bump golang.org/x/sync from 0.18.0 to 0.19.0 by @dependabot[bot] in #1972
- chore(deps): bump go.opentelemetry.io/otel/sdk from 1.38.0 to 1.39.0 by @dependabot[bot] in #1975
- refactor: Standardize config loading and system default injection by @LukeAVanDrie in #1953
- chore(deps): bump github.com/onsi/gomega from 1.38.2 to 1.38.3 by @dependabot[bot] in #1974
- chore(deps): bump go.opentelemetry.io/otel/exporters/stdout/stdouttrace from 1.38.0 to 1.39.0 by @dependabot[bot] in #1973
- feat: Enable Scale-from-Zero with Flow Control enabled by @LukeAVanDrie in #1952
- feature: (helm) support custom volumes and volumeMounts for epp by @delavet in #1945
- Use spf13/pflag instead of Go's standard flag package by @elevran in #1979
- Extend textual configuration support with the Datalayer's configuration by @shmuelk in #1914
- test/integration: introduce robust harness and migrate BBR suite by @LukeAVanDrie in #1959
- test/bbr: fix startup race condition and IPv6 address formatting by @LukeAVanDrie in #1987
- [chore]Bump vLLM Image Tags by @Frapschen in #1733
- Add Prefill Heavy E2E Test to Github Actions by @rlakhtakia in #1894
- Add decode heavy benchmark e2e test to github actions. by @rlakhtakia in #1893
- BBR multi lora guide by @davidbreitgand in #1940
- [feat] Add running requests scorer and tests by @BenjaminBraunDev in #1957
- Implement PrepareDataPlugin for prefix cache match plugin by @rahulgurnani in #1942
- Define and implement command line parsing with Options struct by @elevran in #1984
- fix(inferenceModelRewrites): condition...
v1.2.1
v1.2.0
What's Changed
- Add openai api link for request format by @learner0810 in #1757
- Docs: Fix incorrect
stream_optionsvalue in Observability example by @aman4433 in #1758 - Docs: Bumps Quickstart to Use Kgateway v2.2.0-main by @danehans in #1761
- Docs: Updates Latest/Main Quickstart by @danehans in #1747
- Docs: Versioned Quickstart Install All CRDs by @danehans in #1762
- chore: fixed meeting link by @nirrozenbaum in #1734
- Add Produces and Consumes methods to Plugin by @rahulgurnani in #1754
- Docs: Removes Agentgateway Docs by @danehans in #1771
- Record EPP NormalizedTimePerOutputToken metric on streaming mode by @dharaneeshvrd in #1706
- chore(deps): bump github.com/onsi/ginkgo/v2 from 2.26.0 to 2.27.2 by @dependabot[bot] in #1776
- chore(deps): bump github.com/prometheus/prometheus from 0.307.1 to 0.307.2 by @dependabot[bot] in #1774
- fix tracing configuration in helm epp-deployment template by @sallyom in #1777
- Fix for kustomization missing path for inferencepoolimport.yaml. by @bexxmodd in #1782
- fix inferenceobjective api types link by @learner0810 in #1739
- update release quickstart to use v1.1.0 by @nirrozenbaum in #1785
- [metrics]: Allow EPP to register metrics from extension by @JeffLuoo in #1787
- feat (reports): add infrastructure to run NGF conformance tests and i… by @sindhushiv in #1788
- Add Install Gateway section in Getting Started Latest guide by @dharaneeshvrd in #1759
- quickstart cleanup by @nirrozenbaum in #1805
- fix(release): update quickstart guide version automatically by @AvineshTripathi in #1803
- chore(deps): bump github.com/prometheus/prometheus from 0.307.2 to 0.307.3 by @dependabot[bot] in #1809
- chore(deps): bump github.com/prometheus/common from 0.67.1 to 0.67.2 by @dependabot[bot] in #1807
- logging cleanup of scheduler pkg by @nirrozenbaum in #1806
- chore(deps): bump sigs.k8s.io/controller-runtime from 0.22.3 to 0.22.4 by @dependabot[bot] in #1808
- allow overriding the runner's containing executable name by @elevran in #1813
- quickstart numbering by @nirrozenbaum in #1819
- [SLO Routing] Add Latency Predictor sidecars and EPP tools by @BenjaminBraunDev in #1791
- update inferencepool helm chart flags to be map instead of an array by @nirrozenbaum in #1818
- feat: Configure LRUCacheSize using the numGPUBlocks for approximate prefix cache by @zetxqx in #1748
- don't use cluster scope permissions when metrics auth is disabled by @nirrozenbaum in #1804
- Add benchmarking folder by @rlakhtakia in #1689
- Add prompt_cached_tokens metrics from each response. by @zetxqx in #1814
- hotfix to helm chart. missing quotes by @nirrozenbaum in #1825
- Correct the InferencePoolResolvedRefsCondition conformance tests. by @zetxqx in #1756
- Adjust default scorer weights to favor more prefix cache affinity by @liu-cong in #1827
- refactor: Flatten Flow Control queue plugin directory structure by @LukeAVanDrie in #1824
- Update docs on prefix cache plugin related metrics by @liu-cong in #1828
- Add prefix cache aware benchmarking config by @rlakhtakia in #1822
- feat: add validation and fallback for prefix cache config fields by @googs1025 in #1846
- chore(deps): bump github.com/envoyproxy/go-control-plane/envoy from 1.35.0 to 1.36.0 by @dependabot[bot] in #1844
- chore(deps): bump golang.org/x/sync from 0.17.0 to 0.18.0 by @dependabot[bot] in #1845
- Improvements to the E2E Test utilities by @shmuelk in #1853
- Conformance: Adds Data Parallelism Test by @danehans in #1769
- fix incorrect interface input parameter names by @googs1025 in #1865
- docs: Adding the Gateway inference support documentation for Nginx Ga… by @sindhushiv in #1789
- helm support for sidecar injection in EPP by @capri-xiyue in #1821
- Helm: Adds
istioas aprovider-scoped value for the inferencepool Chart by @danehans in #1831 - refactor: Improve Flow Control queue contracts for clarity and correctness by @LukeAVanDrie in #1836
- fix training server indentation bug and test yaml to build script by @kaushikmitr in #1854
- Validate datalayer with additional testing by @elevran in #1857
- Add PrepareData and Admission control plugins by @rahulgurnani in #1796
- feat(api): Introduce InferenceModelRewrite API by @zetxqx in #1816
- Add owners files to subsections by @kfswain in #1874
- Additional data layer tests by @irar2 in #1876
- chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #1873
- feat: Extend the text based configuration to include feature flags and the SaturationDetector's configuration by @shmuelk in #1492
- refactor bbr main as a prep for pluggability by @nirrozenbaum in #1867
- use a dispatch ticker to dispatch requests periodly in ShardProcessor… by @delavet in #1850
- feat(conformance): add responseReceived plugin to support verifying destination endpoint. by @zetxqx in #1855
- some cleanup in runner and config loading + deprecation notes by @nirrozenbaum in #1880
- fix bbr dockerfile post build by @nirrozenbaum in #1881
- add shmuelk as code reviewer by @nirrozenbaum in #1882
- SLO Aware Routing Plugins Only by @BenjaminBraunDev in #1849
- Upload prefill and decode heavy benchmarking configs by @rlakhtakia in #1848
- Update outdated documentation for monitoring config of GKE by @JeffLuoo in #1837
- Enable EPP to support endpoint discovery using pod selector by @c...