Skip to content

OLS-2882: Add spec files to the projects for AI-assisted development#1536

Open
joshuawilson wants to merge 2 commits intoopenshift:mainfrom
joshuawilson:spec
Open

OLS-2882: Add spec files to the projects for AI-assisted development#1536
joshuawilson wants to merge 2 commits intoopenshift:mainfrom
joshuawilson:spec

Conversation

@joshuawilson
Copy link
Copy Markdown

@joshuawilson joshuawilson commented Apr 20, 2026

Description

Initial set of spec files to enable Agentic SDLC.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up dependent library

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 20, 2026

@joshuawilson: This pull request references OLS-2882 which is a valid jira issue.

Details

In response to this:

Description

Initial set of spec files to enable Agentic SDLC.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up dependent library

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 20, 2026
@openshift-ci openshift-ci Bot requested review from blublinsky and bparees April 20, 2026 03:13
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 20, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joshuawilson for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

3. The operator is fully event-driven. It does not use periodic/timer-based reconciliation. All changes are detected via Kubernetes watches on owned resources and annotated external resources.
4. The operator selects between two mutually exclusive backend implementations at startup via the `--use-lcore` flag: AppServer (legacy, direct LLM proxy) or LCore (new, agent-based with Llama Stack). Both implement the same Lightspeed API surface.

### Component Inventory
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can but the spec files are specific to the repo.
Could create a higher level set of specs that cover all repos and konflux.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment below

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec's Component Inventory currently lists the operands, the things the operator deploys and manages. Konflux components are the build artifacts, the container images that get built in the CI/CD pipeline.

Those are two different concerns. The build/image inventory is useful context, but it belongs in a different place than the Component Inventory section, which describes runtime behavior. It would fit better as a reference in the how/project-structure.md spec under something like "Container Images" mapping each logical component to its image name and build source. Mapping "which Konflux component produces which image that the operator deploys" -- but that's a convenience reference, not a behavioral rule.

Comment thread .ai/spec/what/lcore.md Outdated
1. The Llama Stack database name is hardcoded by the Llama Stack project and must not be changed.
2. Llama Stack Generic mode cannot be mixed with legacy provider-specific fields (deploymentName, projectID, url, apiVersion).
3. The Lightspeed Stack always connects to Llama Stack via localhost, even in server mode (they share a pod).
4. Vector database IDs are sanitized from RAG image names if indexID is not explicitly provided.
Copy link
Copy Markdown
Contributor

@blublinsky blublinsky Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole thing is going to be removed in this sprint

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not exactly fair to hold this up on something that didn't get merged but I'll remove it since I'm updating it

|---|---|---|---|
| `--use-lcore` | bool | `false` | Select LCore backend instead of AppServer |
| `--lcore-server` | bool | `true` | LCore server mode (two containers) vs library mode (one container) |
| `--namespace` | string | `WATCH_NAMESPACE` env or `openshift-lightspeed` | Operator namespace |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lcore is going away this sprint

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread .ai/spec/what/system-overview.md Outdated
| OLS-2322 | Streamline OLSConfig CR deployment configuration |
| OLS-2323 | Extend OLSConfig CR to report specific deployment errors |
| OLS-2325 | Create type-safe log-level definition in the operator CR |
| OLS-2140 | Remove time-based operator reconciliation (completed -- now fully event-driven) |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add here “delivery map” subsection: one short table or bullet list that maps repo components → Konflux application names (or “see Konflux UI → ols app → components”) with a disclaimer: operator repo spec describes operator-managed workloads; Konflux may list additional CI/catalog components. That answers JoaoFula’s question without duplicating Konflux in every spec.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is something you can add to the spec as a separate PR.

Comment thread .ai/spec/what/system-overview.md Outdated

### Operator Role

1. The operator manages exactly one OLSConfig CR per cluster, named "cluster". CRs with any other name must be ignored.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OLSConfig is treated as a singleton per cluster: the operator only reconciles the cluster-scoped instance named cluster. Any other OLSConfig objects are ignored. Reconciled workloads are created in the openshift-lightspeed namespace.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want that added?
I replaced the line with that text. If you wanted something else put it in quotes.

Comment thread .ai/spec/what/system-overview.md Outdated
### Operator Role

1. The operator manages exactly one OLSConfig CR per cluster, named "cluster". CRs with any other name must be ignored.
2. The operator deploys and manages four components: an application backend (AppServer or LCore), a PostgreSQL database, a Console UI plugin, and operator-level monitoring/networking resources.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mixing external resources with the operator's own infrastructure

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this wrong?
What do you think it should say?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resource changes management is different. See below

Comment thread .ai/spec/what/system-overview.md Outdated
1. The operator manages exactly one OLSConfig CR per cluster, named "cluster". CRs with any other name must be ignored.
2. The operator deploys and manages four components: an application backend (AppServer or LCore), a PostgreSQL database, a Console UI plugin, and operator-level monitoring/networking resources.
3. The operator is fully event-driven. It does not use periodic/timer-based reconciliation. All changes are detected via Kubernetes watches on owned resources and annotated external resources.
4. The operator selects between two mutually exclusive backend implementations at startup via the `--use-lcore` flag: AppServer (legacy, direct LLM proxy) or LCore (new, agent-based with Llama Stack). Both implement the same Lightspeed API surface.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going away in this sprint

Comment thread .ai/spec/what/system-overview.md Outdated
6. Console UI Plugin: OpenShift console extension that provides the Lightspeed chat interface. Integrates via ConsolePlugin CR and proxies requests to the backend.
7. AppServer backend: Python/FastAPI application that handles LLM queries, RAG retrieval, conversation management, and tool execution. Talks to LLM providers directly.
8. LCore backend: Dual-container deployment (Llama Stack + Lightspeed Stack) that provides the same API but routes through Llama Stack for LLM communication, enabling agent-based tool use and provider abstraction.
9. Operator-level resources: ServiceMonitor for operator metrics, NetworkPolicy restricting operator pod access.
Copy link
Copy Markdown
Contributor

@blublinsky blublinsky Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest separating external with operator-level resources (observability support), and also add a cross-reference here to the specific docs

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you want here. What is external?
What cross-reference are you looking for?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For change detection, the operator differentiates owned (created by the operator) and external (provided by the user) resources and does change detection differently.

  1. For owned resources (described above), change detection is based on the resource version
  2. For external resources, change detection is implemented using watchers described in external resources

Comment thread .ai/spec/what/system-overview.md Outdated
6. Console UI Plugin: OpenShift console extension that provides the Lightspeed chat interface. Integrates via ConsolePlugin CR and proxies requests to the backend.
7. AppServer backend: Python/FastAPI application that handles LLM queries, RAG retrieval, conversation management, and tool execution. Talks to LLM providers directly.
8. LCore backend: Dual-container deployment (Llama Stack + Lightspeed Stack) that provides the same API but routes through Llama Stack for LLM communication, enabling agent-based tool use and provider abstraction.
9. Operator-level resources: ServiceMonitor for operator metrics, NetworkPolicy restricting operator pod access.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For change detection, the operator differentiates owned (created by the operator) and external (provided by the user) resources and does change detection differently.

  1. For owned resources (described above), change detection is based on the resource version
  2. For external resources, change detection is implemented using watchers described in external resources

Comment thread .ai/spec/what/system-overview.md Outdated
### Operator Role

1. The operator manages exactly one OLSConfig CR per cluster, named "cluster". CRs with any other name must be ignored.
2. The operator deploys and manages four components: an application backend (AppServer or LCore), a PostgreSQL database, a Console UI plugin, and operator-level monitoring/networking resources.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resource changes management is different. See below

3. Compare content hashes (proxy CA cert hash) via annotations
4. If any differ: update spec + annotations, call RestartX() function
- RestartX() sets `ols.openshift.io/force-reload` annotation to `time.Now().Format(time.RFC3339Nano)`
- This triggers a rolling restart by changing the pod template
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For change detection, the operator differentiates owned (created by the operator) and external (provided by the user) resources and does change detection differently.

  1. For owned resources (described above), change detection is based on the resource version
  2. For external resources, change detection is implemented using watchers described in external resources

Two-layer spec structure under .ai/spec/:
- what/ (10 files): behavioral rules for system-overview, CRD API,
  reconciliation, app-server, postgres, console-ui, TLS, security,
  resource-lifecycle, and observability
- how/ (4 files): architecture specs for project-structure,
  reconciliation, deployment-generation, and config-generation

Includes comprehensive testing section in project-structure covering
unit tests (envtest + Ginkgo) and E2E test suite (12 test areas,
Makefile targets, required environment variables).

Specs are optimized for AI agent consumption and document the
operator thoroughly enough to enable a from-scratch rewrite.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@joshuawilson
Copy link
Copy Markdown
Author

I have also expanded the reference to E2E tests. To be clear, the spec is about behavior and not implementation so it should not list the tests.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 7, 2026

@joshuawilson: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@JoaoFula
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants