Working PR for ADR-191: Refactor DVC pipeline into proper OOP design patterns#202
Draft
jodavis wants to merge 11 commits into
Draft
Working PR for ADR-191: Refactor DVC pipeline into proper OOP design patterns#202jodavis wants to merge 11 commits into
jodavis wants to merge 11 commits into
Conversation
Test Results401 tests ±0 401 ✅ ±0 1m 59s ⏱️ -46s Results for commit b835493. ± Comparison against base commit b877808. This pull request removes 3 and adds 2 tests. Note that renamed tests count towards both.♻️ This comment has been updated with latest results. |
Owner
Author
|
@copilot rebase this PR branch on main so it gets the latest fixes. Make sure scripts/validate passes with no warnings, errors, or test failures. |
Contributor
|
@jodavis I'm unable to start working on this because of repository rules that prevent me from pushing to the branch:
See the documentation for more details. |
- Add EnableWindowsTargeting=true to WPF and Console E2E host projects so they build on Linux without WPF SDK installed - Fix PersistSettingsTests to use a custom controlled env var instead of %LocalAppData%, which doesn't expand on Linux, causing InputSettingsPath and ResolvedSettingsPath to be identical and conflicting Moq setups
…core/ - sample.py: Sample(ABC), SampleWithPath(Sample,ABC), TextSample, AudioSample, SampleSpectrogram, SampleTokens dataclasses with all spec fields. - manifest.py: Manifest[S] generic with O(1) by_id/by_content_hash lookups; ManifestStore.read() dispatches on sample_type; write() emits version-1 JSON. TextSample JSON omits path/parent_content_hash/applied_values; Spectrogram and Tokens JSON omits applied_values and includes parent_id per spec. - pyproject.toml: minimal pytest config so plain `pytest` from ml/ works. - Ambiguity resolutions: mutable dataclasses (no frozen=True); seed not a default field (callers pass seed=0 explicitly, matching spec's "PhraseVariator sets seed=0"). https://claude.ai/code/session_01DLVaMxWuhm75L3YaMdVTFT
1. Manifest.__init__: raise ValueError on duplicate sample ids (UUIDs must
be unique); keep first occurrence for duplicate content_hash (documented
contract — deterministic stages can produce identical hashes).
2. ManifestStore.read(): use data.get("sample_type") and raise ValueError
explicitly when the key is absent, matching the defensive pattern used
for "version".
3. ManifestStore.write(): validate that all samples share the same type
before writing; a mixed-type Manifest would produce a JSON file whose
sample_type header mismatches later entries, causing read() to KeyError.
Adds four new tests covering each guard.
https://claude.ai/code/session_01DLVaMxWuhm75L3YaMdVTFT
Tests use pytest's tmp_path fixture; tempfile was never called. https://claude.ai/code/session_01DLVaMxWuhm75L3YaMdVTFT
TextSample now derives its id from content_hash via __post_init__, so callers no longer pass id= explicitly. This means two TextSamples with the same content always share the same id, enforcing the deduplication invariant at the type level. Update ManifestStore._deserialise to omit id= when constructing TextSample, and update tests to match: duplicate-id and duplicate-content-hash tests now use AudioSample (where id and content_hash are independent) to exercise those code paths. https://claude.ai/code/session_01DLVaMxWuhm75L3YaMdVTFT
…-sample-class-hierarchy-and ADR-221: Implement Sample class hierarchy and ManifestStore with JSON round-trip serialisation
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Working PR for ADR-191: Refactor DVC pipeline into proper OOP design patterns