Skip to content

Make location lookups resilient against non-conformant data (#26)#358

Draft
morissette wants to merge 4 commits intoCodeForPhilly:mainfrom
morissette:resilient-location-lookups
Draft

Make location lookups resilient against non-conformant data (#26)#358
morissette wants to merge 4 commits intoCodeForPhilly:mainfrom
morissette:resilient-location-lookups

Conversation

@morissette
Copy link
Contributor

Closes #26

Summary

  • Case-insensitive matching: All lookups now use COLLATE NOCASE so "montgomery", "MONTGOMERY", and "Montgomery" all return the same results.
  • Whitespace trimming: Filter values are trimmed before the query, so "Montgomery " and " Montgomery" work correctly.
  • Fuzzy prefix matching: New lookupFuzzy() method uses SQL LIKE with a % suffix so abbreviations like "Mont." match "Montgomery" and "Montour". Trailing periods are stripped before the wildcard is appended.
  • SQL injection guard: Column names and filter keys are now validated against an allowlist of known locations table columns (zipCode, countyName, countyFips, stateAbbreviation) before being interpolated into SQL. Filter values were already parameterized.
  • Error handling: Replaced silent e.printStackTrace() with a structured log + RuntimeException so failures surface instead of returning empty results unexpectedly.
  • Resource management: Replaced manual null-guarded try/finally blocks with try-with-resources for Connection, PreparedStatement, and ResultSet.
  • Tests: Added 11 @QuarkusTest tests covering exact match, case-insensitive variants, whitespace trimming, empty result, and all fuzzy matching cases. Also fixed a pre-existing test port conflict (quarkus.http.test-port=0) that was causing all existing tests to be silently skipped in local dev when another service occupied port 8081. All 25 tests pass.

Changes applied to both library-api and builder-api copies of LocationService.

Manual testing note

No checks currently call LocationService, so there is no end-to-end path to exercise this through the screener UI. Residence checks would be the natural place to wire this in — for example, a "lives in county X" check that looks up whether a ZIP code falls within a given county. That work is out of scope for this PR but would be needed to validate the feature manually.

Test plan

  • cd library-api && mvn test — all 25 tests pass
  • Add a residence check that calls LocationService.lookup() or LocationService.lookupFuzzy() and verify results in the screener UI

🤖 Generated with Claude Code

morissette and others added 4 commits March 11, 2026 06:27
…ForPhilly#26)

- lookup(): use COLLATE NOCASE so "montgomery" and "MONTGOMERY" both
  match "Montgomery" without requiring exact case from callers
- lookup(): trim leading/trailing whitespace from all filter values
  before querying, so "Montgomery " works as well as "Montgomery"
- lookupFuzzy(): new method for prefix/abbreviation matching; strips
  trailing periods then appends % for LIKE COLLATE NOCASE, so
  "Mont." matches both "Montgomery" and "Montour"
- Extract shared JDBC execution into private lookupWithOperator() to
  eliminate duplication between lookup() and lookupFuzzy()
- Both changes applied identically to library-api and builder-api

Tests:
- Add LocationServiceTest (11 tests) covering exact match, case
  insensitivity, whitespace trimming, fuzzy prefix, fuzzy abbreviation,
  and empty-result cases
- Add src/test/resources/application.properties with
  quarkus.http.test-port=0 so tests don't conflict with running
  services on port 8081 (fixes pre-existing issue where all 14
  existing tests were being skipped in local dev)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Errors should never pass silently:
- Replace e.printStackTrace() with a proper Logger and re-throw as
  RuntimeException so callers can distinguish "no results" from
  "query failed". Silent swallowing made DB failures invisible.

Flat is better than nested:
- Replace try/finally/null-guard pattern with try-with-resources.
  Connection, PreparedStatement, and ResultSet are all AutoCloseable;
  the old 3-level nesting with manual null checks is now a single
  try-with-resources block.

Explicit is better than implicit / Readability counts:
- Rename lookupWithOperator -> query; the old name described the
  mechanism rather than the intent.
- Move COLLATE NOCASE out of the operator string and into the SQL
  builder so the operator param is a plain SQL operator ("=" or
  "LIKE"), not an opaque fragment embedding a JDBC placeholder.
- Rename normalised -> patterns in lookupFuzzy; the variable holds
  LIKE patterns, not normalised data.

Beautiful is better than ugly:
- Replace inline java.util.LinkedHashMap with a proper import.
- Add Logger as a named constant rather than inline creation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Filter values were already parameterized, but column names and filter
keys were interpolated directly into the SQL string. Validate both
against an allowlist of known locations-table columns before building
the query.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix ALLOWED_COLUMNS: remove non-existent 'stateName', add 'countyFips'
  which is a real column in the locations schema
- Guard against empty filters before building SQL (would produce invalid
  syntax and a cryptic SQLException)
- Move ALLOWED_COLUMNS constant to top of class per Java convention
- Import Set properly instead of inlining java.util.Set
- Make getDbConnection() package-private (only called via Arc handle
  within the same class; not part of the public API)
- Remove redundant inline comment in lookupFuzzy (covered by Javadoc)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@morissette morissette requested a review from prestoncabe as a code owner March 11, 2026 10:48
@prestoncabe prestoncabe marked this pull request as draft March 12, 2026 01:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make state and county lookups resilient against non-conformant data

1 participant