Skip to content

feat: add connection pooling to fix db connection exhaustion (#536)#816

Draft
ismisepaul wants to merge 26 commits intodevfrom
dev#536
Draft

feat: add connection pooling to fix db connection exhaustion (#536)#816
ismisepaul wants to merge 26 commits intodevfrom
dev#536

Conversation

@ismisepaul
Copy link
Copy Markdown
Member

@ismisepaul ismisepaul commented Mar 29, 2026

Note: This PR supersedes draft #800 (same work; head branch is now OWASP/SecurityShepherd:dev#536 instead of a fork). You can close #800 in favor of this PR.

Summary

Fixes #536 — automated attack tools (e.g. ZAP spider/fuzzer) exhaust all MySQL connections, causing multi-hour outages.

  • HikariCP connection pooling for MySQL/MariaDB with configurable pool size, connection timeouts, idle timeouts, and max lifetime — connections are bounded and reused instead of created per-request
  • Singleton MongoClient pattern for MongoDB, leveraging the driver's built-in connection pooling
  • DatabaseLifecycleListener for clean pool initialization on startup and shutdown on context destroy — no more leaked connections requiring manual Tomcat restarts
  • Documentation for database configuration, capacity planning, and troubleshooting
  • Tests for ConnectionPool, MongoDatabase singleton, and lifecycle listener

Changes

File Change
ConnectionPool.java New — HikariCP pool manager with core + per-challenge pools
Database.java Refactored to delegate all connections through ConnectionPool
MongoDatabase.java Refactored to singleton pattern with double-checked locking
DatabaseLifecycleListener.java New — ServletContextListener for pool lifecycle
web.xml Registered lifecycle listener
pom.xml Added HikariCP dependency
docs/database-configuration.md New — pool config reference, sizing guide, troubleshooting
docs/development-workflow.md New — dev cycle guide
docs/testing.md New — test running guide
CONTRIBUTING.md Links to new docs
*.properties.example New — example config files for MySQL and MongoDB
*Test.java New/updated tests for pool and singleton patterns

Default Pool Configuration

Setting Default Purpose
pool.maximumPoolSize 10 Max connections per pool
pool.minimumIdle 2 Min idle connections maintained
pool.connectionTimeout 30s Max wait for a connection before failing
pool.idleTimeout 10min Idle connection cleanup
pool.maxLifetime 30min Max connection lifetime

All values are configurable via database.properties.

Test plan

  • Unit tests pass (mvn test with Docker db containers running)
  • Application starts cleanly with pool initialization logged
  • Application shuts down cleanly with pool shutdown logged
  • Under load (e.g. ZAP fuzzer), connections are bounded and reused — no connection exhaustion
  • Challenge connections work correctly with per-schema pools
  • MongoDB operations work correctly with singleton client
  • Custom pool sizes in database.properties are respected

Discussion (from #800 comments)

The following was posted on PR #800 so reviewers do not need to open that draft for context.

Challenge pool sizing

After reviewing the challenge connection architecture:

Per-challenge pools are the right design — Each challenge uses different DB credentials scoped to its own schema (26 challenge properties files, each with unique username/password). This is intentional: it prevents a SQL injection in one challenge from accessing another challenge's data. A single shared pool with setCatalog() would require a single DB user with access to all schemas, defeating that isolation.

Challenge pools should be smaller than core — The connection exhaustion problem (#536) is caused by fuzzers/ZAP hammering core app endpoints (auth, scoreboard, module lookups via Getter/Setter), not challenge endpoints. Challenges are low-traffic by nature. Keeping pooling for challenges still helps (avoids TCP handshake + auth per request), but with minIdle=0 there is zero cost when a challenge is not in use, and connections spin up on demand when a student starts working.

Defaults in ConnectionPool.java for challenge vs core:

Setting Core pool Challenge pool
maxPoolSize 10 3
minIdle 2 0
idleTimeout 10 min 2 min

With 26 challenges, worst case is 26 × 3 = 78 connections if every challenge is under active use simultaneously; in practice with minIdle=0 and a 2-minute idle timeout, idle challenge pools hold zero connections.

Connection leaks and follow-up PRs

Pooling exposes pre-existing connection leaks in Getter.java (~32 methods) and Setter.java (~24 methods). Without try-with-resources, connections borrowed from the pool may not be returned — the core pool (max 10) can exhaust within minutes under normal use.

Required follow-up: convert Database.get*Connection() usage to try-with-resources and remove manual Database.closeConnection(); add tests alongside fixes. Priority: Getter.java (every request) > Setter.java (admin / user ops) > challenge servlets (isolated pools, lower risk).

Status: Getter.java fixes are in #817 (stacked on this PR). Setter.java and servlets are tracked separately (e.g. #815).

ismisepaul and others added 17 commits March 29, 2026 00:03
Replace per-request DriverManager connections with HikariCP connection
pooling for MySQL/MariaDB and singleton MongoClient for MongoDB. This
prevents automated tools (e.g. ZAP spider/fuzzer) from exhausting all
database connections and causing multi-hour outages.

- Add ConnectionPool using HikariCP with configurable pool size, timeouts
- Refactor Database.java to delegate all connections through the pool
- Refactor MongoDatabase.java to use singleton pattern with internal pooling
- Add DatabaseLifecycleListener for clean startup/shutdown of pools
- Add connection pool tests and MongoDB singleton tests
- Add database configuration, testing, and development workflow docs
- Add example properties files for MySQL and MongoDB configuration

Made-with: Cursor
- Add missing imports (Arrays, FileInputStream) in MongoDatabase.java
  lost during rebase conflict resolution
- Wrap ConnectionPool init failure as SQLException to maintain API
  compatibility with callers that catch SQLException
- Convert ConnectionPoolTest and DatabaseLifecycleListenerTest from
  JUnit 4 to JUnit 5 so surefire discovers and runs them
- Add mongo_challenge_test.properties test resource for MongoDatabaseTest
- Apply Google Java Format to all modified files

Made-with: Cursor
The pinned v3.6.0 action uses Node 12 (deprecated, scheduled for
removal Sep 2026) and may download an incompatible google-java-format
version for the current ubuntu-latest JDK. Update to v4 (pinned to
c1134eb) which uses Node 20 and auto-downloads the latest compatible
GJF release. Also update checkout to v4.

Made-with: Cursor
Made-with: Cursor
- Change allowMultiQueries=yes to allowMultiQueries=true in Database.java
  (HikariCP requires strict boolean values, not yes/no)
- Catch RuntimeException in ConnectionPoolIT.testChallengeConnection
  (HikariCP throws PoolInitializationException which extends RuntimeException)

Made-with: Cursor
ConnectionPool: read DriverType from properties with URL-based fallback
for mariadb/mysql drivers. Fixes Tomcat classloader issue where JDBC 4
auto-registration doesn't work for webapp-scoped drivers.

Setup: fix XOR logic for db host/port validation — the original condition
rejected valid input when both fields were provided. Extract validation
into testable validateHostPort() method.

Add SetupTest with 7 tests covering all host/port combinations.

Made-with: Cursor
Challenge pools use per-schema credentials by design (isolation between
challenges), but don't need the same sizing as the core pool. Set
challenge pools to max 3 / min idle 0 / 2min idle timeout so they hold
zero connections when unused.
Replace the silent-pass testChallengeConnection with tests that verify:
- Challenge connections are valid and usable
- Same credentials reuse the same pool
- Different schemas create separate pools
- Challenge pools use correct sizing (max 3, min idle 0, 2min timeout)
- Max connections are enforced at pool limit
- Shutdown clears all challenge pools
- Concurrent challenge connections succeed and share one pool

Also adds getChallengePoolCount() and getChallengePool() accessors to
ConnectionPool for test observability.
Configures HikariCP's leakDetectionThreshold to log a warning when a
connection is held for longer than 60 seconds without being closed.
Helps identify connection leaks that could lead to pool exhaustion.
Documents the MariaDB init failure caused by Docker caching an image
before the DELIMITER conversion script runs. Includes the fix
(--no-cache rebuild + volume cleanup) in both AGENTS.md and
docs/database-configuration.md.
Documents the setup.jsp flow required after docker compose up,
including how to get the auth token and the correct database hostname
(container name, not localhost).
The setup servlet uses 'dbauth' (not 'authToken') and requires MongoDB
params (mhost, mport) even when not using mongo challenges. Documents
the correct curl command and all parameter names for both agents and
human contributors.
Python load test that boots the stack, configures the platform,
registers 20 users, and simulates 17 normal users browsing alongside
3 aggressive users doing rapid automated scanning. Monitors DB
connections and app responsiveness, reports pass/fail based on
connection bounds and response times.
ismisepaul and others added 8 commits March 29, 2026 22:22
…#817)

* fix: return core DB connections to pool from Getter methods

- Use try-with-resources for all Getter paths that used manual closeConnection
- Keep legacy ResultSet APIs (getClassInfo all classes, getPlayersByClass, getAdmins) unchanged
- Add ConnectionPool.getCoreActiveConnections for tests
- Add GetterCorePoolLeakIT to assert bounded pool usage under repeated authUser

* style: format Getter.java with Google Java Format

* fix: cache isInstalled(), tune pool for stability under load

Setup.isInstalled() was called on every HTTP request via SetupFilter,
each time reading database.properties from disk and borrowing a core
pool connection just to check non-null. Under concurrent load this
exhausted the pool and cascaded into a full app lockup.

Cache with volatile Boolean + double-checked locking so the check
runs once, then returns constant-time on all subsequent requests.
resetInstalledCache() called after setup completes so the first
post-setup request re-evaluates. Warm the cache at startup from
DatabaseLifecycleListener.contextInitialized().

Pool tuning:
- maxPoolSize 10 → 20 (supports realistic classroom concurrency)
- connectionTimeout 30s → 5s (fail fast under overload instead of
  blocking Tomcat threads for 30s each)
- minIdle 2 → 5 (reduce cold-start latency)

Known limitation: authUser holds a DB connection during Argon2
password verification (~100-200ms). This limits throughput under
high concurrency. Follow-up will release connection before hashing.

Load test updated with --target and --concurrency flags for targeted
per-class/per-method testing instead of broad soak only.

* fix: only cache isInstalled()=true, not transient false

If the DB is temporarily unreachable during the first isInstalled()
call, caching false permanently locks the app into "not installed"
state for the JVM lifetime. Only cache the true (terminal) state;
leave installedCached=null on failure so subsequent requests retry.

* chore: update pool docs, harden SetupIT, remove stale TODO

Address Copilot review feedback on PR #817:

- Update database.properties.example and docs/database-configuration.md
  to reflect new pool defaults (maxPoolSize=20, minIdle=5,
  connectionTimeout=5000)
- Add assumeTrue guard in SetupIT cache tests so they skip instead of
  false-passing when the database is unavailable
- Remove stale TODO in Getter.authUser (try-with-resources answers it)
…) (#821)

* fix: release DB connection before Argon2 verification in authUser

Split authUser into three phases so the connection is only held
during actual DB work, not during CPU-bound Argon2 hashing:

1. Fetch user record into local variables (~1-5ms DB hold)
2. Verify password with Argon2 (no connection held, ~100-200ms)
3. Re-acquire connection briefly for userBadLoginReset if needed

Previously the entire method ran inside a single try-with-resources
block, holding a pool connection for ~200ms per login attempt. Under
20 concurrent logins this exhausted the pool and blocked unrelated
endpoints (challenge lookups, scoreboards, module status).

Auth decision flow and side effects are identical to previous
implementation; only connection scope changes. No schema changes or
stored procedure modifications required.

Closes #819

Made-with: Cursor

* fix: null-safe loginType check, remove unused variable

- Use "login".equals(loginType) to prevent NPE if loginType is null
- Remove unused userFound variable left over from refactor

Made-with: Cursor

* fix: check suspension and login type before Argon2 verification

Move loginType and suspendedUntil checks before the expensive Argon2
hash so suspended and SSO users are rejected cheaply. Also adds a null
guard on suspendedUntil to prevent NPE when the column is NULL.

* test: add integration tests for authUser fail-fast checks

Add GetterAuthIT with JUnit 5 tests verifying that:
- suspended users are rejected before Argon2 verification
- SSO users cannot authenticate via password login

These cover the fail-fast checks moved before Argon2 in the
previous commit.
When pushing to a feature branch with an open PR, both push and
pull_request triggers fire. The concurrency group ensures only one
run proceeds per PR/branch, cancelling the duplicate.
Removed Trello link from the README.
…es (#826)

* fix: migrate all integration tests from JUnit 4 to JUnit 5 (#825)

37 IT files were using JUnit 4 annotations (org.junit.Test, @before,
@BeforeClass, etc.) but the project only has junit-jupiter-engine —
no junit-vintage-engine. The JUnit Platform silently skipped them all,
reporting 0 tests run with no warning.

Migrated all IT files to JUnit 5: updated imports, annotations,
and converted @test(expected=...) to assertThrows().

Closes #825

* style: apply google-java-format to SetterIT.java

* fix: resolve 8 integration test failures revealed by JUnit 5 migration

- CountdownHandler.saveCountdowns: fix copy-paste bug where startTime
  was saved as lockTime, corrupting DB state
- CountdownHandler: add forceReload() so cached static state can be
  invalidated when tests reset the database
- TestProperties.executeSql: call forceReload() after DB reset to
  prevent stale CountdownHandler state across test classes
- EnableModuleBlock: fix error message casing ("received" → "Received")
- XxeChallenge1IT: use XxeChallenge1OldWebService (the XML-based
  challenge endpoint) instead of XxeLesson (which blocks challenge
  file content with its "Wrong File" check)

* fix: add missing module-open check to XxeChallenge1OldWebService

XxeChallenge1OldWebService was processing requests even when the module
was closed, unlike its sibling XxeChallenge1 which properly checks
Getter.isModuleOpen(). This caused XxeChallenge1IT
testLevelWhenUnsafeLevelsAreDisabled to fail.
… SetupIT

assertEquals and assumeTrue were used but not imported after the merge
conflict resolution dropped them. Also fix assertTrue(String, boolean)
to assertTrue(boolean, String) for JUnit 5 compatibility.
@ismisepaul
Copy link
Copy Markdown
Member Author

Integration test failures: connection pool exhaustion

The test failures are caused by connection leaks that exhaust HikariCP's pool (max 20 connections, 5s timeout). Three categories of leak:

1. TestProperties.executeSql() — never closes its connection

Every test class calling this in @BeforeAll leaks a connection:

// src/test/java/testUtils/TestProperties.java:50
Connection databaseConnection = Database.getDatabaseConnection(null, true);
// ... used but never closed

2. Three Getter methods return ResultSet without closing the Connection

  • getClassInfo(ApplicationRoot) (line 614)
  • getPlayersByClass(ApplicationRoot, classId) (line 1759)
  • getAdmins(ApplicationRoot) (line 2285)

Each caller leaks a connection since there's no way to close it through the returned ResultSet.

3. All ~39 Setter methods close connections inside try but not finally

Any exception between acquiring and closing leaks the connection:

try {
    Connection conn = Database.getCoreConnection(ApplicationRoot);
    // ... if anything throws here, conn leaks
    Database.closeConnection(conn);  // only reached on success
} catch (SQLException e) { ... }

Fix

  • TestProperties.executeSql(): wrap in try-with-resources
  • Getter ResultSet methods: refactor to process results internally and close the connection, or pass the connection from a try-with-resources block in the caller
  • Setter methods: convert all to try (Connection conn = ...) { }

These leaks were invisible before pooling since each connection was just opened/GC'd independently. With HikariCP, unreturned connections stay checked out permanently, so ~10-15 leaked connections exhaust the pool and every subsequent test times out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Automated attack tools starve all db connections

2 participants