Bloodhound v2 – CI/CD, Validation, and Operational Documentation Updates by mmccla1n · Pull Request #2 · codeplatoon-devops/Bloodhound

mmccla1n · 2026-03-16T16:56:24Z

This pull request introduces updates to the Bloodhound v2 project related to CI/CD workflows, operational tooling, validation infrastructure, and documentation. Changes include updates to GitHub Actions workflows such as introducing a dedicated manual operations workflow (bloodhound_ops.yml), adding a concurrency guard to prevent overlapping executions, expanding Lambda execution logging, streaming CloudWatch logs into CI output, and masking sensitive environment variables in logs. GitHub Actions authentication was updated to use AWS OIDC role assumption with a bootstrap script for configuring the IAM provider and role, eliminating the need for static AWS credentials in CI. A validation harness was added to support controlled teardown testing using Terraform-created resources, with safeguards requiring validation source identification, explicit target IDs, and validation tags on resources. Additional operational configuration options were introduced to define teardown limits and execution conditions, including deletion caps, simulation mode, Terraform deployment guard checks, and AWS account verification. Execution observability was expanded through CI summaries and improved Lambda logging visibility. Repository documentation was also updated to reflect the current project architecture, operational workflows, validation processes, and configuration system. The validation workflow option has been temporarily removed from CI workflow inputs while stabilization continues, though validation tooling remains available locally via repository scripts. These updates provide expanded CI/CD workflow capabilities, validation tooling for teardown operations, OIDC-based authentication for CI, improved execution logging, and updated operational documentation.

…wn controls Summary - Replace v1 script with v2 package architecture (scanner/budget/whitelist/teardown). - Slack reporting: scan summary (per region + totals, including 0 counts for scanned types), budget summary, teardown plan/results, and dedicated whitelisted resources list. - Whitelist: tag-based keep rule (default bloodhound:keep=true) plus optional KEEP_RESOURCE_IDS. - Teardown: dry-run by default; apply-mode gated by APPLY_CHANGES and supports simulate mode (TEARDOWN_SIMULATE) plus safety rails (TEARDOWN_TARGET_IDS, TEARDOWN_ALLOW_ALL). - Budgeting: 7-month cohort spend tracking and month-end projection via Cost Explorer. Operational - Add lambda handler entrypoint (lambda_function.lambda_handler) and local runner (run_local.py). - Add env.example and .env auto-loading for local runs. - Add .gitignore to prevent committing secrets/venvs/build zips. - Update requirements to resolve urllib3/botocore conflict. - Add v2 GitHub Actions workflow (invoke_lambda_v2.yml). - Add v2 plan doc and split Slack setup into SLACK_SETUP.md. Notes - v1 is preserved separately under versions/v1_0/ outside this repo directory; v2 deletes/terminations require explicit env flags.

- Move Lambda entrypoint into handlers/ and update Terraform handler + build pipeline - Move docs into docs/ and link from README - Move local runner + AWS helper JSON into tools/ - Remove empty scripts directory - Keep functionality unchanged (only paths/organization)

… the demo, feel free to use this just update with your aws profile, it builds 8 ec2, 2 rds and half get whitelisted, cleaned up readme

Mmc/bloodhound v2

…re for infra automation

…pendency split, and Slack manifest infrastructure Key changes: - Separate runtime and development dependencies - requirements.txt now contains only Lambda runtime packages - requirements-dev.txt added for local development/testing dependencies - Document Lambda dependency strategy - add docs/lambda_packaging.md explaining: - why boto3 should not be bundled - Lambda runtime dependency behavior - packaging workflow - future Docker-based packaging - Add Lambda packaging flow documentation and diagrams - Introduce Slack manifest-based configuration - infra/slack/bloodhound_v2_manifest.json becomes Slack app source of truth - add infra/slack/README.md documenting manifest structure and change policy - Update docs/SLACK_SETUP.md to support manifest-based setup with manual fallback - Update main README with improved onboarding flow and Slack setup references - Improve infra documentation - clarify Lambda environment variables and secret handling - document Terraform build behavior - Introduce Lambda alias infrastructure for versioned deployments and safe rollback - infra/alias.tf - lambda_alias_version_override variable - Improve Terraform packaging pipeline documentation - Update .gitignore and repo structure for build artifacts - Prepare repo for future deterministic Docker-based Lambda packaging No infrastructure behavior changes yet; Terraform deployment remains ZIP-based. Docker packaging planned for future phase.

- added terraform aws account guard to prevent deploy to wrong account - added lifecycle.prevent_destroy to lambda iam role and policy - confirmed terraform plan safe (1 add, 2 update, 0 destroy) slack integration - wired /seek and /seek_destroy to lambda function url - verified slack -> lambda -> aws scan flow - confirmed async lambda invocation working - slack messages returning scan + budget + teardown plan validation - ran /seek from slack - confirmed lambda invocation in cloudwatch logs - lambda run time ~14s, memory usage normal - dry-run teardown confirmed (no deletes) docs - added slack validation doc (watching cloudwatch logs) - added safe operations guide for teardown controls - added future infra hardening notes status phase 3 complete (slack wired) phase 4 validation in progress

…eployment guard Infrastructure safety - Add Terraform apply-mode guard preventing deployment when APPLY_CHANGES=true unless allow_apply_mode=true - Add deletion cap via TEARDOWN_MAX_DELETE_COUNT to prevent large accidental teardown operations - Update lambda.tf and variables.tf with documented safety logic and comments Runtime safety - Extend TeardownConfig with max_delete_count - Add executor guard to abort teardown when plan exceeds deletion limit - Preserve simulate-mode protections and dry-run behavior Operational validation - Add docs/validate_teardown.md with full controlled teardown test procedure - Update Slack/Lambda validation documentation and common failure scenarios - Document CLI methods for verifying Lambda environment variables and CloudWatch logs Configuration updates - Update env.example and terraform.tfvars.example to include TEARDOWN_MAX_DELETE_COUNT - Clarify apply-mode behavior and Terraform deployment guard Documentation improvements - Expand README teardown controls and safety model - Update infra README with destructive-mode deployment guard - Improve V2_PLAN and validation guides for operational clarity These changes introduce defense-in-depth protections for Bloodhound teardown operations while providing reproducible validation workflows for engineers.

…entation - Add automated validation tooling: - tools/run_validation_workflow.sh - tools/smoke_test_lambda.sh - tools/validate_teardown.sh - tools/show_validation_history.sh - Implement controlled teardown validation pipeline - Add Terraform validation resource: - infra/test_resource.tf - Introduce validation logging and history tracking - Add configuration system documentation: - docs/configuration_system.md - docs/run_validation.md - Update teardown validation documentation - Improve README with configuration safety warning and documentation index - Rename architecture document to docs/bloodhound_v2_plan.md - Update env.example with safety guard configuration - Add Terraform variables for validation resources and safety controls This commit introduces a full validation framework for Bloodhound v2 including smoke testing, controlled teardown verification, configuration safety guards, and documentation for operational workflows.

…ove teardown validation tooling Core changes - Standardized Slack command routing to maintain internal modes and - Added support for preview mode while preserving existing destructive flow - Ensured Lambda worker receives correct execution flags (apply_changes / simulate) - Fixed Slack command handler logic and improved safety gating for destructive operations Slack integration - Updated Slack manifest to include , , , , and - Aligned manifest URLs with Lambda Function URL endpoint - Updated Slack command documentation and operational guidance Infrastructure - Updated Terraform outputs and test resource configuration - Improved Lambda smoke test tooling Validation & tooling - Added automated validation workflow scripts - Improved teardown validation scripts and history utilities - Added operational docs for Slack app usage and troubleshooting Documentation - Updated Slack setup documentation - Updated configuration system documentation - Updated validation workflow documentation - Added troubleshooting and operational runbooks

- Combined header and mode_text generation into a single decision block - Added explicit DRY RUN banner to prevent operator confusion”

…face - Implement /v2_status Slack command for Bloodhound system status - Add system health indicator (🟢 🟡 🔴) based on teardown configuration and safety limits - Improve Slack report formatting (section dividers, vertical service/action lists, Top Regions Affected) - Update Slack manifest to include /v2_status - Update validation scripts and teardown tooling references - Synchronize documentation across README and docs/* with full v2 command set Commands now supported: /v2_seek /v2_seek_destroy_plan /v2_seek_destroy CONFIRM /v2_status

Mmc/bloodhound v2

…safety improvements (WIP) Summary ------- Refactors Bloodhound pipeline structure to separate event handling and operational services. This keeps the orchestration layer clean and improves maintainability of scan, budget, and teardown logic. Major Changes ------------- • Introduced new architecture layers - handlers/: event interpretation (Slack, validation harness, scheduled runs) - services/: operational pipeline logic (scan, budget, status, teardown) • Extracted pipeline logic from app.py into service modules: - scan_service.py - budget_service.py - status_service.py - teardown_service.py • Added handler modules: - slack_handler.py - validation_handler.py - scheduled_handler.py • execute_pipeline() now acts as a clean orchestrator coordinating services. Validation Safety Improvements ------------------------------ • Added validation-mode safeguards to ensure destructive testing can only affect validation resources. • Validation runs now enforce: - target ID filtering - validation tag checks - restricted teardown scope Validation Workflow (WIP) ------------------------- Validation pipeline currently under active testing: Terraform -> create validation instance Validation script -> capture instance ID Lambda invocation -> validation mode Teardown restricted via TEARDOWN_TARGET_IDS Script verifies instance deletion Status ------ Validation harness still in progress. Destructive validation behavior being verified before finalizing CI/CD integration.

- Add explicit event routing in app.py for slack_command, validation, and scheduled sources - Harden validation_handler with source checks and target_ids enforcement - Document validation harness architecture and payload model - Update README to reflect Lambda → app.run() → pipeline execution flow - Clarify dual execution paths (Slack operator vs validation harness) - Align documentation with v2 slash commands and current validation workflow - Fix outdated doc references and legacy command notes

Engineering notes: Changes made while validating the Bloodhound teardown workflow and debugging Lambda packaging behavior. Changes: - Add jq validation check to ensure Lambda response success - Add scheduled_handler entrypoint for scheduled scans - Improve Terraform Lambda packaging triggers and debug visibility - Exclude __pycache__ and .pyc files from Lambda bundle - Add AWS CLI '--cli-binary-format raw-in-base64-out' to Lambda invocation workflow - Add Terraform + Lambda troubleshooting documentation Validation: Pipeline verified using run_validation_workflow.sh with successful EC2 teardown validation.

This commit introduces the Bloodhound teardown validation system along with several reliability improvements. Key updates: - added strict bash mode (set -euo pipefail) to prevent silent script failures - added Lambda execution metric validation before checking AWS resources - replaced fixed sleep with a loop that waits until EC2 is fully terminated - added workflow logging using RUN_ID for each validation run - added log cleanup to keep only the last 3 validation logs - limited Lambda rebuilds to actual code changes - updated troubleshooting documentation for the build pipeline - confirmed full teardown validation workflow working end-to-end Validation workflow test: 1. smoke test checks Lambda configuration 2. Terraform creates a disposable EC2 instance 3. Lambda teardown deletes the instance 4. execution metrics are verified 5. EC2 termination is confirmed Result: PASS This validation workflow ensures Bloodhound safely deletes targeted resources.

Mmc/bloodhound v2

- Add structured CI log groups for improved debugging - Add Lambda error detection and StatusCode validation - Stream CloudWatch Lambda logs into GitHub Actions output - Document GitHub automation in docs/github_actions.md - Update README with GitHub Actions workflow references

Mmc/bloodhound v2

… assumption - Add scripts/bootstrap_github_oidc.sh to configure GitHub OIDC provider and IAM role - Detect AWS account ID dynamically using STS - Add IAM resource tagging for governance and ownership tracking - Add cleanup trap to remove temporary IAM policy artifacts (trust-policy.json, lambda-policy.json) - Update GitHub Actions workflow to use OIDC role assumption - Document GitHub automation and OIDC bootstrap process in README

Add GitHub OIDC bootstrap script and switch CI authentication to role…

- Restrict OIDC subject to repo:*/Bloodhound:ref:refs/heads/main - Update trust policy automatically if role exists - Improve documentation and security comments

- Allow forks of Bloodhound repo to assume role

added debug statements for GA to check output, temporary add

Bloodhound: add Terraform support for validation workflow

- Removed validation option from workflow_dispatch inputs - Validation pipeline still exists but requires CI hardening - Will be re-enabled prior to GA once validation workflow stabilizes

Bloodhound: temporarily disable validation workflow in CI

Mmc/bloodhound v2

…ution troubleshooting Expanded Lambda packaging documentation and troubleshooting guidance for the Bloodhound Lambda deployment. Changes include: - Added explanation of build environment vs Lambda runtime differences - Documented common dependency resolution failures during packaging - Added guidance on avoiding transitive dependency pinning - Expanded Docker-based packaging section for future deterministic builds - Added troubleshooting section covering pip dependency conflicts - Linked packaging documentation with Terraform troubleshooting guide These updates were added after encountering a real dependency conflict between botocore and a manually pinned urllib3 version during Lambda packaging. The documentation now explains: - how Lambda packages are built locally - why boto3 should not be bundled - how dependency conflicts occur - recommended dependency management practices - the long-term plan for Docker-based packaging This improves maintainability of the infrastructure documentation and provides engineers with clear debugging guidance for Lambda packaging failures.

…ocumentation * Document script-driven build process (build_lambda.sh) * Introduce layered build directory model (.build/deps, src, lambda_pkg) * Clarify Terraform triggers and packaging flow * Improve troubleshooting for archive/build edge cases * Add guardrails for modifying build pipeline Ensures documentation reflects deterministic, cache-aware Lambda packaging architecture

- routed scheduled events through dedicated handler instead of run() - added run_scheduled_scan() as explicit scheduled entrypoint - removed scheduled flow from generic run() path - updated lambda router to distinguish scheduled vs default invocations - aligned documentation to reflect actual execution model This change addresses the suspected recursion issue in scheduled Lambda executions. Validation pending via Terraform deploy, GHA, and Slack testing.

… path - routed scheduled events through dedicated handler instead of run() - added validate_scheduler mode to simulate EventBridge scheduled trigger in GHA - standardized CloudWatch logging across lambda entrypoint - added request_id tracing for improved log visibility Validation: - manual scan/status verified via GHA and CLI - WIP on scheduler path validation via validate_scheduler mode

Mmc/bloodhound v2..fix(lambda): prevent scheduled recursion and add scheduler validation path

…tional and infrastructure guides

Move Lambda packaging out of Terraform and into scripts/build_lambda.sh. Key changes: - Introduced scripts/build_lambda.sh to build the Lambda deployment package - Default build mode uses AWS SAM Docker image for Amazon Linux compatibility - Added optional local build mode for faster development - Terraform terraform_data.build_lambda_pkg now invokes the build script - archive_file continues to package .build/lambda_pkg into the deployment zip - Added structured build logging and package visibility for debugging Benefits: - Ensures dependencies match the AWS Lambda runtime environment - Keeps Terraform focused strictly on infrastructure - Produces deterministic and reproducible Lambda packages - Improves debugging when diagnosing Lambda import errors - Enables future CI/CD integration Docker builds are now the default to ensure production-safe artifacts.

Mmc/bloodhound v2

updated doc

…mbda build system - corrected Lambda packaging documentation to reflect real .build structure - removed outdated deps/src build directory references - documented final Lambda artifact (.build/bloodhound_lambda_v2.zip) - clarified Docker build environment using SAM build container (public.ecr.aws/sam/build-python3.10) - added Python packaging metadata explanation (*.dist-info, bin/) - improved Lambda packaging troubleshooting guidance - moved Terraform bootstrap import documentation to infra/README.md - removed Terraform bootstrap section from Slack documentation - clarified safe operations and teardown validation documentation - ensured infrastructure docs accurately reflect current build and deployment pipeline

…g artifacts - add comprehensive quick demo guide covering 8 operational scenarios - document local execution, Slack commands, teardown planning, and controlled deletion - add GitHub Actions automation and manual operations walkthroughs - document CloudWatch log inspection and teardown validation workflow - add architecture overview documentation - include demo artifacts (screenshots and PDF walkthroughs)

Mmc/bloodhound v2

docs: expose quick_demo guide in README and features documentation

moved screenshot pics

corrected text output

Corrected path to render pdfs correctly

corrected doc

tsmith4014 and others added 30 commits December 20, 2025 18:36

updated main readme post root reorg

05babb7

updated diagram, added .sh resource script for building resources for…

53446be

… the demo, feel free to use this just update with your aws profile, it builds 8 ec2, 2 rds and half get whitelisted, cleaned up readme

plan update

df713fd

Merge pull request #1 from mmccla1n/mmc/bloodhound_v2

eb82444

Mmc/bloodhound v2

infra(slack): add Bloodhound-V2 Slack app manifest and update gitigno…

24f89cb

…re for infra automation

Improve teardown plan Slack messaging and safety visibility

f08e4b0

- Combined header and mode_text generation into a single decision block - Added explicit DRY RUN banner to prevent operator confusion”

Merge pull request #2 from mmccla1n/mmc/bloodhound_v2

073fe9e

Mmc/bloodhound v2

Merge pull request #3 from mmccla1n/mmc/bloodhound_v2

ffc34bd

Mmc/bloodhound v2

updatineg docs

53c90ba

updating documentatino. add feautures file

e91e61e

Merge pull request #4 from mmccla1n/mmc/bloodhound_v2

d608c0e

Mmc/bloodhound v2

Merge pull request #5 from mmccla1n/mmc/bloodhound_v2

f33c065

Add GitHub OIDC bootstrap script and switch CI authentication to role…

- Allow forks of Bloodhound repo to assume role

1ee101b

- Restrict OIDC subject to repo:*/Bloodhound:ref:refs/heads/main - Update trust policy automatically if role exists - Improve documentation and security comments

Merge pull request #6 from mmccla1n/mmc/bloodhound_v2

f4b66e7

- Allow forks of Bloodhound repo to assume role

added debug statements for GA to check output, temporary add

c375c7e

Merge pull request #7 from mmccla1n/mmc/bloodhound_v2

ff2acab

added debug statements for GA to check output, temporary add

mmccla1n added 30 commits March 14, 2026 23:04

Merge pull request #20 from mmccla1n/mmc/bloodhound_v2

2c5cb7a

Bloodhound: add Terraform support for validation workflow

Bloodhound: temporarily disable validation workflow in CI

2d3fbee

- Removed validation option from workflow_dispatch inputs - Validation pipeline still exists but requires CI hardening - Will be re-enabled prior to GA once validation workflow stabilizes

Merge pull request #21 from mmccla1n/mmc/bloodhound_v2

34128a9

Bloodhound: temporarily disable validation workflow in CI

docs: update GitHub automation documentation and architecture flow

742c78a

update document

e79e6fc

Merge pull request #22 from mmccla1n/mmc/bloodhound_v2

6ca9b87

Mmc/bloodhound v2

Merge pull request #23 from mmccla1n/mmc/bloodhound_v2

baefba7

Mmc/bloodhound v2..fix(lambda): prevent scheduled recursion and add scheduler validation path

docs: update and synchronize Bloodhound v2 documentation across opera…

3beb650

…tional and infrastructure guides

Merge pull request #24 from mmccla1n/mmc/bloodhound_v2

1d85f68

Mmc/bloodhound v2

updated doc

4bede85

Merge pull request #25 from mmccla1n/mmc/bloodhound_v2

ccfd545

updated doc

update docs

6b2b93b

Merge pull request #26 from mmccla1n/mmc/bloodhound_v2

9246466

Mmc/bloodhound v2

docs: expose quick_demo guide in README and features documentation

7c3bffe

Merge pull request #27 from mmccla1n/mmc/bloodhound_v2

b5c584b

docs: expose quick_demo guide in README and features documentation

moved screenshot pics

101119f

Merge pull request #28 from mmccla1n/mmc/bloodhound_v2

cdc5d5a

moved screenshot pics

corrected text output

cbbbf43

Merge pull request #29 from mmccla1n/mmc/bloodhound_v2

fb8466b

corrected text output

Corrected path to render pdfs correctly

9c43d16

Merge pull request #30 from mmccla1n/mmc/bloodhound_v2

441e847

Corrected path to render pdfs correctly

corrected doc

190beb1

Merge pull request #31 from mmccla1n/mmc/bloodhound_v2

89d0460

corrected doc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bloodhound v2 – CI/CD, Validation, and Operational Documentation Updates#2

Bloodhound v2 – CI/CD, Validation, and Operational Documentation Updates#2
mmccla1n wants to merge 86 commits into
codeplatoon-devops:mainfrom
mmccla1n:main

mmccla1n commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mmccla1n commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants