Skip to content

Latest commit

 

History

History
254 lines (198 loc) · 9.03 KB

File metadata and controls

254 lines (198 loc) · 9.03 KB

Handoff Document: Path Normalization Bug Fix

Date: October 4, 2025
Issue: #12 (S0-AG-02: Fix Patch Path Normalization)
PR: #11 (feat: Complete local dev workflow with git hooks)
Branch: agent-dev (worktree at /opt/andela/genai/patchpro-bot-agent-dev)
Status: ROOT CAUSE IDENTIFIED, FIX READY TO TEST


🎯 THE PROBLEM

Symptom: Patches generated by PatchPro have wrong file paths and git apply fails.

Root Cause: When analysis runs in async/multiprocessing mode (via git post-commit hook), the path normalization logic produces wrong paths:

  • Expected: "file": "src/patchpro_bot/analyzer.py"
  • Actual: "file": "analyzer.py"

Why Sync Works But Async Fails:

  • Sync execution (with --sync flag): Produces correct paths ✅
  • Async execution (via git hook with --async): Produces wrong paths ❌
  • Manual tests of the git command: Always return correct paths ✅
  • The bug ONLY manifests in actual multiprocessing execution

🔍 TECHNICAL DETAILS

The Failing Code Path

File: src/patchpro_bot/analyzer.py
Function: RuffNormalizer._normalize_file_path() (lines ~232-268)
Also affects: SemgrepNormalizer._normalize_file_path() (duplicate implementation)

Current Implementation (has debug code, user has reset it):

def _normalize_file_path(self, file_path: str) -> str:
    """Normalize file path to be relative to git root."""
    path_obj = Path(file_path)
    if not path_obj.is_absolute():
        return file_path
    
    try:
        file_dir = path_obj.parent  # e.g., ".../src/patchpro_bot"
        result = subprocess.run(
            ["git", "rev-parse", "--show-toplevel"],
            cwd=file_dir,  # ← PROBLEM: In async, this returns file_dir instead of git root!
            capture_output=True,
            text=True,
            check=True
        )
        git_root = Path(result.stdout.strip())
        # In async: git_root = ".../src/patchpro_bot" (WRONG)
        # Should be: git_root = ".../patchpro-bot-agent-dev" (CORRECT)
        
        rel_path = path_obj.resolve().relative_to(git_root)
        return str(rel_path)  # Returns "analyzer.py" instead of "src/patchpro_bot/analyzer.py"
    except (subprocess.CalledProcessError, FileNotFoundError, ValueError):
        return str(path_obj).lstrip('/')

The Paradox:

  • Debug logs showed git returning WRONG path in async: /opt/.../src/patchpro_bot
  • Manual tests from same directory returned CORRECT path: /opt/.../patchpro-bot-agent-dev
  • This suggests the multiprocessing environment has different state (env vars, git config, etc.)

The Proposed Fix

Remove the cwd=file_dir parameter:

result = subprocess.run(
    ["git", "rev-parse", "--show-toplevel"],
    # NO cwd parameter - use current working directory
    capture_output=True,
    text=True,
    check=True
)

Status: User has RESET the file, removing all debug code. The fix needs to be cleanly implemented.


📊 MODULE FLOW

Git Hook → CLI (analyze_commit) → Multiprocessing → Analysis Pipeline → Ruff/Semgrep Execution
                                                                              ↓
                                                             Returns absolute paths from tools
                                                                              ↓
FindingsAnalyzer → RuffNormalizer.normalize() → _convert_ruff_finding() → _normalize_file_path()
                                                                              ↓
                                                                  ⚠️ BUG OCCURS HERE
                                                                              ↓
                                                         Wrong relative path calculation
                                                                              ↓
                                    Save findings.json with wrong paths → LLM reads wrong paths
                                                                              ↓
                                                         Generate patches with wrong paths
                                                                              ↓
                                                              git apply fails ❌

🛠️ WHAT NEEDS TO BE DONE NEXT

Immediate Next Steps

  1. Implement Clean Fix (no debug code):

    # In both RuffNormalizer and SemgrepNormalizer
    # Remove cwd=file_dir parameter from git rev-parse call
  2. Test in Real Async Execution:

    cd /opt/andela/genai/patchpro-bot-agent-dev
    rm -rf .patchpro/  # Clean slate
    
    # Make a trivial change to trigger hook
    echo "# Test" >> CONTRIBUTING.md
    git add CONTRIBUTING.md
    git commit -m "test: verify path normalization fix [S0-AG-02]"
    
    # Wait for background analysis to complete (check status.json)
    sleep 5
    
    # Check if paths are correct
    cat .patchpro/findings.json | jq '.findings[0].location.file'
    # Should show: "src/patchpro_bot/..." not just "analyzer.py"
  3. If Fix Works:

    • Clean up any remaining debug commits (squash if needed)
    • Update Issue #12 with final solution
    • Push agent-dev branch to upstream
    • Update PR #11 description with Fixes #12
  4. If Fix Doesn't Work:

    • The problem is more complex than cwd parameter
    • May need to investigate:
      • Git worktree configuration
      • Environment variables (GIT_DIR, GIT_WORK_TREE)
      • Multiprocessing spawn vs fork method
      • Current working directory when process spawns

📝 IMPORTANT CONTEXT

Repository Setup

  • Main repo: /opt/andela/genai/patchpro-bot
  • Worktree: /opt/andela/genai/patchpro-bot-agent-dev (branch: agent-dev)
  • Python: 3.12.4 in .venv
  • Install: uv pip install -e . (editable install)
  • Shell: Fish (not bash - important for commands!)

Git Hook

  • Location: .git/hooks/post-commit
  • Triggers: On commits with Python files
  • Command: analyze-commit --async --with-llm
  • Runs via: nohup in background with multiprocessing.Process()

Issue Tracking

  • Issue #12: S0-AG-02 - Fix Patch Path Normalization
  • Label: pod:agent-core
  • Created: October 4, 2025
  • Status: In Progress
  • URL: #12

PR Status

  • PR #11: feat: Complete local dev workflow with git hooks
  • Branch: agent-dev → foundation/acceptance-driven-mvp
  • URL: #11
  • Needs: Update description with Fixes #12 once issue resolved

Files to Modify

  1. src/patchpro_bot/analyzer.py:
    • RuffNormalizer._normalize_file_path() (~line 232)
    • SemgrepNormalizer._normalize_file_path() (~line 388)
    • Both have same bug, need same fix

Test Files

  • Findings: .patchpro/findings.json
  • Raw ruff: .patchpro/ruff.json
  • Patches: .patchpro/patch_001.diff
  • Status: .patchpro/status.json

🎓 LESSONS LEARNED

  1. Manual tests don't reproduce multiprocessing issues: The bug only appears in actual async execution
  2. Git worktree is normal: Not the cause of the problem
  3. Debug logging was valuable: Confirmed git returns wrong path in async (via /tmp/patchpro_debug.log)
  4. User correctly stopped premature fixes: Always test hypothesis in actual execution before implementing
  5. The codebase was polluted: ~12 debug commits that need cleanup

🚨 USER PREFERENCES

  • Shell: Fish - use and for chaining, not &&
  • Python package manager: uv not pip
  • Issue-driven development: Create/reference issues for all work
  • Commit format: Conventional commits with issue reference [S0-AG-02]
  • No assumptions: Always verify before implementing "drastic changes"

✅ VERIFICATION CHECKLIST

Once fix is implemented, verify:

  • Sync execution still works (analyze-commit --sync)
  • Async execution produces correct paths (analyze-commit --async)
  • Findings have full relative paths: src/patchpro_bot/analyzer.py
  • Patches have correct paths in headers
  • git apply --check succeeds on generated patches
  • No debug code or logging left in production code
  • Issue #12 updated with solution
  • PR #11 updated with Fixes #12
  • Clean git history (squash debug commits if needed)

💡 QUICK START FOR NEXT ASSISTANT

# Navigate to worktree
cd /opt/andela/genai/patchpro-bot-agent-dev

# Check current state
git status
git log --oneline -5

# View the buggy function
cat src/patchpro_bot/analyzer.py | grep -A 50 "_normalize_file_path"

# Test current behavior (if not already tested)
rm -rf .patchpro/
echo "# Test" >> CONTRIBUTING.md
git add CONTRIBUTING.md
git commit -m "test: reproduce path normalization bug [S0-AG-02]"
sleep 5
cat .patchpro/findings.json | jq '.findings[0].location.file'

# Expected: "analyzer.py" (WRONG)
# Goal: "src/patchpro_bot/analyzer.py" (CORRECT)

Good luck! The solution is close - just needs clean implementation and real async testing.