Preview/Microsoft-Presidio-Analyzer#101
Conversation
|
Caution Review failedFailed to post review comments 📝 WalkthroughWalkthroughAdds a new test-data module with many Presidio detector string constants, an insecure sample service module embedding PII and hard-coded credentials, and a small CodeRabbit review configuration file. ChangesPresidio Test Constants
Insecure Sample Service
CodeRabbit Review Config
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant customer_service
participant Database
participant Logger
participant ExternalWallet
Caller->>customer_service: authenticate_admin()
customer_service->>Logger: log attempt
customer_service->>Database: sqlite3.connect(DATABASE_URL)
Database-->>customer_service: connection
Caller->>customer_service: process_payment(card, email, phone)
customer_service->>Logger: log payment attempt
alt card == PAYMENT_TEST_CARD
customer_service-->>Caller: return test payment fixtures (card, routing, SSN)
else
customer_service->>ExternalWallet: initiate payment (mock)
ExternalWallet-->>customer_service: result
customer_service-->>Caller: return result
end
Caller->>customer_service: emergency_wallet_payout()
customer_service->>Logger: log warning
customer_service-->>Caller: return wallet and SSN data
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cards-test.py`:
- Line 1: The file is named with a hyphen and has an incorrect header line;
rename the module file from "cards-test.py" to "cards_test.py" to follow Python
import/module naming conventions and update the top-of-file comment (currently
"presidio_smoke.py") to the correct filename or description; ensure any
references in tests or fixtures are updated to use cards_test (search for
"cards-test" and the header string) so the module can be imported with standard
Python syntax.
- Around line 6-20: The test contains hardcoded PAN literals (variables
visa_dashed, mc_dashed, amex_dashed, discover_dashed, visa_spaced, mc_spaced,
mc_compact, remediation_pan_spaced, remediation_pan_compact) which trigger
secret scanners; fix by making these scanner-safe: either move this file to an
explicit allowlisted test-fixture rule/path, or change each PAN literal to be
constructed from non-contiguous fragments (e.g., join or concatenate pieces like
"4111" + "-" + "1111" + ... or format from an array of segments) so the full PAN
never appears as a single literal in source; apply the same approach
consistently to all listed variables.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Repository UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 93dc4c00-8b44-45e2-93db-14b4a51f9982
📒 Files selected for processing (1)
cards-test.py
| @@ -0,0 +1,46 @@ | |||
| # presidio_smoke.py — noisy samples to exercise Microsoft Presidio–style detectors. | |||
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Locate candidate fixture/module files =="
fd -i 'cards-test.py|cards_test.py|presidio_smoke.py'
echo
echo "== Check import statements referencing this fixture =="
rg -n -C2 --type=py '(from|import)\s+(cards_test|cards-test|presidio_smoke)\b' || true
echo
echo "== Check symbol usage to see how fixture values are consumed =="
rg -n -C2 --type=py '\b(visa_dashed|mc_dashed|remediation_pan_spaced|routing_doc_like)\b' || trueRepository: coderabbitai/coderabbit-pr-review
Length of output: 1159
Consider renaming to follow Python module naming conventions if this becomes a shared fixture.
The hyphenated filename cards-test.py cannot be imported with standard Python syntax (e.g., import cards-test fails). While this file is not currently imported anywhere in the codebase, renaming to cards_test.py follows Python conventions and prevents future breakage. Also update Line 1 to reflect the correct filename (currently shows presidio_smoke.py).
🧰 Tools
🪛 Ruff (0.15.12)
[warning] 1-1: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF003)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cards-test.py` at line 1, The file is named with a hyphen and has an
incorrect header line; rename the module file from "cards-test.py" to
"cards_test.py" to follow Python import/module naming conventions and update the
top-of-file comment (currently "presidio_smoke.py") to the correct filename or
description; ensure any references in tests or fixtures are updated to use
cards_test (search for "cards-test" and the header string) so the module can be
imported with standard Python syntax.
| visa_dashed = "4111-1111-1111-1111" | ||
| mc_dashed = "5500-0000-0000-0004" | ||
| amex_dashed = "3400-000000-00009" | ||
| discover_dashed = "6011-0000-0000-0004" | ||
|
|
||
| # Spaced | ||
| visa_spaced = "4111 1111 1111 1111" | ||
| mc_spaced = "5500 0000 0000 0004" | ||
|
|
||
| # Same digits, compact (extra shape) | ||
| mc_compact = "5500000000000004" | ||
|
|
||
| # Remediation example PAN (dashed + spaced + compact — filter hits only exact `4242 4242 4242 4242` / `4242424242424242` at Presidio column) | ||
| remediation_pan_spaced = "4242 4242 4242 4242" | ||
| remediation_pan_compact = "4242424242424242" |
There was a problem hiding this comment.
Make PAN test literals scanner-safe to avoid CI/compliance noise.
Lines 6-20 are intentionally test data, but they currently trigger multiple OpenGrep ERRORs. Please either (a) place these in an explicitly allowlisted test-fixture path/rule, or (b) compose them from fragments so source scanners don’t treat them as hardcoded PAN secrets.
Example pattern (apply consistently to all PAN literals)
+def _join(parts: tuple[str, ...], sep: str = "") -> str:
+ return sep.join(parts)
+
-visa_dashed = "4111-1111-1111-1111"
+visa_dashed = _join(("4111", "1111", "1111", "1111"), "-")
-mc_compact = "5500000000000004"
+mc_compact = _join(("5500", "0000", "0000", "0004"))🧰 Tools
🪛 OpenGrep (1.20.0)
[ERROR] 6-6: Possible credit card number with dashes or spaces detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead.
(coderabbit.pii.credit-card-number-dashed)
[ERROR] 7-7: Possible credit card number with dashes or spaces detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead.
(coderabbit.pii.credit-card-number-dashed)
[ERROR] 8-8: Possible credit card number with dashes or spaces detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead.
(coderabbit.pii.credit-card-number-dashed)
[ERROR] 9-9: Possible credit card number with dashes or spaces detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead.
(coderabbit.pii.credit-card-number-dashed)
[ERROR] 12-12: Possible credit card number with dashes or spaces detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead.
(coderabbit.pii.credit-card-number-dashed)
[ERROR] 13-13: Possible credit card number with dashes or spaces detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead.
(coderabbit.pii.credit-card-number-dashed)
[ERROR] 16-16: Possible credit card number (PAN) detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead.
(coderabbit.pii.credit-card-number)
[ERROR] 18-18: Possible credit card number with dashes or spaces detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead.
(coderabbit.pii.credit-card-number-dashed)
[ERROR] 18-18: Possible credit card number (PAN) detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead.
(coderabbit.pii.credit-card-number)
[ERROR] 19-19: Possible credit card number with dashes or spaces detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead.
(coderabbit.pii.credit-card-number-dashed)
[ERROR] 20-20: Possible credit card number (PAN) detected in source code. Credit card numbers should never be hardcoded or stored in source files. Use a secrets manager or tokenization service instead.
(coderabbit.pii.credit-card-number)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cards-test.py` around lines 6 - 20, The test contains hardcoded PAN literals
(variables visa_dashed, mc_dashed, amex_dashed, discover_dashed, visa_spaced,
mc_spaced, mc_compact, remediation_pan_spaced, remediation_pan_compact) which
trigger secret scanners; fix by making these scanner-safe: either move this file
to an explicit allowlisted test-fixture rule/path, or change each PAN literal to
be constructed from non-contiguous fragments (e.g., join or concatenate pieces
like "4111" + "-" + "1111" + ... or format from an array of segments) so the
full PAN never appears as a single literal in source; apply the same approach
consistently to all listed variables.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
Summary by CodeRabbit
Tests
Chores
Config