Skip to content

Skip "dc/base/" prefix for custom data instances.#2025

Open
SandeepTuniki wants to merge 2 commits into
masterfrom
aggregation-cleanup
Open

Skip "dc/base/" prefix for custom data instances.#2025
SandeepTuniki wants to merge 2 commits into
masterfrom
aggregation-cleanup

Conversation

@SandeepTuniki
Copy link
Copy Markdown
Contributor

No description provided.

@SandeepTuniki SandeepTuniki requested review from gmechali and vish-cs May 21, 2026 17:15
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an is_base_dc configuration to the ingestion helper to support non-base Data Commons environments, updating SQL query generation and class constructors accordingly. Feedback identifies critical SQL injection vulnerabilities in the query construction and recommends refactoring redundant conditional logic into class constructors to improve code maintainability and reduce duplication.

Comment thread import-automation/workflow/ingestion-helper/aggregation_utils.py
Comment on lines +180 to +183
prefix = "dc/base/" if self.is_base_dc else ""
provenances = [f"'{prefix}{name}'" for name in import_names]
provenance_filter = f" AND provenance IN ({', '.join(provenances)})"
gen_graphs_prov = 'dc/base/GeneratedGraphs' if self.is_base_dc else 'GeneratedGraphs'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The import_names are interpolated directly into the SQL query, which is vulnerable to SQL injection if an import name contains single quotes. Additionally, using the properties defined in the constructor simplifies the logic and reduces duplication.

Suggested change
prefix = "dc/base/" if self.is_base_dc else ""
provenances = [f"'{prefix}{name}'" for name in import_names]
provenance_filter = f" AND provenance IN ({', '.join(provenances)})"
gen_graphs_prov = 'dc/base/GeneratedGraphs' if self.is_base_dc else 'GeneratedGraphs'
# Escape single quotes to prevent SQL injection.
safe_names = [name.replace("'", "''") for name in import_names]
provenances = [f"'{self.prefix}{name}'" for name in safe_names]
provenance_filter = f" AND provenance IN ({', '.join(provenances)})"

Comment thread import-automation/workflow/ingestion-helper/aggregation_utils.py
Comment thread import-automation/workflow/ingestion-helper/aggregation_utils.py
Comment thread import-automation/workflow/ingestion-helper/aggregation_utils.py
Comment thread import-automation/workflow/ingestion-helper/aggregation_utils.py
Comment thread import-automation/workflow/ingestion-helper/aggregation_utils.py
Comment thread import-automation/workflow/ingestion-helper/aggregation_utils.py
Comment thread import-automation/workflow/ingestion-helper/aggregation_utils.py
Comment thread import-automation/workflow/ingestion-helper/aggregation_utils.py
class ProvenanceSummaryGenerator:
"""Contains the SQL queries to generate ProvenanceSummary in the Cache table."""
def __init__(self, executor: BigQueryExecutor) -> None:
def __init__(self, executor: BigQueryExecutor, is_base_dc: bool = True) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we default this to false? So we can only set the environment variable for our instance and not the DCP ones.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vish-cs prefers this to be set to true by default and have it set to false in terraform for DCP ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants