Skip "dc/base/" prefix for custom data instances.#2025
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces an is_base_dc configuration to the ingestion helper to support non-base Data Commons environments, updating SQL query generation and class constructors accordingly. Feedback identifies critical SQL injection vulnerabilities in the query construction and recommends refactoring redundant conditional logic into class constructors to improve code maintainability and reduce duplication.
| prefix = "dc/base/" if self.is_base_dc else "" | ||
| provenances = [f"'{prefix}{name}'" for name in import_names] | ||
| provenance_filter = f" AND provenance IN ({', '.join(provenances)})" | ||
| gen_graphs_prov = 'dc/base/GeneratedGraphs' if self.is_base_dc else 'GeneratedGraphs' |
There was a problem hiding this comment.
The import_names are interpolated directly into the SQL query, which is vulnerable to SQL injection if an import name contains single quotes. Additionally, using the properties defined in the constructor simplifies the logic and reduces duplication.
| prefix = "dc/base/" if self.is_base_dc else "" | |
| provenances = [f"'{prefix}{name}'" for name in import_names] | |
| provenance_filter = f" AND provenance IN ({', '.join(provenances)})" | |
| gen_graphs_prov = 'dc/base/GeneratedGraphs' if self.is_base_dc else 'GeneratedGraphs' | |
| # Escape single quotes to prevent SQL injection. | |
| safe_names = [name.replace("'", "''") for name in import_names] | |
| provenances = [f"'{self.prefix}{name}'" for name in safe_names] | |
| provenance_filter = f" AND provenance IN ({', '.join(provenances)})" |
| class ProvenanceSummaryGenerator: | ||
| """Contains the SQL queries to generate ProvenanceSummary in the Cache table.""" | ||
| def __init__(self, executor: BigQueryExecutor) -> None: | ||
| def __init__(self, executor: BigQueryExecutor, is_base_dc: bool = True) -> None: |
There was a problem hiding this comment.
Can we default this to false? So we can only set the environment variable for our instance and not the DCP ones.
There was a problem hiding this comment.
@vish-cs prefers this to be set to true by default and have it set to false in terraform for DCP ones.
No description provided.