Skip to content

feat(gcp): switch GCP cloud connector to OIDC + WIF (matches Nullify backend)#42

Merged
vik-nullify merged 2 commits into
mainfrom
feat/gcp-oidc-integration
Apr 19, 2026
Merged

feat(gcp): switch GCP cloud connector to OIDC + WIF (matches Nullify backend)#42
vik-nullify merged 2 commits into
mainfrom
feat/gcp-oidc-integration

Conversation

@tim-thacker-nullify
Copy link
Copy Markdown
Member

Claude

Summary

The Nullify backend mints OIDC JWTs and exchanges them via Google STS as subject_token_type=urn:ietf:params:oauth:token-type:jwt. The customer-facing terraform module was creating an AWS-typed WIF provider (aws { account_id = ... } block, attribute_mapping keyed on assertion.arn/assertion.account). Google STS rejects the JWT against an AWS provider on the subject-token-type check — so today every customer install fails at Verify.

This PR rewrites the WIF provider to OIDC, with nullify_oidc_issuer_uri and nullify_tenant_id as the two new required inputs (replacing nullify_aws_principal_arn and nullify_aws_account_id). The pool's attribute_condition pins trust to the customer's specific Nullify tenant id, so even if Nullify's signing key were stolen, an attacker could not mint a token accepted by another tenant's provider.

Changes

  • modules/nullify-gcp-integration/main.tf:
    • Replace aws { account_id = var.nullify_aws_account_id } with oidc { issuer_uri = var.nullify_oidc_issuer_uri }.
    • Replace attribute_mapping to map google.subject = assertion.sub and attribute.tenant_id = assertion.tenant_id.
    • Replace attribute_condition with assertion.tenant_id == \"${var.nullify_tenant_id}\".
    • Update the SA-impersonation principalSet to bind by attribute.tenant_id/${var.nullify_tenant_id} instead of attribute.aws_role/....
    • Drop local.nullify_aws_role_name.
    • Remove roles/viewer from local.predefined_viewer_roles (per the internal architecture doc — it grants data-plane reads like compute.instances.getSerialPortOutput and cloudbuild.builds.get that leak secrets; the granular per-service viewers + roles/cloudasset.viewer cover the same surface without those).
  • New modules/nullify-gcp-integration/apis.tf enables prerequisite APIs on the host project (iam, iamcredentials, sts, cloudresourcemanager, cloudasset, serviceusage) so first terraform apply against a fresh project doesn't 403 at pool-creation time. disable_on_destroy = false so unrelated resources aren't broken on terraform destroy. WIF pool / provider / SA all depends_on the API enables.
  • modules/nullify-gcp-integration/variables.tf and terraform/variables.tf:
    • Drop nullify_aws_principal_arn, nullify_aws_account_id.
    • Add nullify_oidc_issuer_uri (validation: https://... no trailing slash) and nullify_tenant_id (validation: 1-100 chars [A-Za-z0-9_-]).
    • Add validation on host_project_id against the GCP project-ID regex.
  • modules/nullify-gcp-integration/outputs.tf: update workload_identity_provider reference (nullify_awsnullify_oidc).
  • terraform.tfvars.example, examples/{organization,folder,single-project}/main.tf: drop AWS vars, add OIDC vars.
  • terraform/README.md: rewrite — drop stale tenant_external_id row, add Prerequisites (installer IAM roles + APIs) and Troubleshooting sections (covering org-policy WIF allowlist, missing roles/iam.workloadIdentityPoolAdmin, issuer-URL mismatch, tenant-id mismatch, per-project PERMISSION_DENIED, iam.disableServiceAccountCreation).
  • scripts/install.sh: switch to gcloud iam workload-identity-pools providers create-oidc with the new attribute mapping/condition; enable APIs up front; replace NULLIFY_AWS_PRINCIPAL_ARN/NULLIFY_AWS_ACCOUNT_ID env vars with NULLIFY_OIDC_ISSUER_URI/NULLIFY_TENANT_ID; update principalSet to bind by tenant id; remove roles/viewer from the granted roles.
  • scripts/uninstall.sh: same role-list trim, default PROVIDER_ID updated to nullify-oidc.
  • docs/permissions.md: update Trust model section (OIDC instead of AWS), drop roles/viewer row.

Breaking changes

This rewrites the input contract for the module. Per an end-to-end audit, no customer can have completed onboarding on the AWS-WIF version (Google STS rejects every Verify), so there is no rollback population. Any sandbox installs need to terraform destroy and re-apply with the new variables.

Test plan

  • terraform fmt -check -recursive clean
  • terraform validate clean for the root module + every example dir
  • shellcheck clean for install.sh + uninstall.sh
  • Apply in a sandbox GCP project against Nullify staging (nullify_oidc_issuer_uri = \"https://gcp.dev.nullify.ai\", real tenant id), paste outputs into staging UI, click Verify → expect green per project
  • Trigger a scan via POST /context/cloud-integration/gcp/scan/start and confirm aws s3 ls shows per-service latest.json for compute, iam, vpc
  • terraform destroy cleanly removes all resources

Coordination

  • Tag this commit as gcp-v1.0.0 after merge — the monorepo PR's UI trust-instructions block links to the cloud-connector repo at this tag.
  • Public-docs PR (Nullify-Platform/public-docs) lands the customer-facing install guide that references this module.

…backend)

The Nullify backend mints OIDC JWTs (subject_token_type=urn:ietf:params:oauth:token-type:jwt,
signed by the platform oidc-gcp Lambda's RSA key in SSM, with `tenant_id` as a
custom claim) and exchanges them via Google STS. The customer-facing terraform
module was creating an AWS-typed WIF provider, which Google STS rejects on the
subject token type — every customer install failed at Verify.

This PR rewrites the WIF provider to OIDC, using `nullify_oidc_issuer_uri` and
`nullify_tenant_id` as the new required inputs. The pool's `attribute_condition`
pins trust to the customer's specific Nullify tenant id, so even if Nullify's
signing key were stolen, an attacker could not mint a token accepted by another
tenant's provider. The IAM binding moves from `attribute.aws_role/...` to
`attribute.tenant_id/...` for the same reason.

Other changes:
- New `apis.tf` enables prerequisite Google Cloud APIs (iam, iamcredentials,
  sts, cloudresourcemanager, cloudasset, serviceusage) on the host project so
  the first `terraform apply` against a fresh project doesn't 403 at
  pool-creation time. `disable_on_destroy = false` so we don't break unrelated
  resources on `terraform destroy`.
- Removes `roles/viewer` from the granted roles. Per the internal architecture
  doc, it grants data-plane reads (compute.instances.getSerialPortOutput,
  cloudbuild.builds.get) that leak secrets; granular per-service viewer roles
  + roles/cloudasset.viewer cover the same surface without those.
- README: drops the stale `tenant_external_id` reference, adds Prerequisites
  (installer IAM roles + APIs) and Troubleshooting sections.
- gcloud install.sh: switches to `create-oidc`, enables APIs up front, drops
  AWS env vars in favour of NULLIFY_OIDC_ISSUER_URI + NULLIFY_TENANT_ID.
- host_project_id now validated against the GCP project-ID regex.

Verified locally:
- terraform fmt -check / terraform validate clean for the module + every
  example dir
- shellcheck clean for install.sh + uninstall.sh

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tim-thacker-nullify tim-thacker-nullify added the major Major version updates (breaking changes) label Apr 19, 2026
- main.tf: drop stale "Nullify AWS principal" section header; WIF
  provider now says "OIDC Provider trusting Nullify's OIDC issuer"
  matching the code below.
- main.tf: fix broken reference to non-existent
  `modules/nullify-gcp-integration/README.md` — point at the real file
  `../../docs/permissions.md` instead.
- install.sh: wrap the first IAM call after `gcloud services enable` in
  a 5-attempt retry with 10s backoff. The IAM API's "enabled" state can
  take 10-30s to propagate on a fresh host project, and the previous
  straight-through invocation would fail with a cryptic 403 on the first
  install for customers.
- README.md: add a Prerequisites bullet clarifying that
  `scope = "projects"` requires `roles/resourcemanager.projectIamAdmin`
  on EVERY project in `project_ids`, not just `host_project_id`. Without
  this the predefined-role bindings fail on each sibling project.

All three are nits individually but each is the kind of thing that
derails a live customer install demo.

Verified:
- `terraform fmt -check -recursive` clean
- `terraform validate` clean on module + every example
- `shellcheck scripts/install.sh` clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vik-nullify vik-nullify marked this pull request as ready for review April 19, 2026 05:40
@vik-nullify vik-nullify merged commit 8afd801 into main Apr 19, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

major Major version updates (breaking changes)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants