Your Terraform State File Is a Secrets Dump. Treat It Like One.

I want to show you something. Run this against any Terraform state file you have locally:

cat terraform.tfstate | jq '.. | strings | select(length > 20)' | sort -u

Look at what comes back. Database passwords, private key material, service account JSON, API tokens, OAuth client secrets. Everything Terraform touched at creation time is in that file, in plaintext, forever — including things you marked sensitive = true.

That last part surprises people. sensitive = true in Terraform suppresses the value in plan and apply output. It does not encrypt it in state. The state file has everything.

What's actually in your state

Terraform state is a complete snapshot of every resource attribute at the time of last apply. Not just the attributes you configured — every attribute the provider returned, including ones the provider generated and you never explicitly set.

A google_sql_database_instance resource in state will contain the database connection string. An aws_db_instance will contain the password in plaintext if you used a password input. An acme_certificate resource will contain the private key. A google_service_account_key resource will contain the key JSON.

Here's the actual state entry for a GCS bucket HMAC key:

{
  "type": "google_storage_hmac_key",
  "instances": [{
    "attributes": {
      "access_id": "GOOG1EXAMPLEACCESSID",
      "secret": "HBoXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
      "service_account_email": "my-sa@my-project.iam.gserviceaccount.com",
      "state": "ACTIVE"
    }
  }]
}

That secret field is a live credential. Anyone with read access to your state file has that credential.

The backend security problem

The default backend is local — a terraform.tfstate file on disk. That file almost always ends up in version control. I've found state files in public GitHub repos. I've found them committed to internal repos where the git history goes back years and the secrets are long-rotated but the history isn't. Never use the local backend for anything beyond a personal experiment.

GCS backend (GCP):

terraform {
  backend "gcs" {
    bucket = "my-tfstate-prod"
    prefix = "terraform/state"
  }
}

The bucket needs:

Uniform bucket-level access (disable ACLs)
No allUsers or allAuthenticatedUsers bindings
Versioning enabled (you want rollback capability and an audit trail)
Object retention or soft delete if your compliance requires it
CMEK encryption if you need key management control

gsutil mb -l US-CENTRAL1 gs://my-tfstate-prod
gsutil versioning set on gs://my-tfstate-prod
gsutil uniformbucketlevelaccess set on gs://my-tfstate-prod

# Encrypt with a customer-managed key
gsutil kms authorize -k projects/my-project/locations/us-central1/keyRings/tfstate/cryptoKeys/tfstate
gsutil rewrite -k projects/my-project/locations/us-central1/keyRings/tfstate/cryptoKeys/tfstate \
  gs://my-tfstate-prod/**

S3 backend (AWS):

terraform {
  backend "s3" {
    bucket         = "my-tfstate-prod"
    key            = "terraform/state"
    region         = "us-east-1"
    encrypt        = true
    kms_key_id     = "arn:aws:kms:us-east-1:123456789012:key/..."
    dynamodb_table = "tfstate-lock"
  }
}

The S3 bucket needs versioning, server-side encryption (encrypt = true), public access blocked, and bucket policy that explicitly denies s3:GetObject to everyone except the specific IAM roles used by your Terraform service accounts. The DynamoDB table is for state locking — without it, concurrent applies will corrupt state.

State locking matters more than most teams think

Without locking, two concurrent terraform apply runs will read the same state, compute different diffs, and write back conflicting state. The result is state corruption — resources that exist in the cloud but aren't in state (orphaned), or state entries for resources that were deleted. Recovery means manually editing state, which is how people accidentally delete things.

GCS has native object locking you can use, but the simpler approach for GCS is that the Terraform GCS backend handles locking through the GCS API directly. S3 requires DynamoDB for locking.

If you're using Terraform Cloud or HCP Terraform, locking is built in and enforced.

Separate state per environment, always

terraform/
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   └── backend.tf    # backend bucket: my-tfstate-dev
│   └── prod/
│       ├── main.tf
│       └── backend.tf    # backend bucket: my-tfstate-prod

Separate state files for dev and prod means:

A compromised dev environment can't read prod state
Dev applies don't lock prod state
You can give broader access to dev state for debugging without touching prod
State corruption in dev doesn't affect prod

Never use workspaces as a substitute for separate state backends for prod vs non-prod. Workspaces store state in the same backend with a prefix — the same service account that reads dev state can read prod state. They're useful for managing multiple identical environments (like per-region deployments), not for security isolation.

What the plan output leaks

terraform plan

Plan output shows what will change — and "what will change" sometimes means displaying sensitive values that are being rotated or replaced. A database password rotation in Terraform will show:

~ password = (sensitive value)

That (sensitive value) suppression happens in terminal output and in stored plan files only if you use Terraform Cloud or HCP Terraform's sensitive variable handling. If you're running Terraform in a GitHub Actions workflow and logging the output, those sensitive values can appear in logs.

Fix: redirect plan output to a file and only log the exit code in CI, or use Terraform Cloud's log filtering. Never pass sensitive values directly as -var flags — they'll appear in shell history and process listings.

# Bad — password visible in process list and runner logs
- run: terraform apply -var="db_password=${{ secrets.DB_PASSWORD }}"

# Better — use a var file or environment variables
- run: |
    export TF_VAR_db_password="${{ secrets.DB_PASSWORD }}"
    terraform apply

Environment variables prefixed with TF_VAR_ are picked up by Terraform without appearing in the command string. They still go into process memory, but they won't show up in ps aux or in runner log output of the command itself.

Module version pinning

Unpinned module sources are the IaC equivalent of unpinned GitHub Actions:

# Bad — resolves to whatever "main" is right now
module "vpc" {
  source = "terraform-google-modules/network/google"
  version = ">= 1.0"
}

# Good — pinned to a specific version
module "vpc" {
  source  = "terraform-google-modules/network/google"
  version = "9.3.0"
}

# For git sources, pin to a commit SHA
module "my-module" {
  source = "git::https://github.com/myorg/tf-modules.git//vpc?ref=abc123def456"
}

A compromised module registry entry or a repo with a moved tag can push malicious code into your Terraform run, which then gets applied to your infrastructure with your Terraform service account's permissions. Treat module pinning the same way you treat action pinning — version tags are mutable, commit SHAs are not.

Provider version constraints

Same principle for providers:

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"   # allows 5.x but not 6.x
    }
  }
  required_version = ">= 1.6.0"
}

The ~> operator (pessimistic constraint) allows patch and minor updates within a major version. Commit your .terraform.lock.hcl file — it records the exact provider versions and their checksums, making your deploys reproducible and catching unexpected provider changes.

Drift detection

Terraform doesn't automatically detect when someone changes infrastructure outside of Terraform — a manual console change, an API call, an auto-scaling event. That drift matters because:

The next terraform apply will revert it (potentially breaking things)
Drift can be how attackers persist access — add an IAM binding manually, Terraform doesn't see it until plan runs

Run terraform plan in CI on a schedule, not just on apply:

name: Drift Detection
on:
  schedule:
    - cron: '0 8 * * *'  # 8am daily

jobs:
  drift:
    runs-on: ubuntu-latest
    environment: prod-plan
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4
      - uses: google-github-actions/auth@v2
        with:
          workload_identity_provider: ${{ vars.WIF_PROVIDER_PROD }}
          service_account: "tf-plan-prod@my-project.iam.gserviceaccount.com"
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
      - run: |
          terraform plan -detailed-exitcode
          EXIT_CODE=$?
          if [ $EXIT_CODE -eq 2 ]; then
            echo "::warning::Drift detected in production infrastructure"
            exit 1
          fi

Exit code 2 from terraform plan -detailed-exitcode means changes are pending. A failed daily drift check becomes a Slack alert or GitHub notification that someone changed infrastructure outside the normal process.

What not to put in Terraform

Some things shouldn't go through Terraform state at all:

Database passwords for existing databases — use Secret Manager or Parameter Store, reference with a data source, never manage the secret value itself with Terraform
TLS private keys — generate outside Terraform if you need long-term keys; if you use Terraform's tls_private_key resource, the private key is in state in plaintext
Service account keys — use Workload Identity Federation instead (covered here); if you must create a key, immediately store it in Secret Manager and accept that it's in state

For secrets that legitimately need to go through Terraform (initial database setup, for example), use a remote backend with encryption, limit who can read state with strict IAM, and rotate the credential after the initial deploy so the state value is stale even if it leaks.

The state security checklist

Remote backend with encryption (GCS + CMEK, S3 + KMS)
Versioning and soft delete on the state bucket
State bucket IAM: plan SA gets objectViewer, apply SA gets objectAdmin, nothing else
Separate state backends per environment (not workspaces)
State locking enabled
.terraform.lock.hcl committed
Module and provider versions pinned
Sensitive values not managed as Terraform resources where avoidable
Drift detection running on a schedule
terraform.tfstate and *.tfvars in .gitignore