Starting an IaC Repository with GitHub and Terraform

There are only two hard things in Computer Science: cache invalidation, naming things, and off by one errors.

  • Loosely attributed to Phil Karlton

Let's start off with a bit of a hot take - Terraform isn't particularly hard to learn. It does use unique configuration languages, but most people don't struggle with learning the code.

Infrastructure-as-Code (IaC) isn't about the programming language - it's about establishing a body of discipline around managing infrastructure. Tools like Ansible and Terraform simply facilitate the practice.

Instead of focusing on some programmatically elegant tricks here, let's try to focus on how to build a "starter kit" of sorts to build upon this practice. The managed resources in this example will be intentionally simple to shift focus to the structure, naming, and release management aspects of Infrastructure-as-Code.

IaC Starter Kit

Repositories (Structure and Naming)

Start a GitHub repository with some basic documentation before contributing code:

  • README.md should describe what the project is for, describe the project structure: how the software works.
  • USAGE.md should describe how to consume resources within the project, how release management works.
  • CONTRIBUTING.md should describe how to contribute to the codebase: the branch and merge workflows and rules of conduct go here.
  • CHANGELOG.md should be created based on the Keep a Changelog standards
  • .gitignore should make sure that any temporary files created by tools, like pycache, Terraform locks don't accidentally get committed to the repository
  • markdownlint.json and any other linting rules - automated code QC is a good thing
  • img/ should be created to contain rendered images for documentation. Use illustrations to make the repository easy to understand!
  • dwg/ should be created to contain unrendered diagrams, e.g. svg, d2
  • doc/ may be created for any automatically rendered documentation, e.g. ReadTheDocs

Once these are created, start mapping out what loose structures should be included in the repository. Here are some examples:

  • conf.d/ for any flat file configurations that may get deployed
    • Make subdirectories for any machine targets
  • roles/ for any Ansible roles. Since this is IaC, breaking this down into roles instead of one giant pile will be simpler
    • Within each role:
      • templates/ should contain any Jinja2 templates. Ansible will auto-detect this folder by name, and it simplifies structure quite a bit.
      • requirements.txt should contain any software prerequisites for the Ansible playbooks. This facilitates CI/CD tooling with virtual environments, in addition to better documenting software dependencies.
      • Playbooks and truth files, of course
  • terraform/ for any Terraform code
    • modules/ for any Terraform re-usable modules
    • accounts/ for any Terraform tenants, e.g. AWS Accounts, CloudFlare accounts, or other unrelated resources to keep them separate and organized
  • python/ for any Python code
  • js/ for any JavaScript
  • ...and so on.

Now that the raw structure is somewhat laid out, we can shift focus to the Terraform account's subdirectory (in /terraform/accounts/{{ account_type }}_{{ account_id }}_{{account_name}}) structure. Here's what I've seen lead to a maintainable code base:

  • /terraform/accounts/cloudflare_12345_engyak_co
    • templates/ for any gotmpl templates
    • provider.tf should declare any Terraform pre-requisites, e.g. the Cloudflare provider minimum version
    • vars.tf should declare any input variables. In my experience, this is a good place for module inputs, but not as useful for actual infrastructure declarations
    • locals.tf should declare any Don't Repeat Yourself (DRY) variables. I typically use them for consistent resource names and IDs. There are a lot of opinions about vars versus locals, but there are a few key differences:
      • vars should actually be variable (non-static multiples of a resource)
      • locals can render and iterate on an input, e.g. with for_each loops
    • backend.tf should indicate where terraform.tfstate is placed, any file locking. Normally, this points to an S3 bucket and provides authorization for it
    • data.tf should have any external data resources. This example doesn't need any, but AWS IAM policy documents and S3 bucket policies fit this category. Any resource prefixed with data instead of resource goes here, essentially

Now that all that's out of the way, we're able to actually create resources. Things can be a lot more free-form here, because the definition of related resources can vary greatly based on who's doing the work.

My personal preference is to maintain small, easily readable files that function independently wherever possible. In this example, we'll use one file for each DNS zone. Here's /terraform/accounts/cloudflare_youwish_engyak_co/engyak.co.tf:

 1resource "cloudflare_record" "engyak_co_blog" {
 2  content = "blog-engyak-co.pages.dev"
 3  name    = "blog"
 4  proxied = false
 5  ttl     = 1
 6  type    = "CNAME"
 7  zone_id = "redacted"
 8}
 9
10resource "cloudflare_record" "engyak_co_root" {
11  content = "blog-engyak-co.pages.dev"
12  name    = "engyak.co"
13  proxied = true
14  ttl     = 1
15  type    = "CNAME"
16  zone_id = "redacted"
17}
18
19resource "cloudflare_record" "engyak_co_uri_blog" {
20  name     = "engyak.co"
21  priority = 1
22  proxied  = false
23  ttl      = 1
24  type     = "URI"
25  zone_id  = "redacted"
26  data {
27    target = "blog.engyak.co"
28    weight = 1
29  }
30}

These resources are built according to the provider in provider.tf:

 1terraform {
 2  required_providers {
 3    cloudflare = {
 4      source  = "cloudflare/cloudflare"
 5      version = "~> 4"
 6    }
 7  }
 8}
 9
10provider "cloudflare" {
11}

Always consult the provider's documentation on how to use their resources.

Actions (Release Management)

The biggest advantage a Git repository has for Infrastructure-as-Code is its versioning capability, but the ability to control the release of changes can really take things to the next level.

First, I'd recommend starting out with a branch management plan. It can start simple, like:

  • Don't allow any commits directly to main (GitHub branch protection rules, plus general threads in CONTRIBUTING.md)
  • Only allow code to be pushed to main via a successful pull request (GitHub branch protection rules do this as well)
    • At least 1 approving peer review
    • All testing must PASS (more on this later)
  • All prospective changes must start as a diverging branch (or fork, but forking is much more advanced) that is up-to-date with main
  • Outline appropriate change windows, if applicable

At this point, the rules are in place, but none of it actually controls release. GitHub doesn't have credentials to release changes; ideally no users should either. The objective here is to prevent all direct changes to infrastructure. This can be achieved with AWS IAM roles, Cloudflare RBAC, or an equivalent. Take away the keys!

GitHub Actions provides a (usually free or cheap) amnesic container service to run ephemeral code from source control. This is going to be the foundation for this example moving forward, but other providers like GitLab and Atlassian have equivalents as well. If the source control provider doesn't have a built-in service, plenty of other CI tools exist to fill that gap, like Jenkins and Concourse.

For a Terraform pipeline, there should be two Actions per account:

  • terraform plan: This will test your code for validity, and also explain any potential impacts the change might have
  • terraform apply: This will implement tested changes. This Action should be restricted to the main branch!

Here's an example plan Action. I named it based on `{{ event trigger }}: {{ provider }} {{ action }} to keep things organized.

 1---
 2name: 'On-Commit: Cloudflare Terraform Plan'
 3
 4on:
 5  push:
 6
 7permissions:
 8  contents: read
 9
10jobs:
11  plan:
12    name: 'Terraform Plan'
13    env:
14      CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
15    runs-on: ubuntu-latest
16    steps:
17      - uses: actions/checkout@v4
18      - name: 'Terraform Setup'
19        uses: hashicorp/setup-terraform@v3
20        with:
21          terraform_version: '>= 1.10.5'
22      - name: 'Terraform Plan'
23        run: |
24          terraform init
25          terraform validate
26          terraform plan -input=false          
27        working-directory: terraform/accounts/cloudflare_youwish_engyak_co/

Here's a rundown on how the testing works:

  • We use the env directive to expose CLOUDFLARE_API_TOKEN (specified in the cloudflare provider as the way to pass credentials)
  • We use actions/checkout@v4 (or latest version) to load a copy of main into the Actions runner
  • We use hashicorp/setup-terraform@v3. Previous Actions runners shipped with Terraform, but the base image didn't update this package frequently enough. Now it doesn't ship with the image - but this tool lets us restrict and control software versions as part of the pipeline. This lets us slow releases if breaking changes occur with terraform without having to monkey around with internals - it's a much better system.
  • The Terraform Plan step is where most of the work gets done. We initialize Terraform in non-interactive mode (-input=false) using our workspace with the working-directory key.

This will now run every time code is committed to the repository, and it'll display any expected changes every time code is contributed. If it fails, it will produce an error and (ideally) notify engineers/developers on where to fix it.

Note: terraform validate and terraform plan do not catch all problems, just test for config validity. Resource conflicts, API idiosyncrasies will pass this step and only reveal things on apply!

Now, we can finally start releasing changes:

 1---
 2name: 'Cron-Demand: Cloudflare Terraform Apply'
 3
 4on:
 5  workflow_dispatch:
 6    branches: ['main']
 7  schedule:
 8    - cron: "15 4,5 * * *"
 9
10permissions:
11  contents: read
12
13jobs:
14  plan:
15    name: 'Terraform Plan'
16    env:
17      CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
18    runs-on: ubuntu-latest
19    steps:
20      - uses: actions/checkout@v4
21      - name: 'Terraform Setup'
22        uses: hashicorp/setup-terraform@v3
23        with:
24          terraform_version: '>= 1.10.5'
25      - name: 'Terraform Plan'
26        id: tf_plan
27        run: |
28          terraform init
29          terraform validate
30          terraform plan -input=false --detailed-exitcode          
31        continue-on-error: true
32        working-directory: terraform/accounts/cloudflare_youwish_engyak_co/
33      - name: 'Terraform Apply'
34        run: |
35          terraform apply          
36        working-directory: terraform/accounts/cloudflare_youwish_engyak_co/
37        if: github.ref != 'refs/heads/main' && needs.tf_plan.outputs.exit-code == 2

This Action will either run daily at 0415-0515 UTC or if executed manually. We've established a "change window", and there are quite a few more complexities added to this workflow to implemet change safety:

  • detailed-exitcode and id: tf_plan allow us to "catch" the results of terraform plan. A return code of 0 means no changes required, and 2 means changes are required.
  • if: conditionals restrict the dangerous parts of the workflow to only execute when the branch is main and plan is valid and expects changes.

Terraform Starter Kit

This template should act as a foundational "starter kit" for establishing an effective, robust, mature Infrastructure-as-Code practice. I've found that it's easier to modify and improve an existing process than to start anew - the objective here is to get engineers past that "writer's block."

Happy coding!