Manage Linux patching with Ansible and Netbox!

Apr 7, 2024 · 6 min read · Linux Programmability Netbox Ansible ·

Patching all of my random experiments took too much of my free time, so I automated it

This is a pretty cheesy thing to do, but over the years it became more and more time-consuming to maintain all the different deployed workloads and infrastructure.

Requirements

With all system design, it's best to consider all relevant needs ahead of time. Given that this is a home lab, I decided to adopt an intentionally aggressive, but theoretically viable in production approach:

Nightly patching
Nightly reboots
No exempt packages
Distribution-agnostic, it should patch multiple distributions at once
This workflow should execute consistently from-code

Iteration 1: Ansible with Jenkins

The earliest implementation I built here had the least refinement by far. Here I tied Jenkins to an internal repository:

To leverage this, I started out with an INI inventory, but it quickly became problematic. I wanted a hierarchy, with each distribution potentially fitting multiple categories. This became pretty messy pretty quickly, so I moved to a YAML Inventory:

 1debian_machines:
 2  hosts:
 3    hostname_1:
 4      ansible_host: "1.1.1.1"
 5    hostname_2:
 6      ansible_host: "2.2.2.2"
 7ubuntu_machines:
 8  hosts:
 9    hostname_3:
10      ansible_host: "3.3.3.3"
11apt_updates:
12  children:
13    debian_machines:
14    ubuntu_machines:
15nameservers:
16  children:
17    hostname_2:
18    hostname_3:

This allowed me to simplify my playbooks and inventory by making "groups of groups", and avoid crazy stuff like taking down all nodes for an application at once. We'll use nameservers: as an example here:

 1---
 2- name: "Reboot APT Machines, except DNS"
 3  hosts: apt_updates,!nameservers
 4  tasks:
 5    - name: "Ansible Self-Test!"
 6      ansible.builtin.ping:
 7    - name: "Reboot Apt Machines!"
 8      ansible.builtin.reboot:
 9- name: "Reboot nameservers"
10  hosts: nameservers
11  serial: 1
12  tasks:
13    - name: "Ansible Self-Test!"
14      ansible.builtin.ping:
15    - name: "Reboot nameservers serially!"
16      ansible.builtin.reboot:

The serial: 1 key instructs the Ansible controller to only execute this playbook on one machine at a time, so DNS continuity is preserved.

Retrospective

I had several issues with this approach, but to my surprise, Linux patching and actual Ansible issues haven't cropped up at all. With most mainstream distributions, the QC must be good enough to patch nightly like this.

I did have issues with inventory management, however. To update the Ansible inventory, I could deploy as-code, which was nice, but it was still clunky. If I deployed 5 Alpine images in a day, I want them to automatically be added to my inventory for maximum laziness.

I also quickly discovered that maintaining Jenkins was labor-intensive. It's a truly powerful engine, and great if you need all the extra features, but there aren't many low-friction ways to automate all the required maintenance, particularly around plugins. I was able to update Jenkins itself with a package manager, but it seems like every few days I had to patch plugins (manually).

Iteration 2: Ansible, Netbox, GitHub Actions

I'll be up-front - for parameterized builds, GitHub Actions is less capable.

It has some pretty big upsides, however:

You don't have to maintain the GUI at all
Logging is excellent
Itegration with GitHub is excellent
Pipelines are YAML defined in their own repository
Status badges in Markdown (we don't need some stinkin' badges!)

This workflow has been much smoother to operate. Since the deployment workflow already updates Netbox, all machines are added to the "maintenance loop after first boot.

I was really surprised at how little work was required to convert these CI pipelines. This was naive of me - ease of conversion is the entire point of CI pipelines, but it's still mind-boggling to realize how effective it is at times.

To make this work, I first needed to create a CI process in .github/workflows on my Lab repository:

 1name: "Nightly: @0100 Update Linux Machines"
 2
 3on:
 4  schedule:
 5    - cron: "0 9 * * *"
 6
 7permissions:
 8  contents: read
 9
10jobs:
11  build:
12    runs-on: self-hosted
13    steps:
14    - uses: actions/checkout@v3
15    - name: Execute Ansible Nightly Job
16      run: |
17        python3 -m venv .
18        source bin/activate
19        python3 -m pip install --upgrade pip
20        python3 -m pip install -r requirements.txt
21        python3 --version
22        ansible --version
23        export NETBOX_TOKEN=${{ secrets.NETBOX_TOKEN }}
24        export NETBOX_API=${{ vars.NETBOX_URL }}
25        ansible-galaxy collection install netbox.netbox
26        ansible-inventory -i local.netbox.netbox.nb_inventory.yml --graph
27        ansible-playbook -i local.netbox.netbox.nb_inventory.yml lab-nightly.yml

This executes on a GitHub Self-Hosted runner in my lab with a Python Virtual Environment. The workflow will run a clean build, every time - by wiping out the workspace prior to each execution. No configuration artifacts are left behind.

With GitHub Actions, all processes are listed alphabetically, you can't do folders and trees to keep it more organized. I developed a naming convention:

1{{ workflow_type }}: {{ description}}

To keep things sane.

From there, we need a way to point to Netbox as an inventory source. This requires a few files:

requirements.txt is the Python 3 Pip inventory - since things are running in a virtual environment, it will only use python packages in this list.

 1###### Requirements without Version Specifiers ######
 2pytz
 3netaddr
 4django
 5jinja2
 6requests
 7pynetbox
 8
 9###### Requirements with Version Specifiers ######
10ansible >= 8.4.0              # Mostly just don't use old Ansible (e.g. v2, v3)

The next step is to build an inventory file. This has to be named specifically for the plugin to work - local.netbox.netbox.nb_inventory.yml:

1plugin: netbox.netbox.nb_inventory
2validate_certs: False
3config_context: True
4group_by:
5  - tags
6device_query_filters:
7  - has_primary_ip: 'true'

Not featured here - The API Endpoint and API Token directives are handled by GitHub Actions Secrets, and therefore don't need to be in this file.

This file is pretty straightforward. It indicates that we should use Netbox tags to develop our inventory, and we can assign multiple tags in the netbox application to each Virtual Machine. I also added the has_primary_ip directive - if a machine doesn't get an IP address for some reason, it won't try to reach that VM and patch it, causing late night failures.

Here's a preview of the Netbox application with these tags:

Refactoring the Ansible playbooks was hilariously easy. The Netbox inventory plugin prepends the group_by field onto the group, so all I had to do in each playbook was prepend tags_ to each name. Here's an example:

 1---
 2- name: "Apt Machines"
 3  hosts: tags_lab_apt_updates
 4  tasks:
 5    - name: "Ansible Self-Test!"
 6      ansible.builtin.ping:
 7    - name: "Update Apt!"
 8      ansible.builtin.apt:
 9        name: "*"
10        state: latest
11        update_cache: true
12- name: "Apk Machines"
13  hosts: tags_lab_apk_updates
14  tasks:
15    - name: "Ansible Self-Test!"
16      ansible.builtin.ping:
17    - name: "Update Apk!"
18      ansible.builtin.apt:
19        available: true
20        upgrade: true
21        update_cache: true

After that, the CI tooling just takes care of it all for me!

Retrospective

I'm going to stick with this method for a while. Netbox tagging makes inventory management much more intuitive, and I can develop tag "pre-sets" in my deployment pipeline to correctly categorize all the stuff I deploy. Since it's effectively documentation, I have an easy place to put data I'll need to find later for those "what was I thinking?" moments.

I'll be honest - behind that, I haven't really given it much thought. This approach requires zero attention to continue, and it happens while I sleep. I haven't gotten any problems from it, and it allows me to focus my free time on things that are more important.

10/10 would recommend.