VM Deployment Pipelines with Proxmox

Decoupled approaches to deployment of IaaS workloads are the way of the future.

Here, we'll try to construct a VM deployment pipeline leveraging GitHub Actions and Ansible's community modules.

Proxmox Setup

  • Not featured here: Loading a VM ISO is particular to the Proxmox deployment, but it's necessary for future steps.

Let's create a VM named deb12.6-template:

First creation screen

I set a separate VM ID range for templates to simplify visual automatic sorting.

Second creation screen

Third creation screen

Note: Paravirtualized hardware is still the optimal choice, like with vSphere - but in this case, VirtIO is the code supplier.

Fourth creation screen

Note: SSD Emulation and qemu-agent are required for virtual disk reclamation with QEMU. This is particularly important in my lab.

Fifth creation screen

In this installation, I'm using paravirtualized network adapters and have separated my management(vmbr0) and data plane(vmbr1)

Debian Linux Setup

I'll skip the Linux installer parts for brevity, Debian's installer is excellent and easy to use.

At a high level, we'll want to do some preparatory steps before declaring this a usable base image:

  • Create users
    • Recommended approach: Create a bootstrap user, then shred it
      • Leave the bootstrap user with an SSH key on the base image
      • After creation, build a takeover playbook that installs the latest and greatest username table, sssd, SSH keys, APM, anything with confidential cryptographic material that should not be left unencrypted on the hypervisor
      • This won't slow the VM deployment speed by as much as you think
  • Install packages
    • This is just a list of some basics that I prefer to add to each machine. It's more network-centric; anything more comprehensive should be part of a build playbook specific to whatever's being deployed.
    • Note: This is an Ansible playbook, and therefore, it needs Ansible to run (apt install ansible)
 1---
 2- name: "Debian machine prep"
 3  hosts: localhost
 4  tasks:
 5  - name: "Install standard packages"
 6    ansible.builtin.apt:
 7      pkg:
 8        - 'curl'
 9        - 'dnsutils'
10        - 'diffutils'
11        - 'ethtool'
12        - 'git'
13        - 'mtr'
14        - 'net-tools'
15        - 'netcat-traditional'
16        - 'python3-requests'
17        - 'python3-jinja2'
18        - 'tcpdump'
19        - 'telnet'
20        - 'traceroute'
21        - 'qemu-guest-agent'
22        - 'vim'
23        - 'wget'
  • Clean up the disk. This will make our base image more compact - each clone will inherit any wasted space, so consider it a 10,20x savings in disk usage. I leave this as a file on the base image and name it reset_vm.sh:
 1#!/bin/bash
 2
 3# Clean Apt
 4apt clean
 5
 6# Cleaning logs.
 7if [ -f /var/log/audit/audit.log ]; then
 8  cat /dev/null > /var/log/audit/audit.log
 9fi
10if [ -f /var/log/wtmp ]; then
11  cat /dev/null > /var/log/wtmp
12fi
13if [ -f /var/log/lastlog ]; then
14  cat /dev/null > /var/log/lastlog
15fi
16
17# Cleaning udev rules.
18if [ -f /etc/udev/rules.d/70-persistent-net.rules ]; then
19  rm /etc/udev/rules.d/70-persistent-net.rules
20fi
21
22# Cleaning the /tmp directories
23rm -rf /tmp/*
24rm -rf /var/tmp/*
25
26# Cleaning the SSH host keys
27rm -f /etc/ssh/ssh_host_*
28
29# Cleaning the machine-id
30truncate -s 0 /etc/machine-id
31rm /var/lib/dbus/machine-id
32ln -s /etc/machine-id /var/lib/dbus/machine-id
33
34# Cleaning the shell history
35unset HISTFILE
36history -cw
37echo > ~/.bash_history
38rm -fr /root/.bash_history
39
40# Truncating hostname, hosts, resolv.conf and setting hostname to localhost
41truncate -s 0 /etc/{hostname,hosts,resolv.conf}
42hostnamectl set-hostname localhost
43
44# Clean cloud-init - deprecated because cloud-init isn't currently used
45# cloud-init clean -s -l
46
47# Force a filesystem sync
48sync

Shutdown the Virtual Machine. I prefer to start it back up and shut it down from the hypervisor to ensure that qemu-guest-agent is working properly.

Deployment Pipeline

First, we will want to create an API token under "Datacenter -> Permissions -> API Tokens":

Proxmox API token screen

There are some oddities with the Ansible proxmoxer based module and Ansible to keep in mind:

  • api_user is needed and used by the API client, formatted as {{ user }}@domain
  • api_token_id is not the same as the output from the command, it's what you put into the "Token ID" field.
    • {{ api_user}}!{{ api_token_id }} should form the combined credential presented to the API, and match the created token.

If you attempt to use the output from the API creation screen under api_user or api_token_id, it'll return a 401 Invalid user without much explanation as to what might be the issue.

Here's the pipeline. Github's primary job is to set up the Python/Ansible environment, and translate the workflow inputs into something that Ansible can properly digest.

I also added some cat steps - this allows us to use the GitHub Actions log to store intent until Netbox registration completes.

 1---
 2name: "On-Demand: Build VM on Proxmox"
 3
 4on:
 5  workflow_dispatch:
 6    inputs:
 7      machine_name:
 8        description: "Machine Name"
 9        required: true
10        default: "examplename"
11      machine_id:
12        description: "VM ID (can't re-use)"
13        required: true
14      template:
15        description: "VM Template Name"
16        required: true
17        type: choice
18        options:
19          - deb12.6-template
20        default: "deb12.6-template"
21      hardware_cpus:
22        description: "VM vCPU Count"
23        required: true
24        default: "1"
25      hardware_memory:
26        description: "VM Memory Allocation (in MB)"
27        required: true
28        default: "512"
29
30permissions:
31  contents: read
32
33jobs:
34  build:
35    runs-on: self-hosted
36    steps:
37      - uses: actions/checkout@v4
38      - name: Create Variable YAML File
39        run: |
40          cat <<EOF > roles/proxmox_kvm/parameters.yaml
41          ---
42            vm_data:
43              name: "${{ github.event.inputs.machine_name }}"
44              id: ${{ github.event.inputs.machine_id }}
45              template: "${{ github.event.inputs.template }}"
46              node: node
47              hardware:
48                cpus: ${{ github.event.inputs.hardware_cpus }}
49                memory: ${{ github.event.inputs.hardware_memory }}
50                storage: ssd-tier
51                format: qcow2
52          EOF          
53      - name: Build VM
54        run: |
55          cd roles/proxmox_kvm/
56          cat parameters.yaml
57          python3 -m venv .
58          source bin/activate
59          python3 -m pip install --upgrade pip
60          python3 -m pip install -r requirements.txt
61          python3 --version
62          ansible --version
63          
64          export PAPIUSER="${{ secrets.PAPIUSER }}"
65          export PAPI_TOKEN="${{ secrets.PAPI_TOKEN }}"
66          export PAPI_SECRET="${{ secrets.PAPI_SECRET }}"
67          export PHOSTNAME="${{ secrets.PHOSTNAME }}"
68          export NETBOX_TOKEN="${{ secrets.NETBOX_TOKEN }}"
69          export NETBOX_URL="${{ secrets.NETBOX_URL }}"
70          export NETBOX_CLUSTER="${{ secrets.NETBOX_CLUSTER_PROX }}"
71          ansible-playbook build_vm_prox.yml          

In addition, a requirements.txt is required by GitHub to set up the venv, and belongs in the role folder (roles/proxmox_kvm as above):

 1###### Requirements without Version Specifiers ######
 2pytz
 3netaddr
 4django
 5jinja2
 6requests
 7pynetbox
 8
 9###### Requirements with Version Specifiers ######
10ansible >= 8.4.0              # Mostly just don't use old Ansible (e.g. v2, v3)
11proxmoxer >= 2.0.0

This Ansible playbook also integrates Netbox, as my vSphere workflow did, and uses a common schema to simplify code re-use. There are a few quirks with the Proxmox playbooks:

  • There's no module to grab VM Guest network information, but the API provides it, so I can get it with uri
  • Proxmox has a nasty habit of breaking Ansible with JSON keys that include -. The best way to fix it is with a debug action: {{ prox_network_result.json.data | replace('-','_') }}
  • Proxmox's VM copy needs a timeout configured, and announces it's done before the VM is ready for actions. I added an ansible.builtin.pause step before starting the VM, and after (to allow it to boot)
  1---
  2- name: "Build VM on Proxmox"
  3  hosts: localhost
  4  gather_facts: true
  5  # Before executing ensure that the prerequisites are installed
  6  # `ansible-galaxy collection install netbox.netbox`
  7  # `python3 -m pip install aiohttp pynetbox`
  8  # We start with a pre-check playbook, if it fails, we don't want to
  9  # make changes
 10  any_errors_fatal: true
 11  vars_files:
 12    - "parameters.yaml"
 13
 14  tasks:
 15    - name: "Debug"
 16      ansible.builtin.debug:
 17        msg: '{{ vm_data }}'
 18    - name: "Test connectivity and authentication"
 19      community.general.proxmox_node_info:
 20        api_host: '{{ lookup("env", "PHOSTNAME") }}'
 21        api_user: '{{ lookup("env", "PAPIUSER") }}'
 22        api_token_id: '{{ lookup("env", "PAPI_TOKEN") }}'
 23        api_token_secret: '{{ lookup("env", "PAPI_SECRET") }}'
 24      register: prox_node_result
 25    - name: "Display Node Data"
 26      ansible.builtin.debug:
 27        msg: '{{ prox_node_result }}'
 28    - name: "Build the VM"
 29      community.general.proxmox_kvm:
 30        api_host: '{{ lookup("env", "PHOSTNAME") }}'
 31        api_user: '{{ lookup("env", "PAPIUSER") }}'
 32        api_token_id: '{{ lookup("env", "PAPI_TOKEN") }}'
 33        api_token_secret: '{{ lookup("env", "PAPI_SECRET") }}'
 34        name: '{{ vm_data.name }}'
 35        node: '{{ vm_data.node }}'
 36        storage: '{{ vm_data.hardware.storage }}'
 37        newid: '{{ vm_data.id }}'
 38        clone: '{{ vm_data.template }}'
 39        format: '{{ vm_data.hardware.format }}'
 40        timeout: 500
 41        state: present
 42    - name: "Wait for the VM to fully register"
 43      ansible.builtin.pause:
 44        seconds: 15
 45    - name: "Start the VM"
 46      community.general.proxmox_kvm:
 47        api_host: '{{ lookup("env", "PHOSTNAME") }}'
 48        api_user: '{{ lookup("env", "PAPIUSER") }}'
 49        api_token_id: '{{ lookup("env", "PAPI_TOKEN") }}'
 50        api_token_secret: '{{ lookup("env", "PAPI_SECRET") }}'
 51        name: '{{ vm_data.name }}'
 52        state: started
 53    - name: "Wait for the VM to fully boot"
 54      ansible.builtin.pause:
 55        seconds: 45
 56    - name: "Get VM information"
 57      community.general.proxmox_vm_info:
 58        api_host: '{{ lookup("env", "PHOSTNAME") }}'
 59        api_user: '{{ lookup("env", "PAPIUSER") }}'
 60        api_token_id: '{{ lookup("env", "PAPI_TOKEN") }}'
 61        api_token_secret: '{{ lookup("env", "PAPI_SECRET") }}'
 62        vmid: '{{ vm_data.id }}'
 63      register: prox_vm_result
 64    - name: "Report the VM!"
 65      ansible.builtin.debug:
 66        var: prox_vm_result
 67    - name: "Fetch VM Networking information"
 68      ansible.builtin.uri:
 69        url: 'https://{{ lookup("env", "PHOSTNAME") }}:8006/api2/json/nodes/{{ vm_data.node }}/qemu/{{ vm_data.id }}/agent/network-get-interfaces'
 70        method: 'GET'
 71        headers:
 72          Content-Type: 'application/json'
 73          Authorization: 'PVEAPIToken={{ lookup("env", "PAPIUSER") }}!{{ lookup("env", "PAPI_TOKEN") }}={{ lookup("env", "PAPI_SECRET") }}'
 74        validate_certs: false
 75      register: prox_network_result
 76    - name: "Refactor Network Information"
 77      ansible.builtin.debug:
 78        msg: "{{ prox_network_result.json.data | replace('-','_') }}"
 79      register: prox_network_result_modified
 80    - name: "Register the VM in Netbox!"
 81      netbox.netbox.netbox_virtual_machine:
 82        netbox_token: '{{ lookup("env", "NETBOX_TOKEN") }}'
 83        netbox_url: '{{ lookup("env", "NETBOX_URL") }}'
 84        validate_certs: false
 85        data:
 86          cluster: '{{ lookup("env", "NETBOX_CLUSTER") }}'
 87          name: '{{ vm_data.name }}'
 88          description: 'Built by the GH Actions Pipeline!'
 89          local_context_data: '{{ prox_vm_result }}'
 90          memory: '{{ vm_data.hardware.memory }}'
 91          vcpus: '{{ vm_data.hardware.cpus }}'
 92    - name: "Configure VM Interface in Netbox!"
 93      netbox.netbox.netbox_vm_interface:
 94        netbox_token: '{{ lookup("env", "NETBOX_TOKEN") }}'
 95        netbox_url: '{{ lookup("env", "NETBOX_URL") }}'
 96        validate_certs: false
 97        data:
 98          name: '{{ vm_data.name }}_intf_{{ item.hardware_address | replace(":", "") | safe }}'
 99          virtual_machine: '{{ vm_data.name }}'
100          vrf: 'Campus'
101          mac_address: '{{ item.hardware_address }}'
102      with_items: '{{ prox_network_result_modified.msg.result }}'
103      when: item.hardware_address != '00:00:00:00:00:00'
104    - name: "Reserve IP"
105      netbox.netbox.netbox_ip_address:
106        netbox_token: '{{ lookup("env", "NETBOX_TOKEN") }}'
107        netbox_url: '{{ lookup("env", "NETBOX_URL") }}'
108        validate_certs: false
109        data:
110          address: '{{ item.ip_addresses[0].ip_address }}/{{ item.ip_addresses[0].prefix }}'
111          vrf: 'Campus'
112          assigned_object:
113            virtual_machine: '{{ vm_data.name }}'
114        state: present
115      with_items: '{{ prox_network_result_modified.msg.result }}'
116      when: item.hardware_address != '00:00:00:00:00:00'
117    - name: "Finalize the VM in Netbox!"
118      netbox.netbox.netbox_virtual_machine:
119        netbox_token: '{{ lookup("env", "NETBOX_TOKEN") }}'
120        netbox_url: '{{ lookup("env", "NETBOX_URL") }}'
121        validate_certs: false
122        data:
123          cluster: '{{ lookup("env", "NETBOX_CLUSTER") }}'
124          tags: 
125            - 'lab_debian_machines'
126            - 'lab_linux_machines'
127            - 'lab_apt_updates'
128          name: '{{ vm_data.name }}'
129          primary_ip4:
130            address: '{{ item.ip_addresses[0].ip_address }}/{{ item.ip_addresses[0].prefix }}'
131            vrf: "Campus"
132      with_items: '{{ prox_network_result_modified.msg.result }}'
133      when: item.hardware_address != '00:00:00:00:00:00'

Conclusion

Overall, the Proxmox API/playbooks are quite a bit simpler to use than the VMware ones. The proxmoxer based modules are relatively feature complete compared to vmware_rest, but the largest exception I found (examples not in this post) was that I could always fall back to Ansible's comprehensive Linux foundation to fill any gaps I needed to. It's a refreshing change.