Sunday, October 23, 2022

Track Certificate Expiration with Jenkins and Python 3!

CI/CD tools aren't just for automatically deploying apps! Jenkins excels at enabling an engineer to automatically execute and test code - but, it has a hidden super-power: Automating boring and intensive IT tasks(removing toil).

Let's take a common and relatable IT problem - it doesn't matter if you're a DevOps engineer, a Agilista, or even a "normal" systems engineer. Tracking certificate expiration is not an enjoyable task, and can often involve either manual checking or (usually) outages to discover that a certificate has expired.

This solution will have several major elements:

  • An inventory of TLS-issued hosts
  • A Python 3 script (leveraging OpenSSL) to open up TLS connections and fetch certificates
  • A Jenkins pipeline to execute that script against that inventory daily, emailing the results


Full transparency, this example is executed in a home lab. It's naive to assume that this task is trivial for any enterprise, but here are some potential approaches to building an inventory at scale:

  • Write a Python script to ingest DNS zone files, and loop curl to see if any listen on port 443
  • Fetch a report for a vulnerability scanner (Retina, Qualys, Nexpose)
  • Searching PKI issuance reports (if available)

We also want to write our inventory file in a way that's friendly to our execution approach. Python is dynamically typed, and most IT automation is fine with that - we're not doing any hardcore programming for most of it. The vast majority of IT automation involves sending and processing files and I/O.

Python will change a variable to any data type when you tell it to, so it's useful to map out what we want. Here are the relevant data types. I will also include the symbols JSON uses to signify them (if applicable):

  • String (""): This is a type that encapsulates a series of text characters
  • Integer (No wrapping): Whole number, and Signed. It can be positive or negative, but there's no decimal point (decimal points are their own unique flavor of complexity in computer programming)
  • List ([]): To a dyed in the wool software developer, this will be similar to an array. Lists are indexed by an integer, and can contain any data type below it
    • Python has a neat trick where a for loop can return a list item instead of the index, which saves a great deal of code
    • Python can sort a list by executing the function .sort() on that object
  • Dictionary ({}): This is an advanced construct, and provides an engineer with a great deal of capability (at the expense of performance, and code simplicity in some cases)
    • dicts store entries as key-value pairs, and the index is usually a string
    • Python can add to a dictionary by adding a new key, e.g. dictionary["newkey"] = "b33f"

When planning software functionality, we want to always use the right tool for the job. Downstream APIs (e.g. OpenSSL) want to see a particular format for a parameter(e.g. TCP port should be an integer), so documentation research is a must at this phase. I'll explain my logic for this file:

  • I want to easily iterate through the list, without addressing indexes, and I want it to be fast. I should use a list for the top-level data in the inventory ([])
  • I want to ensure that I don't accidentally address the keys wrong, so each individual entry should be a dictionary ({}) with the following keys: fqdn, port
    • fqdn should store a string
    • port should store an integer


[ { "fqdn": "", "port": 443 }, { "fqdn": "", "port": 443 } ]

Python Code

Here's a copy of my code. To execute it, the following pip packages need to be installed:

  • fqdn
  • OpenSSL
  • ruamel.yaml

datetime in particular does quite a bit of heavy lifting here. The package providers in the Python community have managed to solve most of the truly difficult work, so interpreting expiration dates is a simple comparison operation.

I heavily rely on functions for this code to work in a maintainable format - this code is only 167 lines long, but most of the usage is for readability.

Another point of note - when writing Python to execute in a pipeline, it helps to be Perl levels of dramatic when crashing code. Jenkins doesn't evaluate output by default, and the easiest way to notify of a problem is by using sys.exit(""). This is why I placed a crash if errors exist at the end of the list.


This configuration example should provide some basic level of functionality. Jenkins has a lot of capability, so this tooling can be endlessly tweaked to your needs.

First, let's set up a SMTP server. With a default installation, the settings are under Dashboard -> Manage Jenkins -> Configure System:

Advanced Settings will allow you to configure SMTP auth, port, if applicable. If you use Gmail, you can still leverage MFA and app passwords, preserving MFA and avoiding password proliferation.

Now, let's set up a freestyle pipeline:

With Jenkins, all things should be executed from source code. This is the way.

We want to run this daily, irrespective of source code changes. This requires a slight deviation from the usual Poll SCM approach:

As always, amnesic workspaces are best:

Python and the inventory file simplify the Jenkins configuration as well. Just execute the script as-is:

The final step is to add a Post-Build Action to email if there is a failure:

It really is this simple. Jenkins will now execute daily and email you a list of expired, and soon to expire certificates!

Lessons Learned

I'm going to improve this code. Here are some of my ideas:

I'm continually amazed at what the open source community can achieve with this level of simplicity. Would you consider this approach out of reach or too challenging?

Saturday, October 15, 2022

Gathering and Using Data from Cisco NX-OS with Ansible Modules

easy button

Reliably executing repetitive tasks with automation is easy (after the work is done).

Given enough work, self-built automation can be easy to consume. Non-consumers (engineers) need to focus on reliability and repeatability, but occasionally there's an opportunity to save time and simplify lives directly.

Information gathering with Ansible is a powerful tool, making the level of difficulty to perform a check on one network node roughly equal to the effort on 2, or even one hundred. Here's a quick and easy way to get started.

Ansible Inventory

Ansible likes to know where each managed node lives, and provides the inventory capability to organize similar devices for remote management. Not all network automation endpoints use the inventory feature, so ensure that you read the published documentation first. 

Note: The easiest way to check inventory dependency is to verify if there are directives in the playbook named hostname, username, or password. If they exist, that module probably does not use inventory.

Ansible supports two formats for an on-controller inventory, conf (Windows-like) and YAML (Linux-like). Here's an example in YAML, I personally find it easier to read:

        ansible_host: ""
        ansible_host: ""
        ansible_user: "admin"

We have a little bit to unpack here:

  • The first hierarchical tier is for groups, which can contain other groups if you use the children: directive (see nxos_all as an example)
  • vars: will specify variables to commonly use across all members of that group
  • ansible_host is used to specify an address - and is useful with dual stack environments (or ones that don't have DNS)

Ansible Facts

Ansible stores all of its runtime variables for a given playbook as facts. This is held as a Python dict at runtime by Ansible Engine, and the debug: module allows an engineer to print the output to stdout:

- hosts: localhost
  connection: local
    - name: "Print it!"
        var: lookup('ansible.builtin.env', 'PATH')
    - name: "Print it, but with msg!"
          - "The system environment PATH is: {{ lookup('ansible.builtin.env', 'PATH') }}"
          - "Wise engineers don't use this feature to print passwords"

Running this playbook will produce the following:

ansible-playbook debug.yml 
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'

PLAY [localhost] *************************************************************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] *******************************************************************************************************************************************************************************************************************************************************************
ok: [localhost]

TASK [Print it!] *************************************************************************************************************************************************************************************************************************************************************************
ok: [localhost] => {
    "lookup('ansible.builtin.env', 'PATH')": "/root/.vscode-server/bin/d045a5eda657f4d7b676dedbfa7aab8207f8a075/bin/remote-cli:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

TASK [Print it, but with msg!] ***********************************************************************************************************************************************************************************************************************************************************
ok: [localhost] => {
    "msg": [
        "The system environment PATH is: /root/.vscode-server/bin/d045a5eda657f4d7b676dedbfa7aab8207f8a075/bin/remote-cli:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
        "Wise engineers don't use this feature to print passwords"

PLAY RECAP *******************************************************************************************************************************************************************************************************************************************************************************
localhost                  : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0 

msg: is effective for formatted output, while var: is considerably simpler when dumping a large dictionary. var: does not require Jinja formatting, which may cause playbooks to be simpler.

Let's apply this to a Cisco NX-OS Node. We can register command output from the nxos_facts module.

Note: The example provided below is the "new way", where Network modules follow the Ansible rules. If using older versions of Ansible (Ansible 2), the following may not be fully available!

First, we need to update the Ansible inventory. We will be using the API method to collect data, and it requires multiple new variables:

  • ansible_network_os: Instructs Ansible on what module to use for that system
  • ansible_connection: Instructs Ansible on what transport to use (HTTP API, SSH)
  • ansible_httpapi_use_ssl: Instructs Ansible to use HTTPS
        ansible_host: ""
        ansible_host: ""
        ansible_user: "admin"
        ansible_network_os: 'cisco.nxos.nxos'
        ansible_connection: ansible.netcommon.httpapi
        ansible_httpapi_password: ''
        ansible_httpapi_use_ssl: 'yes'
        ansible_httpapi_validate_certs: 'no'

The updated inventory allows us to run extremely simple playbooks to gather data

- hosts: nxos_machines
    - name: "Gather facts via NXAPI"
        gather_subset: 'min'
          - 'interfaces'
      register: nxos_facts_gathered
    - name: "Print it!"
        var: nxos_facts_gathered
ansible-playbook debug_nxos_facts.yml 

PLAY [nxos_machines] *************************************************************************************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] ***********************************************************************************************************************************************************************************************************************************************************************************************
[WARNING]: Ignoring timeout(10) for cisco.nxos.nxos_facts
ok: [nx-1]

TASK [Gather facts via NXAPI] ****************************************************************************************************************************************************************************************************************************************************************************************
ok: [nx-1]

TASK [Print it!] *****************************************************************************************************************************************************************************************************************************************************************************************************
ok: [nx-1] => {
    "nxos_facts_gathered": {
        "ansible_facts": {
            "ansible_net_api": "nxapi",
            "ansible_net_gather_network_resources": [
            "ansible_net_gather_subset": [
            "ansible_net_hostname": "AnsLabN9k-1",
            "ansible_net_image": "bootflash:///nxos.9.3.8.bin",
            "ansible_net_license_hostid": "",
            "ansible_net_model": "Nexus9000 C9300v",
            "ansible_net_platform": "N9K-C9300v",
            "ansible_net_python_version": "3.9.2",
            "ansible_net_serialnum": "",
            "ansible_net_system": "nxos",
            "ansible_net_version": "9.3(8)",
            "ansible_network_resources": {
                "interfaces": [
                        "name": "Ethernet1/1"
                        "name": "mgmt0"
        "changed": false,
        "failed": false

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************************************************************************************
nx-1                       : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

Ansible's Inventory feature enables us to scale per node without any additional code - the previous playbook will execute once on every inventory object in the group, which allows an engineer to thoroughly test a playbook on lab resources with some level of separation.

Deliberate automation design will bear fruit here - as safety is key when developing and testing automation. Like with previous automation-centric posts, thorough, comprehensive testing of automation for reliability is a social responsibility when creating tools. 

Establishing a separate CI/CD tooling set to target a lab (or CML, as in this case!) enables us to add additional safeguards against accidental changes, such as ACLs/Firewall policies preventing access from Test CI/CD -> Production network assets. Tools like CML take it even further by allowing an engineer to spin up amnesic NOS instances to run code against.

Here's an applicable instance. Recently, Cisco disclosed a vulnerability with Cisco Fabric Services - and most environments don't need that service running. This is an aggressive fix - but with Ansible we can check for the service and disable it only if it's running, and then check again afterwards. This illustrates the value of idempotency, or the practice of running repeated executions safely.

Popular Posts