VMware NSX-T and Ansible

What is the point of all this software-defined infrastructure if you don't use it?

In prior examples, it's a fairly straightforward path to SDN when deploying NSX Data Center, allowing a VI admin or network engineer to deploy virtual network resources via a GUI.

This isn't the end of an effort, but the start of a journey. Once the API is available, deployment of services on top of a virtual cloud network become easier


Ansible and NSX

First, we need our CI tooling to be capable of leveraging VMware's NSX-T Community module:

1ansible-galaxy collection install git+https://github.com/vmware/ansible-for-nsxt

**Note: This requires Ansible 3.0 or higher to leverage the "Galaxy install from git+https" feature. This software package is not hosted on Ansible Galaxy

Building the playbook

To an Ansible engineer, this part might be a bit bothersome. Since we're interacting with an appliance, there are several differences compared to canonical Ansible playbooks:

  • Inventory: Playbooks for network providers specify targets inside of their module, not using Ansible's inventory. If an inventory is used, the playbook will execute once on every inventory host, targeting the same destination device - not good.
  • Credentials are stored internally to the leveraged module
  • Any parameters that would constitute the thing that the playbook should build

Now that we have the required software, the next step would be to map out what to build. Playbooks typically have documentation on what parameters it will accept to perform work. Example here.

Let's take what will probably be the most common deployment to automate - network segments:

Executing the Playbook

When writing playbooks that will be frequently re-used, I like to leverage Jinja in the playbook, denoted by {{}} to morph to whatever need I have at the moment. Ansible supports loading variables with the --extra-vars "@{{ filename }}" statement:

1ansible-playbook create_segments.yml --extra-vars "@segment_vars.yml"  

Credential Management

I've glossed over a particularly important aspect of automation here - what credentials do we use?

Generally, I prefer to perform "hands-free" execution of aspects like this, so the provided playbook is designed to leverage the "Credentials as Environment Variables" feature in Jenkins automatically.

What I think of the Module

ansible-for-nsxt is a community maintained module, so expectations for the automation features should be set at the price paid for the software. I've tested this platform quite a bit, and the issues that I encountered appeared to be code obfuscation.

Digging deeper into the Ansible modules themselves, the biggest reason for this is NSX-T's declarative API - it makes more sense to not code those things at all and simply leverage the API. This resonates with me quite a bit!

VMware's community also has built testing into the repository, which seems to indicate that that testing is automated!

I do have two complaints about this module:

  • No maintainer for Ansible Galaxy means that Ansible 2 users (Red Hat shops) will have a difficult time installing the software from GitHub
  • Not all modules fully support IPv6 yet

Lessons Learned

Testing Idempotency

When looking to leverage automation at work, change safety is often provided as the primary reason not to do it.

It's okay not to trust automation with your production network, that's what testing is for.

When implementing an automation play like this in a production network, we need to evaluate whether or not it's safe to execute.

The first aspect to test when planning to automate a play will be idempotency, or how executing a play should consistently cause the desired state defined in the playbook, and not to impact services unnecessarily (ex. by deleting and re-creating something).

Idempotency is extremely important with infrastructure, we can't afford downtime as easily as other IT professions. The good news is that it's pretty easy to achieve:

1ansible-playbook create_segments.yml --extra-vars "@segment_vars.yml"  
2ansible-playbook create_segments.yml --extra-vars "@segment_vars.yml"`

The second executed playbook should return a status of "ok" instead of "changed".

If this test is passed, we'll evaluate the next aspect of idempotency by executing the playbook, then changing the segment's configuration in some way (GUI, API, Ansible playbook), and re-executing to ensure that the delta was detected by Ansible and can be resolved by it. I just create a second segment_vars_deviation.yml file and execute thusly:

1ansible-playbook create_segments.yml --extra-vars "@segment_vars.yml"  
2ansible-playbook create_segments.yml --extra-vars "@segment_vars_deviation.yml"  
3ansible-playbook create_segments.yml --extra-vars "@segment_vars.yml"`

This test should return "changed" for all three attempts.

More Testing

These two tests have extremely good coverage, indicating a high level of change safety for almost no effort. Additional tests to execute for a play like this can be mind-mapped or brain-stormed, and then coded from there. Here are some examples:

  • Check Tier-0 routes to ensure that the prefix built populated
  • Check Looking Glass to ensure the prefix is reachable everywhere
  • Check vCenter for the port-group created