Saturday, February 4, 2023

Deploy vSphere VMs with Ansible!

In a previous post, we covered how to create a virtual machine from a VM template in vSphere using Python and the REST API as an example of service-agnostic methods to invoke infrastructure resources.

VMware's PowerCLI is a fantastic tool, but it's not for me. Tweaking or porting functionality from PowerCLI to other languages beyond Windows and PowerShell isn't supported. Technology professionals with more than a few years of experience are leery of code portability issues; standard hardware architectures today are the results of consolidation efforts and caused some quite painful transition points. Change safety features like idempotency or context awareness are important to me; I prefer the approach of "check parameters, check destructiveness, execute, test."

Engineers develop linguistic preferences as part of a normal progression throughout their careers. Bash/Zsh/Python/Perl and JSON/YAML/XML appear more intuitive to me as universal formats to store artifacts and execution code. BASIC-style languages like PowerShell/CLI are not for everybody.

It's possible to use Ansible to manage and deploy Virtual Machines now - the previously covered Python code leverages the same RESTful API as the Ansible modules. The Ansible modules shift responsibility for code maintenance away from internal teams. Let's not forget the cost of writing and maintaining custom code:

Let's take a look. We need to install the Ansible "Collection" to leverage the REST API (this does require the Python package aiohttp:

ansible-galaxy collection install vmware.vmware_rest
Starting galaxy collection install process
Process install dependency map
Starting collection install process
Installing 'vmware.vmware_rest:2.2.0' to '/root/.ansible/collections/ansible_collections/vmware/vmware_rest'
Downloading to /root/.ansible/tmp/ansible-local-346454pqg9aobo/tmpi1o29dho
vmware.vmware_rest (2.2.0) was installed successfully
Installing 'cloud.common:2.1.2' to '/root/.ansible/collections/ansible_collections/cloud/common'
Downloading to /root/.ansible/tmp/ansible-local-346454pqg9aobo/tmpi1o29dho
cloud.common (2.1.2) was installed successfully

First, let's plan where automation will deploy workloads. Good work starts with good data - Ansible combines with a popular template tool (Jinja), enabling automation engineers to compile "pretty" reports from variables we collect during the execution phase:

The playbook will help any planning efforts with consistent infrastructure. At work, I'd recommend rendering an HTML page with these artifacts (Jinja can do that as well) periodically with a CI tool and publishing it to a web server for other teams to reference.

Let's use that information to deploy a virtual machine:

Correctly setting the session_timeout parameter in the playbook is critical. VM Deployment times depend on storage backing; the default timeout is 300 seconds. Deployment timeouts require adjustment to the equipment they live on - I set it to 1200 seconds to accommodate spinning disks.

This playbook is parameterized, with variables identified by double curly brackets,  {{ variable_name }}. Ansible Playbook examples don't always provide what output the code will expect, a consistent issue with community-generated code.

Ansible supports variable injection from an external file or the command line; the best way to manage these input variables for self-service is to use file-based inputs. CI tools ship with APIs that allow you to invoke a pipeline with a JSON payload, and this method gives you an easy way to convert back and forth. I have provided an example JSON document to show what format the playbook wants.

To invoke extra variables in Ansible, use the ext-vars feature:

ansible-playbook build_vm.yml "@parameters.json"

Jenkins has a parameter injection feature that makes this consumable as an end user as well:

Consumers perceive infrastructure's quality with a heavy bias towards convenience. We should construct easy ways to provision resources and enable creativity to boost positive perception of infrastructure services. Ansible's vmware_rest module combines with an execution engine like Jenkins to create an open method for deploying Virtual Machines, a pivotal step towards making infrastructure appealing.

Thursday, January 5, 2023

Why Automate? VM Deployment with vSphere's REST API

VMware introduced RESTful APIs with their vSphere 7.0 release train; migrating from the old MOB APIs is a welcome choice. REST clients provide a powerful tool for automating processes, but it's important to embody reliable practices when interacting with infrastructure.

The neat thing about published and documented REST APIs is that you're no longer restricted to a specific toolset to automate - both a blessing and a curse.

In the following section, I describe how I prefer to develop reliable infrastructure automation with integrated checking. The code is here, and the guide on how to use it is here.

How Much Automation?

The industry often provides persuasive guidance on the fact that automation should be a conscious choice at work (usually to sell automation products). The title of this blog article is also indicative of this trend - imagine the difference in Google Search hits for "How much automation is good for me and my company". This "peer pressure" alienates people who haven't researched much on automation before they can resolve any subconscious issues present.

The issues I hear are typically from infrastructure engineers, and the concerns raised with automation are common and valid. That doesn't mean we're all off the hook, though - the issues raised are all solvable:

  • How Safe: Automation safety is a huge concern - and rightly so. Taking the wrong action rapidly produces disastrous results. This is the highest priority when developing an automation practice, so peer review fundamentals and executing proofs are so important.
  • Which Things: Discussions like "This highly custom and variable process can't be automated" indicate that a company's IT Architecture doesn't fit well with automation approaches - prioritizing repetitive solutions on automatable infrastructure should buy companies time to redesign themselves. This is particularly true in networking, where automation models and tools may require hardware upgrades to implement. Infrastructure has always moved slowly, but we stand to benefit from a redesign incorporating everything we've learned since the last time around (~2014 for most).
  • How Much: Do you think you'd know the difference between a company with one engineer coding it all with Python, a company that broadly uses Chef/Puppet/Ansible, or one that uses commercial automation tools exclusively?
    • Going "hard mode" may be the right choice, or it might not - it's a better choice to make IT work fluidly with the business.
      • A company that typically accepts tools as-is and doesn't modify them much (commercial farming, manufacturing as examples) would benefit more from vendor-provided automation
      • A startup trying to rush an initial product to market is going to "code it all".
      • Most of us live somewhere in the middle.

Automation Requirements

Let's assume that the agreement is set appropriately. I frame this as a critical ability for my home lab - creating disposable virtual machines to avoid suffering consequences for my mistakes. We start by constructing a plan or requirements for our code:

  • Deploy a vSphere Virtual Machine
    • Without OVFtool. OVFTool deployments are slow, even if originated from tools like vRealize Orchestrator within the data center
    • Leverage host-storage copying optimization if available (vSAN, NFS Server-Side Copy, VAAI, etc)
    • Build a VM from a central image repository
    • If the VM isn't possible to build, perform best-effort checking prior to deployment
    • Validate that the VM is successfully deployed
    • Return a pointer to enforce further customization

The final requirements may be present in a canned solution, but we quickly find that companies with multiple unlinked vCenters/clusters find unique challenges propagating a VM template to each compute resource in their administrative domain. VMware's Content Library feature enables enterprises to create a "build stack" that production clusters can subscribe to (and sync) for standardized images. The Publish Once, Sync Everywhere approach simplifies administration, in part because more complex automation doesn't need to be developed to perform these tasks.

Note: The approach of publishing a VM template requires a specific type of template to automatically populate in vSphere. Clicking "Clone as Template to Library" completes this conversion in one step:

Note: This post assumes that a usable template is already available for use in a content library.

Reviewing Canned Offerings

Always explore the easy path before the difficult one. In this case, Ansible and VMware's REST modules do not yet support template deployments - nearly everything after the deployment is covered.

When compared to Ansible or other idempotent commercial automation tools, self-authored API code has some heavy lifting to do.

  • Ensure it's safe to "do": This is difficult for a developer; it requires a level of infrastructure understanding not normally present.
  • Implement the "do" thing: With RESTful APIs, this part isn't particularly difficult. Implicitly trusting an endpoint's API reduces most code to "send this document to this IP", resulting in small code bases and little development effort (a handful of PowerCLI or Python lines)
  • Ensure the "do" was executed correctly: It's notably straightforward to verify with a REST call after the fact, once the correct checks are

Developing the Code

If it's useful to self-develop code, we need to keep a few things in mind. Anyone with formal software development experience will be familiar with this routine.

Begin by developing an outline/pseudocode: Getting a team to agree on the "what" with actual code may be feasible for experienced developers, but the rest of us could use a little help. Flow diagrams are well supported with Visio/Omnigraffle/ Coding with a software diagram is like having a map to guide you while writing code. The C4 Model helps by defining personas and interaction types with a common framework:

Develop the tests before the actions. Data structures and formatting magically lose their ability to change after users start consuming automation, and testing will improve the diagrams and data structures before the code is actually usable. 

Develop the user interactions, documentation, and formatting after the tests. This would include any use of argparse, json.loads/dumps, or environment variables. Users and their inputs should be on the diagram created and relatively easy to code. CI/CD tooling should also come into play here.

Develop the action code. This should be a single-digit percentage of the overall effort!

The Python code to deploy from a content library is here.

Maintaining the code

Most of the effort to automate exists in maintaining code, not creating it. It takes considerably more skill to improve on yesterday's effort and to perform routine work like:

  • Steering/Directional choices for the code 
  • Documentation
  • Logging any reported issues
  • Prioritizing issues
  • Allocating resources to develop solutions to functional issues
  • Allocating resources to any problems that may occur operationally

Making responsibilities clear is a critical part of any automation solution - a common approach is to establish a RACI chart or to formally elect a maintainer to decide product direction more centrally in community projects.

In short, if you're in the maintenance phase, Congratulations, you made it! 99% of the work is ahead of you!

TL;DR, Let's use the code!

Now that we covered the development practice, let's cover how to use this code with Jenkins!

We create a new "Freestyle Project", and set input parameters:

Configure Git Hooks (but not "Poll SCM", because this work is to be performed on-demand):

Configure Credential Injection (vCenter Credentials):

Finally, command execution:

This will create a new button - Build with Parameters:

Executing with a properly formatted, short JSON file then results in a new VM build. The tool also supports simple CLI invocation, and will suggest resources for each field (except name, which is up to the user:

Fetching vSphere Details...
    "id": {
        "description": "The Content Library object to clone",
        "suggestions": {
            "05886bd8-7389-49e1-a53f-29353cd70186": {
                "name": "debian11-base",
                "guest_OS": "DEBIAN_11_64"
            "ac6d7f50-0f50-4a1d-b9b5-0e6326f95bb2": {
                "name": "suse15.4-base",
                "guest_OS": "SLES_15_64"
            "a26bb731-3ef6-4045-bf7f-3c0514fa343f": {
                "name": "ubuntu-22.04-base",
                "guest_OS": "UBUNTU_64"
    "name": "Example",
    "datastore": {
        "description": "The vSphere datastore to put virtual disks on",
        "suggestions": {
            "datastore-1023": {
                "name": "datastore1"
    "folder": {
        "description": "The vCenter folder to place the VM into",
        "suggestions": {
            "group-v1002": "vm",
            "group-v1095": "Infrastructure",
            "group-v1096": "Services",
            "group-v1097": "NSX",
            "group-v1098": "Management Plane",
            "group-v1099": "Edges",
            "group-v1100": "Experiments",
            "group-v1101": "Templates",
            "group-v1102": "Monitoring",
            "group-v1103": "Routing",
            "group-v1104": "CI-CD Pipeline",
            "group-v2001": "vCLS",
            "group-v26010": "Security",
    "cluster": {
        "description": "The vSphere compute cluster to put the VM into",
        "suggestions": {
            "domain-c1008": "cluster01"
Operation Complete!

As a footnote, I created this "suggestions" tool in multiple projects at this point - my major issue with most DevOps toolsets is that they don't effectively document ingress/egress schemas, leaving a user of their code "stuck" if they didn't write it. Even when using a tool like Ansible, I would probably create accompanying tools to "discover" or "suggest" inputs to help the user along.

Popular Posts