Sunday, September 19, 2021

Get an A on with VMware Avi / NSX ALB (and keep it that way with SemVer!)

Cryptographic security is an important aspect of hosting any business-critical service.

When hosting a public service secured by TLS, it is important to strike a balance between compatibility (The Availability aspect of CIA), and strong cryptography (the Integrity or Authentication and Confidentiality aspects of CIA). To illustrate, let's look at the CIA model:

In this case, we need to balance backward compatibility with using good quality cryptography -  here's a brief and probably soon-to-be-dated overview of what we ought to use and why.


This block is fairly easy, as older protocols are worse, right? 

TLS 1.3

As a protocol, TLS 1.3 has quite a few great improvements and is fundamentally simpler to manage with fewer knobs and dials. There is a major concern with TLS 1.3 currently - security tooling in the large enterprise hasn't caught up with this protocol yet as new ciphers like ChaCha20 don't have hardware-assisted lanes for decryption. Here are some of the new capabilities you'll like::
  • Simplified Crypto sets: TLS 1.3 deprecates a ton of less-than-secure crypto - TLS 1.2 supports up to 356 cipher suites, 37 of which are new with TLS 1.2. This is a mess - TLS 1.3 supports five.
    • Note: The designers for TLS 1.3 achieved this by removing forward secrecy methods from the cipher suite, and they must be separately selected.
  • Simplified handshake: TLS 1.3 connections require fewer round-trips, and session resumption features allow a 0-RTT handshake.
  • AEAD Support: AEAD ciphers both support integrity and confidentiality. AES Galois Counter Mode (GCM) and Google's ChaCha20 serve this purpose.
  • Forward Secrecy: If a cipher suite doesn't have PFS (I disagree with perfect) support, it means that a user can capture your network traffic and replay it to decrypt if the private keys are acquired. PFS support is mandatory in TLS 1.2

Here are some of the things you can do to mitigate the risk if you're in a large enterprise that performs decryption:
  • Use a load balancer - since this is about a load balancer, you can protect your customer's traffic in transit by performing SSL/TLS bridging. Set the LB-to-Server (serverssl) profile to a high-efficiency cipher suite (TLS 1.2 + AES-CBC) to maintain confidentiality while still protecting privacy.

TLS 1.2

TLS 1.2 is like the Toyota Corolla of TLS, it's run for forever and not everyone maintains it properly.

It can still perform well if properly configured and maintained - we'll go into more detail on how in the next section. The practices outlined here are good for all editions of TLS.

Generally, TLS 1.0 and 1.1 should not be used. Two OS providers (Windows XP, Android 4, and below) were disturbingly slow to adopt TLS 1.2, so if this is part of your customer base, beware.


This information is much more likely to be dated. I'll try to keep this short:


  • (AEAD) AES-GCM: This is usually my all-around cipher. It's decently fast and supports partial acceleration with hardware ADCs / CPUs. AES is generally pretty fast, so it's a good balance of performance and confidentiality. I don't personally think it's worth running anything but 256-bit on modern hardware.
  • (AEAD) ChaCha20: This was developed by Google, and is still "being proven". Generally trusted by the public, this novel cipher suite is fast despite a lack of hardware acceleration.
  • AES-CBC: This has been the "advanced" cipher for confidentiality before AES-GCM. Developed in 1993, this crypto is highly performant and motivated users to move from suites like DES and RC4 by being both more performant and stronger. Like with AES-GCM, I prefer not to use anything but 256-bit on modern hardware
  • Everything else: This is the "don't bother" bucket: RC4, DES, 3DES


Generally, AEAD provides an advantage here - SHA3 isn't generally available yet but SHA2 variants should be the only thing used. The more bits the better!

Forward Secrecy

  • ECDHE (Elliptic Curve Diffie Hellman): This should be mandatory with TLS 1.2 unless you have customers with old Android phones and Windows XP.
  • TLS 1.3 lets you select multiple PFS algorithms that are EC-based.

Matters of Practice

Before we move into the Avi-specific configuration, I have a recommendation that is true for all platforms:
Cryptography practices change over time - and some of these changes break compatibility. Semantic versioning provides the capability to support three scales of change:
  • Major Changes: First number in a version. Since the specification is focused on APIs, I'll be more clear here. This is what you'd iterate if you are removing cipher suites or negotiation parameters that might break existing clients
  • Minor Changes: This category would be for tuning and adding support for something new that won't break compatibility. Examples here would be cipher order preference changes or adding new ciphers.
  • Patch Changes: This won't be used much in this case - here's where we'd document a change that matches the Minor Change's intent, like mistakes on cipher order preference.

Let's do it!

Let's move into an example leveraging NSX ALB (Avi Vantage). Here, I'll be creating a "first version," but the practices are the same. First, navigate to Templates -> Security -> SSL/TLS Profile:

Note: I really like this about Avi Vantage, even if I'm not using it here. The security scores here are accurate, albeit capped out - VMware is probably doing this to encourage use of AEAD ciphers:
...but, I'm somewhat old-school. I like using Apache-style cipher strings because they can apply to anything, and everything will run TLS eventually. Here are the cipher strings I'm using - the first is TLS 1.2, the second is TLS 1.3.

One gripe I have here is that Avi won't add the "What If" analysis like F5's TM-OS does (14+ only).  Conversely, applying this profile is much easier. To do this, open the virtual service, and navigate to the bottom right:

That's it! Later on, we'll provide examples of coverage reporting for these profiles. In a production-like deployment, these services should be managed with release strategies given that versioning is applied.

Friday, September 17, 2021

Static IPv4/IPv6 Addresses - Debian 11

 Here's how to set both static IPv4 and IPv6 addressing on Debian 11. The new portions are outlined in italics.

First, edit /etc/network/interfaces

auto lo
auto ens192
iface lo inet loopback
# The primary network interface
allow-hotplug ens192
iface ens192 inet static
    address {{ ipv4.address }}
    gateway {{ ipv4.gateway }}
iface ens192 inet6 static
    address {{ ipv6.address }}
    gateway {{ ipv6.gateway }}

Then, restart your networking stack:
systemctl restart networking

Friday, September 10, 2021

VMware NSX ALB (Avi Networks) and NSX-T Integration, Installation

Note: I created a common baseline for pre-requisites in this previous post. We'll be following VMware's Avi + NSX-T Design guide.

This will be a complete re-install. Avi Vantage appears to develop some tight coupling issues with using the same vCenter for both Layer 2 and NSX-T deployments - which is not an issue that most people will typically have. Let's start with the OVA deployment:

Initial setup here will be very different compared to a typical vCenter standalone or read-only deployment. The setup wizard should be very minimally followed:

With a more "standard" deployment methodology, the Avi Service Engines will be running on their own Tier-1 router, and leveraging Source-NAT (misnomer, since it's a TCP proxy) for "one-arm load balancing":

To perform this, we'll need to add two segments to the ALB Tier-1. one for management, and one for vIPs. I have created the following NSX-T segments, with running DHCP and for vIPs:
Note: I used underscores in this segment name, in my own testing both ./ are illegal characters. Avi's NSX-T Cloud Connector will report "No Transport Nodes Found" if it cannot match the segment name due to these characters.
Note: If you configure an NSX-T cloud and discover this issue, you will need to delete and re-add the cloud after fixing the names!
Note: IPv6 is being used, but I will not share my globally routable prefixes.

First off, let's create NSX-T Manager and vCenter Credentials:
There is one thing that needs to be created on vCenter as well - a content library. Just create a blank one and label it accordingly, then proceed with the following steps:
Click Save, and get ready to wait. The Avi controller has automated quite a few steps, and it will take a while to run. If you want, the way to track any issue in NSX ALB is to navigate to Operations -> Events -> Show Internal:
Once the NSX Cloud is reporting as "Complete" under Infrastructure -> Dashboard, we need to specify some additional data to ensure that the service engines will deploy. To do this, we navigate to Infrastructure -> Cloud Resources -> Service Engine Groups, and select the Cloud:
Then let's build a Service Engine Group. This will be the compute resource attached to our vIPs. Here I configured a naming convention and a compute target - and it can automatically drop SEs into a specific folder.
The next step here is to configure the built-in IPAM. Let's add an IP range under Infrastructure -> Cloud Resources -> Networks by editing the appropriate network ID. Note that you will need to select the NSX-T cloud to see the correct network:
Those of you who have been LTM Admins will appreciate this. Avi SE also perform "Auto Last Hop," so you can reach a vIP without a default route, but monitors (health checks) will fail. The spot to configure the custom routes is under Infrastructure -> Cloud Resources -> Routing:

Finally, let's verify that the NSX-T Cloud is fully configured. An interesting thing I saw here is that Avi 21 shows an unconfigured or "In Progress" cloud as green now, so we'll have to mouse over the cloud status to check in on it. 
Now that everything is configured (at least in terms of infrastructure), Avi will not deploy Service Engines until there's something to do! So let's do that:
Let's define a pool (back-end server resources):

Let's set a HTTP-to-HTTPS redirect as well:

Finally, let's make sure that the correct SE group is selected:
And that's it! You're up and running with Avi Vantage 21! After a few minutes, you should see deployed service engines:
The service I configured is also now up - In this case, I'm using Hyperglass, and I can leverage the load-balanced vIP to check and see what the route advertisement from Avi looks like. As you can see, it's firing a multipath BGP host address:

Friday, September 3, 2021

vCenter - File system `/storage/log` is low on storage space

After a recent VCSA reboot, I was seeing the infamous `no healthy upstream` error from vCenter.

The first place to check for issues like this is VMware's Virtual Appliance Management Interface (VAMI), located by default via HTTPS on port 5480. An administrator can use the appliance root password for this particular interface.

When reviewing this issue with the VAMI, I saw the following error:

Now, VCSA by design automatically rotates most logs available on the appliance using the open-source tool logrotate, but nothing in this directory appears to be managed:

root@vcenter [ / ]# grep \/storage\/log /etc/logrotate.d/*

I'd say this particular log partition is going to need some manual cleanup every now and then. To open up the CLI, SSH into vCenter and execute the following command:
Command> shell
Shell access is granted to root

First, let's get an idea of how full the disks are:
Note: The -m switch converts units into Megabytes
root@vcenter [ ~ ]# df -m
Filesystem 1M-blocks Used Available Use% Mounted on
devtmpfs 5982 0 5982 0% /dev
tmpfs 5993 1 5992 1% /dev/shm
tmpfs 5993 2 5992 1% /run
tmpfs 5993 0 5993 0% /sys/fs/cgroup
/dev/sda3 46988 7199 37374 17% /
tmpfs 5993 5 5988 1% /tmp
/dev/mapper/dblog_vg-dblog 15047 185 14080 2% /storage/dblog
/dev/mapper/vtsdb_vg-vtsdb 10008 68 9412 1% /storage/vtsdb
/dev/mapper/vtsdblog_vg-vtsdblog 4968 36 4661 1% /storage/vtsdblog
/dev/sda2 120 30 82 27% /boot
/dev/mapper/log_vg-log 10008 9475 6 100% /storage/log
/dev/mapper/core_vg-core 25063 45 23723 1% /storage/core
/dev/mapper/db_vg-db 10008 507 8974 6% /storage/db
/dev/mapper/updatemgr_vg-updatemgr 100273 1953 93185 3% /storage/updatemgr
/dev/mapper/netdump_vg-netdump 985 3 915 1% /storage/netdump
/dev/mapper/lifecycle_vg-lifecycle 100273 3364 91775 4% /storage/lifecycle
/dev/mapper/autodeploy_vg-autodeploy 10008 37 9444 1% /storage/autodeploy
/dev/mapper/imagebuilder_vg-imagebuilder 10008 37 9444 1% /storage/imagebuilder
/dev/mapper/seat_vg-seat 10008 1185 8295 13% /storage/seat
/dev/mapper/archive_vg-archive 50133 16373 31185 35% /storage/archive

The log partition is definitely full. To take an inventory of disk usage, we'll use the du utility, with the s (summarize) and m (megabytes) switches enabled, and then pass the output to sort with the n (numerical) and r (reverse) switches enabled to focus on the most important first.
root@vcenter [ / ]# du -sm /storage/log/vmware/* | sort -n -r
2578 /storage/log/vmware/eam
2286 /storage/log/vmware/lookupsvc
785 /storage/log/vmware/sso
781 /storage/log/vmware/vsphere-ui
530 /storage/log/vmware/vmware-updatemgr

Examining these folders further, quite a few of these are old and never rotated. VMware provides the following guidance on what's safe or isn't. Generally, Linux has issues with files being deleted out from under it, so obviously rotated logs can be safely removed. If this is a production system, I'd recommend calling VMware GSS instead of taking it upon yourself. The above command (du -sm * | sort -nr) can be used in any working directory to see what is filling up the logs the most. Here are a few examples of what I deleted to make room:
rm -rf /storage/log/vmware/eam/web/localhost-2020-*
rm -rf /storage/log/vmware/eam/web/localhost_access.2020*
rm -rf /storage/log/vmware/eam/web/catalina-2020*

From here, I like to verify that space is cleared:
root@vcenter [ / ]# df -m | grep \/storage\/log
/dev/mapper/log_vg-log 10008 5793 3688 62%

Catalina and Tomcat names for the same thing. This software package proxies inbound HTTP requests to specific applications, allowing many developers to build code without having to construct a soup-to-nuts HTTP server. Other similar (but more recent) projects include Python's Flask.

With HTTP Proxies and servers, it is useful to keep comprehensive records indicating "who did what", both for security reasons ("whodunit") and for debugging reasons. As a result, Tomcat is a serious log-hog wherever it exists, and it almost never reviews old logs. This is why I evaluated the change as relatively safe.

If this was not an appliance, I would have added a logrotate spec to automatically delete old files from this directory, but it is not recommended to alter VCSA in this way.

Popular Posts