Wednesday, August 25, 2021

VMworld 2021 is right around the corner! Here are my top 10 sessions!

VMworld 2021 is online this year

I'll really miss some of the sessions and exploration we've had in past years in person, but I think VMware made the right call this year. We can expect to see a fundamental shift with online conventions - and this will need some unique strategy compared to previous years.

The Basics

I attended my first VMworld in 2016, and to describe it as information overload would be an understatement. It's only been a few years, but here's what I have to say to new VMworld attendees:
  • Give yourself time between sessions: it's too easy to switch between video streams at home - but it's a trap. Your brain needs time to process new information, and normally stretching your legs and walking around would help with that. After a particularly heavy session, get away from your keyboard and give yourself time to think. It's like college, if you take too many classes you will perform less effectively than if you capped out your class time.
  • Talk to people: The Orbital Jigsaw Discord server can serve as a water cooler of sorts here - remember that you always can learn more with others than on your own.
  • Be kind to your mind: I'm mentioning it twice, and I don't care. trying to absorb everything will be stressful, the single most important thing you can do is take care of yourself. Don't skip meals, don't skip time with the kids, don't skip out on rest.
VMware has provided a lot more content in the breakout sessions this year, and it's because we can't do stuff like the fun run. Here are my sessions of interest:

Fundamentally Important Sessions

At its core - I'd like to break out sessions that would be of critical importance, aforementioned biases notwithstanding:
  • Enhance Data Center Network Design with NSX and VMware Cloud Foundation [NET1789]
    • Nimish Desai is an extremely colorful presenter. In my first VMworld, I was actually wandering around the halls and heard yelling from one of the auditoriums, and decided to wander in and take a look. It turns out he was asking some questions about OSPF and I answered one right and ended up with some trucker cap he'd glued a marketing-noncompliant NSX logo onto and didn't leave the auditorium for about 3 hours. This was on NSX-V Fundamentals - for a director he is an extremely capable teacher and presenter.
    • I consider this (other names before it, it's basically NSX fundamentals) session every year a foundation for just about everything VMware and SDN.
  • NSX-T Design, Performance and Sizing for Stateful Services [NET1212]
    • This one has to be good. My other favorite presenter on NSX has always been Samuel Kommu, he specializes in flaying whatever SDN platform crosses his desk within an inch of its life, and then squeezing a little bit more than that out of it. He was the first engineer to get NSX-V past 40 Gigabits/s. Nicolas Michel is a capable engineer in the newer NSX-T team, they appear to be based out of EMEA, and is a total Linux and Open Source guy too. NSX-T is based almost completely on open source software and his team is working to recreate the old NSX functionality with F/OSS.
    • In this case, we're visiting how to build out the stateful back-end (Tier-1) services, essentially the bits that make a network "smart". NSX-T has some highly unique next-gen scaling capabilities for these service types. Packet inspection devices are the bottleneck in nearly all modern enterprise networks, this will present a fresh perspective on solving this problem!
  • Extreme Performance Series: vSphere Advanced Performance Boot Camp [MCL2033]
    • This class every year is basically required for anyone interested in their VCAP (DCV) as it handles the most important subject for virtualization - getting the absolute most value out of your equipment. It is a Tech+ pass session but probably justifies it by itself. If you're having trouble putting together the in-book subjects while studying for VCAP/VCP, this is where you want to go.

Interesting Sessions

  • Apply SRE’s Golden Signals for Monitoring Toward Network Operations [NET1088]
    • The title more or less says it all, this would be step 4 after a round-trip of fundamentals. The first thing I try to do when encountering a new technology is to make it reliable, and this is a logical progression.
  • (Tech+)Future-Proof Your Network with IPv6, Platform Security and Compliance [EDG1024]
    • If you haven't guessed, IPv6 is coming and you can't avoid it. With that out of the way, VMware's Networking and Security Business Unit (NSBU) has covered significant ground getting the rest of the company IPv6-ready. This is a Tech+ session primarily focused on SD-WAN, so if you're interested in how an enterprise can become IPv6-ready, this is where to start.
  • (Tech+)NSX-T Reference Designs for vSphere with Tanzu [NET1426]
    • NSX-T's hidden superpower is actually container networking. It's designed from the ground up with two Container Plugins - Antrea and NCP - that support container networking without complex Flannel/IPTables configurations simply to get stuff to work.
  • Getting Started with NSX Infrastructure as Code [NET2272]
    • I'll be blunt here, I've made several series of blog posts on this already, but NSX-T is a complicated animal, and it's important to build it right. In my opinion, the best way to do this is to prototype your deployment repeatedly until it's as close to perfect as you can get it.
    • There are two major paths to automate NSX-T here:
      • The platform: Ansible/Terraform helps us here to maintain configured state. In a previous life I crushed concrete cylinders to see if they're strong enough, this is like that but digital (and safer!)
      • The services: vRealize Automation / vCloud Director provides services on top of the base networking we provide, it is important to understand how people consume networks we build.
  • NSX-T and Infrastructure as Code [CODE2741]
    • Yes, this will take more than one session to absorb. VMware understands that - Nicolas Michel is front-ending this one too, he's working on a YouTube channel called vPackets to capture some of this automation knowledge.

Telecom Sessions

I'm breaking this out because "there are dozens of us!" 

Apparently, VMware thinks there are more of us than that - and is diving head-first into the breach. VMware has developed a robust hosting and automation suite of services to help accelerate telecommunication delivery.

I'm hoping this will possibly transform smaller ISPs into more of an Edge model, where the telecom provides the pipe and "stuff" on top of it as an additional revenue source. It'd be pretty exciting - even if you don't have a 4-post rack and some cooling, you could loan some cycles from a colocation space as needed. Despite most complaints, telecommunications companies have a few strengths here, namely:
  • Drive. Telecom engineers do what they do to connect people to information - regardless of how one will often complain about how their internet sucks, these guys are out there working nonstop to help make things just that little bit better.
  • Connectivity. While this ought to be a given, do you as a customer want to deal with the stress of relocating your server farm while down-sizing offices due to COVID?
  • Connectivity (people) believe it or not, running cable in every major city will build up quite the Rolodex. If anyone can find a viable physical space to fit your equipment/services, it'd be the telecom company.
Before I go too far, there is a ton of sensationalism on "The Edge!(tm)" All this really means is what I've explained here - your telecommunications provider would be empowered to deploy distributed compute stacks regionally to fit your (low latency? more like cost-effective!) workload needs. This is especially important in Alaska, where reaching out to the data center the "next town over" is a microwave relay system reaching hundreds of miles.

There's also quite a bit of misinformation on 5G, which fits into my top priority session in this category:

  • A Tour of the Heart of the 5G Network with Nokia and VMware [EDG1935]
    • You probably haven't heard of this Nokia Networks. It doesn't matter, attend this session if you're interested in 5G - the architecture changes from 4G to 5G are myriad, the organization maintaining the standards (3GPP) made dramatic improvements in terms of technical design, and this will give you a bird's eye view.
    • Nokia Networks is a name to track in the future, VMware's NSX-T platform and Nokia's new SR-Linux platform are going to take the data center by storm. Nokia's recent interest in Open Source has culminated in a telecommunication grade workload based on Linux - and they seem to have thought of everything, model-based configuration, automated testing in a container pipeline, the sky is the limit!
  • Demystifying Performance: Meeting Stringent Latency Requirements for RAN [EDG2872]
    • I still groan every time someone states that it's "impossible to virtualize x because of latency!" We wouldn't have a connected Alaska today if we felt that wasn't a good enough reason to try. These guys succeeded.
I look forward to seeing you all there! I'll try my best to be reachable via Twitter @engyak907 and in the Orbital Jigsaw server when I can.

Sunday, August 22, 2021

Managing DNS Servers with Ansible and Jenkins (Unbound, BIND)

DNS is a vital component of all computer networks. Also known as the "Internet Yellow Pages," this service is consumed by every household.

DNS services are typically deployed in several patterns to support users and systems:

  • DNS Forwarder: This deployment method is the most common. Everybody needs name resolution - caching and forwarding DNS results can save you bandwidth and improve localized performance. Most appliances can do this out of the box, and if they don't, try it out! It's really easy and will help you learn how DNS works.
    • Use case: You don't have your own domain and use computers.
  • Managed Public DNS: This deployment method is a significant majority of public domains are managed this way. You pay a third-party provider to manage the authoritative registration of public DNS records
    • Use case: You have a business and own a domain, but don't have any internal resources that you need to resolve.
    • Use case: You have a business and own a domain, but don't want to manage publicly resolvable nameservers
  • Private/Internal Nameserver: This deployment method is typically enterprise-specific, but is also required for home labs and all manner of weird experiments. Since it's not on the internet, we can violate any and all manner of Internet conventions.
    • The first component here is a recursive nameserver because even if you run a second server for recursive lookups, you still need a second server for recursive lookups.
    • Authoritative zones: For any given domain, keep a zone file to resolve against. This will include name-to-record (forward) objects and record-to-name (reverse) objects in separate files.
    • A method to change everything above, this has a high benefit:effort ratio.
For this post, we'll build the structure to have an internal nameserver managed completely from source control. This is surprisingly easy to get started - performing this work with abstraction is a welcome convenience, but not initially necessary as zone files are typically very simple and the application (Bind 9 or Unbound) is only one service.

To perform this, we'll follow this procedure:

  • Install the service - in this case, we'll use CentOS for Bind9 (my old setup), and Debian 11 for Unbound (because Debian 11 is new).
  • Extract the configuration file, and then export it into source control.
  • Create zone files, and then export it into source control
  • Automate delivery from source control to what we'll now call the "DNS Worker Node"

Bind9

dnf install bind
find / -name 'named.conf'
cat /etc/named/named.conf
Example named configuration file (Credit where it's due, the vast majority of this configuration has been provided by CentOS and Bind9 - I set the forwarders, allow-query, listen-on, and zone directives:
options {
        listen-on { any; };
        listen-on-v6 { any; };
        directory       "/var/named";
        dump-file       "/var/named/data/cache_dump.db";
        statistics-file "/var/named/data/named_stats.txt";
        memstatistics-file "/var/named/data/named_mem_stats.txt";
        secroots-file   "/var/named/data/named.secroots";
        recursing-file  "/var/named/data/named.recursing";
        allow-query { 10.0.0.0/8; 127.0.0.1; 2000::/3; };
        forwarders { 1.1.1.1; 9.9.9.9; };
        /*
         - If you are building an AUTHORITATIVE DNS server, do NOT enable recursion.
         - If you are building a RECURSIVE (caching) DNS server, you need to enable
           recursion.
         - If your recursive DNS server has a public IP address, you MUST enable access
           control to limit queries to your legitimate users. Failing to do so will
           cause your server to become part of large scale DNS amplification
           attacks. Implementing BCP38 within your network would greatly
           reduce such attack surface
        */
        recursion yes;

        dnssec-enable yes;
        dnssec-validation yes;

        managed-keys-directory "/var/named/dynamic";

        pid-file "/run/named/named.pid";
        session-keyfile "/run/named/session.key";

        /* https://fedoraproject.org/wiki/Changes/CryptoPolicy */
        include "/etc/crypto-policies/back-ends/bind.config";
        
};

zone "engyak.net" in {
        allow-transfer { any; };
        file "/etc/named/engyak.net.zone";
        type master;
};
Then, let's build a zone file in source control. Please note that there are additional conventions that should be followed when creating new DNS zone records, this is just an example file that will run!
$TTL 2d
@               SOA             ns.engyak.net. hostmaster.engyak.net  (
                                1      ; serial
                                3600            ; refresh
                                600             ; retry
                                608400          ; expiry
                                3600 ) ;
;
;
engyak.net.     IN NS           ns.engyak.net.
ns              IN A            10.0.0.1
johnnyfive      IN A            10.1.1.1
duncanidaho     IN A            10.2.2.2
Copy the named.conf contents into a new source code repository or your existing one, preferably in an organized fashion. Ansible playbook execution is very straightforward. I'd recommend building this in source control as well - see above note about potential process improvements
---
- hosts: ns.engyak.net
  tasks:
    - name: "Update DNS Zones!"
      copy:
        src: zonefiles/engyak.net
        dest: /etc/named/engyak.net.zone
        mode: "0644"
    - name: "Update DNS Config!"
      copy:
        src: conf.d/ns.engyak.net/named.conf
        dest: /etc/named.conf
        mode: "0640"
    - name: "Restart Named!"
      service:
        name: "named"
        state: "restarted"

Any time you run this playbook it will download a fresh configuration and zone file, then restart Bind9.

As a cherry on top, let's make this process smart - if we want to automatically deploy changes to DNS from source control, we need a CI Tool like Jenkins. Start off by creating a new Freeform pipeline to "Watch SCM" - yes, this isn't a real repository.




That's it - add entries, live long, and prosper! Since the Ansible playbook and supporting files are fetched via source control, the only setup required on a DNS worker node is to establish a relationship between it and the CI tool, ex. SSH authentication.

Unbound

Unbound is a newer DNS server project and has quite a few interesting properties. I've been using BIND for well over a decade - and Unbound aims to change a few things, notably:
Oddly enough, there is no features list for this software package, but pretty much everything else is impressively documented. Let's start the installation:
apt install unbound
cat /usr/share/doc/unbound/examples/unbound.conf

Unbound can use the same zonefile format as BIND, so we only need to create a new config file to migrate things over. Note: This is not a production-ready configuration, it's just enough to get me started. 

As I learn more about Unbound, I'll be using source control to implement changes / implement a rollback - an important benefit when making lots of mistakes!


# The server clause sets the main parameters.
server:
        verbosity: 1
        num-threads: 2
        interface: 0.0.0.0
        interface: ::0
        port: 53
        prefer-ip4: no
        edns-buffer-size: 1232

        # Maximum UDP response size (not applied to TCP response).
        # Suggested values are 512 to 4096. Default is 4096. 65536 disables it.
        max-udp-size: 4096
        msg-buffer-size: 65552
        udp-connect: yes
        unknown-server-time-limit: 376

        do-ip4: yes
        do-ip6: yes
        do-udp: yes
        do-tcp: yes

        # control which clients are allowed to make (recursive) queries
        # to this server. Specify classless netblocks with /size and action.
        # By default everything is refused, except for localhost.
        access-control: 10.0.0.0/8 allow
        access-control: 127.0.0.0/8 allow

        private-domain: "engyak.net"
        caps-exempt: "engyak.net"
        domain-insecure: "engyak.net"

        private-address: 10.0.0.0/8

        # cipher setting for TLSv1.2
        tls-ciphers: "ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256"
        # cipher setting for TLSv1.3
        tls-ciphersuites: "TLS_AES_128_GCM_SHA256:TLS_AES_128_CCM_8_SHA256:TLS_AES_128_CCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256"

# Python config section. To enable:
# o use --with-pythonmodule to configure before compiling.
# o list python in the module-config string (above) to enable.
#   It can be at the start, it gets validated results, or just before
#   the iterator and process before DNSSEC validation.
# o and give a python-script to run.
python:
        # Script file to load
        # python-script: "/etc/unbound/ubmodule-tst.py"

# Dynamic library config section. To enable:
# o use --with-dynlibmodule to configure before compiling.
# o list dynlib in the module-config string (above) to enable.
#   It can be placed anywhere, the dynlib module is only a very thin wrapper
#   to load modules dynamically.
# o and give a dynlib-file to run. If more than one dynlib entry is listed in
#   the module-config then you need one dynlib-file per instance.
dynlib:
        # Script file to load
        # dynlib-file: "/etc/unbound/dynlib.so"

# Remote control config section.
remote-control:
        # Enable remote control with unbound-control(8) here.
        # set up the keys and certificates with unbound-control-setup.
        control-enable: no

# Authority zones
# The data for these zones is kept locally, from a file or downloaded.
# The data can be served to downstream clients, or used instead of the
# upstream (which saves a lookup to the upstream).  The first example
# has a copy of the root for local usage.  The second serves example.org
# authoritatively.  zonefile: reads from file (and writes to it if you also
# download it), primary: fetches with AXFR and IXFR, or url to zonefile.
# With allow-notify: you can give additional (apart from primaries) sources of
# notifies.
forward-zone:
      name: "."
      forward-addr: 1.1.1.1
      forward-addr: 9.9.9.9
auth-zone:
      name: "engyak.net"
      for-downstream: yes
      for-upstream: yes
      zonefile: "engyak.net.zone"

To automate file delivery here, we'll use a (similar) playbook for Unbound. The Jenkins configuration will not need to be modified, because the playbook will automatically be re-executed.

---
- hosts: ns.engyak.net
  tasks:
    - name: "Update DNS Zones!"
      copy:
        src: zonefiles/engyak.net
        dest: /etc/unbound/engyak.net.zone
        mode: "0644"
    - name: "Update DNS Config!"
      copy:
        src: conf.d/ns.engyak.net/unbound.conf
        dest: /etc/unbound.conf
        mode: "0640"
    - name: "Restart Unbound!"
      service:
        name: "unbound"
        state: "restarted"

Some Thoughts

This method of building DNS records from a source of truth does replace the master-slave (sorry guys, BIND's terms are not my own!) relationship older name servers will typically use. Personally, I like this method of propagation.

The biggest upside here is that a DNS worker node being unavailable does not prevent an engineer from adding/modifying records as long as recursive name servers support multiple resolvers.

It is eventually consistent, as the orchestrator will update every worker node for you. This may be slower or faster, depending on TTL.

The Ansible playbook I used here will kill your DNS node if you push it into an invalid configuration, so this is probably not production-worthy without additional work.

If you would rather purchase a platform instead of building this capability with F/OSS components, this is basically how Infoblox Grid works.

It'd be really neat to abstract software-specific constructs, which can be done with Python and Jinja2 (or just Ansible and Jinja2!)

VyOS and other Linux builds unable to use `vmxnet3` or "VMware Paravirtual SCSI" adapter on vSphere

Have you seen this selector when building machines on vSphere? This causes some fairly common issues in NOS VMs, as most don't really kn...