Sunday, December 29, 2019

Securing Dual-Stack (IPv4,IPv6) Endpoints with NSX-T

I have mentioned in a previous blog post that I'm not using any ACLs on my tunnel broker VM.

This is usually pretty bad, but again, we can get those protections outside of the VM - I'm using this to prove out how NSX-T can provide utility in this situation.

Solution Overview

VyOS is a fantastic platform, with a ton of rich, extensive features that can empower any network engineer to achieve greater outcomes. There's a lot of good stuff - here I'm using it as a tunnel broker, but we also have these other features:


  • Configuration versioning: Any network platform with in-built configuration versioning (and its cousin, the wonderful "commit review" capability) gets a favorable vote in my book
  • API/CLI: The two have feature parity. It's source control friendly, as I have already shown
  • IPv6: You do not need an IPv4 management plane for this platform to work


  • All routing protocols except IS-IS
  • All VPN functionality except VPNv4 (although EdgeOS, Ubiquiti's fork, has that. It shouldn't take long). This includes WireGuard and OpenVPN, and SIT as I used in this previous example
  • Full IPv6 support, including DHCPv6, RA, SLAAC, OSPFv3, MP-BGP, etc. The only thing missing is 6to4 for completely native IPv6 deployments
It'd be fair to say that VyOS is a fantastically capable router, which like Cisco ISR or any other traditional router, does have some downsides.

What's Missing - or What Could Be Easier

Just as a caveat, I do think we'll see this a lot with virtualized routing and switching. 

VyOS has always had a bit of a problem with firewalling. I've been using it since it was simply Vyatta, prior to Brocade's acquisition, and the primary focus of the platform has always been high-quality routing and switching. Functions like NAT and firewalling are disabled by default and have an extremely obtuse, Layer-4 centric interface for creating new rules. This gets messy pretty quickly, as the rules themselves consume significant configuration space and have to be carefully stacked to apply correctly. This interface is manageable but becomes difficult at scale.

Of course, if it was my entire job to manage firewall policies, I'd automate baseline generation and change modifications, the platform is pretty friendly for that. This may not necessarily be maintainable if it's not placed in an area easily discoverable by other engineers, and definitely doesn't resemble the "single pane of glass" I'd rather have when running a network.

What I'd like to see is a way to intuitively and centrally implement a set of firewall security policies against this device, in a way that can be centrally audited, managed, and maintained. Keep in mind - the auditing aspect is critically important, as any security control that isn't periodically reviewed may not necessarily be effective.

Fortunately, VMWare's NSX (or as it was previously known, vShield) has been doing this for quite some time. There are some advantages to this:
  • Distributed Firewall enforces traffic at the VM's NIC, but is not controlled by the VM. This means that you don't have to automatically trust the workload to secure it.
  • VM Guest Firewalling CPU/NIC costs don't impact the guest's allocation. This blade has two edges:
    • VM Guests don't need firewall resources factored into their workload, as it's not their problem. This allows for easy onboarding, as the application you're protecting doesn't have to be refactored.
    • VM Hosts need CPU to be over-provisioned, as this will be taken out of the host resources at a high priority. This being said, if you're going down the full VMWare Cloud Foundations / Software Defined Data Center (VCF/SDDC) it is important to re-think host overhead, as other components such as vSAN, HA do the same thing!

Securing Workloads

First - we need to ensure that the IPv6 tunnel endpoint VM is on a machine that is eligible for Distributed Firewalling. From the NSX-T homepage, click on the VM Inventory:

Then we select the IPv6 tunnel VM:
From here, let's verify those tags, as we'll be using that in our security policies:

We also need to add some IP Sets - this is the NSX-T construct that handles non-VM or non-Container addressing for external entities. Technically, East-West Firewalling shouldn't always be used for this, but IPv6 tunnel brokering is an edge case: (IP Sets guide here)
From here, you want to add the IP Sets to a group via tag membership - a topic I will cover later as it's vitally important to get right with NSX-T:
We also want to do the same with our virtual machines:

We're all set to start applying policies to it! Navigate over to Security -> East-West Firewalling -> Distributed Firewall:
Add these policies. I have obfuscated my actual addresses under groups for privacy reasons.

That's about it! If you want to add more tunnel nodes, you'd simply apply the tag to any relevant VM with NSX Manager, and all policies are automatically inherited.

Some Recommendations

  • If you haven't deployed a micro-segmentation platform, the #1 thing to remember is that distributed firewalling, because it captures all lateral traffic, generates a TON of logs, all of which happens to be invaluable troubleshooting data. I'd recommend rolling out vRealize Log Insight + Network Insight (vRLI/vRNI) to help here, but ELK stack will probably work just fine in a pinch. 
  • Have a tag plan! Retroactive refactoring of tags is a pretty miserable task, so try and get it at least well organized the first time.
  • Have a naming convention for all of the objects listed above! I'll write a skeleton later on and place on this blog, along with tagging strategies.
  • Make sure to set "Applied to" whenever possible, as this will prevent your changes from negatively affecting other data center tenants.
  • Try to use North-South firewalling (tier-0 and tier-1 edges ONLY) for traffic that leaves the data center. East-West wasn't really designed for that.
  • Try to use North-South firewalling, period. If a data center tenant (or their workload) is not globally trusted, assign that entity its own tier-1, making it really easy to wall off from the rest of the network. This is probably the easiest thing to do in NSX-T, and generates the most value!

Saturday, November 23, 2019

IPv6 Up and Running - Address Planning Basics and using a Tunnel Broker

First things first - let's cover some IPv6 basics.

What's Different

Many aspects of IPv6 is actually much easier than most people would expect - since there's such a large addressing space, entire fields of work with IPv6 go away.

Custom CIDR / Subnetting

Remember how you had to do binary math, and use your crystal ball to guess how many hosts will be on any given subnet? Well, if you use CIDR masks from /29 to /19 for individual subnets, that will be replaced with a /64. 

A great deal of functionality breaks if you use a subnet mask longer than /64 for generic devices - such as RA/DHCP. When setting up a network for any host-facing network, you need to remember only four masks:
  • /64: Use this everywhere
  • /126: Use like a /30, but ONLY when interconnecting network devices. You're not saving space by trying to use this for hosts.
  • /127: Use like a /31, but with even more flakey vendor support. This is more space efficient, but you need to verify that ALL of your equipment supports it, or deal with a really fragmented point-to-point prefix.
  • /128: Loopbacks


You don't need it, because it's IPv4 duct tape. Prepare yourself for a simpler life without it.

Private Addressing

IPv6 does take a different approach here - there are TWO "private" allocations:
  • Link-local addressing (fe80::/10): This addressing allocation is used on a per-segment basis, and pretty much just exists so that every IPv6 speaker will always have an IP address, allowing routing protocols to work on unnumbered interfaces, for example.
  • ULA (fc00::/7) Unique local addresses are on the should not be routed list, and should not be used, generally speaking. You have to use NAT Prefix translation to be globally routable, a feature that isn't well supported. I use this in my spine-and-leaf fabric examples to avoid revealing my publicly allocated prefix, and only in my lab.
Instead, IPv6 architecture focuses on the inverse - allocating prefixes you CAN use. Right now the planet (e.g. Earth, not kidding) has the Global (hehehe) allocation of 2::/3. All IPv6 prefixes are allocated out of this block by providers, using large allocations to ensure easy summarization.


DHCPv6 is not mandatory, as SLAAC/RA Configuration can provide any client device with the default gateway and DNS servers. For enterprise applications, however, it is recommended to use DHCPv6 so you don't unintentionally disclose any information encoded into your IP by SLAAC, and so that your ARP tables aren't murdered by SLAAC privacy extensions. More here.


DNS actually isn't all that different anymore, but still deserves mention for a few reasons. 

The first reason why I think it deserves mention is because, as an application, its IPv6 journey was extremely well designed. 
  • IPv6 Constructs are available, regardless of which "stack" you're running: Global DNS Servers have a new (ish) record type, AAAA, that indicates that IPv6 is available for any service, and any DNS server should serve AAAA records, even if solicited on IPv6. This is useful in situations where your DNS server may have additional attack surface over IPv6, like Microsoft's Active Directory servers. It also helps make your migration strategy a bit smoother, as you implement the IPv6 stack progressively throughout your network.
Second, if you don't have AAAA resolving, IPv6 won't do much for you.

IPv6 Address Planning

IPv6 address planning is fundamentally different for the reasons listed above, but I do have some general guidelines that help establish a good starting point:
  • /48 and /56 are good site prefixes: Since we are using 8x the space in our FIB for each route, allocate a /48 or /56 depending on size per site, but don't do anything weird like allocating a /63 or a /62 to save space. Keep your sites consistent. A  /56 is the IPv6 equivalent of a /16 in IPv4 - you'll almost always be right allocating at this length.
  • Allocate the last 2 /64s in your prefix for point-to-point prefixes and loopbacks, respectively. It just keeps address fragmentation less messy, and you can summarize the /64s at your backbone to ensure that traceroute "just works".
  • You have lots of space, leave gaps between sites. If you get a /48, you have 255 sites to play with. You can block out entire regions, sites, in a myriad of ways to help your routing table "make sense".
Here's how I did it (/48 allocated to me, prefix is masked):
  • ffff:ffff:ffff:ffff::/64: Loopbacks
  • ffff:ffff:ffff:fffd::/64: Point-to-point links
  • ffff:ffff:ffff:e::/49: Allocated to NSX-T, because I don't have multiple sites in my lab. Don't do this in the real world, this is for various (messy) experiments with address summarization.
  • ffff:ffff:ffff:b::/49: Allocated to the underlay fabric. See above.
  • ffff:ffff:ffff:a::/64: Home campus network. This is where Pinterest, and other meatspace activities live.
I'm actually not using much else - I'm allocating large because IPv6 Address shortening makes it easier to type (P.S. IPv4 Address shortening works too, but there are fewer opportunities. Try and ping 1.1) and allocating properly would look like:
  • ffff:ffff:ffff::/56 for Site A (Maybe a headquarters location?)
  • ffff:ffff:ffff:001::/56 for Site B (Satellite office near HQ?)
  • ffff:ffff:ffff:008::/56 for Site C (in another geographic region or state?)
  • ffff:ffff:ffff:1::/56 for Site D (HQ in another country?)
Hopefully this is helpful - when in doubt, whiteboard it out.

Well that's nice, but I'd like to actually do something!

Let's go through the process of selecting a tunnel broker (this assumes you do not have native IPv6 connectivity, because this would already be done):

Step 1: Use Wikipedia's Cheat Sheet to select the best tunnel broker for you. Since I'm in the United States, I selected Hurricane Electric. I am biased by their educational outreach and certification program. I cannot recommend enough taking a crack at their Sage certification.
Step 2: Sign up using the links provided in the cheat sheet. If possible, ask for a /48 for maximum productivity.
Step 3: Establish a tunnel - I have provided a VyOS template here, but a great deal of networking equipment supports SIT tunneling, so it's not particularly difficult to set up. Keep in mind that there's no firewall enabled here, I wouldn't recommend the same approach, but I'm doing that elsewhere.
Step 4: Start experimenting!

Saturday, October 26, 2019

Anycast Stateless Services with NSX-T, Implementation

First off, let's cover what's been built so far:
To set up an anycast vIP in NSX-T after standing up your base infrastructure (already depicted and configured), all you have to do is stand up a load balanced vIP at multiple sites. NSX-T takes care of the rest. Here's how:
Create a new load balancing pool.

Create a new load balancer:
Create a new virtual server:
If your Tier-1 gateways have the following configured, you should see a new /32 in your routing table:
Repeat the process for creating a new load balancer and virtual server on your second Tier-1 interface, pinned to a completely separate Tier-0. If multipath is enabled, you should see entries like this in your routing table:

It really is that easy. This process can be repeated for load balancers, and (when eventually supported) multisite network segments.

A few caveats:

  • State isn't carried through: if you're using a stateful service, use your routing protocols (AS-PATH is an easy one) to ensure that devices consistently forward to the same load balancer
  • Anycast isn't load balancing: This is easy here, as NSX-T can do both. This won't protect your servers from overload unless you use one.
  • Use the same server pool: It was (hopefully) apparent that I used the same pool everywhere. Try to keep regional configurations consistent, to ensure that new additions aren't missed for a pool. Server pools should be configured on a per region or per transport zone basis.
Some additional light reading on anycast implementations:

Saturday, October 19, 2019

Anycast Stateless Services with NSX-T, the Theory

Before getting started, let's cover what different IP message types exist in a brief summary, coupled with a "day in the life of a datagram" as it were.

One source, one well-defined destination. Most network traffic falls into this category.

Mayfly perspective:
Source device originates packet, and fires it to whatever route (yes, hosts, VMs and containers can have a routing table) matches based on the destination.
The destination router, if reachable, forwards the packet, and decrements the time-to-live (TTL) field by 1. Rinse and repeat until the destination is reached. Note: the TTL field is 8 bits, so if a message needs over 255 hops, it won't make it. (we're looking at YOU, Mars!) Pretty boring, but boring is good. 

One source, many specific destinations. This has a moderate gain in efficiency over bandwidth constrained links when routed.

In most cases, if a group pruning protocol, e.g. IGMP, MLD, is not running, multicast traffic "floods" and distributes all messages across all ports. The most common application for multicast is as a discovery or routing protocol.

Mayfly perspective:
Source device originates packet and the next layer 2 device replicates the packet to all multicast destinations (if IGMP/MLD is not doing its job, this becomes a flood, and forwards on all ports, which removes the forwarding efficiency) and then stops.
If multicast routing is enabled, traffic will forward just like it did with unicast, and have a moderate increase in efficiency. This is at the expense of traffic control. Since all multicast traffic is inherently stateless, there's no way to manage bandwidth consumption, fully eliminating the efficiency gain in many cases. If you're running routed multicast, I'd highly recommend using BGP to prune the multicast table... to help with some of this.

One source, ALL destinations. This is usually the least efficient traffic type and is part of why most networks don't have one all-encompassing VLAN, but instead use a number of subnetworks. With some exceptions, this traffic type is exclusively for when a source doesn't know how to get to a destination, e.g. ARP.

Mayfly Perspective:
Source device originates packet and the next layer 2 device floods on all ports but the origin (unless it's a hub). This traffic is subsequently dropped by all layer 3 forwarding devices unless a broadcast helper address is configured.

Unicast with a twist. Addresses (or networks) are advertised by multiple nodes, all capable of providing a service, enabling an end device to speak to the nearest available node.

Mayfly Perspective:
Source device originates packet and forwards on the appropriate interface leverages whatever routing metrics will choose. Next Layer 3 device will forward traffic to the available node with the most favorable routing protocol metric. 
There's a lot to unpack here. Let's focus on the main points re: Anycast:
  • It DOES forward to the nearest available node, and if configured correctly, will use less reachable nodes as a backup.
  • It DOES NOT load balance traffic in any meaningful way.
  • It DOES NOT retain state
This is a pretty big deal-breaker, but let's keep in mind that we have more tools - these incapabilities are completely achievable. The only things you need to provide to make a anycast service are:
  • A load balancer
  • A load balancer that provides stateful services, or one that will synchronize state.
  • A load balancer
NSX-T conveniently provides the above with fully integrated routing and switching (We set up BGP, the routing protocol of the internet before), and adds micro-segmentation firewalling to boot. I'll cover more of that on the next post.

Before we go much further, this is a critically important that we understand something very fundamental. 


I know it sounds dramatic, but VMWare's concept of a "transport zone" seems to imply that universal reachability via a PORTABLE SUBNET is the primary goal. In NSX-V, this was described as a Universal Distributed Logical Router (UDLR), and does not appear to be fully implemented in NSX-T. As a network designer, we should plan for universal reachability leveraging the Anycast model, e.g. "Will the nearest NSX-T Edge please stand up" wherever possible. 

Hopefully, it is clear by now, but Anycast isn't a specific IP message type, but instead a design for network reachability. It's commonly Unicast, but can be multicast if an implementation is carefully designed. The core principle for Anycast is to provide the shortest path to an asset, to the best knowledge of the network routing protocol.

More on the practical side of this post, but common Anycast applications include:
  • DNS
  • Application load balancers
  • Content Delivery Networks (CDNs)
Coming soon - how to do this with NSX-T!

Saturday, October 12, 2019

BGP Graceful Restart, some inter-platform oddities, what to do with it

Since most of NSX-T runs in a firewall mode of sorts, it's probably worthwhile to discuss on of the less well-known routing protocol features - Graceful Restart.

As published for BGP, IETF RFC 4724 outlines a mechanism for "preserving forwarding traffic during a BGP restart." This definition may be a little misleading, but that's mostly because of HOW the industry is leveraging Graceful Restart. Here are a few of the "normal use-cases" for BGP GR:

Cisco Non-Stop Forwarding and other similar technologies:
Cisco has developed another standard - NSF - that applies industry-generic methods for executing a BGP restart with forwarding continuity, with a twist. In many cases, multi-supervisor redundancy is a popular way of keeping your high-availability numbers up, with either a chassis switch running multiple supervisor modules or multiple devices bonded into a virtual chassis. In theory, these implementations get better availability numbers because they'll keep the main IP address reachable during software upgrades or system failures.
In my experience, this is great in campus applications, where client devices don't really have any routing/switching capability (like a cell phone) and where availability expectations are somewhat low (99%-99.99% uptime). However, in higher availability situations or ones running extensive routing protocol functionality, this appears to fall apart somewhat, where the caveats start to break the paradigm:

  • ISSU caveats: You have to CONSTANTLY upgrade your routers because ISSU is typically only supported across 1 or 2 minor releases. If you have a "cold" cutover, i.e. with a major version upgrade, you'll see a pretty extensive outage (5-30 minutes long depending on hardware)
  • Older implementations of a multi-supervisor chassis tend to have configuration sync issues, you need to CONSTANTLY test your failover capability (I mean, you should do that anyway...)

Just my 2 cents.  But here's where Graceful Restart does its job: During a supervisor failover, the IP address of the routing protocol speaker is shared between supervisors, so when establishing a routing protocol adjacency, the speakers negotiate GR capability, along with tunable timers. Since the IP doesn't change, the greatest availability action would be to continue forwarding to a "dead" address until the adjacency is established, ensuring sub-second availability for a dynamic routing protocol speaker (except in the case of updating your gear...)
Most firewall implementations are either Active-Active or Active-Standby, with shared IP addresses and session state tables. Well-designed firewall platforms use a generic method for sharing the state table, which includes (ideally) the session table, routing table, etc. ensuring that mismatched software versions do not introduce a disproportionate outage. The primary downside to this approach is that you don't have a good way to test your forwarding path (beyond Layer 2) so you should TEST OFTEN.

Now let's cover where you should NOT use Graceful Restart:
Any situation where the routing protocol speaker does not have a backup supervisor or any state mechanism. Easy, right?

NOPE. You have to enable Graceful Restart on speakers that have an adjacent firewall (or NSX-T Tier-0 gateway) to support the downstream failover.

RFC 4724 outlines two modes for Graceful Restart: Capable and Aware. Intuitively, GR Capable speakers should be stateful network devices, such as multi-supervisor chassis, firewalls, or NSX-T edges, and GR Aware devices should be stateless network devices, such as layer 3 switches.
The catch, however, is that not all devices support GR Awareness mode. For example, it IS supported in IOS 12, but provides caveats on what hardware has this capability.

So why does this matter? Well, Cisco illustrated it well in this NANOG presentation by stating that if an NSF-Capable advertising device fails, but there is no backup device sharing that same IP address, all traffic is dropped until the GR timers expire. Ouch. This is especially bad given some defaults:

  • RFC 8538 Recommendation: 180 seconds
  • Palo Alto: 120 seconds
  • Cisco: 240-300 seconds
  • VMWare NSX-T: 600 seconds?!?!?!?

Now that's pretty weird. If we fetch from VMWare's VVD 5.0.1, it says the following:
NSXT-VISDN-038 Do not enable Graceful Restart between BGP neighbors. Avoids loss of traffic. Graceful Restart maintains the forwarding table which in turn will forward packets to a down neighbor even after the BGP timers have expired causing loss of traffic. 
Coupled with the recommendation for Tier-0 to be active-active (remember, as I stated before, stateless devices do NOT need GR):

Oddly, it did not warn me about needing to restart the session. Let's find out why:

bgp-rrc-l0#show ip bgp summary
BGP router identifier, local AS number 65000
BGP table version is 84, main routing table version 84
7 network entries using 819 bytes of memory
11 path entries using 572 bytes of memory
14/6 BGP path/bestpath attribute entries using 1960 bytes of memory
2 BGP AS-PATH entries using 48 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 3399 total bytes of memory
BGP activity 102/93 prefixes, 264/247 paths, scan interval 60 secs

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd      4 65000  143031  142962       84    0    0 14w1d           2      4 65000  143036  142962       84    0    0 14w1d           1       4 64900  330104  280526       84    0    0 1d17h           1      4 65001  178250  174230       84    0    0 1w0d            3
FD00:6::240     4 65000  310833  578924       84    0    0 14w1d           0
FD00:6::241     4 65000  301493  578924       84    0    0 14w1d           1

Note that for GR to be modified, the BGP session must re-start, so if this was a production environment with equipment that supports GR (*sigh*) you would want to get into the leaf switch and perform a hard restart of the BGP peering.

VMWare's VVD recommendation here is pretty sound, as with most devices the GR checkbox is a global one, so you'd want to buffer between GR/Non-GR with a dedicated router (it's just a VM in NSX's case!), keeping in mind most leaf switches will have GR enabled by default.

Oddly enough, Cisco's Nexus 9000 platform (flagship datacenter switches) default to graceful restart capable. My recommendations (to pile on with the VVD) on this platform would be to:

  • Set BGP timers to 4/12
  • Set GR timers to 120/120 or lower (they're fast switches, so I chose 30/30)
  • Under BGP, configure graceful-restart-helper to make the device GR-Aware instead of GR-Capable
Obviously, the VVD will adequately protect your infrastructure to issues like this, but I think it's unlikely you'll have NSX-T as the only firewall in your entire datacenter.

Saturday, October 5, 2019

NSX-T 2.5 Getting Started, Part 2 - Service Configuration!

Now that the primary infrastructure components for NSX-T are in place, it is now possible to build-out the actual functions that NSX-T is designed to provide.

A friendly suggestion, make sure your Fabric is healthy before doing this:
NSX-T differs from NSX-V quite a bit here. Irregular topologies between edge routers aren't supported, and you have to design any virtual network deployments in a two-tier topology that somewhat resembles Cisco's Aggregation-Access model, but in REVERSE.

The top tier of this network, or as VMWare calls it in their design guide, Tier-0, the primary function provided by logical routers in this layer are simply route aggregation devices, performing tasks such as:
  • Firewalling
  • Dynamic Routing to Physical Network
  • Route Summarization
  • ECMP
The second logical tier, Tier-1 is automatically and dynamically connected to Tier-0 routers via /31s generated from a prefix of your choosing. This logical router will experience a much higher frequency of change, performing tasks like:
  • Layer 2 segment termination/default gateway
  • Load Balancing
  • Firewalling
  • VPN Termination
  • NAT
  • Policy-Based Forwarding
Before implementing said network design, I prefer to write out a network diagram.

Let's start with configuring the Tier-0 gateway:
We'll configure the Tier-0 router to redistribute pretty much everything:
Configure the uplink interface:
Oddly enough, we have spotted a new addition with 2.5 in the wild - the automatic inclusion of prefix-lists!
We also want to configure route summarization, as the switches in my lab are pretty ancient (WS-3560-24TS-E). I'd recommend doing this anyway in production, as it will limit the impact of widespread changes. To pull that off, you *should* reserve the following prefixes, even if they seem excessive:
  • A /16 for Virtual Network Services per transport zone
  • A /16 for NSX-T Internals, allocating /19s to each tier-0 cluster, as outlined in our diagram.
I did so below, and it makes route aggregation or summarization EASY.
Now, we configure BGP Neighbors:
At this point, we want to save and test the configuration. It'll take a while for NSX-T to provision the services listed here, but once it's up, you'll see:
Check for advertised routes. Only routes that exist are aggregated, so you should only see

As a downside, I have prefix-filtering to prevent my lab from stomping on the vital pinterest and netfix network, so I had to add the new prefixes to that:
That was quite a journey! Fortunately, Tier-1 gateway configuration is MUCH simpler, initially. Most of the work performed on a Tier-1 Gateway is Day 1/Day 2, where you add/remove network entities as you need them:
Let's add a segment to test advertisements. I STRONGLY RECOMMEND WRITING A NAMING CONVENTION HERE. This is one big difference between NSX-V and NSX-T, where you don't have this massive UUID in the port group obfuscating what you have. Name this something obvious and readable, your future self will thank you.
Hey look, new routes!

As I previously mentioned, these segments, once provisioned, are just available as port-groups for consumption by other VMs on any NSX prepared host:
Next, we'll configure NSX-T to make waffles!

Sunday, September 29, 2019

NSX-T 2.5 Getting Started, Part 1

Since NSX-T 2.5 just came out, it's about time to do a full rebuild and getting started guide. NSX-T differs greatly from NSX-V in that the initial setup is quite a bit more complicated and doesn't have many guardrails or direct paths to initial set-up.

We'll be skipping the appliance deployment, because if you have troubles deploying an OVA this will probably be too difficult.

First off, we'll be using our applied Clos fabric for this, and we won't be multihoming these devices as of yet, as this post will be pretty lengthy as it is. Diagram is here:

With that in mind, the first step to configuring virtualized routing & switching for NSX-T is in the vCenter GUI. In this lab, I have two hosts in two separate clusters -

  • Payload: Virtual Tunnel Endpoints (VTEPs) exist primarily on the host, and are leveraged as port-groups for guest network connectivity
  • Management/Edge: No host VTEPs currently exist, as they are not required for the management VMs, nor for the Edge Appliances (Primary difference coming from NSX-V!)
Coming from the vCenter UI, it looks like this:
The NSX-T Edge Appliances need to ingest underlay networks via 802.1Q tags, instead of as individual port groups. Fortunately, vSphere has been able to do this for quite some time, so we use the lesser-known "VLAN trunking" 

From here, it's time to outline our Edge Design - BEFORE anything is built.
We'll use this as a guide throughout the configuration process., First, we set transport zones and device profiles:
We create the underlay (VLAN) transport zone to ensure that virtualized traffic can exit to the "real network":
We create the overlay network where the GENEVE VN-Segments will live next:
Then we configure the Layer 2 uplink profiles. Note: specifically configuring the Active uplink to FP-ETH0 is REQUIRED. The NSX Edges will not function without this, and NSX-T will never tell you why.
And the VTEP profiles. Note that this portion uses the name allocated in the transport node profile.
Finally, the host transport profiles. Here we set a profile that will use a single uplink for the N-VDS, add transport zones, etc. Note that the physical NIC name on the left needs to exactly match the physical NIC identifier in ESXi.
Now, we can finally start configuring transport nodes. Note that since we deployed profiles prior to this, there's not a whole lot to do as far as roll-out is concerned.

Ensure the edge appliance is ready:
Configure the edge cluster:
Now we're ready to configure routing and switching functionality. This can go several different ways, as VMWare has provided additional capabilities with regards to configuring NSX-T assets - declarative configuration methods. We'll cover that in detail, along with how to use it, in the next post!

Popular Posts