Sunday, January 16, 2022

Bogons, and how to leverage public IP feeds with NSX-T

Have you ever wondered what happened to all the privately-addressed traffic coming from any home network?

Well, if it isn't explicitly blocked by the business, it's routed, and this is not good. Imagine what data leakage can occur when a user mistypes a destination IP - the traffic goes out to the Service Provider, who will probably drop it somewhere, but it's inviting wiretapping/hijacking to occur.

RFC 1918 over the internet is part of a larger family of addresses called "bogons", an industry term to indicate a short list of prefixes that shouldn't be publicly routed.

Many network attacks traversing the public internet flow from what the industry calls "fullbogons", or addresses that, while publicly routable, aren't legitimate. These addresses are obviously block-able, with no legitimate uses. 

As it turns out, the industry calls these types of network traffic Internet background noise, and recent IPv4 shortages have pushed some providers (Cloudflare in particular) into implementing on previous fullbogon space and shouldering the noise from an internet-load of mis-configured network devices. 

The solution for mitigating both problems is the same: filtering that network traffic. Team Cymru provides public services that list all bogon types for public ingestion, all that needs to be done here is implementation and automation.

Bogon strategies

Given that the bogon list is extremely short, bogons SHOULD be implemented as null routes on perimeter routing. Due care may be required when filtering RFC 1918 in enterprise deployments with this method - Longest Prefix Match (LPM) will ensure that any specifically routed prefix will stay reachable, as long as dynamic routing is present and not automatically summarizing to the RFC 1918 parent. If this is a concern, implement what's possible today and build a plan for what isn't later.

Here's an example of how to implement with VyOS:


protocols {
    static {
        route 10.0.0.0/8 {
            blackhole {
            }
        }
        route 10.66.0.0/16 {
            blackhole {
            }
        }
        route 100.64.0.0/10 {
            blackhole {
            }
        }
        route 169.254.0.0/16 {
            blackhole {
            }
        }
        route 172.16.0.0/12 {
            blackhole {
            }
        }
        route 192.0.2.0/24 {
            blackhole {
            }
        }
        route 192.88.99.0/24 {
            blackhole {
            }
        }
        route 192.168.0.0/16 {
            blackhole {
            }
        }
        route 198.18.0.0/15 {
            blackhole {
            }
        }
        route 198.51.100.0/24 {
            blackhole {
            }
        }
        route 203.0.113.0/24 {
            blackhole {
            }
        }
    }
}

This approach is extremely straightforward and provides almost instant value.

Fullbogon strategies

For smaller enterprises and below (in this case, "smaller enterprise" means unable to support 250k+ prefixes via BGP, so nearly everybody) the most effective path to mitigate fullbogons isn't routing. Modern policy based firewalls typically have features that can subscribe to a list and perform policy-level packet filtering. The following are examples of firewall platform built-ins that let you just subscribe to a service:

In all of these cases, policies must be configured to enforce on traffic in addition to ingesting the threat feeds.

We can build this on our own, though. Since NSX-T has a policy API, let's apply it to a manager:

The method I provided here can be applied to any IP list with some minimal customization. There is only really one key drawback to this population method - the 4,000 object limit.

GitOps with NSX Advanced Load Balancer and Jenkins

GitOps

GitOps, a term coined in 2017, describes the practice of performing infrastructure operations from a Git repository. In this practice, we easily develop the ability to re-deploy any broken infrastructure (like application managers), but that doesn't really help infrastructure engineers.

From the perspective of an Infrastructure Engineer, Git has a great deal to offer us:

  • Versioning: Particularly with the load balancer example, many NEPs (Network Equipment Providers) expose object-oriented profiles, allowing services consuming the network to leverage versioned profiles by simply applying them to the service.
  • Release Management: Most enterprises don't have non-production networks to test code, but having a release management strategy is a must for any infrastructure engineer. At a minimum, Git provides the following major tools for helping an infrastructure engineer ensure quality when releasing changes:
    • Collaboration/Testing: Git's Branch/Checkout features contribute a great deal to allowing teams to plan changes on their own infrastructure. If virtual (simulated production) instances of infrastructure are available, this becomes an incredibly powerful tool
    • Versioning: Git's Tags feature provides an engineer the capability of declaring "safe points" and clear roll-backs sets in the case of disaster.
    • Peer Review: Git's Pull Request feature is about as good as it gets in terms of peer review tooling. When releasing from the "planning" branch to a "production" branch, just create a Pull Request providing notification that you want the team to take a look at what you indent to do. Bonus Points for performing automated testing to help the team more effectively review the code.

Applying GitOps

On Tooling

Before visiting this individual implementation, none of these tools are specific or non-replaceable. The practice is what matters more than the tooling, and there are many equivalents here:

  • Jenkins: Harness, Travis CI, etc
  • GitHub: GitLab, Atlassian, Gitea, etc.
  • Python: Ansible, Terraform, Ruby, etc.

GitOps is pretty easy to implement (mechanically speaking). Any code designed to deploy infrastructure will execute smoothly from source control when the CI tooling is completely set up. All of the examples provided in this article are simple and portable to other platforms.

On Practice

This is just an example to show how the work can be executed. The majority of the work in implementing GitOps lies with developing release strategy, testing, and peer review processes. The objective is to improve reliability, not to recover an application if it's destroyed.

It does help deploy consistently to new facilities, though.

Let's go!

Since we've already developed the code in a previous post, most of the work is already completed - the remaining portion is simply configuring a CI tool to execute and report.

A brief review of the code (https://github.com/ngschmidt/python-restify/blob/main/nsx-alb/apply_idempotent_profiles.py) shows it was designed to be run headless and create application profiles. Here are some key features for pipeline executed code to keep in mind:

  • If there's a big enough problem, crash the application so there's an obvious failure. Choosing to crash may feel overly dramatic in other applications, but anything deeper than pass/fail takes more comprehensive reporting. Attempt to identify "minor" versus "major" failures when deciding to crash the build. It's OK to consider everything "major".
  • Plan the code to leverage environment variables where possible, as opposed to arguments
  • Generate some form of "what was performed" report in the code. CI tools can email or webhook notify, and it's good to get a notification of a change and what happened (as opposed to digging into the audit logs on many systems!)
  • Get a test environment. In terms of branching strategy, there will be a lot of build failures and you don't want that to affect production infrastructure.
  • Leverage publicly available code where possible! Ansible (when fully idempotent) fits right into this strategy, just drop the playbooks into your Git repository and pipeline.

Pipeline Execution

Here's the plan. It's an (oversimplified) example of a CI/CD pipeline - we don't really need many of the features required by a CI tool here:

  • Pull code from a Git Repository + branch
    • Jenkins can support a schedule as well, but with GitOps you typically just have the pipeline check in to SCM and watch for changes.
  • Clear workspace of all existing data to ensure we don't end up with any unexpected artifacts
  • Load Secrets (username/password)
  • "Build". This stage, since we don't really have to compile, simply lets us execute shell commands.
  • "Post-build actions". Reporting on changed results is valuable and important, but the code will also have to be tuned to provide a coherent change report that turns to code. Numerous static analysis tools can also be run and reported on from here.

The configuration is not particularly complex because the code is designed for it:

 

This will perform unit testing first, then execute and provide a report on what changed.

Building from Code

The next phase to GitOps would be branch management. since the production or main branch now represents production, it's not particularly wise to simply commit to it when we attempt to create a new feature or capability. We're going to prototype next:

  • Outline what change we want to make with a problem statement

  • Identify the changes desired, and build a prototype. Avi is particularly good at this, because each object created can be exported as JSON once we're happy with it.
    • This can be done either as-code, or by the GUI with an export. Whichever works best.
  •  Determine any versioning desired. Since we're going to make a functional but not breaking change, SemVer doesn't let us increment the third number. Instead, we'll target version v0.1.0 for this release.
  • Create a new branch, and label in a way that's useful, e.g. clienttls-v0.1.0-release
  • Generate the code required. Note: If you use the REST client, this is particularly easy to export:
    • python3 -m restify -f nsx-alb/settings.json get_tls_profile --vars '{\"id\": \"clienttls-prototype-v0.1.0\"}' 
  • Place this as a JSON file in the desired profile folder. 
  • Add the new branch to whatever testing loop (preferably the non-prod instance!) is currently used to ensure that the build doesn't break anything.
  • After a clean result from the pipeline, create a pull request (Example: https://github.com/ngschmidt/python-restify/pull/17). Notice how easy it is to establish peer reviews with this method!

After the application, we'll see the generated profiles here:

What's the difference?

When discussing this approach with other infrastructure engineers, the answer is "not much". GitOps is not useful without good practice. GitOps, put simply, makes disciplined process easier:

  • Peer Review: Instead of meetings, advance reading, some kind of Microsoft Office document versioning and comments, a git pull request is fundamentally better in every way, and easier too. GitHub even has a mobile app to make peer review as frictionless as possible
  • Testing: Testing is normally a manual process in infrastructure if performed at all. Git tools like GitHub and Bitbucket support in-line reporting, meaning that tests not only cost zero effort, but the results are automatically added to your pull requests!
  • Sleep all night: It's really easy to set up a 24-hour pipeline release schedule, so that roll to production could happen at 3 AM with no engineers awake unless there's a problem

To summarize, I just provided a tool-oriented example, but the discipline and process is what matters. The same process would apply to:

  • Bamboo and Ansible
  • Harness and Nornir
The only thing missing is more systems with declarative APIs.

Sunday, January 2, 2022

Leverage Idempotent, Declarative Profiles with the NSX-ALB (Avi) REST API

 Idempotence and Declarative Methods - not just buzzwords

Idempotence

Coined by Benjamin Peirce, this term indicates that a mathematical operation will produce a consistent result, even with repetition.

Idempotence is much more complicated subject in mathematics and computer science. IT and DevOps use a simplified version of this concept, commonly leveraging flow logic instead of Masters-level Algebra.

Typically, an idempotent function in DevOps-land adds a few other requirements to the mix:

  • If a change is introduced, convergence (the act of making an object match what the consumer asked for) should be non-invasive and safe
    • It's the responsibility of the consumer to adequately test this
  •  Provide a "What If?" function of some kind, indicating how far off from desired state a system is
    • It's the responsibility of the consumer to adequately test this. Idempotent systems should provide a method for indicating what will change, but won't always provide a statement of impact

 Ansible's modules are a good example of idempotent functions, but Ansible doesn't require that everything be idempotent. Some good examples exist of methods that cannot be idempotent, re-defining it to add the "do no harm" requirement:

  • Restarting a service
  • Deleting and re-adding a file

As a result, many contributed modules are not pressured to be idempotent when they should be. It's the responsibility of the consumer (probably you) to verify things don't cause harmful change.

Declarative Methods

Lori MacVittie (F5 Networks) provides an excellent detailed explanation of Declarative Models here:

https://www.f5.com/company/blog/why-is-a-declarative-model-important-for-netops-automation

Declarative Methods provide a system interface that can be leveraged by a non-Expert, by allowing a consumer to specify what the consumer wants instead of how to build it (an Imperative method).

This is a huge issue in the IT industry in general, because we (incorrectly) conflate rote memorization of individual imperative methods with capability. In the future, the IT industry will be forced to transform away from this highly negative cultural pattern

We as professionals need to solve two major problems to assist in this transition:

  • Find a way to somehow teach fundamental concepts without imperative methods
  • Teach others to value the ability to effectively define what they desire in a complete and comprehensive way

If you've ever been frustrated by an IT support ticket that has some specific steps and a completely vague definition of success, declarative methods are for you. The single most important aspect of declarative methods is that  the user/consumer's intent is captured in a complete and comprehensive way. If a user fails to define their intent in modern systems like Kubernetes, the service will fail to build. In my experience, problem #1 feeds into problem #2, and some people just think they're being helpful by requesting imperative things.

Obviously the IT industry won't accept that a computer system is allowed to deny them if they failed to define everything they need to. This is where expertise comes in.

How we can use it in DevOps

Here's the good news - designing and providing systems to provide idempotent, declarative methods of cyclical convergence isn't really an enterprise engineer's responsibility. Network Equipment Providers (NEP) and systems vendors like VMware are on the hook for that part. We can interact with provided functions leveraging some relatively simple flow logic:

Well-designed APIs (NSX ALB and NSX-T Data Center are good examples) provide a declarative method, ideally versioned (minor gripe with NSX ALB here, the message body contains the version and may be vestigial), and all we have to do is execute and test.

In a previous post, I covered that implementing reliability is the consumer's responsibility, transforming a systems engineer's role into one of testing, ensuring quality and alignment of vision as opposed to taking on all of the complex methods ourselves

TL;DR Example Time, managing Application Profiles as Code (IaC)

 Let's start by preparing NSX ALB(Avi) for API access. The REST client I'm using uses HTTP Basic Authentication, so it must be enabled - the following setting is under System -> Settings -> Access Settings:

Note: In a production deployment other methods like JWT ought to be used.

The best place to begin here with a given target is to consult the API documentation, provided here: https://avinetworks.com/docs/21.1/api-guide/ApplicationProfile/index.html

When reviewing the documentation VMware provides, declarative CRUD methods are all provided (GET, PUT, PATCH, DELETE) for an individual application profile. Let's leverage the workflow above as code (Python 3)


# Recursively converge application profiles
def converge_app_profile(app_profile_dict):
    # First, grab a copy of the existing application profile
    before_app_profile = json.loads(
        cogitation_interface.namshub(
            "get_app_profile", namshub_variables={"id": app_profile_dict["uuid"]}
        )
    )

    # Fastest and cheapest compare operation first
    if not app_profile_dict["profile"] == before_app_profile:
        # Build a deep difference of the two dictionaries, removing attributes that are not part of the profile, but the API generates
        diff_app_profile = deepdiff.DeepDiff(
            before_app_profile,
            app_profile_dict["profile"],
            exclude_paths=[
                "root['uuid']",
                "root['url']",
                "root['uuid']",
                "root['_last_modified']",
                "root['tenant_ref']",
            ],
        )

        # If there are differences, try to fix them at least 3 times
        if len(diff_app_profile) > 0 and app_profile_dict["retries"] < 3:
            print("Difference between dictionaries found: " + str(diff_app_profile))
            print(
                "Converging "
                + app_profile_dict["profile"]["name"]
                + " attempt # "
                + str(app_profile_dict["retries"] + 1)
            )
            # Increment retry counter
            app_profile_dict["retries"] += 1
            # Then perform Update verb on profile
            cogitation_interface.namshub(
                "update_app_profile",
                namshub_payload=app_profile_dict["profile"],
                namshub_variables={"id": app_profile_dict["uuid"]},
            )
            # Perform recursion
            converge_app_profile(app_profile_dict)
        else:
            return before_app_profile
Idempotency is easy to achieve, we leverage the deepdiff library to process data handled by a READ action, and then execute a re-apply action if it doesn't match. This method will allow me to just mash the execute key until I feel good with the results. I've included a retry counter as well to prevent looping.

That's actually all there is to it - this method can be combined with Semantically Versioned Profiles. I have provided public examples on how to execute that in the source code: https://github.com/ngschmidt/python-restify/tree/main/nsx-alb

Popular Posts