Keeping It Classless Perspectives On Networks, Automation, Systems, and Software Engineering https://keepingitclassless.net/ Up and Running with Kubernetes and Tungsten Fabric <p>I have a predominantly technical background. You can show me all the slide decks you want but until I can get my hands on it, it’s not real to me. This has greatly influenced what I’m focusing on now that I’m doing more than just technical work - how to reduce the barrier to entry for people to become acquainted with a project or product.</p> <p>As a result, I’ve been getting more involved with <a href="https://tungsten.io/">Tungsten Fabric</a> (formerly OpenContrail). Tungsten is an open source Software-Defined Networking platform, and is a healthy candidate for building some tutorials. In addition, I’m new to the project in general - so, even if only for my own benefit, a blog post summarizing a quick and hopefully easy way to get up and running with it seems quite appropos.</p> <h1 id="introduction-to-the-lab-environment">Introduction to the Lab Environment</h1> <p>We’re going to spin up a 3-node cluster in AWS EC2 running Kubernetes, and using Tungsten Fabric for the networking. Why AWS instead of something like Vagrant? Simply put, a lot of advanced networking software require a lot of system resources - more than most laptops are able to provide. In this case, a total of four virtual machines (three-node cluster plus Ansible provisioning machine) with Kubernetes and Tungsten isn’t exactly “lightweight”, and that’s without any applications on top. So this is a good option to quickly spin up or spin down lab all programmatically.</p> <p>The lab consists of four instances (virtual machines):</p> <ul> <li><strong>Ansible Provisioning VM</strong> - started by CloudFormation, responsible for instantiating the other three instances, and installing Kubernetes and Tungsten on them.</li> <li><strong>Controller</strong> - runs Tungsten and Kubernetes controller software</li> <li><strong>Compute01</strong> - runs Kubernetes Kubelet and Tungsten vRouter, as well as any apps</li> <li><strong>Compute02</strong> - runs Kubernetes Kubelet and Tungsten vRouter, as well as any apps</li> </ul> <p>Recently, the Tungsten wiki was updated with instructions and <a href="https://github.com/tungstenfabric/website/wiki/Tungsten-Fabric:-10-minute-deployment-with-k8s-on-AWS">a Cloudformation template</a> for spinning up this environment. Cloudformation is a service offered by AWS to define a whole bunch of underlying infrastructure in text files ahead of time, so you can just run a single command rather than click through a bunch of GUIs, and presto chango you have a lab.</p> <p>I took this work and ran with it to provide more opinionated parameters. This makes things a little simpler for our uses, so you don’t need to bother with a bunch of inputs to get to a quick Kubernetes/Tungsten cluster.</p> <p>This lab also uses the relatively new <a href="https://github.com/Juniper/contrail-ansible-deployer">Ansible provisioning playbooks</a> for doing much of the legwork. Once CloudFormation spins up a single instance for running these playbooks, they’ll spin up additional AWS instances, and take care of installing Kubernetes and Tungsten components for us.</p> <h1 id="prerequisites">Prerequisites</h1> <p>One advantage of using tools like CloudFormation or Terraform, as well as simpler tools like Vagrant, is that the overwhelming majority of the infrastructure complexity is defined ahead of time in text files, so that you, the user, really only need to do a few things to get a lot of value from this lab. That said, you need to do a few things ahead of time:</p> <ul> <li><a href="https://git-scm.com/book/en/v2/Getting-Started-Installing-Git">Install Git on your local machine</a>. This allows you to clone the repo that contains our lab files so you can run it.</li> <li><a href="https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/">Set up an AWS account</a>. You’ll need to provide a credit card to pay for the compute time. Don’t worry, I’ll make sure to include instructions for shutting everything down so you don’t get charged an arm and a leg.</li> <li><a href="https://docs.aws.amazon.com/cli/latest/userguide/installing.html">Install the AWS CLI</a> and configure it with your <a href="https://docs.aws.amazon.com/cli/latest/userguide/cli-config-files.html">secret keys and secret access keys</a> from your AWS account in the previous step. <a href="https://aws.amazon.com/blogs/security/wheres-my-secret-access-key/">See here for some insight on where to find these</a> - this is how the AWS CLI knows how to authenticate to AWS as you.</li> </ul> <h1 id="spin-up-the-stack">Spin up the “Stack”</h1> <p>CloudFormation defines infrastructure using template files. When we spin up infrastructure using CloudFormation, it refers to it all as a “Stack”. I have a Github repo where my modified CloudFormation template is located, so the first step is to clone this repo to your machine:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone https://github.com/mierdin/tftf &amp;&amp; cd tftf </code></pre></div></div> <p>Now that we’ve got the repo cloned, we can run this command to spin up our stack. Note that we’re referring to <code class="highlighter-rouge">cftemplate.yaml</code> in this command, which is the CloudFormation template that defines our stack, located within this repo:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws cloudformation create-stack --capabilities CAPABILITY_IAM --stack-name tf --template-body file://cftemplate.yaml </code></pre></div></div> <p>If that runs successfully, you should see it output a short JSON snippet containing the Stack ID. At this point, we can navigate to the <a href="https://console.aws.amazon.com/cloudformation/">CloudFormation console</a> to see how the set-up activities are progressing:</p> <div style="text-align:center;"><a href="https://keepingitclassless.net//assets/2018/05/spin-up-stack.png"><img src="https://keepingitclassless.net//assets/2018/05/spin_up_stack.png" width="700" /></a></div> <p>You can navigate to the <a href="https://console.aws.amazon.com/ec2/">EC2 dashboard</a> and click on “Instances” to see the new instance being spun up by CloudFormation:</p> <div style="text-align:center;"><a href="https://keepingitclassless.net//assets/2018/05/ansible_started.png"><img src="https://keepingitclassless.net//assets/2018/05/ansible_started.png" width="700" /></a></div> <p>You might ask - why only one instance? Actually this is how the Ansible playbooks do their stuff. CloudFormation only needs to spin up a single instance with Ansible to run these playbooks. Once done, those playbooks will connect to the AWS API directly to spin up the remaining instances for actually running our cluster.</p> <blockquote> <p>This means <strong>you need to be patient</strong> - it may take a few minutes for all of this to happen. Read on for details on how to know when the provisioning is “done”.</p> </blockquote> <p>After a few minutes, some additional instances will start to appear (use the refresh button to the right):</p> <div style="text-align:center;"><a href="https://keepingitclassless.net//assets/2018/05/cluster_provisioning.png"><img src="https://keepingitclassless.net//assets/2018/05/cluster_provisioning.png" width="700" /></a></div> <p>Eventually, you’ll see a total of four instances in the dashboard - one for our initial Ansible machine spun up by CloudFormation, and the remaining three that will form our Kubernetes/Tungsten cluster:</p> <div style="text-align:center;"><a href="https://keepingitclassless.net//assets/2018/05/cluster_provisioned.png"><img src="https://keepingitclassless.net//assets/2018/05/cluster_provisioned.png" width="700" /></a></div> <h1 id="accessing-the-cluster">Accessing the Cluster</h1> <p>While it’s possible to SSH directly to any instance, as they all have public IPs provisioned, the Ansible machine already has certificates in place to easily authenticate with the cluster instances. So, we can SSH to the Ansible machine once and find everything from there.</p> <p>First, grab the public IP address or FQDN of the Ansible instance:</p> <div style="text-align:center;"><a href="https://keepingitclassless.net//assets/2018/05/get_ip_ansible.png"><img src="https://keepingitclassless.net//assets/2018/05/get_ip_ansible.png" width="700" /></a></div> <p>Then, use that to connect via SSH with the user <code class="highlighter-rouge">root</code> and the password <code class="highlighter-rouge">tungsten123</code>:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ssh root@&lt;ansible instance public IP or FQDN&gt; </code></pre></div></div> <p>You should be presented with a bash prompt: <code class="highlighter-rouge">[root@tf-ansible ~]#</code> on successful login.</p> <p>Now that we’re on the Ansible machine, we can take a look at the Ansible log located at <code class="highlighter-rouge">/root/ansible.log</code>. This is our only indication on the progress of the rest of the installation, so make sure you take a look at this before doing anything else:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tail -f ansible.log </code></pre></div></div> <blockquote> <p>YMMV here. Sometimes I ran this and it was super quick, other times it took quite a long time. Such is the way of cloud.</p> </blockquote> <p>You should see <code class="highlighter-rouge">PLAY RECAP</code> somewhere near the bottom of the output, which indicates Ansible has finished provisioning everything on the other instances. If you don’t, let the execution continue until it finishes.</p> <p>Finally, we can navigate to the Tungsten Fabric (still branded OpenContrail, don’t worry about it :) ) console by grabbing the public IP address:</p> <div style="text-align:center;"><a href="https://keepingitclassless.net//assets/2018/05/get_ip_controller.png"><img src="https://keepingitclassless.net//assets/2018/05/get_ip_controller.png" width="700" /></a></div> <p>Use that IP or FQDN as shown below in your web browser, and log in with the user <code class="highlighter-rouge">admin</code> and the password <code class="highlighter-rouge">contrail123</code> (leave “domain” blank):</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://&lt;controller public IP or FQDN&gt;:8143/ </code></pre></div></div> <div style="text-align:center;"><a href="https://keepingitclassless.net//assets/2018/05/tungsten_screen.png"><img src="https://keepingitclassless.net//assets/2018/05/tungsten_screen.png" width="700" /></a></div> <p>We can use the same FQDN or IP to ssh from our Ansible instance to the controller instance. No password needed, as the Ansible instance already has SSH keys installed on the cluster instances:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ssh centos@&lt;controller public IP or FQDN&gt; </code></pre></div></div> <h1 id="destroy-the-lab-when-finished">Destroy the Lab When Finished</h1> <p>If you wish to clean everything up when you’re not using it to save cost, there’s a bit of a catch. We can delete our CloudFormation stack easily enough with the appropriate command:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws cloudformation delete-stack --stack-name tf </code></pre></div></div> <p>You should eventually see the stack status transition to <code class="highlighter-rouge">DELETE_COMPLETE</code> in the CloudFormation console.</p> <p>However, as mentioned previously, CloudFormation is only responsible, and therefore only knows about, the one Ansible instance. It will not automatically delete the other three instances spun up by Ansible. So we’ll need to go back into the EC2 console, navigate to <code class="highlighter-rouge">instances</code>, and manually check the boxes next to the controller and both compute instances, and select <code class="highlighter-rouge">Actions &gt; Instance State &gt; Terminate</code>.</p> <div style="text-align:center;"><a href="https://keepingitclassless.net//assets/2018/05/terminate.png"><img src="https://keepingitclassless.net//assets/2018/05/terminate.png" width="700" /></a></div> <blockquote> <p>You may also have to clean up unused EBS volumes as well. Make sure you delete any unused volumes from the “EBS” screen within the EC2 console. For some reason, CloudFormation isn’t cleaning these up from the Ansible instance, and I haven’t had a chance to run this issue down yet.</p> </blockquote> <h1 id="conclusion">Conclusion</h1> <p>That’s it for now! We’ll explore this lab in much greater detail in a future blog post, including interacting with Tungsten Fabric, running applications on Kubernetes, and more.</p> <p>I hope you were able to get a working Tungsten Fabric lab up and running with this guide. If you have any feedback on this guide, feel free to leave a comment, and I’m happy to improve it.</p> Tue, 08 May 2018 00:00:00 +0000 https://keepingitclassless.net/2018/05/up-running-kubernetes-tungsten-fabric/ https://keepingitclassless.net/2018/05/up-running-kubernetes-tungsten-fabric/ Get Started with Junos Quickly (and free!) <p>When I got started in networking, my education (like so many network engineers) was all about Cisco. All my networking courses in college, as well as my early networking jobs all used Cisco curricula and equipment, and valued Cisco certifications like the CCNA/CCNP/CCIE above all.</p> <p>It wasn’t until I had already been in the industry for about three years or so before I even got my hands on a Juniper device, and by that time, my IOS habits had taken root in my muscles, which made the new set/delete style of Junos configurations even more strange. While my Junos experience never came close to exceeding my IOS/NXOS experience, I grew to appreciate some of the subtle advantages that Juniper bakes into its software. However, getting this experience meant I had to work that much harder to get my hands on lab gear to make it more a part of my day-to-day experience.</p> <p>These days, it’s way easier to get started with Junos. You don’t have to wait for someone to get you some lab gear - you can set up a virtual lab right on your laptop. While there are a few places you can do this, one of the best and most up-to-date is the <a href="https://github.com/Juniper/vqfx10k-vagrant">vQFX Vagrant</a> repository. This repo contains multiple directories for running a virtualized version of Juniper’s QFX switch ranging from the simple single-node deployment, to a full IP fabric. This means we can do a whole lot of Junos learning, right on our laptop, for free.</p> <div style="text-align:center;"><a href="https://keepingitclassless.net//assets/2018/04/qfx10008-right-high.jpg"><img src="https://keepingitclassless.net//assets/2018/04/qfx10008-right-high.jpg" width="300" /></a></div> <h1 id="prerequisites">Prerequisites</h1> <p>To get started, you will need some software installed on your machine. To keep things simple, we’ll keep it limited to the bare essentials; the very shortest path to being able to play with Junos:</p> <ol> <li><a href="https://www.virtualbox.org/wiki/Downloads">Virtualbox</a> - this is a hypervisor that allows us to run vQFX in a virtual machine.</li> <li><a href="https://www.vagrantup.com/downloads.html">Vagrant</a> - this is a virtual machine orchestrator that configures our VMs for us using the configurations in the Git repo</li> <li><a href="https://git-scm.com/downloads">Git</a> - this is a version control tool we’ll use to download the vqfx-vagrant repository, which contains all needed configurations for running this image.</li> </ol> <p>Once you’ve installed these three pieces of software once, you can then take advantage of the myriad of repositories on the web that contain Vagrant configurations for running virtual network devices - this isn’t limited just to the vQFX environment we’ll use today.</p> <h1 id="boot-up-a-vqfx-instance">Boot Up a vQFX Instance</h1> <p>As mentioned before, the <a href="https://github.com/Juniper/vqfx10k-vagrant">vQFX Vagrant</a> repository contains a number of directories with configurations for running various vQFX topologies.</p> <blockquote> <p>You may also notice there are Ansible playbooks in these directories. These are very useful for building complex configurations using virtual images for deeper learning. However, to keep this post as simple as possible, we’re skipping that part for now. This guide is intended to get you started with a single, vanilla Junos instance as quickly as possible.</p> </blockquote> <p>Now that Git is installed, we can use it to “clone” the Github repository (downloading it to our local machine) that contains the configurations for running our lab. Load up your favorite terminal application (I use iTerm2 for macOS but you can use whatever works for you) and run the following commands to clone the repo and navigate to the directory that contains a Vagrant environment for a single vQFX instance:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone https://github.com/Juniper/vqfx10k-vagrant cd vqfx10k-vagrant/light-1qfx </code></pre></div></div> <p>There’s a file in this directory called <code class="highlighter-rouge">Vagrantfile</code>. This contains instructions to Vagrant for downloading and configuring a virtual machine. What this means for us is we don’t need to click through GUIs to make sure our VM is configured correctly. Just run <code class="highlighter-rouge">vagrant up --no-provision</code> and Vagrant will take care of everything for us.</p> <blockquote> <p>The <code class="highlighter-rouge">--no-provision</code> flag instructs Vagrant to skip the Ansible provisioning process so we can just get straight to playing with Junos. We’ll follow up this blog post with another one that focuses on the various configurations made possible via Ansible in this repo. You can safely ignore the “Machine not provisioned” message you’ll see after this command; this just means the Ansible process was skipped, and we have a vanilla Junos environment.</p> </blockquote> <p>No need to go to a website to download an OVA, or register for anything. One command, and Vagrant downloads the image for you, creates a virtual machine, configures it, and boots it up. You’ll end up seeing something like the below output:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~$ vagrant up --no-provision Bringing machine 'vqfx' up with 'virtualbox' provider... ==&gt; vqfx: Box 'juniper/vqfx10k-re' could not be found. Attempting to find and install... vqfx: Box Provider: virtualbox vqfx: Box Version: &gt;= 0 ==&gt; vqfx: Loading metadata for box 'juniper/vqfx10k-re' vqfx: URL: https://vagrantcloud.com/juniper/vqfx10k-re ==&gt; vqfx: Adding box 'juniper/vqfx10k-re' (v0.3.0) for provider: virtualbox vqfx: Downloading: https://vagrantcloud.com/juniper/boxes/vqfx10k-re/versions/0.3.0/providers/virtualbox.box ==&gt; vqfx: Successfully added box 'juniper/vqfx10k-re' (v0.3.0) for 'virtualbox'! ==&gt; vqfx: Importing base box 'juniper/vqfx10k-re'... ==&gt; vqfx: Matching MAC address for NAT networking... ==&gt; vqfx: Checking if box 'juniper/vqfx10k-re' is up to date... ==&gt; vqfx: Setting the name of the VM: light-1qfx_vqfx_1524617298243_85857 ==&gt; vqfx: Clearing any previously set network interfaces... ==&gt; vqfx: Preparing network interfaces based on configuration... vqfx: Adapter 1: nat vqfx: Adapter 2: intnet vqfx: Adapter 3: intnet vqfx: Adapter 4: intnet vqfx: Adapter 5: intnet ==&gt; vqfx: Forwarding ports... vqfx: 22 (guest) =&gt; 2222 (host) (adapter 1) ==&gt; vqfx: Booting VM... ==&gt; vqfx: Waiting for machine to boot. This may take a few minutes... vqfx: SSH address: 127.0.0.1:2222 vqfx: SSH username: vagrant vqfx: SSH auth method: private key ==&gt; vqfx: Machine booted and ready! ==&gt; vqfx: Checking for guest additions in VM... vqfx: No guest additions were detected on the base box for this VM! Guest vqfx: additions are required for forwarded ports, shared folders, host only vqfx: networking, and more. If SSH fails on this machine, please install vqfx: the guest additions and repackage the box to continue. vqfx: vqfx: This is not an error message; everything may continue to work properly, vqfx: in which case you may ignore this message. ==&gt; vqfx: Setting hostname... ==&gt; vqfx: Machine not provisioned because `--no-provision` is specified. </code></pre></div></div> <p>Now that we have a vQFX instance running, we can run <code class="highlighter-rouge">vagrant ssh</code> to SSH to the virtual machine and get a handle on the Junos CLI.</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~$ vagrant ssh --- JUNOS 17.4R1.16 built 2017-12-19 20:03:37 UTC {master:0} vagrant@vqfx-re&gt; show interfaces Physical interface: gr-0/0/0, Enabled, Physical link is Up Interface index: 645, SNMP ifIndex: 504 Type: GRE, Link-level type: GRE, MTU: Unlimited, Speed: 800mbps Device flags : Present Running Interface flags: Point-To-Point SNMP-Traps Input rate : 0 bps (0 pps) Output rate : 0 bps (0 pps) Physical interface: bme0, Enabled, Physical link is Up Interface index: 64, SNMP ifIndex: 37 Type: Ethernet, Link-level type: Ethernet, MTU: 2000 Device flags : Present Running Link flags : None Current address: 02:00:00:00:00:0a, Hardware address: 02:00:00:00:00:0a Last flapped : Never Input packets : 0 Output packets: 4 ... </code></pre></div></div> <h1 id="getting-started-guides">Getting Started Guides</h1> <p>Now that you have a working Junos environment to play with, you might want some additional resources to help you explore what’s possible, and to help translate your existing experiences into Junos-land. Here are a a few super helpful (and free!) mini-books you can use. Only a free J-Net login is required, and you can download the PDF:</p> <ul> <li><a href="https://www.juniper.net/us/en/training/jnbooks/day-one/fundamentals-series/cli/">Exploring the Junos CLI</a></li> <li><a href="https://www.juniper.net/us/en/training/jnbooks/day-one/fundamentals-series/migrate-cisco-asa-srx-series/">Migrating from Cisco to Juniper Networks</a></li> <li><a href="https://www.juniper.net/us/en/training/jnbooks/day-one/fundamentals-series/junos-for-ios-engineers/">Junos for IOS Engineers</a></li> </ul> <p>I hope this was a helpful and simple guide to getting a working Junos environment to play with within a few minutes. It doesn’t make sense in today’s world to wait for lab gear to even get a basic experience with new software, and I hope this helps you kick start your learning. Happy labbing!</p> Mon, 23 Apr 2018 00:00:00 +0000 https://keepingitclassless.net/2018/04/get-started-junos-quickly-free/ https://keepingitclassless.net/2018/04/get-started-junos-quickly-free/ Unit Testing Junos with JSNAPy <p><a href="https://keepingitclassless.net/2016/03/test-driven-network-automation/">I’ve been passionate</a> about the idea of proactively testing network infrastructure for some time. I revived and added to these ideas in my <a href="https://keepingitclassless.net/2018/02/intentional-infrastructure/">last post</a>. In that post’s video, I lay out three types of network testing in my presentation:</p> <ol> <li><strong>Config-Centric</strong> - Verify my network is configured correctly</li> <li><strong>State-Centric</strong> - Verify the network has the operational state I expect</li> <li><strong>Application-Centric</strong> - Verify my applications can use the network in the way I expect</li> </ol> <p>In the same way a software developer might write tests in Python or Go that describe and effect desired behavior, the network engineer now has a growing set of tools they can use to make assertions about what “should be” and constantly be made aware of deviations. One of those tools popped up on my radar this week - <a href="https://github.com/Juniper/jsnapy"><code class="highlighter-rouge">jsnapy</code></a>.</p> <div style="text-align:center;"><a href="https://keepingitclassless.net/assets/2018/02/JSNAPy.png"><img src="https://keepingitclassless.net/assets/2018/02/JSNAPy.png" width="150" /></a></div> <h1 id="jsnapy">JSNAPy</h1> <p>JSNAPy describes itself as the python version of the Junos snapshot administrator. While this isn’t untrue, I think it’s a <strong>huge</strong> undersell. In my view, the assertions you can make on the data retrieved via these snapshots is where JSNAPy really shines. So in order to conceptually understand JSNAPy, I’d recommend you think of it as as a generic assertion engine for Junos, and the snapshots are an implementation detail that makes this possible.</p> <p>JSNAPy offers a syntax with a set of primitives for making assertions on things like whether or not a certain configuration stanza is present, making sure a certain number of routing protocol adjacencies are actually being seen, and more. The language used to describe these things is low level enough for you to get pretty granular with it.</p> <p>You can use JSNAPy in one of three ways:</p> <ul> <li>The actual <code class="highlighter-rouge">jsnapy</code> <a href="https://github.com/Juniper/jsnapy/wiki/3.-Command-Line-Tool">command-line tool</a></li> <li>JSNAPy’s underlying <a href="https://github.com/Juniper/jsnapy/wiki/4.-Module">Python API</a></li> <li>The <code class="highlighter-rouge">juniper_junos_jsnapy</code> Ansible module in <a href="https://github.com/Juniper/ansible-junos-stdlib">Juniper’s Ansible module collection</a></li> </ul> <p>For this post, we’ll stick with option #1 and run everything from the shell.</p> <p>JSNAPy implements its logic in two phases:</p> <ul> <li>Retrieve a snapshot of whatever data is required by each test. Could be configuration, could be operational state.</li> <li>Run a series of checks/tests that make assertions about what that we expect the data in those snapshots to look like.</li> </ul> <h2 id="config-and-testing-files">Config and Testing Files</h2> <blockquote> <p>JSNAPy is under active development, and I’m not entirely sure if the Python API, or the format of the YAML files I’ll be discussing should be considered “stable”. Review the <a href="https://github.com/Juniper/jsnapy/wiki">JSNAPy docs</a> for the most updated source for this information.</p> </blockquote> <p>JSNAPy uses a basic config file to tie everything together. It is here where you can list the hostnames and credentials for devices you wish to run <code class="highlighter-rouge">jsnapy</code> against, as well as references to separate files that contain your tests. All of the above are written in YAML.</p> <p>The following is a simple config file for connecting to a local vSRX instance. You can see reference to our separate tests file under the <code class="highlighter-rouge">tests</code> key:</p> <div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span> <span class="na">hosts</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">device</span><span class="pi">:</span> <span class="s">127.0.0.1</span> <span class="na">username </span><span class="pi">:</span> <span class="s">root</span> <span class="na">passwd</span><span class="pi">:</span> <span class="s">Juniper</span> <span class="na">port</span><span class="pi">:</span> <span class="s">2202</span> <span class="na">tests</span><span class="pi">:</span> <span class="pi">-</span> <span class="s">/Users/mierdin/Code/Juniper/nfd17-netverify-demo/jsnapytest.yaml</span> </code></pre></div></div> <p>Our testing file <code class="highlighter-rouge">jsnapytest.yaml</code> contains our test(s). In this case, I’ve constructed a single test named <code class="highlighter-rouge">test_applications</code>. We’ll explore the details of this later on in this post.</p> <div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span> <span class="na">test_applications</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">rpc</span><span class="pi">:</span> <span class="s">get-config</span> <span class="pi">-</span> <span class="na">item</span><span class="pi">:</span> <span class="na">id</span><span class="pi">:</span> <span class="s">./name</span> <span class="na">xpath</span><span class="pi">:</span> <span class="s1">'</span><span class="s">applications/application[name="k8sfrontend"]'</span> <span class="na">tests</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">is-equal</span><span class="pi">:</span> <span class="s">destination-port, 30589</span> <span class="na">info</span><span class="pi">:</span> <span class="s2">"</span><span class="s">Test</span><span class="nv"> </span><span class="s">Succeeded!!,</span><span class="nv"> </span><span class="s">destination-port</span><span class="nv"> </span><span class="s">is</span><span class="nv"> </span><span class="s">&lt;{{post['destination-port']}}&gt;"</span> <span class="na">err</span><span class="pi">:</span> <span class="s2">"</span><span class="s">Test</span><span class="nv"> </span><span class="s">Failed!!!,</span><span class="nv"> </span><span class="s">destination-port</span><span class="nv"> </span><span class="s">is</span><span class="nv"> </span><span class="s">&lt;{{post['destination-port']}}&gt;"</span> </code></pre></div></div> <p>In short, our config file contains all of the information we need for connectivity to Junos devices, and references to tests to run on those devices. We’ll explore these in detail in the following sections.</p> <h2 id="retrieve-snapshot">Retrieve Snapshot</h2> <p>As I mentioned before, the real value of JSNAPy is in the testing functionality, but it’s important to understand how <code class="highlighter-rouge">jsnapy</code> retrieves the data to be tested within snapshots, as it is on these snapshots that those assertions are performed.</p> <p>You may have noticed that our <code class="highlighter-rouge">test_applications</code> test has two main components. The first statement, <code class="highlighter-rouge">rpc: get-config</code> is very important, as it specifies how to retrieve the necessary data that our test will be making assertions on. We can use the <code class="highlighter-rouge">--snap</code> argument at the shell to instruct <code class="highlighter-rouge">jsnapy</code> to retrieve a snapshot and store it on disk:</p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>jsnapy <span class="nt">--snap</span> pre <span class="nt">-f</span> jsnapyconfig.yaml <span class="nt">-v</span> Tests Included : test_applications Taking snapshot of RPC: get-config </code></pre></div></div> <p>I have <code class="highlighter-rouge">jsnapy</code> set up in a virtual environment at <code class="highlighter-rouge">venv/</code> so I can easily find the resulting snapshot from the local directory:</p> <div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~$ cat venv/etc/jsnapy/snapshots/127.0.0.1_pre_get_config.xml <span class="nt">&lt;configuration</span> <span class="na">changed-seconds=</span><span class="s">"1519176621"</span> <span class="na">changed-localtime=</span><span class="s">"2018-02-21 01:30:21 UTC"</span><span class="nt">&gt;</span> <span class="nt">&lt;version&gt;</span>12.1X47-D15.4<span class="nt">&lt;/version&gt;</span> <span class="nt">&lt;system&gt;</span> <span class="nt">&lt;host-name&gt;</span>vsrx01<span class="nt">&lt;/host-name&gt;</span> .......... </code></pre></div></div> <p>As you can imagine, you can use any supported Junos XML RPC to retrieve data. Here’s the corresponding snapshot file from the rpc <code class="highlighter-rouge">get-interface-information</code>:</p> <div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~$ cat venv/etc/jsnapy/snapshots/127.0.0.1_pre_get_interface_information.xml <span class="nt">&lt;interface-information</span> <span class="na">style=</span><span class="s">"normal"</span><span class="nt">&gt;</span> <span class="nt">&lt;physical-interface&gt;</span> <span class="nt">&lt;name&gt;</span> ge-0/0/0 <span class="nt">&lt;/name&gt;</span> <span class="nt">&lt;admin-status</span> <span class="na">format=</span><span class="s">"Enabled"</span><span class="nt">&gt;</span> up <span class="nt">&lt;/admin-status&gt;</span> <span class="nt">&lt;oper-status&gt;</span> up <span class="nt">&lt;/oper-status&gt;</span> <span class="nt">&lt;/physical-interface&gt;</span> <span class="nt">&lt;physical-interface&gt;</span> <span class="nt">&lt;name&gt;</span> ge-0/0/0 <span class="nt">&lt;/name&gt;</span> <span class="nt">&lt;admin-status</span> <span class="na">format=</span><span class="s">"Enabled"</span><span class="nt">&gt;</span> up <span class="nt">&lt;/admin-status&gt;</span> <span class="nt">&lt;oper-status&gt;</span> up <span class="nt">&lt;/oper-status&gt;</span> ....... </code></pre></div></div> <p>If the data you’re looking for isn’t available via an RPC, you can still execute <code class="highlighter-rouge">show</code> commands and the snapshot will contain the resulting XML. For instance, instead of <code class="highlighter-rouge">rpc</code>, you’d specify <code class="highlighter-rouge">command</code>, followed by the command to issue:</p> <div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># BGP neighbors actually are available via RPC, but this will suffice as an example</span> <span class="na">test_bgp_neighbor</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">command</span><span class="pi">:</span> <span class="s">show bgp neighbor</span> </code></pre></div></div> <p>You can also use the <code class="highlighter-rouge">--diff</code> flag to compare two snapshots. Let’s say we run another snapshot, call it <code class="highlighter-rouge">post</code>, and then run a diff against the two (click to zoom):</p> <div style="text-align:center;"><a href="https://keepingitclassless.net/assets/2018/02/diffcheck.png"><img src="https://keepingitclassless.net/assets/2018/02/diffcheck.png" width="800" /></a></div> <p>The bottom line is, in order to test our network devices, we need a way to describe how to get the information needed to run these tests. These snapshots provide this for us.</p> <h2 id="checks">Checks</h2> <p>While the main messaging around JSNAPy tends to focus on snapshots, I feel that the ability to make assertions on the data in these snapshots is where the true value really lies. I’ve been a strong advocate of “<a href="https://keepingitclassless.net/2016/03/test-driven-network-automation/">test-driven network automation</a>” for a while, and this concept can take place in many forms. One of these forms is the ability to run detailed and specific tests on your network devices.</p> <p>It’s also worth noting that while JSNAPy is a great tool to enable this, Junos definitely meets us halfway here, since everything in Junos can be represented in XML. As a result of this, anything retrieved in the aforementioned snapshots is available to have assertions made on them, using the variety of generic primitives offered in JSNAPy. These range from checking to ensure a certain number of elements are seen, to ensuring a certain value within one of those elements is equal to a certain value.</p> <p>For instance, the previous example inspects my vSRX configuration for a custom application definition called <code class="highlighter-rouge">k8sfrontend</code>, and ensures that this application has the correct <code class="highlighter-rouge">destination-port</code> field value set:</p> <div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span> <span class="na">test_applications</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">rpc</span><span class="pi">:</span> <span class="s">get-config</span> <span class="pi">-</span> <span class="na">item</span><span class="pi">:</span> <span class="na">id</span><span class="pi">:</span> <span class="s">./name</span> <span class="na">xpath</span><span class="pi">:</span> <span class="s1">'</span><span class="s">applications/application[name="k8sfrontend"]'</span> <span class="na">tests</span><span class="pi">:</span> <span class="pi">-</span> <span class="na">is-equal</span><span class="pi">:</span> <span class="s">destination-port, 30589</span> <span class="na">info</span><span class="pi">:</span> <span class="s2">"</span><span class="s">Test</span><span class="nv"> </span><span class="s">Succeeded!!,</span><span class="nv"> </span><span class="s">destination-port</span><span class="nv"> </span><span class="s">is</span><span class="nv"> </span><span class="s">&lt;{{post['destination-port']}}&gt;"</span> <span class="na">err</span><span class="pi">:</span> <span class="s2">"</span><span class="s">Test</span><span class="nv"> </span><span class="s">Failed!!!,</span><span class="nv"> </span><span class="s">destination-port</span><span class="nv"> </span><span class="s">is</span><span class="nv"> </span><span class="s">&lt;{{post['destination-port']}}&gt;"</span> </code></pre></div></div> <p>In my recent <a href="https://www.youtube.com/watch?v=pHwkwjd2WtQ">Network Field Day</a> demo, I did a lot of this check logic myself in Python. With this format, I can succinctly declare the checks that are important to me, and the YAML format is more accessible to non-programmers.</p> <blockquote> <p>It’s also worth noting that because this is all defined in YAML text files, we can iterate on these tests in the same way that software developers improve their own testing without using a “full-blown language” like Python. So we get the (relative) ease of use of YAML with the same benefits of source code (plain text in git repos). This is a huge part of “infrastructure as code”.</p> </blockquote> <p>There are a variety of ways to run these tests - such as on existing snapshots - but I find that the most useful way to run these checks is to use the <code class="highlighter-rouge">--snapcheck</code> flag. This simply takes a quick snapshot of whatever’s needed by the test definitions, and immediately runs those tests. You don’t need to mess with the snapshot name in this case, just pass in the config file that contains references to your tests, and they’ll be run on this “just in time” snapshot:</p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>jsnapy <span class="nt">--snapcheck</span> <span class="nt">-f</span> jsnapyconfig.yaml <span class="nt">-v</span> Tests Included : test_applications Taking snapshot of RPC: get-config <span class="k">*****************************</span> Device: 127.0.0.1 <span class="k">*****************************</span> Tests Included: test_applications <span class="k">*************************</span>RPC is get-config<span class="k">*************************</span> <span class="nt">----------------------Performing</span> is-equal Test Operation---------------------- Test Succeeded!!, destination-port is &lt;30589&gt; PASS | All <span class="s2">"destination-port"</span> is equal to <span class="s2">"30589"</span> <span class="o">[</span> 1 matched <span class="o">]</span> <span class="nt">-------------------------------</span> Final Result!! <span class="nt">-------------------------------</span> test_applications : Passed Total No of tests passed: 1 Total No of tests failed: 0 Overall Tests passed!!! </code></pre></div></div> <p>If you look at the YAML testing file, you can infer what’s going on here. First, the <code class="highlighter-rouge">id</code> and <code class="highlighter-rouge">xpath</code> attributes allow us to specify a location of interest to us in the snapshot retrieved by the <code class="highlighter-rouge">get-config</code> RPC we specified. Then, under <code class="highlighter-rouge">tests</code>, we specify assertions we wish to make on that particular portion of the data. In this case, I’m using <code class="highlighter-rouge">is-equal</code> as a way of saying “I expect that the <code class="highlighter-rouge">destination-port</code> attribute is equal to <code class="highlighter-rouge">30589</code>. You may also notice this is a YAML list, meaning we can add as many tests as we want for this particular snapshot. We can also define a totally separate test with it’s own command or RPC if we want to define tests on some other dataset.</p> <p>Just like there are multiple options for retrieving snapshots (i.e. <code class="highlighter-rouge">rpc</code> or <code class="highlighter-rouge">command</code>), there are multiple test operators you can use, not just <code class="highlighter-rouge">is-equal</code>. See the <a href="https://github.com/Juniper/jsnapy/wiki#supported-test-operators-and-their-description">JSNAPy</a> documentation for a complete list.</p> <blockquote> <p>The documentation does a pretty good job of listing the possible verbs you can use, but you may also consider looking at the <a href="https://github.com/Juniper/jsnapy/blob/master/samples/">samples directory</a> for some examples for test cases and config files.</p> </blockquote> <p>Using these, you can write really granular tests to focus on specific portions of the configuration or operational state, and make assertions relevant to that portion. This is way more detailed than simple <a href="https://github.com/StackStorm-Exchange/stackstorm-napalm/blob/master/actions/check_consistency.meta.yaml">WISB vs WIRI comparisons</a> (though those are important too), where you might have a “golden config” that you use to run a “diff” against what’s actually on the device. This way, you can know which of your tests fail or succeed, and take appropriate action accordingly. A deviation in your BGP state will probably result in a wildly different remediation action than a deviation in your security policies.</p> <h1 id="conclusion">Conclusion</h1> <p>Any mature network automation deployment is going to require dedicated focus on testing and validation at every layer of the stack. JSNAPy is proving to be a very useful tool for writing detailed test cases for Junos devices, whether you’re looking to validate configuration or operational state. It should be noted that - like everything else - JSNAPy is not a silver bullet. It is a piece of a much larger picture. There’s still room for multivendor validation with <a href="http://napalm.readthedocs.io/en/latest/validate/">tools like NAPALM</a>, or application-level testing with <a href="http://todd.readthedocs.io/en/latest/concepts.html">ToDD</a>. However, if you have Juniper gear in your infrastructure and you’re looking for a network validation tool that jives well with infrastructure-as-code practices, JSNAPy is worth a look.</p> Tue, 27 Feb 2018 00:00:00 +0000 https://keepingitclassless.net/2018/02/unit-testing-junos-jsnapy/ https://keepingitclassless.net/2018/02/unit-testing-junos-jsnapy/ Intentional Infrastructure <p>I gave a presentation at the recent <a href="http://techfieldday.com/event/nfd17/">Network Field Day 17</a> (on my 3rd day working for Juniper). My main goal for this presentation was just to get people excited about building stuff.</p> <div style="text-align:center;"><iframe width="560" height="315" src="https://www.youtube.com/embed/pHwkwjd2WtQ" frameborder="0" allowfullscreen=""></iframe></div> <p>We tend to focus on vendor-provided solutions in this industry, and there’s a lot of good reasons for that, but it’s also good to stay sharp and be able to build your own solution to fill gaps where necessary. One reason I joined Juniper is that much of what we offer is built on a highly programmable foundation. So you get the best of both worlds - high-level products to solve the hard problems, but you still have the ability to insert your own custom tooling at various points in the stack.</p> <p>In the above video, I outlined a simple <a href="https://github.com/Mierdin/nfd17-netverify-demo">Github-available demo</a> for applying policies to a vSRX based on the existing services running in Kubernetes, and then verifying those policies are actually working by again using Kubernetes to determine what applications should be available.</p> <blockquote> <p><a href="https://github.com/Mierdin/nfd17-netverify-demo">My demo</a> is designed to be self-sufficient, meaning you should be able to follow the README and get a working demo. Feel free to watch the above video first for context, then follow along on that repo to get it working yourself.</p> </blockquote> <p>All of this was done in the context of discussing what “intent-driven” means to me, and I thought it important to summarize those thoughts here.</p> <h1 id="what-is-intent-all-about">What is Intent All About?</h1> <p>I think it’s safe to say we’re mostly past the Software-Defined Networking (SDN) hype cycle. And after the SDNocalypse, in retrospect, I think SDN didn’t quite have the same direct impact many thought it would based on the hype. After the dust settled, the real impact was felt in the wildly different way we were talking about networking, and this was a very real impact.</p> <p>Similarly, “intent-driven” is in full hype cycle mode right now, and it’s easy (and prudent) to be skeptical about the whole thing. Again, however, once the dust settles, there will be very real lessons learned from all this, and I’d like to spend a little time talking about what I think this will (or at least should) be.</p> <p>Just like what happened with SDN before it, the whole “intent-driven networking” thing has become all about network engineers; a.k.a. “What is the intent of the network engineer”? I’ve found most of these analogies to be fairly weak. Usually the examples provided are something along the lines of “I intend for the network to stay up and running”. Well…..duh? I think everyone wants that. Let’s not conflate “intent” with “competent operations”. There are a number of tables stakes that must be accepted before we can move forward, and network reliability is one of them.</p> <p>To me, the interesting intent - the thing that goes above and beyond “competent operations” - is on the applications side. Automation isn’t just about configuring network boxes, it’s about providing services to what’s using the infrastructure - the applications. And no, this isn’t just about the datacenter. Applications use your branch office networking and backbone just as much as the datacenter network. So we have to start thinking about the useful intent in these terms: “What is it that the applications are expecting from my network?” - not just as a transport, but also in terms of network services.</p> <div style="text-align:center;"><a href="https://keepingitclassless.net/assets/2018/02/makeitso.jpg"><img src="https://keepingitclassless.net/assets/2018/02/makeitso.jpg" width="500" /></a></div> <p>In 2018, we actually have it pretty good. The cloud-native wave has made application intent <strong>way</strong> more accessible, as applications deployed to platforms like Kubernetes are much more self-describing. Kubernetes offers its users primitives for declaratively describing what their applications need, and the underlying infrastructure “makes it so”. Most Kubernetes users think of “infrastructure” as the compute node the kubelet is running on, or the virtual network between these nodes, etc - but it doesn’t have to be limited to this. We can use this same source of truth to proactively enforce policies elsewhere. This original intent is an API call away in many cases. So we simply need to go get it.</p> <h1 id="proactively-seeking-out-intent">Proactively Seeking Out Intent</h1> <p>Let’s talk about what it might take to design and build a bridge. I’m no expert, but I think it’s fair to say that for most drivers, you just drive across it. You don’t need to know the materials the bridge is made of, or call ahead to let the bridge people know you’re thinking about driving across it, you just use the bridge. The engineers and architects that built the bridge recognize that this is the desired experience, and they take on the responsibility of researching and understanding the expected traffic patterns and types of vehicles that will use the bridge. They build a bridge that meets those requirements. They maintain the bridge over the long-term to ensure that this bridge continues to operate as desired.</p> <div style="text-align:center;"><a href="https://keepingitclassless.net/assets/2018/02/bridge.jpg"><img src="https://keepingitclassless.net/assets/2018/02/bridge.jpg" width="500" /></a></div> <p>At <strong>no point</strong> do these engineers need to have an ongoing conversation with the average driver. They know that they’re relied on to provide infrastructure, so they proactively go out and get the information they need to provide this service.</p> <p>Similarly, AWS doesn’t make you, developers or SREs configure their network switches. AWS offers primitives and APIs for describing what you want, and all of their underlying infrastructure comes up to meet that requirement. Similarly, if you’re running your own infrastructure, this is “the new normal”.</p> <p>It’s no longer sufficient to make the network the center of the universe; it has to be 100% all about the applications, and we need to begin to focus on making the network an accessible service in order to even stay on par with what cloud services are providing. It doesn’t matter that devs don’t know how to subnet, or how OSPF works. They shouldn’t. The network can and should stop being such a black box, and the only way this will happen is if the intent of the application is proactively sought out.</p> <h1 id="conclusion">Conclusion</h1> <p>Infrastructure operators are going to need to step up and reach outside their domain for this intent, and change their processes to make this the new center of the universe. Network automation in any form cannot become all about making the life of the network engineer easier. Yes, this is a very attractive and real benefit of automation, but it cannot be the end goal. It <strong>must</strong> be all about making the network more responsive to the applications that rely on it, and in order to do this effectively, automation workflows must interact with technology outside of networking.</p> Sun, 04 Feb 2018 00:00:00 +0000 https://keepingitclassless.net/2018/02/intentional-infrastructure/ https://keepingitclassless.net/2018/02/intentional-infrastructure/ New Role, Same Goal <p>I recently gave a <a href="https://vimeo.com/252900298">presentation at Network Field Day 17</a> wherein I announced that not only was I about to give probably the most compressed talk of my life (time constraints are unforgiving) but that I also was now working for Juniper. Until today, this was pretty much the most explanation I had time to give:</p> <div style="text-align:center;"><a href="https://keepingitclassless.net/assets/2018/01/short_form.png"><img src="https://keepingitclassless.net/assets/2018/01/short_form.png" width="500" /></a></div> <p>I decided to accept a position with Juniper over the 2017 holiday, and I started last week. There were a few reasons for moving on from the StackStorm team, some of which are personal and have nothing to do with either day job. Despite the move, all of these things are still true:</p> <ul> <li>StackStorm is and continues to be an awesome project. Regular updates are happening <a href="https://stackstorm.com/2018/01/25/new-year-new-stackstorm-v2-6-released/">all the time</a>, each full of tons of new features and fixes.</li> <li>The StackStorm team and Extreme Networks as a whole are some of my favorite people ever. I will never forget everything I learned from them, and will try my best to stay in contact with all of them.</li> <li>The concepts behind StackStorm, such as infrastructure-as-code, and autonomous response to events, are still top-of-mind for me. I still strongly believe that each of these concepts are very valuable to any IT professional</li> </ul> <p>In short, I was presented with a great opportunity at Juniper, and unlike most job changes, that’s really all this is about. I look forward to continuing to stay involved with the StackStorm project in whatever capacity I can.</p> <h1 id="same-or-at-least-similar-goals">Same (or at least similar) Goals</h1> <p>I’ve been focused on software development for three years straight now. This was the right move for me when I started this journey. My technical focus at the time just wasn’t doing it for me - I was working only on technology that someone else built, and I felt like little more than a power user. Jumping full-time into software helped me not only move past this, but also helped me gain experience and insight into the life of a developer.</p> <p>While I’m certainly not reverting to my prior focus, my exclusive focus on writing features and fixing bugs just isn’t sufficient on its own. Ever since I started in my career, I’ve enjoyed not only working on technical topics, but also talking and writing about them. I get a lot of enjoyment and energy from of helping folks understand a new topic.</p> <p>Unfortunately, the more I focus exclusively on writing code, the less I feel I have energy to dedicate to sharing about them. You have probably noticed that my blog output for the past three years has been way down. These two career shifts for me have taught me that not only do I need to work on modern technical topics, I also need to be able to share them in order for me to really enjoy what I’m working on.</p> <p>To help enforce this dual focus, my new role is actually within the marketing organization at Juniper. This is a new challenge for me, and while I’m certainly looking forward to getting more involved on the business and marketing side of things, it’s not in anyone’s interest for my technical capabilities to wane. So this will not be “marketing as usual” from my perspective. I will continue to write technical blog posts, contributing to open source, and researching new topics. The only difference is that I’ll be talking about it a lot more.</p> <p>I believe my time in engineering actually prepared me well for the new role. I’ll admit to falling prey to the assumption that marketing is all about going ‘rah rah’ about a product or company regardless of how it actually works. Unfortunately, I’m sure you’ll find pockets where this is true, but in general, it’s way more nuanced than this.</p> <p>My time in engineering has taught me a few things, one of which is that everything has tradeoffs. There is no “perfect solution”, there is “cause and effect”. You want to optimize for a certain problem space - so you pick solutions that trade things off that you don’t care about for things you do care about. One out of several responsibilities I’ll have in my new role is to adequately describe the problem space in which a certain product excels - and in my opinion just as importantly - where it doesn’t. Spending time in engineering, keeping the concept of tradeoffs at the top of my mind, has actually prepared me well for taking on this new challenge.</p> <p>So while the application of my goals is changing, my goal is still to learn about new technical topics and share it with the community. Even if they’re mixed with silly puns:</p> <div style="text-align:center;"><a href="https://keepingitclassless.net/assets/2018/01/nfd_talk.png"><img src="https://keepingitclassless.net/assets/2018/01/nfd_talk.png" width="500" /></a></div> <p>I’m excited for what’s in store in the next few years, and hopefully this will help me to be more involved again with the community that has been so helpful in my career journey. There will always be room for more content about tools and tech, but increasingly, the changing role of the network engineer is one of the most important topics to dive into. So while my personal career goals might be changing, my goal of helping engineers evolve their skillsets has not.</p> Mon, 29 Jan 2018 00:00:00 +0000 https://keepingitclassless.net/2018/01/new-role-same-goal/ https://keepingitclassless.net/2018/01/new-role-same-goal/ A Guide to Open Source for IT Practitioners <p>It’s easy to see that open source is changing the way people think about infrastructure. However, as the saying goes: “The future is here, it’s just not evenly distributed”. As is normal, there will always be pockets of IT where active involvement in open source will just take some more time.</p> <p>I’ve worked on open source for a few years now, and I have always wanted to publish a post that focuses on a few key ideas that I wish I could tell every new entrant into the world of open source. I feel like going in with the right expectations can really help any efforts here go much more smoothly. So if you’re accustomed to getting most if not all of your technology stack from a vendor, and you’re wondering about the open source craze, and trying to make sense of it all, this is for you. My goal with this post is to empower you to start getting out there and exploring the various communities behind the projects you may already have your eyes on.</p> <h1 id="open-source-is-free-as-in-puppy">Open Source is “Free as in Puppy”</h1> <p>Before some practical tips, I want to spend some time on expectations. This is crucially important when it comes to considering open source software for use in your own infrastructure. Obviously, one of the famous benefits of open source is that you usually don’t need to buy anything to get it. It’s “free”, right?</p> <p>Open source isn’t just about getting free stuff; for enterprise IT, it’s an opportunity to change the paradigm from getting direction from a 3rd party, to being able to set the direction. Everything in technology is based on tradeoffs: “I am willing to give up X to get Y”. While it’s true that you may not have to pay a license fee to use open source software, like you did with vendor-provided solutions, it’s almost certain that some assembly will be required, if not long-term maintenance of the system. There is a financial cost to having your IT staff do this. Even if it’s just a small tool to address a niche use case in your environment, it’s something you’re still on the hook for owning.</p> <blockquote> <p>It is for this reason I always like to highlight the difference between “product” and “project”. There’s a lot of work that goes on behind the scenes of many vendor-provided products that most open source projects don’t worry about (and rightfully so).</p> </blockquote> <p>To help mitigate risks in this tradeoff, any major shift to open source will/should include additional headcount. This can include devs to help contribute needed features and bugfixes, but it could also include ops folks to learn it, and keep it running, just like any other piece of infrastructure. I run into all kinds of folks that encounter the inevitable “wrinkles” present in any open source project (even well-funded, corporate-backed ones) and are frustrated it’s not totally turnkey, and bug-free. Most open source projects, in my experience, aren’t trying to be turnkey in the same way we’ve been conditioned with legacy IT vendors. They try to fill a part of the stack, and expect that their community will take the project and piece it together with other components to make a system. So don’t try to half-ass this - if you feel open source is right for a component of your infrastructure, invest in your people and do it right. This is why open source isn’t “free” in the financial sense - your people fulfill some of the role that was previously fulfilled by your vendor support contracts.</p> <p>In my opinion, open source is all about control. You’re trading off a little bit of that vendor comfort in exchange for enhanced control over the part of your infrastructure where you’re leveraging open source. Open source is a tool to leverage where this additional control gives you a competitive edge, or in some cases, to replace a costly IT system that is <strong>not</strong> giving you that edge, so you wish to move to commodity. In short, <strong>participating in open source isn’t an all-or-nothing proposition</strong> - identify areas where internalizing this control might help you gain an edge, and focus there.</p> <h1 id="if-you-want-something-say-something">If You Want Something, Say Something</h1> <p>Enterprise IT companies have conditioned us to get the vast majority of our technical solutions from behind closed doors. We’re usually forced to adjust to the common-denominator functionality that a particular product or solution provides for an entire set of verticals, and very rarely do we get to significantly influence the direction of a product.</p> <p>However, open source gives us a unique opportunity to really take an <strong>active</strong> role in the direction of a project. Note that I emphasized the word “active” - this was intentional. An unfortunately large number of times, I’ve encountered technology professionals who, for whatever reason, choose to watch a project from afar, and not proactively engage with a project. Don’t do this! Understand your use case, and communicate it proactively.</p> <p>If you “drive by” an open source project on Github - maybe dismissing it because it doesn’t have the nerd knob you think you need - you might be leaving a good solution on the table. Or maybe you don’t think you know enough to jump in - I talk to so many folks that are accustomed to using vendor-provided closed-source solutions exclusively, who feel that they don’t have the “right” or “cred” to post an issue and explain their request or use case.</p> <p>This couldn’t be further from the truth! The vast majority of maintainers absolutely love helping new users and getting outside perspective on use cases. You have much more direct power to influence an open source project - especially smaller tools or libraries - but it does require active, not passive participation. So if you want something, say something. Doing the “drive by” cheats you out of a potential solution, and the maintainers out of a new perspective they wouldn’t otherwise have.</p> <p>So to the more practical - how do we do this? Well of course, each open source project is different, but for this post we’re going to focus on Github. It’s generally become the “common ground” for the majority of open source projects today. So, while you will undoubtedly encounter projects that use other tools, even in addition to Github, focusing on this workfow will serve you well for starters. In Github, a “repo” is a place where a project’s code, docs, scripts, etc are stored. This repo might be nested underneath a specific user, or under a separate organization.</p> <p>In Github, the best place to go to provide feedback is to create an Issue. Projects that allow this (most do) will have an “issues” tab right on the Github repo. For instance, <a href="https://github.com/toddproject/todd">the ToDD project</a>:</p> <div style="text-align:center;"><a href="https://keepingitclassless.net/assets/2017/12/new_issue.png"><img src="https://keepingitclassless.net/assets/2017/12/new_issue.png" width="500" /></a></div> <p>You can peruse the list of existing issues, or use the green “New issue” button to the right. Doing so will open a new form for filling out the title and body of the issue you want to raise with the maintainers:</p> <div style="text-align:center;"><a href="https://keepingitclassless.net/assets/2017/12/creating_issue.png"><img src="https://keepingitclassless.net/assets/2017/12/creating_issue.png" width="500" /></a></div> <blockquote> <p>Note that markdown is supported in the text body. Use this extensively, especially using backticks and triple backticks (`) for readable log messages or code snippets and the like. Those reading your issue will thank you for it.</p> </blockquote> <p>There are a few things you should do before you open an issue on any project:</p> <ul> <li><strong>Go with the flow</strong> - Get a sense for how the project runs. Many projects will have a <a href="https://github.com/toddproject/todd/blob/master/CONTRIBUTING.md"><code class="highlighter-rouge">CONTRIBUTING.md</code></a> file in the root of their repository which should contain all kinds of useful information for newcomers, including how to contribute code and create issues. Consider this the README for contributing - go here first.</li> <li><strong>Do some research</strong> - Do some googling, read the project docs, and do a search on the repo for existing issues (both open and closed) to see if the issue you’re about to raise has already been addressed. If you’re encountering an issue, there’s a good chance that someone else did too, and the answer you need might be in a previous issue, or in the documentation. It saves you time by getting the answer without waiting for someone to respond, and it doesn’t require a maintainer to burn cycles sending you back to the docs anyways.</li> <li><strong>Bring data</strong> - Do your due diligence around gathering logs and error context - everything the maintainers might need to track down the root cause of an issue. Note that the <code class="highlighter-rouge">CONTRIBUTING.md</code> file (as well as potentially an <a href="https://github.com/blog/2111-issue-and-pull-request-templates">issue or PR template</a>) will usually enumerate the details they’ll have to ask you for anyways, so it’s good to have this going in, so you can jump right into fixing the problem, rather than going back and forth for a few days just on data gathering.</li> </ul> <p>Here’s what <strong>TO</strong> open an issue for:</p> <ul> <li><strong>Asking for help</strong> - You can use issues to ask for help with certain conditions. The docs and previous issues exist for a reason, so don’t open an issue for help unless you have already followed my previous advice and have already exhausted existing resources. Assuming you’ve done this, this is a great way for maintainers to identify blind spots in their docs, so be ready to elaborate on what you’re looking for so that they can add to their documentation.</li> <li><strong>Bug reports</strong> - If you suspect a certain behavior is a bug, make sure you capture relevant data, and present it openly. It may not be a bug, so be prepared for that.</li> <li><strong>Feature requests</strong> - Focus on adequately describing your use case, rather than jump to suggesting a solution. Those more familiar with the project will give their perspective on the appropriate solution to match your use case.</li> </ul> <blockquote> <p>The Github UI has a few interesting tools to the right, such as labels and assignments for an issue. In general, stay away from using these - the maintainers will typically have their own triage process, and will assign resources and labels when appropriate.</p> </blockquote> <p>Here’s what <strong>NOT</strong> to open an issue for:</p> <ul> <li><strong>Opinions (negative or otherwise)</strong> - Issues should generally be actionable, and able to be closed via a PR. There are times when issues are an appropriate venue for long-form discussion, but be sure this applies to the project in question before using Issues in this way. Most projects have other communication methods for open-ended discussions, like IRC or Slack, and you should be prepared to participate there as well. Usually such resources can be found in the <code class="highlighter-rouge">CONTRIBUTING.md</code> file or sometimes the <code class="highlighter-rouge">README.md</code> file.</li> </ul> <p>Assuming you’ve followed the previous points, you may get the response you were hoping for. Or, you may get a response you didn’t expect, such as:</p> <ol> <li>“That doesn’t really fit with the project, so the answer is no”</li> <li>“We like the idea but don’t have cycles to work on this ourselves, so feel free to open a PR”</li> <li>“You may be going about this the wrong way, here’s another approach you may not have considered.”</li> </ol> <p>You should be ready for any of these. It’s all part of the flow. Open source tends to be very much about code, about results, not about giving one particular user their way at the expense of the direction of the project - so be ready to have your perspective changed. Make your case based on the data you have, but be prepared to receive new information that might make things different for you.</p> <p>This is a blessing and a curse - it requires a bit more mental work, but this is all <strong>very</strong> different from the traditional vendor-led technology discussions, which most customers aren’t able to participate in, certainly not to this level of depth.</p> <h1 id="contribute-back">Contribute Back</h1> <p>If you follow my advice, and staff your team appropriately, this won’t be hard. Just simply by operating the software, you’ll inevitably start finding your own bugs, or even fixing them. Or maybe you’re just trying to get your feet wet - most repos have a backlog of Issues like bug reports and the like, and can serve as a great source of inspiration for making some of your first contributions to the project.</p> <p>Easily one of, if not the most valuable technical skills you can have for contributing to open source is understanding <a href="https://git-scm.com/">Git</a>. Git is a distributed version control system in use by the biggest open source projects in existence, including the Linux kernel itself. It has become the “lingua franca” of contributing to open source. There are <a href="https://try.github.io/levels/1/challenges/1">numerous tutorials</a> out there for this as a result. For getting started with open source, you should know the basics. Now how to work with a repo, such as clone, push/pull, add/commit, etc. You should understand what branching does.</p> <blockquote> <p>Shameless plug: we have a whole chapter dedicated to version control - almost totally focused on Git in the hopefully-soon-to-be-released <a href="http://shop.oreilly.com/product/0636920042082.do">Network Programmability and Automation</a> book.</p> </blockquote> <p>As mentioned before, Github is a popular platform for collaborating over open source software. Github is one of the most popular SaaS platforms for publicly hosting source code, and as the name implies, it’s built around Git. So, in addition to knowing Git fundamentals, we should also understand how to continue on and use these fundamentals to interact with the Github workflow.</p> <p>The general workflow for contributing to a repo is via a “Pull Request”. In effect, this is a way of saying “Hey maintainer - I’ve made this change in my own version of your repository, could you please <strong>pull</strong> it into the main one, so that it’s part of the software going forward?</p> <p>Each Github repository has a “fork” button near the top. This is just a handy way of making a copy of a given repo that you can make changes to directly. Once you’ve done this, you can then open a PR to “sync” the two copies back up.</p> <div style="text-align:center;"><a href="https://keepingitclassless.net/assets/2017/12/fork.png"><img src="https://keepingitclassless.net/assets/2017/12/fork.png" width="500" /></a></div> <p>A <strong>highly</strong> abbreviated list of steps for this workflow is below:</p> <ol> <li>Fork the repo you intend to contribute to. It will ask you where you want to make the copy - doing this under your username is fine.</li> <li>Use Git to interact with your fork/copy of the repo. Clone it to your local system, and make the changes. Use <code class="highlighter-rouge">git add</code> and <code class="highlighter-rouge">git commit</code> commands for this. Then use <code class="highlighter-rouge">git push</code> to push those commits to your fork.</li> <li>Github has a really cool feature for detecting when you’ve recently pushed changes to your fork, so if you go to the main repo within a few minutes, it should prompt you to create a PR. If not, you can to to the “Pull Requests” tab to select the target repo/branch and create a PR from there.</li> </ol> <p>Once this is done and the PR is completed, you’ll probably get asked some additional questions, and maybe some additional commits will be required. This is normal - just part of the process. After this process, the maintainers may “approve” the PR, and/or merge it into the target branch (i.e. <code class="highlighter-rouge">master</code>).</p> <p>Try to focus on small, frequent PRs, rather than infrequent, huge ones - especially when getting started. You don’t want to spend 3 weeks on a massive change, only to get feedback that it’s not desired or wanted after all that hard work. Seek feedback before doing a ton of work. You also don’t need to be “finished” with your change to open a PR. It’s not uncommon to make a small change to prototype something, and open a PR before you’re sure it’s a valid approach or before you’ve written tests for the change, all for the purpose of gathering feedback before spending more time on it. Usually projects will have a “WIP” label, or you can just say that you’re not quite finished in the PR description. This is usually not only acceptable, but expected and appreciated.</p> <p>Some tips for contributing to a project:</p> <ul> <li><strong>Work from the public project</strong> - Don’t fork off permanently and make all your changes privately, behind your firewall. Your bugfixes or enhancements to an open source project are almost certainly not core to your organization’s value proposition. Don’t hoard these and try to maintain your own fork. Just make everything public. There’s no reason to keep most things private, and it will only help to increase your personal value, as you’ll have public contributions to refer to.</li> <li><strong>Start small</strong> - Most project maintainers welcome PRs, but there’s some relationship building that will go a long way here. Frequent, small PRs as opposed to huge, difficult to review PRs will help the maintainer learn your skills and style, and gain confidence you know what you’re doing. It will also help get your contributions merged in a timely fashion.</li> <li><strong>Commit early, and often</strong> - Try to keep changes succinct, and don’t be afraid to push your changes early and seek feedback on them, even if you’re not finished. Most projects appreciate you marking PRs with “WIP” or something like that to indicate this.</li> </ul> <p>Finally, any of the responses that I mentioned in the previous section about Issues are also possible with PRs. Be prepared to defend the changes you’ve made, or change your mind about the approach. Again, most maintainers are just trying to keep the project moving forward, and they have a lot of experience with the project, and will help guide you to a solution that works for everyone. Be flexible. Again, smaller PRs will help prevent ugly situations where you’ve silently worked on a PR for 3 weeks but get “shut down” because it wasn’t needed/wanted. Like most things, it’s all about proactive communication.</p> <h1 id="open-source-is-people">Open Source is People!</h1> <p>There are generally two types of open source projects:</p> <ul> <li>Small, individual-led projects that are created out of passion to solve a particular problem</li> <li>Medium-to-large projects that have corporate backing, usually as a strategic initiative.</li> </ul> <p>In both cases, every open source project is powered by people like you and I. Even people that are paid to work on a project, usually are doing so because they are passionate about the open source community and are driven by a desire to help other technology professionals. Working in open source carries its own set of challenges, so usually they’re not in it to be supreme overlords to cut you down, but rather interested in fostering a community of diverse perspectives, including yours.</p> <p>Software development, including open source, tends to give some folks a culture shock at first, since it’s so much about code, and about working solutions. There’s no room for hyperbole, it either works or it doesn’t. So if you’re not accustomed to this culture, know that the person on the other side of a seemingly bad PR review isn’t “out to get you”. Most of the time they’re just being factual. Try to learn what they’re trying to teach you, and remain open to new ways of doing things.</p> <p>So just remember, there are human beings on the other side of the screen, and while it’s sadly true that there are always bad apples present in all areas of technology, the vast majority just want to build something cool, and work with smart people that give a shit about what they’re doing. By going out of your way to contribute to open source, you’re proving you fit this description, so just focus on jiving with the project and you’ll do fine.</p> <h1 id="conclusion">Conclusion</h1> <p>If I could sum this post up with one bit of advice, it’s this: stop sitting on the sidelines, and jump in. Regardless of your background, and regardless of your type of contribution, adding open source to your resume is a huge deal these days. You don’t have to pay to participate; you don’t even have to know how to write code in most cases - many projects will glady accept docs improvements and the like. There’s really no excuse for not getting started.</p> <p>I hope you all have a Merry Christmas, and a great holiday season overall. Spend time with your families, and when there’s a little downtime (maybe when your family is napping from all the delicious food), consider poking around Github and getting involved with a project.</p> Wed, 20 Dec 2017 00:00:00 +0000 https://keepingitclassless.net/2017/12/a-guide-open-source-it-practitioners/ https://keepingitclassless.net/2017/12/a-guide-open-source-it-practitioners/ StackStorm Architecture Part I - StackStorm Core Services <p>A while ago, I wrote about <a href="https://keepingitclassless.net/2016/12/introduction-to-stackstorm/">basic concepts in StackStorm</a>. Since then I’ve been knee-deep in the code, fixing bugs and creating new features, and I’ve learned a lot about how StackStorm is put together.</p> <p>In this series, I’d like to spend some time exploring the StackStorm architecture. What subcomponents make up StackStorm? How do they interact? How can we scale StackStorm? These are all questions that come up from time to time in the StackStorm community, and there are a lot of little details that I even forget from time-to-time. I’ll be doing this in a series of posts, so we can explore a particular topic in detail without getting overwhelmed.</p> <p>Also, it’s worth noting that this isn’t intended to be an exhaustive reference for StackStorm’s architecture. The best place for that is still the <a href="https://docs.stackstorm.com/">StackStorm documentation</a>. My goal in this series is merely to give a little bit of additional insight into StackStorm’s inner workings, and hopefully get those curiosity juices flowing. There will be some code references, some systems-level insight, probably both.</p> <blockquote> <p>Also note that this is a <em>living document</em>. This is an open source project under active development, and while I will try to keep specific references to a minimum, it’s possible that some of the information below will become outdated. Feel free to comment and let me know, and I’ll update things as necessary.</p> </blockquote> <p>Here are some useful links to follow along - this post mainly focuses on the content there, and elaborates:</p> <ul> <li><a href="https://docs.stackstorm.com/install/overview.html">High-Level Overview</a></li> <li><a href="https://docs.stackstorm.com/reference/ha.html">StackStorm High-Availability Deployment Guide</a></li> <li><a href="https://docs.stackstorm.com/development/code_structure.html">Code Structure for Various Components in “st2” repo</a></li> </ul> <h2 id="stackstorm-high-level-architecture">StackStorm High-Level Architecture</h2> <p>Before diving into the individual StackStorm services, it’s important to start at the top; what does StackStorm look like when you initially lift the hood?</p> <p>The best place to start for this is the <a href="https://docs.stackstorm.com/overview.html">StackStorm Overview</a>, where StackStorm concepts and a very high-level walkthrough of how the components interact is shown. In addition, the <a href="https://docs.stackstorm.com/reference/ha.html">High-Availability Deployment Guide</a> (which you should absolutely read if you’re serious about deploying StackStorm) contains a much more detailed diagram, showing the actual, individual process that make up a running StackStorm instance:</p> <div style="text-align:center;"><a href="https://keepingitclassless.net/assets/2017/04/services.png"><img src="https://keepingitclassless.net/assets/2017/04/services.png" width="500" /></a></div> <blockquote> <p>It would be a good idea to keep this diagram open in another tab while you read on, to understand where each service fits in the cohesive whole that is StackStorm</p> </blockquote> <p>As you can see, there’s not really a “StackStorm server”. StackStorm is actually comprised of multiple microservices, each of which has a very specific job to do. Many of these services communicate with each other over RabbitMQ, for instance, to let each other know when they need to perform some task. Some services also write to a database of some kind for persistence or auditing purposes. The specifics involved with these usages will become more obvious as we explore each service in detail.</p> <h2 id="stackstorm-services">StackStorm Services</h2> <p>Now, we’ll dive in to each service individually. Note that each service runs as its own separate process, and nearly all of them can have multiple running copies of themselves on the same machine, or even multiple machines. Refer to the <a href="https://docs.stackstorm.com/reference/ha.html">StackStorm High-Availability Deployment Guide</a> for more details on this.</p> <p>Again, the purpose of this post is to explore each service individually to better understand them, but remember that they must all work together to make StackStorm work. It may be useful to keep the diagram(s) above open in a separate tab, to keep the big picture in mind.</p> <p>We’ll be looking at things from a systems perspective as well as a bit of the code, where it makes sense. My primary motivation for this post is to document the “gist” of how each service is implemented, to give you a head start on understanding them if you wish to either know how they work, or contribute to them. Selfishly, I’d love it if such a reference existed for my own benefit, so I’m writing it.</p> <h3 id="st2actionrunner">st2actionrunner</h3> <p>We start off by looking at <a href="https://docs.stackstorm.com/reference/ha.html#st2actionrunner"><code class="highlighter-rouge">st2actionrunner</code></a> because, like the Actions that run inside them, it’s probably the most relatable component for those that have automation experience, but are new to StackStorm or event-driven automation in general.</p> <p><code class="highlighter-rouge">st2actionrunner</code> is responsible for receiving execution (an instance of a running action) instructions, scheduling and executing those executions. If you dig into the <code class="highlighter-rouge">st2actionrunner</code> code a bit, you can see that it’s powered by two subcomponents: a <a href="https://github.com/StackStorm/st2/blob/master/st2actions/st2actions/scheduler.py">scheduler</a>, and a <a href="https://github.com/StackStorm/st2/blob/master/st2actions/st2actions/worker.py">dispatcher</a>. The scheduler receives requests for new executions off of the message queue, and works out the details of when and how this action should be run. For instance, there might be a policy in place that is preventing the action from running until a few other executions finish up. Once an execution is scheduled, it is passed to the dispatcher, which actually runs the action with the provided parameters, and retrieves the resulting output.</p> <blockquote> <p>You may have also heard the term “runners” in reference to StackStorm actions. In short, you can think of these kind of like “base classes” for Actions. For instance I might have an action that executes a Python script; this action will use the <code class="highlighter-rouge">run-python</code> runner, because that runner contains all of the repetitive infrastructure needed by all Python-based Actions. Please do not confuse this term with the <code class="highlighter-rouge">st2actionrunner</code> service; <code class="highlighter-rouge">st2actionrunner</code> is a running process for running all Actions, and a “runner” is a Python base class to declare some common foundation for an Action to use. In fact, <code class="highlighter-rouge">st2actionrunner</code> is indeed <a href="https://github.com/StackStorm/st2/blob/master/st2actions/st2actions/container/base.py">responsible for handing off execution details to the runner</a>, whether it’s a Python runner, a shell script runner, etc.</p> </blockquote> <p>As shown in the component diagram, <code class="highlighter-rouge">st2actionrunner</code> communicates with both RabbitMQ, as well as the database (which, at this time is MongoDB). RabbitMQ is used to deliver incoming execution requests to the scheduler, and also so the scheduler can forward scheduled executions to the dispatcher. Both of these subcomponents update the database with execution history and status.</p> <h3 id="st2sensorcontainer">st2sensorcontainer</h3> <p>The job of the <code class="highlighter-rouge">st2sensorcontainer</code> service is to execute and manage the Sensors that have been installed and enabled within StackStorm. The name of the game here is to simply provide underlying infrastructure for running these Sensors, as much of the logic for how the Sensor itself works is done within that code. This includes dispatching Trigger Instances when a meaningful event has occurred. <code class="highlighter-rouge">st2sensorcontainer</code> just maintains awareness of what Sensors are installed and enabled, and does its best to keep them running.</p> <p>The <a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/container/manager.py">sensor manager</a> is responsible for kicking off all the logic of managing various sensors within <code class="highlighter-rouge">st2sensorcontainer</code>. To do this, it leverages two subcomponents:</p> <ul> <li><a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/container/process_container.py">process container</a>: Manages the processes actually executing Sensor code</li> <li><a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/services/sensor_watcher.py">sensor watcher</a>: Watches for Sensor Create/Update/Delete events</li> </ul> <h4 id="sensors---process-container">Sensors - Process Container</h4> <p>The process container is responsible for running and managing the processes that execute Sensor code. If you look at the <a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/container/process_container.py">process container</a> code, you’ll see a <code class="highlighter-rouge">_spawn_sensor_process</code> actually kicks off a <code class="highlighter-rouge">subprocess.Popen</code> call to execute a <a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/container/sensor_wrapper.py">“wrapper” script</a>:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~$ st2 sensor list +-----------------------+-------+-------------------------------------------+---------+ | ref | pack | description | enabled | +-----------------------+-------+-------------------------------------------+---------+ | linux.FileWatchSensor | linux | Sensor which monitors files for new lines | True | +-----------------------+-------+-------------------------------------------+---------+ ~$ ps --sort -rss -eo command | grep sensor_wrapper /opt/stackstorm/st2/bin/python /opt/stackstorm/st2/local/lib/python2.7/site-packages/st2reactor/container/sensor_wrapper.py --pack=linux --file-path=/opt/stackstorm/packs/linux/sensors/file_watch_sensor.py --class-name=FileWatchSensor --trigger-type-refs=linux.file_watch.line --parent-args=["--config-file", "/etc/st2/st2.conf"] </code></pre></div></div> <p>This means that each individual sensor runs as its own separate process. The usage of the wrapper script enables this, and it also provides a lot of the “behind the scenes” work that Sensors rely on, such as dispatching trigger instances, or retrieving pack configuration information. So, the process container’s job is to spawn instances of this wrapper script, with arguments set to the values they need to be in order to run specific Sensor code in packs.</p> <h4 id="sensors---watcher">Sensors - Watcher</h4> <p>We also mentioned another subcomponent for <code class="highlighter-rouge">st2sensorcontainer</code> and that is the “sensor watcher”. This subcomponent watches for Sensors to be installed, changed, or removed from StackStorm, and updates the process container accordingly. For instance, if we install the <a href="https://github.com/StackStorm-Exchange/stackstorm-slack"><code class="highlighter-rouge">slack</code></a> pack, the <a href="https://github.com/StackStorm-Exchange/stackstorm-slack/blob/master/sensors/slack_sensor.yaml"><code class="highlighter-rouge">SlackSensor</code></a> will need to be run automatically, since it’s enabled by default.</p> <p>The sensor watcher subscribes to the message queue and listens for incoming messages that indicate such a change has taken place. In the <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/services/sensor_watcher.py">watcher code</a>, a handler function is referenced for each event (create/update/delete). So, the watcher listens for incoming messages, and calls the relevant function based on the message type. By the way, those functions are defined back in the <a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/container/manager.py">sensor manager</a>, where it has has access to instruct the process container to make the relevant changes.</p> <p>That explains how CUD events are handled, but where do these events originate? When we install the <code class="highlighter-rouge">slack</code> pack, or run the <code class="highlighter-rouge">st2ctl reload</code> command, some <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/bootstrap/sensorsregistrar.py">bootstrapping code</a> is executed, which is responsible for updating the database, as well as publishing messages to the message queue, to which the sensor watcher is subscribed.</p> <h3 id="st2rulesengine">st2rulesengine</h3> <p>While <code class="highlighter-rouge">st2rulesengine</code> might be considered one of the simpler services in StackStorm, its job is the most crucial. It is here that the entire premise of event-driven automation is made manifest.</p> <p>For an engaging primer on rules engines in general, I’d advise listening to <a href="http://www.se-radio.net/2017/08/se-radio-episode-299-edson-tirelli-on-rules-engines/">Sofware Engineering Radio Episode 299</a>. I had already been working with StackStorm for a while when I first listened to that so I was generally familiar with the concept, but it was nice to get a generic perspective that explored some of the theory behind rules engines.</p> <p>Remember my earlier post on <a href="https://keepingitclassless.net/2016/12/introduction-to-stackstorm/">StackStorm concepts</a>? In it, I briefly touched on Triggers - these are definitions of an “event” that may by actionable. For instance, when someone posts a tweet that matches a search we’ve configured, the Twitter sensor may use the <code class="highlighter-rouge">twitter.matched_tweet</code> trigger to notify us of that event. A specific instance of that trigger being raised is known creatively as a “trigger instance”.</p> <p>In short, StackStorm’s rules engine looks for incoming trigger instances, and decides if an Action needs to be executed. It makes this decision based on the rules that are currently installed and enabled from the various packs that are currently present in the database.</p> <p>As is common with most other StackStorm services, the logic of this service is contained within a <a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/rules/worker.py">“worker”</a>, using a handy Python base class which centralizes the receipt of messages from the message queue, and allows the rules engine to focus on just dealing with incoming trigger instances.</p> <p>The <a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/rules/engine.py">engine itself</a> is actually quite straightforward:</p> <ol> <li>Receive trigger instance from message queue</li> <li>Determine which rule(s) match the incoming trigger instance</li> <li>Enforce the consequences from the rule definition (usually, executing an Action)</li> </ol> <blockquote> <p>The <a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/rules/matcher.py">rules matcher</a> and <a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/rules/enforcer.py">enforcer</a> are useful bits of code for understanding how these tasks are performed in StackStorm. Again, while the work of the rules engine in StackStorm is crucial, the code involved is fairly easy to understand.</p> </blockquote> <p>Finally, StackStorm offers some built-in triggers that allow you to trigger an Action based on some passage of time:</p> <ul> <li><code class="highlighter-rouge">core.st2.IntervalTimer</code> - trigger after a set interval of time</li> <li><code class="highlighter-rouge">core.st2.DateTimer</code> - trigger on a certain date/time</li> <li><code class="highlighter-rouge">core.st2.CronTimer</code> - trigger whenever current time matches the specified time constraints</li> </ul> <p>Upon start, <code class="highlighter-rouge">st2rulesengine</code> threads off <a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/timer/base.py">a bit of code</a> dedicated to firing these triggers at the appropriate time.</p> <p><code class="highlighter-rouge">st2rulesengine</code> needs access to RabbitMQ to receive trigger instances and send a request to execute an Action. It also needs access to MongoDB to retrieve the rules that are currently installed.</p> <h3 id="st2api">st2api</h3> <p>If you’ve worked with StackStorm at all (and since you’re still reading, I’ll assume you have), you know that StackStorm has an API. External components, such as the CLI client, the Web UI, as well as third-party systems all use this API to interact with StackStorm.</p> <p>An interesting and roughly accurate way of viewing <code class="highlighter-rouge">st2api</code> is that it “translates” incoming API calls into RabbitMQ messages and database interactions. What’s meant by this is that incoming API requests are usually aimed at either retrieving data, pushing new data, or executing some kind of action with StackStorm. All of these things are done on other running processes; for instance, <code class="highlighter-rouge">st2actionrunner</code> is responsible for actually executing a running action, and it receives those requests over RabbitMQ. So, <code class="highlighter-rouge">st2api</code> must initially receive such instructions via it’s API, and forward that request along via RabbitMQ. Let’s discuss how that actually works.</p> <blockquote> <p>The 2.3 release changed a lot of the underlying infrastructure for the StackStorm API. The API itself isn’t changing (still at v1) for this release, but the way that the API is described within <code class="highlighter-rouge">st2api</code>, and how incoming requests are routed to function calls has changed a bit. Everything we’ll discuss in this section will reflect these changes. Pleaes review <a href="https://github.com/StackStorm/st2/issues/2686">this issue</a> and <a href="https://github.com/StackStorm/st2/pull/2727">this PR</a> for a bit of insight into the history of this change.</p> </blockquote> <p>The way the API itself actually works requires its own blog post for a proper exploration. For now, suffice it to say that StackStorm’s API is defined with the <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/openapi.yaml">OpenAPI specification</a>. Using this definition, each endpoint is linked to an API controller function that actually provides the implementation for this endpoint. These functions may write to a database, they may send a message over the message queue, or they may do both. Whatever’s needed in order to implement the functionality offered by that API endpoint is performed within that function.</p> <p>For the purposes of this post however, let’s talk briefly about how this API is actually served from a systems perspective. Obviously, regardless of how the API is implemented, it will have to be served by some kind of HTTP server.</p> <blockquote> <p>Note that in a production-quality deployment of StackStorm, the API is front-ended by nginx. We’ll be talking about the nginx configuration in another post, so we’ll not be discussing it here. But it’s important to keep this in mind.</p> </blockquote> <p>We can use this handy command, filtered through <code class="highlighter-rouge">grep</code> to see exactly what command was used to instantiate the <code class="highlighter-rouge">st2api</code> process.</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~$ ps --sort -rss -eo command | head | grep st2api /opt/stackstorm/st2/bin/python /opt/stackstorm/st2/bin/gunicorn st2api.wsgi:application -k eventlet -b 127.0.0.1:9101 --workers 1 --threads 1 --graceful-timeout 10 --timeout 30 </code></pre></div></div> <p>As you can see, it’s running on Python, like most StackStorm components. Note that this is the distribution of Python in the StackStorm virtualenv, so anything run with this Python binary will already have all of its pypi dependencies satisfied - these are installed with the rest of StackStorm.</p> <p>The second argument - <code class="highlighter-rouge">/opt/stackstorm/st2/bin/gunicorn</code> - shows that <a href="http://gunicorn.org/">Gunicorn</a> is running the API application. Gunicorn is a WSGI HTTP server. it’s used to serve StackStorm’s API as well as a few other components we’ll explore later. You’ll notice that for <code class="highlighter-rouge">st2api</code>, the third positional argument is <a href="https://github.com/StackStorm/st2/blob/master/st2api/st2api/wsgi.py">actually a reference to a Python variable</a> (remember that this is running from StackStorm’s Python virtualenv, so this works). Looking at <a href="https://github.com/StackStorm/st2/blob/master/st2api/st2api/wsgi.py">the code</a> we can see that this variable is the result of a call out to the setup task for the <a href="https://github.com/StackStorm/st2/blob/master/st2api/st2api/app.py">primary API application</a>, which is where the aforementioned OpenAPI spec is loaded and rendered into actionable HTTP endpoints.</p> <p>You may also be wondering how <code class="highlighter-rouge">st2api</code> serves <a href="https://docs.stackstorm.com/webhooks.html">webhooks</a>. There’s an endpoint for webhooks at <code class="highlighter-rouge">/webhooks</code> of course, but how does <code class="highlighter-rouge">st2api</code> know that a rule has registered a new webhook? This is actually not that different from what we saw earlier with Sensors, when the sensor container is made aware of a new sensor being registered. In this case, <code class="highlighter-rouge">st2api</code> leverages a <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/services/triggerwatcher.py">TriggerWatcher</a> class which is made aware of new triggers being referenced from rules, and calls the appropriate event handler functions in the <code class="highlighter-rouge">st2api</code> controller. Those functions add or remove webhook entries from the <code class="highlighter-rouge">HooksHolder</code> instance, so whenever a new request comes in to the <code class="highlighter-rouge">/webhooks</code> endpoint, <code class="highlighter-rouge">st2api</code> knows to check this <code class="highlighter-rouge">HooksHolder</code> for the appropriate trigger to dispatch.</p> <h3 id="st2auth">st2auth</h3> <p>Take a look at StackStorm’s <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/openapi.yaml">API definition</a> and search for “st2auth” and you can see that the authentication endpoints are defined alongside the rest of the API.</p> <p><code class="highlighter-rouge">st2auth</code> is executed in almost exactly the same way as <code class="highlighter-rouge">st2api</code>. Gunicorn is the HTTP WSGI server, executed within the Python virtualenv in StackStorm:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~$ ps --sort -rss -eo command | head | grep st2auth /opt/stackstorm/st2/bin/python /opt/stackstorm/st2/bin/gunicorn st2auth.wsgi:application -k eventlet -b 127.0.0.1:9100 --workers 1 --threads 1 --graceful-timeout 10 --timeout 30 </code></pre></div></div> <p><code class="highlighter-rouge">st2api</code> defines <a href="https://github.com/StackStorm/st2/blob/master/st2auth/st2auth/app.py">its own WSGI application</a> to run under Gunicorn.</p> <blockquote> <p>If you’re like me, you might have looked at the <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/openapi.yaml">OpenAPI definition</a> and noticed that <code class="highlighter-rouge">st2api</code>’s endpoints are mixed in with the regular API endpoints. At the time of this writing, the two are kept separate when the spec is loaded by either component by none other than…regular expressions! If you look at <a href="https://github.com/StackStorm/st2/blob/master/st2auth/st2auth/app.py"><code class="highlighter-rouge">st2api</code>’s app definition</a>, you’ll notice a few transformations are passed to the <code class="highlighter-rouge">router.add_spec</code> function. Among other things, these are used within the <code class="highlighter-rouge">add_spec</code> to determine which endpoints to associate with this application.</p> </blockquote> <p>The <a href="https://github.com/StackStorm/st2/blob/master/st2auth/st2auth/controllers/v1/auth.py">API controller</a> for <code class="highlighter-rouge">st2api</code> is relatively simple, and provides implementations for the two endpoints:</p> <ol> <li>Token Validation</li> <li>Authentication and Token Allocation</li> </ol> <p>As you can see, <code class="highlighter-rouge">st2auth</code> is fairly simple. We already learned the basics of how WSGI applications are run with Gunicorn in StackStorm when we explored <code class="highlighter-rouge">st2api</code>, and <code class="highlighter-rouge">st2auth</code> is quite similar: just with different endpoints and back-end implementations.</p> <h3 id="st2resultstracker">st2resultstracker</h3> <p>Due to the available options for running <a href="https://docs.stackstorm.com/workflows.html">Workflows</a> in StackStorm, sometimes workflow executions happen outside the scope of StackStorm’s domain. For instance, to run Mistral workflows, StackStorm must interact exclusively through Mistral’s API. As a result, after the workflow is executed, StackStorm needs to continue to poll this API for the results of that workflow, in order to update the local StackStorm copy of those executions in the database. Interestingly, the <a href="https://docs.stackstorm.com/troubleshooting/mistral.html#troubleshooting-mistral-workflow-completion-latency">Mistral troubleshooting doc</a> contains some useful information about this process.</p> <blockquote> <p>A better architectural approach would be to implement callbacks in workflow engines like Mistral that push result updates to subscribers, rather than have StackStorm periodically poll the API. There are a number of <a href="https://review.openstack.org/#/c/455083/">existing proposals</a> for doing this, and hopefully in the next few release cycles, this will be implemented, making <code class="highlighter-rouge">st2resultstracker</code> unnecessary.</p> </blockquote> <p>The end-goal here is to provide the results of a Workflow execution in StackStorm, rather than forcing users to go somewhere else for that information.</p> <p><code class="highlighter-rouge">st2resultstracker</code> runs as its own standalone process. When a workflow is executed, it consumes a message from a special queue (note the <code class="highlighter-rouge">get_tracker</code> function in <a href="https://github.com/StackStorm/st2/blob/master/st2actions/st2actions/resultstracker/resultstracker.py">resultstracker.py</a>). That message follows a <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/models/db/executionstate.py">database model</a> focused on tracking execution state, and contains the parameter <code class="highlighter-rouge">query_module</code>. If the execution is a Mistral workflow, this will be set to <code class="highlighter-rouge">mistral_v2</code>, which causes <code class="highlighter-rouge">st2resultstracker</code> to load the <a href="https://github.com/StackStorm/st2/blob/master/contrib/runners/mistral_v2/query/mistral_v2.py">mistral-specific querier</a>. That querier contains all of the code necessary for interacting with Mistral to receive results information. <code class="highlighter-rouge">st2resultstracker</code> uses this module to query Mistral and place the results in the StackStorm database.</p> <h3 id="st2notifier">st2notifier</h3> <p>The primary role of <code class="highlighter-rouge">st2notifier</code> is to provide an integration point for <a href="https://docs.stackstorm.com/chatops/notifications.html">notifying</a> external systems that an action has completed. <a href="https://docs.stackstorm.com/chatops/chatops.html">Chatops</a> is a big use case for this, but there are others.</p> <p>At the time of this writing, <code class="highlighter-rouge">st2notifier</code> serves two main purposes:</p> <ul> <li>Generate <code class="highlighter-rouge">st2.core.actiontrigger</code> and <code class="highlighter-rouge">st2.core.notifytrigger</code> triggers based on the completion and runtime parameters of an Action execution.</li> <li>Act as a backup scheduler for actions that may not have been scheduled - i.e., delayed by policy.</li> </ul> <p><code class="highlighter-rouge">st2notifier</code> dispatches two types of triggers. The first, <code class="highlighter-rouge">st2.core.actiontrigger</code> is fired for each completed execution. This is enabled by default, so you can hit the ground running by writing a rule to consume this trigger and notify external systems like Slack or JIRA when an action is completed. The second trigger, <code class="highlighter-rouge">st2.core.notifytrigger</code> is more action-specific. As mentioned in the <a href="https://docs.stackstorm.com/chatops/notifications.html">Notification</a> documentation, you can add a <code class="highlighter-rouge">notify</code> section to your Action metadata. If this section is present, <code class="highlighter-rouge">st2notifier</code> will also dispatch a <code class="highlighter-rouge">notifytrigger</code> for each route specified in the <code class="highlighter-rouge">notify</code> section. You can consume these triggers with rules and publish according to the routing information inside that section.</p> <p>If you look at the <a href="https://github.com/StackStorm/st2/blob/master/st2actions/st2actions/notifier/notifier.py">notifier implementation</a>, you can see the familiar message queue subscription logic at the bottom (see <code class="highlighter-rouge">get_notifier</code> function). <code class="highlighter-rouge">st2notifier</code> receives messages from the queue so that the <code class="highlighter-rouge">process</code> function is kicked off when action executions complete. From there, the logic is straightforward; the <code class="highlighter-rouge">actiontrigger</code> fires for each action (provided the config option is still enabled), and <code class="highlighter-rouge">notifytrigger</code> is fired based on the <code class="highlighter-rouge">notify</code> field in the <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/models/db/liveaction.py">LiveActionDB</a> sent over the message queue.</p> <p><code class="highlighter-rouge">st2notifier</code> also acts as a <a href="https://github.com/StackStorm/st2/blob/master/st2actions/st2actions/notifier/scheduler.py">rescheduler</a> for Actions that have been delayed, for instance, because of a <a href="https://docs.stackstorm.com/reference/policies.html#concurrency">concurrency policy</a>. Based on the configuration, <code class="highlighter-rouge">st2notifier</code> can attempt to reschedule executions that have been delayed past a certain time threshold.</p> <h3 id="st2garbagecollector">st2garbagecollector</h3> <p><code class="highlighter-rouge">st2garbagecollector</code> is a relatively simple service aimed at providing garbage collection services for things like action executions and trigger-instances. For some high-activity deployments of StackStorm, it may be useful to delete executions after a certain amount of time, rather than continue to keep them around forever, eating up system resources.</p> <blockquote> <p>NOTE that this is “garbage collection” in the StackStorm sense, not at the language level (Python).</p> </blockquote> <p>Garbage collection is optional, and not enabled by default. You can enable this in the <code class="highlighter-rouge">garbagecollector</code> section of the <a href="https://github.com/StackStorm/st2/blob/master/conf/st2.conf.sample">StackStorm config</a>.</p> <p>The design of <code class="highlighter-rouge">st2garbagecollector</code> is straightforward. Runs as its own process, and executes the garbage collection functionality within an <a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/garbage_collector/base.py">eventlet</a> which performs collection in a loop. The interval is configurable. Both <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/garbage_collection/executions.py">executions</a> and <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/garbage_collection/trigger_instances.py">trigger instances</a> have collection functionality at the time of this writing.</p> <h3 id="st2stream">st2stream</h3> <p>The goal of <code class="highlighter-rouge">st2stream</code> is to provide an event stream to external components like the WebUI and Chatops (as well as third party software).</p> <p><code class="highlighter-rouge">st2stream</code> is the third and final service constructed as a <a href="https://github.com/StackStorm/st2/blob/master/st2stream/st2stream/app.py">WSGI application</a>. If you’ve read the section on <code class="highlighter-rouge">st2api</code> and <code class="highlighter-rouge">st2auth</code>, very little will be new to you here. Searching the <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/openapi.yaml">OpenAPI</a> spec for StackStorm’s API for <code class="highlighter-rouge">/stream</code> will lead to the one and only endpoint for this service.</p> <p>The documentation for this endpoint is <a href="https://github.com/StackStorm/st2docs/issues/550">a bit lacking at the moment</a> but you can get a sense for how it works with a simple <code class="highlighter-rouge">curl</code> call:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~$ curl http://127.0.0.1:9102/v1/stream event: st2.liveaction__create data: {"status": "requested", "start_timestamp": "2017-08-28T21:01:10.414877Z", "parameters": {"cmd": "date"}, "action_is_workflow": false, "runner_info": {}, "callback": {}, "result": {}, "context": {"user": "stanley"}, "action": "core.local", "id": "59a4849602ebd558f14a66d8"} ... </code></pre></div></div> <p>This will keep a connection open to <code class="highlighter-rouge">st2api</code> and events will stream into the console as events take place (I ran <code class="highlighter-rouge">st2 core.local date</code> command in a separate tab to produce this once I had subscribed to the stream).</p> <p>The <a href="https://github.com/StackStorm/st2/blob/master/st2stream/st2stream/controllers/v1/stream.py">controller</a> for this API endpoint is also fairly straightforward - it returns a response of type <code class="highlighter-rouge">text/event-stream</code>, which instructs the <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/router.py">Router</a> to maintain this persistent connection so that events can be forward to the client.</p> <h2 id="conclusion">Conclusion</h2> <p>There are several external services like Mistral, RabbitMQ, NGINX, MongoDB, and Postgres that we explicitly didn’t cover in this post. They’re crucial for the operation of StackStorm, but better suited for a separate post in the near future.</p> <p>We also skipped covering one “core” service, <code class="highlighter-rouge">st2chatops</code>. This is an optional service (disabled by default until configured) that allows chatops integration in StackStorm. There’s a lot to talk about with respect to chatops on its own, so that will also be done in a separate post.</p> <p>For now, I hope this was a useful exploration into the services that make StackStorm work. Stay tuned for follow-up posts on specific topics that we glossed over for now.</p> Mon, 28 Aug 2017 00:00:00 +0000 https://keepingitclassless.net/2017/08/stackstorm-architecture-core-services/ https://keepingitclassless.net/2017/08/stackstorm-architecture-core-services/ Your Cheese Moved a Long Time Ago <p>I was recently on a panel at the <a href="https://www.meetup.com/Auto-Remediation-and-Event-Driven-Automation/">Event-Driven Automation Meetup</a> at LinkedIn in Sunnyvale, CA, and we all had a really good hour-long conversation about automation. What really made me happy was that nearly the entire conversation focused on bringing the same principles that companies like LinkedIn and Facebook use on their network to smaller organizations, making them practical for more widespread use.</p> <blockquote class="twitter-tweet tw-align-center" data-lang="en"><p lang="en" dir="ltr">Nina Mushiana of <a href="https://twitter.com/LinkedIn">@LinkedIn</a> says &quot;Anything that can be documented should be automated&quot;.<br />Great Auto-Remediation Meetup! <a href="https://t.co/l76U1IydjB">pic.twitter.com/l76U1IydjB</a></p>&mdash; StackStorm (@Stack_Storm) <a href="https://twitter.com/Stack_Storm/status/847664487620530177">March 31, 2017</a></blockquote> <script async="" src="//platform.twitter.com/widgets.js" charset="utf-8"></script> <p>One particular topic that came up was one I’ve struggled with for the past few years; What about Day 2 of network automation? So, we manage to write some Ansible playbooks to push configuration files to switches - what’s next? Often this question isn’t asked. I think the network automation conversation has progressed to the point where we should all start asking this question more often.</p> <p>I believe that the network engineering discipline is at a crossroads, and the workforce as a whole needs to make some changes and decisions in order to stay relevant. Those changes are all based on the following premise:</p> <blockquote> <p>The value of the network does not come from discrete nodes (like routers and switches - physical or virtual), or their configuration, but from the services they provide.</p> </blockquote> <p>If you’re just getting started down the path of following basic configuration management or infrastructure-as-code principles, <strong>that’s fantastic</strong>. This post is not meant to discourage you from doing that. Those things are great for 1-2 years in the future. This post focuses on year 3+ of the network automation journey.</p> <h1 id="your-cheese-has-moved">Your Cheese Has Moved</h1> <p>We’ve all heard the lamentations that come from server admins (<a href="https://keepingitclassless.net/2015/02/free-form-discussion-cleur/">throwback alert</a>) like “why does it take weeks to provision a new VLAN?”; I worked as a network and data center consultant for a number of years and I can tell you that these stories are true, and it gets much worse than that.</p> <p>As I’ve said before, what the sysadmin usually doesn’t know is all the activity that goes on behind the scenes to deliver that VLAN. Usually what they’re asking for is a new logical network, which isn’t just a tag on a switchport - it’s also adding a layer 3 interface, and potentially routing changes, edits to the firewall, a new load balancing configuration, and on and on and on. The network has traditionally provided a lot of these services, that the sysadmin took for granted.</p> <p>You might understand their frustration, but the reality is that the network engineer is trying hard just to provide these services and ensure they’re changing adequately for the applications that rely upon them. It also doesn’t help when processes like ITIL force such changes to take places every first weekend of the month at 2AM. This is a far cry from what the application teams and developers have come to expect, like response times of seconds or minutes, not weeks or months. But hey, those silly developers don’t know networking, so they can just deal with it, right?</p> <p>Yes, it can be tempting to make fun of some developers that can’t tell a frame from a packet. However, it may be useful to remember that a developer wrote the software in your router. Someone had to write the algorithms that power your load balancer. It is indeed possible that some software developers know networking - even better than most network engineers out there. Then, if you put them in the constantly-innovating culture of silicon valley that is always looking for a problem to solve, it’s inevitable; the arduous processes and inflexible tooling that has dominated networking for so long provided those developers and sysadmins with a problem to solve on a silver platter.</p> <div style="text-align:center;"><a href="https://keepingitclassless.net/assets/2017/04/cheese.png"><img src="https://keepingitclassless.net/assets/2017/04/cheese.png" width="300" /></a></div> <p>And solve it they did. When x86 virtualization was really hitting the mainstream, network engineers didn’t really acknowledge the vSwitch. They wrote it off as “those server guys”. What about when we started routing in the host or hypervisor? I know a lot of people like to make fun of the whole <code class="highlighter-rouge">docker0</code> bridge/NAT thing. Those silly server people, right? Developers are spinning up haproxy instances for load balancing, and learning how to use iptables to secure their own infrastructure. On top of that, all of these network services are <strong>also being offered by AWS</strong> and are all in one nice dashboard and also totally programmable. Can you really blame the developer now? Put yourself in their shoes - if you were faced with an inflexible network infrastructure that your application depended on, and you had no control over it, how long would it take you to follow the shiny red ball over to Amazon where they make all those same network <em>services</em> totally abstract and API-controllable?</p> <p>So what’s happening here is that “those server guys” are basically running their own network at this point. We’ve clung to our black boxes, and our configuration files at the cost of <strong>losing control over the actual network services</strong>. The truth is, we need to play a lot of catch-up.</p> <blockquote> <p>I know what you’re thinking - there’s more to the network than the data center. But like it or not, the datacenter houses the applications, and the applications are where the business sees the value in IT. Applications and software development teams sit closer to the boss, and they’re learning how to manage network services pretty well on their own out of necessity.</p> </blockquote> <h1 id="getting-the-cheese-back">Getting the Cheese Back</h1> <p>Network automation is about so much more than merely solving a configuration management problem. If it was, this would all be a bit anticlimactic, wouldn’t it? Everyone would just learn Ansible/Salt/Puppet and be done with it.</p> <p>Network automation, just like all other forms, is about <strong>services integration</strong>. There aren’t “existing tools” for your legacy, internal applications. At some point <a href="https://keepingitclassless.net/2017/03/learn-programming-or-perish/">you’re going to have to write some code</a>, even if it’s an extension to an existing tool. It’s time to get over this aversion to dealing with even basic scripting, and start filling in the 20% of our workflows that can’t be addressed by a turnkey tool or product. To me, this is the next step of network automation - being able to fill in the gaps between historically air-gapped services to create an automated broader IT system.</p> <p>For instance - Kubernetes is an increasingly popular choice for those looking to deploy distributed applications (don’t make me say “cloud native”). It’s great at managing the entities (like pods) under it’s control, but it’s not meant to run everything meaningful to your business. If you’re running Kubernetes in your organization, it will have to run alongside a bunch of other stuff like OpenStack, vSphere, even mainframes. This is the reality of brownfield.</p> <p>As you might expect, all these systems need to work together, and we’ve historically “integrated” them by hand for a long time by looking at different areas of our technology stack, and “rendering” abstract concepts of desired state into implementation-specific commands and configurations. Just take networking as a specific example - a network engineer is the human manifestation of a cross platform orchestrator, seamlessly translating between Cisco and Juniper CLI syntaxes.</p> <div style="text-align:center;"><a href="https://keepingitclassless.net/assets/2017/04/dr_garencieres.jpg"><img src="https://keepingitclassless.net/assets/2017/04/dr_garencieres.jpg" width="500" /></a></div> <p>So, to return to the main point; the network is now no longer the sole proprietor of network services - those are slowly but surely migrating into the realm of the sysadmin and software developer. How can we adapt to this? One way is to acknowledge that the new “network edge” is very blurred. No longer is there a physical demarcation like a switchport; rather, these services are being provided either directly adjacent, or even co-resident with the application.</p> <p>It’s actually a bit encouraging that this has happened. This change represents a huge opportunity for network engineers to gain more control over the network than they’ve ever had. Historically, these network services were hidden behind “value-add, differentiating features” like CLI syntax (insert sarcasm undertone here). In the new world these services are either taking place in open-source software, or are at least driven by well-designed, well-documented APIs. So, this new model is out there ready for us. We can take it, or lose it.</p> <h1 id="conclusion">Conclusion</h1> <p>The migration of network services out of the network itself was inevitable, but it’s absolutely not a death blow to the network engineer - it’s a huge opportunity to move forward in a big way. There’s a lot of work to do, but as <a href="https://keepingitclassless.net/2017/03/learn-programming-or-perish/">I wrote about last week</a>, the networking skill set is still sought after, and still needed in this new world.</p> <p><a href="http://info.interop.com/itx/2017/scheduler/session/fundamental-principles-of-automation">I’ll be speaking at Interop ITX</a> in Vegas next month, about this, and more related topics. If you want to talk about automation, or just geek out about beer or food, I’d love to chat with you.</p> Thu, 06 Apr 2017 00:00:00 +0000 https://keepingitclassless.net/2017/04/cheese-moved-long-time-ago/ https://keepingitclassless.net/2017/04/cheese-moved-long-time-ago/ Learn Programming or Perish(?) <p>I was honored to return to Packet Pushers for <a href="http://packetpushers.net/podcast/podcasts/show-332-dont-believe-programming-hype/">a discussion on programming skillsets in the networking industry</a>. I verbalized some thoughts there, but even 60 minutes isn’t enough for a conversation like this.</p> <p>To be clear, this post is written primarily to my followers in the networking industry, since that’s largely where this conversation is taking place.</p> <h1 id="scripting-is-not-programming">Scripting is NOT Programming</h1> <p>I want to put something to rest right now, and that is the conflation of scripting and software development. You may be hesitant to pick up any skills in this area because you feel like you have to boil the ocean in order to be effective, which is not true.</p> <p>As I briefly mention in the podcast, I spent the first 4 years or so of my career making networking my day job. Because of that, I picked up a lot of useful knowledge in this area. However, as I started to explore software, I realized that networking wasn’t something I wanted to do as a day job anymore, but I still greatly value the networking skillset I retain from this experience.</p> <p>Making this leap over 2 years ago revealed a multitude of subskills, fundamental knowledge, and daily responsibilities I simply wasn’t exposed to when I wasn’t doing this full time. Things I even take for granted now - like code review, automated testing, and computer science basics like algorithms. While I wouldn’t ever discourage anyone from learning these kinds of things, it is very understandable that a network engineer doesn’t deal with these things, because they go way beyond simple scripting.</p> <blockquote> <p>That said, you may run into challenges as your scripts become more complex. It may be useful to pair with someone that writes code for a living, and learn how to make your scripts more modular, scalable, and reusable.</p> </blockquote> <p>In short, don’t conflate <strong>skillset</strong> with <strong>occupation</strong>. Don’t feel like you have to boil the ocean in order to get started. You don’t have to become a programmer, but you should be able to write and maintain scripts using a modern language.</p> <h1 id="stop-talking-start-building">Stop Talking, Start Building</h1> <p>Hopefully the previous section drew a clear line between the <strong>skill</strong> of scripting and the <strong>occupation</strong> of software development, and that as a network engineer, you no more “need” to become a software developer than a car mechanic “needs” to become a heart surgeon. Now that this is out of the way, it’s time to have some real talk about this whole debate.</p> <p>One thing I’ve noticed since joining a team that has ties to just about every area of IT, including networking, is that other disciplines realized long ago that these skills are necessary for reasonably modern operations. There is no “should sysadmins learn code” discussions going on right now - they’ve all picked up Python, bash, or similar. It’s not a discussion of whether or not being able to augment their workflows with code is useful; it is assumed. Yet in networking we’re still debating this for some reason. It pains me when I hear perspectives that paint basic scripting skills as something that only engineers at Facebook or Google need to worry about, when other disciplines, even at smaller scale, simply assume this skillset exists in their operational model.</p> <p>Frankly, I am a bit disturbed that this is still so much of a discussion in networking. I worry that the vast majority of the industry is primarily interested in having their problems solved for them. This is something I observed about 3 years ago, and is a big reason I wanted to make a change in my own career - I didn’t feel like I was building anything, just operating something that someone else built. We alluded to this in the podcast - the industry seems to be trending away from “engineering”, and towards “administration”. Of course, this is a generalization. It’s obvious that the rather explosive growth of communities like <a href="http://networktocode.com/community/">“Network to Code”</a> are indicating at least some interest, but I worry that it’s not enough.</p> <p>There are only two possible conclusions that I can draw from my observations:</p> <ul> <li>People assume that in order to be useful, they have to learn everything a software developer has learned.</li> <li>The difference between software development and scripting is understood, but even scripting is viewed as something “only for Facebook or Google”.</li> </ul> <p>Hopefully the previous section sufficiently refuted the first point. This just isn’t true. Don’t conflate occupation with skillset.</p> <p>Regarding the second point, I am not sure how to solve this, to be honest, other than to advise that you look at how other disciplines have incorporated those skillsets. Attend conferences that don’t explicitly focus on networking. I attended <a href="https://stackstorm.com/2017/03/23/stackstorm-srecon-2017/">SREcon</a> recently and was blown away by the difference in mindset towards these skillsets, compared to my experience at networking conferences. I worry that we get into this networking echo chamber where we listen to each other reject these skillsets, and use that to justify not picking them up ourselves.</p> <h1 id="focusing-on-real-fundamentals">Focusing on REAL Fundamentals</h1> <p>All of that in mind, I want to wrap up with a brief discussion about the difference in types of skillsets, since this often comes up when bringing up software skills in networking. For instance, headlines like “Learn Programming, or get CCIE?” piss me off, frankly. It just misses the point entirely, and subverts the tremendous amount of nuance that needs to be explored in this discussion.</p> <p>I believe strongly that focusing on fundamentals, especially if you’re just starting in your career, <strong>and regardless of which discipline you fall under</strong>, will set you up best for success in the long run. It will allow you to make a lot more sense of specific implementations like CLI syntax. Don’t be afraid to lean on the user guide when you need to look up the syntax for a command. Commit the concepts that sit under that command to memory instead of the syntax itself.</p> <p>As an illustration, consider the artist/painter. If painters learned like the network industry wants us to learn, then art schools would only teach how to replicate the Mona Lisa. Instead, artists learn the fundamentals of brush technique. They learn what colors do when blended on the palette. They use their own creativity and decision making to put these fundamentals into practice when it comes time to make something. Similarly, programmers learn fundamentals like sorting algorithms, Big-O notation, CPU architectures, etc, and rely on knowledge of these tools to solve a problem when it arises.</p> <p>It’s worth saying, that because of where this industry is right now, implementation knowledge is important too, especially since the networking industry is in love with certifications that demonstrate implementation knowledge. It’s obvious that the networking industry places a lot more value on specific implementations - just look at the salary estimates for a CompTIA Network+ vs just about any Cisco certification.</p> <p>However, vendor certs are basically a way of putting the vendor in control of your career. On the other hand, fundamental knowledge puts YOU in control. It lets YOU dominate interviews, instead of the vendor you’ve tied yourself to. Always emphasize learning the fundamentals, and consider that the “real” networking fundamentals may not be on any popular curriculum.</p> <p>To build your career, you will likely have to balance implementation-level knowledge like certs, and fundamental knowledge. Certs let you get in the door - that’s just a reality for the current state of the interview. But don’t let this keep you from going way deeper - it will do wonders for your career long-term.</p> <h1 id="conclusion">Conclusion</h1> <p>To wrap up; if you only take two things away from this post, they are:</p> <ul> <li>Scripting is for everyone. Yes, that includes you. It’s something you can start with today, because it’s not magical. We’re just talking about the description of the logic you already use in your day-to-day operations as source code. That’s it.</li> <li>Emphasize fundamental knowledge. Learn enough about implementations to get in the door, but make sure you know how TCP and ARP work (as an example) regardless of platform.</li> </ul> Mon, 27 Mar 2017 00:00:00 +0000 https://keepingitclassless.net/2017/03/learn-programming-or-perish/ https://keepingitclassless.net/2017/03/learn-programming-or-perish/ 2016 Recap and 2017 Goals <p>Yet another recap post to follow up on <a href="https://keepingitclassless.net/2015/12/2015-recap-2016-goals/">last year’s</a>. 2015 was a big transition year for me, and last year I wanted to make sure I kept the momentum going.</p> <blockquote> <p>I make this post yearly to publicly track my own professional development goals. I find this helps me stay accountable to these goals, and it also allows others to give me a kick in the butt if I’m falling behind.</p> </blockquote> <h1 id="2015-goal-recap">2015 Goal Recap</h1> <p>First, let me recap some of the goals <a href="https://keepingitclassless.net/2014/12/2014-recap-2015-goals/">I set for myself at the beginning of the year</a>, and see how well I did.</p> <p><strong>Network Automation Book</strong> - <a href="https://keepingitclassless.net/2015/12/training-next-generation-network-engineer/">At this time last year</a>, I announced that I was working on a network automation book with Scott Lowe and Jason Edelman. This has certainly taken a bit more time than any of us would have liked, but we’re very near the end. The three of us have had a very busy year, and there are very few things to do for this release. However, we have pushed several additional chapters to O’Reilly, so you can still read these via Safari.</p> <p><strong>Open Source</strong> - Given that <a href="https://keepingitclassless.net/2016/10/new-automation-chapter-begins/">I now work for a company centered around an open source project</a>, I’d say I definitely made a good move towards this goal. I also open sourced <a href="https://keepingitclassless.net/2016/03/test-driven-network-automation/">ToDD</a> earlier this year, which has been steadily growing and becoming more stable over the last few months.</p> <p><strong>Deeper into Go and Python</strong> - I did well in this goal as well, for some of the same reasons as the open source goal - namely, that I work for a company centered around a Python-based open source project, and that I maintain ToDD, which is written in Go. I decided early this year, that in order to continue the momentum from my transition to full-time developer in 2015, I want to focus on Go and Python, so that I can be more flexible than knowing a single language, but also focused enough that I can get depth. is a new topic to me. This is a big reason I am getting more involved with Go.</p> <p><strong>More Community Output</strong> - It’s no secret that blogging output has slowed for me lately. My motivations for blogging and for being involved with the community in general are just very different from what they used to be. My early career was defined by trying to become as broad as possible - working with all different kinds of technologies. Now, I tend to spend more time focusing on one thing at a time, getting to a very deep level of understanding. Though I wish this wasn’t the case, this tends to exhaust the energy I’d normally use to write about what I learned. However, while this part has slowed down, I am still fairly pleased with the other things I’ve done. I do feel like my involvement with open source (which has become quite substantial) is filling this gap quite a bit. I’ve also spoke at conferences and am already continuing this in 2017. So to recap, I feel like this goal was accomplished, but perhaps in a different way than it has been in years past.</p> <h1 id="goals-for-2016">Goals for 2016</h1> <p>While my focus since joining StackStorm has certainly included network automation use cases, it’s also exposed me to other industries and customer use cases. In many ways, these scenarios are much more interesting to me personally than what I’ve been working on in networking for the past few years. So I am hoping to branch into other technical areas beyond networking in 2017.</p> <p>I am leaving this intentionally vague, because I don’t know the future obviously, but I feel like the time is right for a change. I’ll always have ties to networking, of course, and I intend on continuing to advocate for network automation, but I want to do more. Lately I’ve been getting more interested in the security industry - and I feel like there might be a gap for me to fill with my networking and software skillset. I’ll be exploring this in greater detail in 2017.</p> <blockquote> <p>I don’t usually talk about personal goals for 2017, but I’d also like to pick up a piano and get back into playing jazz (hoping to find a group in Portland once I brush the rust off)</p> </blockquote> <h1 id="conclusion">Conclusion</h1> <p>I think the most memorable change for me in 2016 was the affirmation that software development was an area where I wanted to work. I’ll always have close ties to the networking industry, but I’ve realized that there’s a lot about the current state of the industry that just doesn’t satisfy my current career objectives in the same way that software, and automation have (and hopefully will). 2016 saw a big direction change towards open source, and I have really enjoyed it.</p> <p>Have a great New Year’s celebration, say safe, and see you in 2017!</p> Sat, 31 Dec 2016 00:00:00 +0000 https://keepingitclassless.net/2016/12/2016-recap-2017-goals/ https://keepingitclassless.net/2016/12/2016-recap-2017-goals/