Keeping It Classless Perspectives On Networks, Automation, Systems, and Software Engineering http://keepingitclassless.net/ A Guide to Open Source for IT Practitioners <p>It’s easy to see that open source is changing the way people think about infrastructure. However, as the saying goes: “The future is here, it’s just not evenly distributed”. As is normal, there will always be pockets of IT where active involvement in open source will just take some more time.</p> <p>I’ve worked on open source for a few years now, and I have always wanted to publish a post that focuses on a few key ideas that I wish I could tell every new entrant into the world of open source. I feel like going in with the right expectations can really help any efforts here go much more smoothly. So if you’re accustomed to getting most if not all of your technology stack from a vendor, and you’re wondering about the open source craze, and trying to make sense of it all, this is for you. My goal with this post is to empower you to start getting out there and exploring the various communities behind the projects you may already have your eyes on.</p> <h1 id="open-source-is-free-as-in-puppy">Open Source is “Free as in Puppy”</h1> <p>Before some practical tips, I want to spend some time on expectations. This is crucially important when it comes to considering open source software for use in your own infrastructure. Obviously, one of the famous benefits of open source is that you usually don’t need to buy anything to get it. It’s “free”, right?</p> <p>Open source isn’t just about getting free stuff; for enterprise IT, it’s an opportunity to change the paradigm from getting direction from a 3rd party, to being able to set the direction. Everything in technology is based on tradeoffs: “I am willing to give up X to get Y”. While it’s true that you may not have to pay a license fee to use open source software, like you did with vendor-provided solutions, it’s almost certain that some assembly will be required, if not long-term maintenance of the system. There is a financial cost to having your IT staff do this. Even if it’s just a small tool to address a niche use case in your environment, it’s something you’re still on the hook for owning.</p> <blockquote> <p>It is for this reason I always like to highlight the difference between “product” and “project”. There’s a lot of work that goes on behind the scenes of many vendor-provided products that most open source projects don’t worry about (and rightfully so).</p> </blockquote> <p>To help mitigate risks in this tradeoff, any major shift to open source will/should include additional headcount. This can include devs to help contribute needed features and bugfixes, but it could also include ops folks to learn it, and keep it running, just like any other piece of infrastructure. I run into all kinds of folks that encounter the inevitable “wrinkles” present in any open source project (even well-funded, corporate-backed ones) and are frustrated it’s not totally turnkey, and bug-free. Most open source projects, in my experience, aren’t trying to be turnkey in the same way we’ve been conditioned with legacy IT vendors. They try to fill a part of the stack, and expect that their community will take the project and piece it together with other components to make a system. So don’t try to half-ass this - if you feel open source is right for a component of your infrastructure, invest in your people and do it right. This is why open source isn’t “free” in the financial sense - your people fulfill some of the role that was previously fulfilled by your vendor support contracts.</p> <p>In my opinion, open source is all about control. You’re trading off a little bit of that vendor comfort in exchange for enhanced control over the part of your infrastructure where you’re leveraging open source. Open source is a tool to leverage where this additional control gives you a competitive edge, or in some cases, to replace a costly IT system that is <strong>not</strong> giving you that edge, so you wish to move to commodity. In short, <strong>participating in open source isn’t an all-or-nothing proposition</strong> - identify areas where internalizing this control might help you gain an edge, and focus there.</p> <h1 id="if-you-want-something-say-something">If You Want Something, Say Something</h1> <p>Enterprise IT companies have conditioned us to get the vast majority of our technical solutions from behind closed doors. We’re usually forced to adjust to the common-denominator functionality that a particular product or solution provides for an entire set of verticals, and very rarely do we get to significantly influence the direction of a product.</p> <p>However, open source gives us a unique opportunity to really take an <strong>active</strong> role in the direction of a project. Note that I emphasized the word “active” - this was intentional. An unfortunately large number of times, I’ve encountered technology professionals who, for whatever reason, choose to watch a project from afar, and not proactively engage with a project. Don’t do this! Understand your use case, and communicate it proactively.</p> <p>If you “drive by” an open source project on Github - maybe dismissing it because it doesn’t have the nerd knob you think you need - you might be leaving a good solution on the table. Or maybe you don’t think you know enough to jump in - I talk to so many folks that are accustomed to using vendor-provided closed-source solutions exclusively, who feel that they don’t have the “right” or “cred” to post an issue and explain their request or use case.</p> <p>This couldn’t be further from the truth! The vast majority of maintainers absolutely love helping new users and getting outside perspective on use cases. You have much more direct power to influence an open source project - especially smaller tools or libraries - but it does require active, not passive participation. So if you want something, say something. Doing the “drive by” cheats you out of a potential solution, and the maintainers out of a new perspective they wouldn’t otherwise have.</p> <p>So to the more practical - how do we do this? Well of course, each open source project is different, but for this post we’re going to focus on Github. It’s generally become the “common ground” for the majority of open source projects today. So, while you will undoubtedly encounter projects that use other tools, even in addition to Github, focusing on this workfow will serve you well for starters. In Github, a “repo” is a place where a project’s code, docs, scripts, etc are stored. This repo might be nested underneath a specific user, or under a separate organization.</p> <p>In Github, the best place to go to provide feedback is to create an Issue. Projects that allow this (most do) will have an “issues” tab right on the Github repo. For instance, <a href="https://github.com/toddproject/todd">the ToDD project</a>:</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2017/12/new_issue.png"><img src="http://keepingitclassless.net/assets/2017/12/new_issue.png" width="500" /></a></div> <p>You can peruse the list of existing issues, or use the green “New issue” button to the right. Doing so will open a new form for filling out the title and body of the issue you want to raise with the maintainers:</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2017/12/creating_issue.png"><img src="http://keepingitclassless.net/assets/2017/12/creating_issue.png" width="500" /></a></div> <blockquote> <p>Note that markdown is supported in the text body. Use this extensively, especially using backticks and triple backticks (`) for readable log messages or code snippets and the like. Those reading your issue will thank you for it.</p> </blockquote> <p>There are a few things you should do before you open an issue on any project:</p> <ul> <li><strong>Go with the flow</strong> - Get a sense for how the project runs. Many projects will have a <a href="https://github.com/toddproject/todd/blob/master/CONTRIBUTING.md"><code class="highlighter-rouge">CONTRIBUTING.md</code></a> file in the root of their repository which should contain all kinds of useful information for newcomers, including how to contribute code and create issues. Consider this the README for contributing - go here first.</li> <li><strong>Do some research</strong> - Do some googling, read the project docs, and do a search on the repo for existing issues (both open and closed) to see if the issue you’re about to raise has already been addressed. If you’re encountering an issue, there’s a good chance that someone else did too, and the answer you need might be in a previous issue, or in the documentation. It saves you time by getting the answer without waiting for someone to respond, and it doesn’t require a maintainer to burn cycles sending you back to the docs anyways.</li> <li><strong>Bring data</strong> - Do your due diligence around gathering logs and error context - everything the maintainers might need to track down the root cause of an issue. Note that the <code class="highlighter-rouge">CONTRIBUTING.md</code> file (as well as potentially an <a href="https://github.com/blog/2111-issue-and-pull-request-templates">issue or PR template</a>) will usually enumerate the details they’ll have to ask you for anyways, so it’s good to have this going in, so you can jump right into fixing the problem, rather than going back and forth for a few days just on data gathering.</li> </ul> <p>Here’s what <strong>TO</strong> open an issue for:</p> <ul> <li><strong>Asking for help</strong> - You can use issues to ask for help with certain conditions. The docs and previous issues exist for a reason, so don’t open an issue for help unless you have already followed my previous advice and have already exhausted existing resources. Assuming you’ve done this, this is a great way for maintainers to identify blind spots in their docs, so be ready to elaborate on what you’re looking for so that they can add to their documentation.</li> <li><strong>Bug reports</strong> - If you suspect a certain behavior is a bug, make sure you capture relevant data, and present it openly. It may not be a bug, so be prepared for that.</li> <li><strong>Feature requests</strong> - Focus on adequately describing your use case, rather than jump to suggesting a solution. Those more familiar with the project will give their perspective on the appropriate solution to match your use case.</li> </ul> <blockquote> <p>The Github UI has a few interesting tools to the right, such as labels and assignments for an issue. In general, stay away from using these - the maintainers will typically have their own triage process, and will assign resources and labels when appropriate.</p> </blockquote> <p>Here’s what <strong>NOT</strong> to open an issue for:</p> <ul> <li><strong>Opinions (negative or otherwise)</strong> - Issues should generally be actionable, and able to be closed via a PR. There are times when issues are an appropriate venue for long-form discussion, but be sure this applies to the project in question before using Issues in this way. Most projects have other communication methods for open-ended discussions, like IRC or Slack, and you should be prepared to participate there as well. Usually such resources can be found in the <code class="highlighter-rouge">CONTRIBUTING.md</code> file or sometimes the <code class="highlighter-rouge">README.md</code> file.</li> </ul> <p>Assuming you’ve followed the previous points, you may get the response you were hoping for. Or, you may get a response you didn’t expect, such as:</p> <ol> <li>“That doesn’t really fit with the project, so the answer is no”</li> <li>“We like the idea but don’t have cycles to work on this ourselves, so feel free to open a PR”</li> <li>“You may be going about this the wrong way, here’s another approach you may not have considered.”</li> </ol> <p>You should be ready for any of these. It’s all part of the flow. Open source tends to be very much about code, about results, not about giving one particular user their way at the expense of the direction of the project - so be ready to have your perspective changed. Make your case based on the data you have, but be prepared to receive new information that might make things different for you.</p> <p>This is a blessing and a curse - it requires a bit more mental work, but this is all <strong>very</strong> different from the traditional vendor-led technology discussions, which most customers aren’t able to participate in, certainly not to this level of depth.</p> <h1 id="contribute-back">Contribute Back</h1> <p>If you follow my advice, and staff your team appropriately, this won’t be hard. Just simply by operating the software, you’ll inevitably start finding your own bugs, or even fixing them. Or maybe you’re just trying to get your feet wet - most repos have a backlog of Issues like bug reports and the like, and can serve as a great source of inspiration for making some of your first contributions to the project.</p> <p>Easily one of, if not the most valuable technical skills you can have for contributing to open source is understanding <a href="https://git-scm.com/">Git</a>. Git is a distributed version control system in use by the biggest open source projects in existence, including the Linux kernel itself. It has become the “lingua franca” of contributing to open source. There are <a href="https://try.github.io/levels/1/challenges/1">numerous tutorials</a> out there for this as a result. For getting started with open source, you should know the basics. Now how to work with a repo, such as clone, push/pull, add/commit, etc. You should understand what branching does.</p> <blockquote> <p>Shameless plug: we have a whole chapter dedicated to version control - almost totally focused on Git in the hopefully-soon-to-be-released <a href="http://shop.oreilly.com/product/0636920042082.do">Network Programmability and Automation</a> book.</p> </blockquote> <p>As mentioned before, Github is a popular platform for collaborating over open source software. Github is one of the most popular SaaS platforms for publicly hosting source code, and as the name implies, it’s built around Git. So, in addition to knowing Git fundamentals, we should also understand how to continue on and use these fundamentals to interact with the Github workflow.</p> <p>The general workflow for contributing to a repo is via a “Pull Request”. In effect, this is a way of saying “Hey maintainer - I’ve made this change in my own version of your repository, could you please <strong>pull</strong> it into the main one, so that it’s part of the software going forward?</p> <p>Each Github repository has a “fork” button near the top. This is just a handy way of making a copy of a given repo that you can make changes to directly. Once you’ve done this, you can then open a PR to “sync” the two copies back up.</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2017/12/fork.png"><img src="http://keepingitclassless.net/assets/2017/12/fork.png" width="500" /></a></div> <p>A <strong>highly</strong> abbreviated list of steps for this workflow is below:</p> <ol> <li>Fork the repo you intend to contribute to. It will ask you where you want to make the copy - doing this under your username is fine.</li> <li>Use Git to interact with your fork/copy of the repo. Clone it to your local system, and make the changes. Use <code class="highlighter-rouge">git add</code> and <code class="highlighter-rouge">git commit</code> commands for this. Then use <code class="highlighter-rouge">git push</code> to push those commits to your fork.</li> <li>Github has a really cool feature for detecting when you’ve recently pushed changes to your fork, so if you go to the main repo within a few minutes, it should prompt you to create a PR. If not, you can to to the “Pull Requests” tab to select the target repo/branch and create a PR from there.</li> </ol> <p>Once this is done and the PR is completed, you’ll probably get asked some additional questions, and maybe some additional commits will be required. This is normal - just part of the process. After this process, the maintainers may “approve” the PR, and/or merge it into the target branch (i.e. <code class="highlighter-rouge">master</code>).</p> <p>Try to focus on small, frequent PRs, rather than infrequent, huge ones - especially when getting started. You don’t want to spend 3 weeks on a massive change, only to get feedback that it’s not desired or wanted after all that hard work. Seek feedback before doing a ton of work. You also don’t need to be “finished” with your change to open a PR. It’s not uncommon to make a small change to prototype something, and open a PR before you’re sure it’s a valid approach or before you’ve written tests for the change, all for the purpose of gathering feedback before spending more time on it. Usually projects will have a “WIP” label, or you can just say that you’re not quite finished in the PR description. This is usually not only acceptable, but expected and appreciated.</p> <p>Some tips for contributing to a project:</p> <ul> <li><strong>Work from the public project</strong> - Don’t fork off permanently and make all your changes privately, behind your firewall. Your bugfixes or enhancements to an open source project are almost certainly not core to your organization’s value proposition. Don’t hoard these and try to maintain your own fork. Just make everything public. There’s no reason to keep most things private, and it will only help to increase your personal value, as you’ll have public contributions to refer to.</li> <li><strong>Start small</strong> - Most project maintainers welcome PRs, but there’s some relationship building that will go a long way here. Frequent, small PRs as opposed to huge, difficult to review PRs will help the maintainer learn your skills and style, and gain confidence you know what you’re doing. It will also help get your contributions merged in a timely fashion.</li> <li><strong>Commit early, and often</strong> - Try to keep changes succinct, and don’t be afraid to push your changes early and seek feedback on them, even if you’re not finished. Most projects appreciate you marking PRs with “WIP” or something like that to indicate this.</li> </ul> <p>Finally, any of the responses that I mentioned in the previous section about Issues are also possible with PRs. Be prepared to defend the changes you’ve made, or change your mind about the approach. Again, most maintainers are just trying to keep the project moving forward, and they have a lot of experience with the project, and will help guide you to a solution that works for everyone. Be flexible. Again, smaller PRs will help prevent ugly situations where you’ve silently worked on a PR for 3 weeks but get “shut down” because it wasn’t needed/wanted. Like most things, it’s all about proactive communication.</p> <h1 id="open-source-is-people">Open Source is People!</h1> <p>There are generally two types of open source projects:</p> <ul> <li>Small, individual-led projects that are created out of passion to solve a particular problem</li> <li>Medium-to-large projects that have corporate backing, usually as a strategic initiative.</li> </ul> <p>In both cases, every open source project is powered by people like you and I. Even people that are paid to work on a project, usually are doing so because they are passionate about the open source community and are driven by a desire to help other technology professionals. Working in open source carries its own set of challenges, so usually they’re not in it to be supreme overlords to cut you down, but rather interested in fostering a community of diverse perspectives, including yours.</p> <p>Software development, including open source, tends to give some folks a culture shock at first, since it’s so much about code, and about working solutions. There’s no room for hyperbole, it either works or it doesn’t. So if you’re not accustomed to this culture, know that the person on the other side of a seemingly bad PR review isn’t “out to get you”. Most of the time they’re just being factual. Try to learn what they’re trying to teach you, and remain open to new ways of doing things.</p> <p>So just remember, there are human beings on the other side of the screen, and while it’s sadly true that there are always bad apples present in all areas of technology, the vast majority just want to build something cool, and work with smart people that give a shit about what they’re doing. By going out of your way to contribute to open source, you’re proving you fit this description, so just focus on jiving with the project and you’ll do fine.</p> <h1 id="conclusion">Conclusion</h1> <p>If I could sum this post up with one bit of advice, it’s this: stop sitting on the sidelines, and jump in. Regardless of your background, and regardless of your type of contribution, adding open source to your resume is a huge deal these days. You don’t have to pay to participate; you don’t even have to know how to write code in most cases - many projects will glady accept docs improvements and the like. There’s really no excuse for not getting started.</p> <p>I hope you all have a Merry Christmas, and a great holiday season overall. Spend time with your families, and when there’s a little downtime (maybe when your family is napping from all the delicious food), consider poking around Github and getting involved with a project.</p> Wed, 20 Dec 2017 00:00:00 +0000 http://keepingitclassless.net/2017/12/a-guide-open-source-it-practitioners/ http://keepingitclassless.net/2017/12/a-guide-open-source-it-practitioners/ StackStorm Architecture Part I - StackStorm Core Services <p>A while ago, I wrote about <a href="https://keepingitclassless.net/2016/12/introduction-to-stackstorm/">basic concepts in StackStorm</a>. Since then I’ve been knee-deep in the code, fixing bugs and creating new features, and I’ve learned a lot about how StackStorm is put together.</p> <p>In this series, I’d like to spend some time exploring the StackStorm architecture. What subcomponents make up StackStorm? How do they interact? How can we scale StackStorm? These are all questions that come up from time to time in the StackStorm community, and there are a lot of little details that I even forget from time-to-time. I’ll be doing this in a series of posts, so we can explore a particular topic in detail without getting overwhelmed.</p> <p>Also, it’s worth noting that this isn’t intended to be an exhaustive reference for StackStorm’s architecture. The best place for that is still the <a href="https://docs.stackstorm.com/">StackStorm documentation</a>. My goal in this series is merely to give a little bit of additional insight into StackStorm’s inner workings, and hopefully get those curiosity juices flowing. There will be some code references, some systems-level insight, probably both.</p> <blockquote> <p>Also note that this is a <em>living document</em>. This is an open source project under active development, and while I will try to keep specific references to a minimum, it’s possible that some of the information below will become outdated. Feel free to comment and let me know, and I’ll update things as necessary.</p> </blockquote> <p>Here are some useful links to follow along - this post mainly focuses on the content there, and elaborates:</p> <ul> <li><a href="https://docs.stackstorm.com/install/overview.html">High-Level Overview</a></li> <li><a href="https://docs.stackstorm.com/reference/ha.html">StackStorm High-Availability Deployment Guide</a></li> <li><a href="https://docs.stackstorm.com/development/code_structure.html">Code Structure for Various Components in “st2” repo</a></li> </ul> <h2 id="stackstorm-high-level-architecture">StackStorm High-Level Architecture</h2> <p>Before diving into the individual StackStorm services, it’s important to start at the top; what does StackStorm look like when you initially lift the hood?</p> <p>The best place to start for this is the <a href="https://docs.stackstorm.com/overview.html">StackStorm Overview</a>, where StackStorm concepts and a very high-level walkthrough of how the components interact is shown. In addition, the <a href="https://docs.stackstorm.com/reference/ha.html">High-Availability Deployment Guide</a> (which you should absolutely read if you’re serious about deploying StackStorm) contains a much more detailed diagram, showing the actual, individual process that make up a running StackStorm instance:</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2017/04/services.png"><img src="http://keepingitclassless.net/assets/2017/04/services.png" width="500" /></a></div> <blockquote> <p>It would be a good idea to keep this diagram open in another tab while you read on, to understand where each service fits in the cohesive whole that is StackStorm</p> </blockquote> <p>As you can see, there’s not really a “StackStorm server”. StackStorm is actually comprised of multiple microservices, each of which has a very specific job to do. Many of these services communicate with each other over RabbitMQ, for instance, to let each other know when they need to perform some task. Some services also write to a database of some kind for persistence or auditing purposes. The specifics involved with these usages will become more obvious as we explore each service in detail.</p> <h2 id="stackstorm-services">StackStorm Services</h2> <p>Now, we’ll dive in to each service individually. Note that each service runs as its own separate process, and nearly all of them can have multiple running copies of themselves on the same machine, or even multiple machines. Refer to the <a href="https://docs.stackstorm.com/reference/ha.html">StackStorm High-Availability Deployment Guide</a> for more details on this.</p> <p>Again, the purpose of this post is to explore each service individually to better understand them, but remember that they must all work together to make StackStorm work. It may be useful to keep the diagram(s) above open in a separate tab, to keep the big picture in mind.</p> <p>We’ll be looking at things from a systems perspective as well as a bit of the code, where it makes sense. My primary motivation for this post is to document the “gist” of how each service is implemented, to give you a head start on understanding them if you wish to either know how they work, or contribute to them. Selfishly, I’d love it if such a reference existed for my own benefit, so I’m writing it.</p> <h3 id="st2actionrunner">st2actionrunner</h3> <p>We start off by looking at <a href="https://docs.stackstorm.com/reference/ha.html#st2actionrunner"><code class="highlighter-rouge">st2actionrunner</code></a> because, like the Actions that run inside them, it’s probably the most relatable component for those that have automation experience, but are new to StackStorm or event-driven automation in general.</p> <p><code class="highlighter-rouge">st2actionrunner</code> is responsible for receiving execution (an instance of a running action) instructions, scheduling and executing those executions. If you dig into the <code class="highlighter-rouge">st2actionrunner</code> code a bit, you can see that it’s powered by two subcomponents: a <a href="https://github.com/StackStorm/st2/blob/master/st2actions/st2actions/scheduler.py">scheduler</a>, and a <a href="https://github.com/StackStorm/st2/blob/master/st2actions/st2actions/worker.py">dispatcher</a>. The scheduler receives requests for new executions off of the message queue, and works out the details of when and how this action should be run. For instance, there might be a policy in place that is preventing the action from running until a few other executions finish up. Once an execution is scheduled, it is passed to the dispatcher, which actually runs the action with the provided parameters, and retrieves the resulting output.</p> <blockquote> <p>You may have also heard the term “runners” in reference to StackStorm actions. In short, you can think of these kind of like “base classes” for Actions. For instance I might have an action that executes a Python script; this action will use the <code class="highlighter-rouge">run-python</code> runner, because that runner contains all of the repetitive infrastructure needed by all Python-based Actions. Please do not confuse this term with the <code class="highlighter-rouge">st2actionrunner</code> service; <code class="highlighter-rouge">st2actionrunner</code> is a running process for running all Actions, and a “runner” is a Python base class to declare some common foundation for an Action to use. In fact, <code class="highlighter-rouge">st2actionrunner</code> is indeed <a href="https://github.com/StackStorm/st2/blob/master/st2actions/st2actions/container/base.py">responsible for handing off execution details to the runner</a>, whether it’s a Python runner, a shell script runner, etc.</p> </blockquote> <p>As shown in the component diagram, <code class="highlighter-rouge">st2actionrunner</code> communicates with both RabbitMQ, as well as the database (which, at this time is MongoDB). RabbitMQ is used to deliver incoming execution requests to the scheduler, and also so the scheduler can forward scheduled executions to the dispatcher. Both of these subcomponents update the database with execution history and status.</p> <h3 id="st2sensorcontainer">st2sensorcontainer</h3> <p>The job of the <code class="highlighter-rouge">st2sensorcontainer</code> service is to execute and manage the Sensors that have been installed and enabled within StackStorm. The name of the game here is to simply provide underlying infrastructure for running these Sensors, as much of the logic for how the Sensor itself works is done within that code. This includes dispatching Trigger Instances when a meaningful event has occurred. <code class="highlighter-rouge">st2sensorcontainer</code> just maintains awareness of what Sensors are installed and enabled, and does its best to keep them running.</p> <p>The <a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/container/manager.py">sensor manager</a> is responsible for kicking off all the logic of managing various sensors within <code class="highlighter-rouge">st2sensorcontainer</code>. To do this, it leverages two subcomponents:</p> <ul> <li><a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/container/process_container.py">process container</a>: Manages the processes actually executing Sensor code</li> <li><a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/services/sensor_watcher.py">sensor watcher</a>: Watches for Sensor Create/Update/Delete events</li> </ul> <h4 id="sensors---process-container">Sensors - Process Container</h4> <p>The process container is responsible for running and managing the processes that execute Sensor code. If you look at the <a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/container/process_container.py">process container</a> code, you’ll see a <code class="highlighter-rouge">_spawn_sensor_process</code> actually kicks off a <code class="highlighter-rouge">subprocess.Popen</code> call to execute a <a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/container/sensor_wrapper.py">“wrapper” script</a>:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~$ st2 sensor list +-----------------------+-------+-------------------------------------------+---------+ | ref | pack | description | enabled | +-----------------------+-------+-------------------------------------------+---------+ | linux.FileWatchSensor | linux | Sensor which monitors files for new lines | True | +-----------------------+-------+-------------------------------------------+---------+ ~$ ps --sort -rss -eo command | grep sensor_wrapper /opt/stackstorm/st2/bin/python /opt/stackstorm/st2/local/lib/python2.7/site-packages/st2reactor/container/sensor_wrapper.py --pack=linux --file-path=/opt/stackstorm/packs/linux/sensors/file_watch_sensor.py --class-name=FileWatchSensor --trigger-type-refs=linux.file_watch.line --parent-args=["--config-file", "/etc/st2/st2.conf"] </code></pre></div></div> <p>This means that each individual sensor runs as its own separate process. The usage of the wrapper script enables this, and it also provides a lot of the “behind the scenes” work that Sensors rely on, such as dispatching trigger instances, or retrieving pack configuration information. So, the process container’s job is to spawn instances of this wrapper script, with arguments set to the values they need to be in order to run specific Sensor code in packs.</p> <h4 id="sensors---watcher">Sensors - Watcher</h4> <p>We also mentioned another subcomponent for <code class="highlighter-rouge">st2sensorcontainer</code> and that is the “sensor watcher”. This subcomponent watches for Sensors to be installed, changed, or removed from StackStorm, and updates the process container accordingly. For instance, if we install the <a href="https://github.com/StackStorm-Exchange/stackstorm-slack"><code class="highlighter-rouge">slack</code></a> pack, the <a href="https://github.com/StackStorm-Exchange/stackstorm-slack/blob/master/sensors/slack_sensor.yaml"><code class="highlighter-rouge">SlackSensor</code></a> will need to be run automatically, since it’s enabled by default.</p> <p>The sensor watcher subscribes to the message queue and listens for incoming messages that indicate such a change has taken place. In the <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/services/sensor_watcher.py">watcher code</a>, a handler function is referenced for each event (create/update/delete). So, the watcher listens for incoming messages, and calls the relevant function based on the message type. By the way, those functions are defined back in the <a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/container/manager.py">sensor manager</a>, where it has has access to instruct the process container to make the relevant changes.</p> <p>That explains how CUD events are handled, but where do these events originate? When we install the <code class="highlighter-rouge">slack</code> pack, or run the <code class="highlighter-rouge">st2ctl reload</code> command, some <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/bootstrap/sensorsregistrar.py">bootstrapping code</a> is executed, which is responsible for updating the database, as well as publishing messages to the message queue, to which the sensor watcher is subscribed.</p> <h3 id="st2rulesengine">st2rulesengine</h3> <p>While <code class="highlighter-rouge">st2rulesengine</code> might be considered one of the simpler services in StackStorm, its job is the most crucial. It is here that the entire premise of event-driven automation is made manifest.</p> <p>For an engaging primer on rules engines in general, I’d advise listening to <a href="http://www.se-radio.net/2017/08/se-radio-episode-299-edson-tirelli-on-rules-engines/">Sofware Engineering Radio Episode 299</a>. I had already been working with StackStorm for a while when I first listened to that so I was generally familiar with the concept, but it was nice to get a generic perspective that explored some of the theory behind rules engines.</p> <p>Remember my earlier post on <a href="https://keepingitclassless.net/2016/12/introduction-to-stackstorm/">StackStorm concepts</a>? In it, I briefly touched on Triggers - these are definitions of an “event” that may by actionable. For instance, when someone posts a tweet that matches a search we’ve configured, the Twitter sensor may use the <code class="highlighter-rouge">twitter.matched_tweet</code> trigger to notify us of that event. A specific instance of that trigger being raised is known creatively as a “trigger instance”.</p> <p>In short, StackStorm’s rules engine looks for incoming trigger instances, and decides if an Action needs to be executed. It makes this decision based on the rules that are currently installed and enabled from the various packs that are currently present in the database.</p> <p>As is common with most other StackStorm services, the logic of this service is contained within a <a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/rules/worker.py">“worker”</a>, using a handy Python base class which centralizes the receipt of messages from the message queue, and allows the rules engine to focus on just dealing with incoming trigger instances.</p> <p>The <a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/rules/engine.py">engine itself</a> is actually quite straightforward:</p> <ol> <li>Receive trigger instance from message queue</li> <li>Determine which rule(s) match the incoming trigger instance</li> <li>Enforce the consequences from the rule definition (usually, executing an Action)</li> </ol> <blockquote> <p>The <a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/rules/matcher.py">rules matcher</a> and <a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/rules/enforcer.py">enforcer</a> are useful bits of code for understanding how these tasks are performed in StackStorm. Again, while the work of the rules engine in StackStorm is crucial, the code involved is fairly easy to understand.</p> </blockquote> <p>Finally, StackStorm offers some built-in triggers that allow you to trigger an Action based on some passage of time:</p> <ul> <li><code class="highlighter-rouge">core.st2.IntervalTimer</code> - trigger after a set interval of time</li> <li><code class="highlighter-rouge">core.st2.DateTimer</code> - trigger on a certain date/time</li> <li><code class="highlighter-rouge">core.st2.CronTimer</code> - trigger whenever current time matches the specified time constraints</li> </ul> <p>Upon start, <code class="highlighter-rouge">st2rulesengine</code> threads off <a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/timer/base.py">a bit of code</a> dedicated to firing these triggers at the appropriate time.</p> <p><code class="highlighter-rouge">st2rulesengine</code> needs access to RabbitMQ to receive trigger instances and send a request to execute an Action. It also needs access to MongoDB to retrieve the rules that are currently installed.</p> <h3 id="st2api">st2api</h3> <p>If you’ve worked with StackStorm at all (and since you’re still reading, I’ll assume you have), you know that StackStorm has an API. External components, such as the CLI client, the Web UI, as well as third-party systems all use this API to interact with StackStorm.</p> <p>An interesting and roughly accurate way of viewing <code class="highlighter-rouge">st2api</code> is that it “translates” incoming API calls into RabbitMQ messages and database interactions. What’s meant by this is that incoming API requests are usually aimed at either retrieving data, pushing new data, or executing some kind of action with StackStorm. All of these things are done on other running processes; for instance, <code class="highlighter-rouge">st2actionrunner</code> is responsible for actually executing a running action, and it receives those requests over RabbitMQ. So, <code class="highlighter-rouge">st2api</code> must initially receive such instructions via it’s API, and forward that request along via RabbitMQ. Let’s discuss how that actually works.</p> <blockquote> <p>The 2.3 release changed a lot of the underlying infrastructure for the StackStorm API. The API itself isn’t changing (still at v1) for this release, but the way that the API is described within <code class="highlighter-rouge">st2api</code>, and how incoming requests are routed to function calls has changed a bit. Everything we’ll discuss in this section will reflect these changes. Pleaes review <a href="https://github.com/StackStorm/st2/issues/2686">this issue</a> and <a href="https://github.com/StackStorm/st2/pull/2727">this PR</a> for a bit of insight into the history of this change.</p> </blockquote> <p>The way the API itself actually works requires its own blog post for a proper exploration. For now, suffice it to say that StackStorm’s API is defined with the <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/openapi.yaml">OpenAPI specification</a>. Using this definition, each endpoint is linked to an API controller function that actually provides the implementation for this endpoint. These functions may write to a database, they may send a message over the message queue, or they may do both. Whatever’s needed in order to implement the functionality offered by that API endpoint is performed within that function.</p> <p>For the purposes of this post however, let’s talk briefly about how this API is actually served from a systems perspective. Obviously, regardless of how the API is implemented, it will have to be served by some kind of HTTP server.</p> <blockquote> <p>Note that in a production-quality deployment of StackStorm, the API is front-ended by nginx. We’ll be talking about the nginx configuration in another post, so we’ll not be discussing it here. But it’s important to keep this in mind.</p> </blockquote> <p>We can use this handy command, filtered through <code class="highlighter-rouge">grep</code> to see exactly what command was used to instantiate the <code class="highlighter-rouge">st2api</code> process.</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~$ ps --sort -rss -eo command | head | grep st2api /opt/stackstorm/st2/bin/python /opt/stackstorm/st2/bin/gunicorn st2api.wsgi:application -k eventlet -b 127.0.0.1:9101 --workers 1 --threads 1 --graceful-timeout 10 --timeout 30 </code></pre></div></div> <p>As you can see, it’s running on Python, like most StackStorm components. Note that this is the distribution of Python in the StackStorm virtualenv, so anything run with this Python binary will already have all of its pypi dependencies satisfied - these are installed with the rest of StackStorm.</p> <p>The second argument - <code class="highlighter-rouge">/opt/stackstorm/st2/bin/gunicorn</code> - shows that <a href="http://gunicorn.org/">Gunicorn</a> is running the API application. Gunicorn is a WSGI HTTP server. it’s used to serve StackStorm’s API as well as a few other components we’ll explore later. You’ll notice that for <code class="highlighter-rouge">st2api</code>, the third positional argument is <a href="https://github.com/StackStorm/st2/blob/master/st2api/st2api/wsgi.py">actually a reference to a Python variable</a> (remember that this is running from StackStorm’s Python virtualenv, so this works). Looking at <a href="https://github.com/StackStorm/st2/blob/master/st2api/st2api/wsgi.py">the code</a> we can see that this variable is the result of a call out to the setup task for the <a href="https://github.com/StackStorm/st2/blob/master/st2api/st2api/app.py">primary API application</a>, which is where the aforementioned OpenAPI spec is loaded and rendered into actionable HTTP endpoints.</p> <p>You may also be wondering how <code class="highlighter-rouge">st2api</code> serves <a href="https://docs.stackstorm.com/webhooks.html">webhooks</a>. There’s an endpoint for webhooks at <code class="highlighter-rouge">/webhooks</code> of course, but how does <code class="highlighter-rouge">st2api</code> know that a rule has registered a new webhook? This is actually not that different from what we saw earlier with Sensors, when the sensor container is made aware of a new sensor being registered. In this case, <code class="highlighter-rouge">st2api</code> leverages a <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/services/triggerwatcher.py">TriggerWatcher</a> class which is made aware of new triggers being referenced from rules, and calls the appropriate event handler functions in the <code class="highlighter-rouge">st2api</code> controller. Those functions add or remove webhook entries from the <code class="highlighter-rouge">HooksHolder</code> instance, so whenever a new request comes in to the <code class="highlighter-rouge">/webhooks</code> endpoint, <code class="highlighter-rouge">st2api</code> knows to check this <code class="highlighter-rouge">HooksHolder</code> for the appropriate trigger to dispatch.</p> <h3 id="st2auth">st2auth</h3> <p>Take a look at StackStorm’s <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/openapi.yaml">API definition</a> and search for “st2auth” and you can see that the authentication endpoints are defined alongside the rest of the API.</p> <p><code class="highlighter-rouge">st2auth</code> is executed in almost exactly the same way as <code class="highlighter-rouge">st2api</code>. Gunicorn is the HTTP WSGI server, executed within the Python virtualenv in StackStorm:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~$ ps --sort -rss -eo command | head | grep st2auth /opt/stackstorm/st2/bin/python /opt/stackstorm/st2/bin/gunicorn st2auth.wsgi:application -k eventlet -b 127.0.0.1:9100 --workers 1 --threads 1 --graceful-timeout 10 --timeout 30 </code></pre></div></div> <p><code class="highlighter-rouge">st2api</code> defines <a href="https://github.com/StackStorm/st2/blob/master/st2auth/st2auth/app.py">its own WSGI application</a> to run under Gunicorn.</p> <blockquote> <p>If you’re like me, you might have looked at the <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/openapi.yaml">OpenAPI definition</a> and noticed that <code class="highlighter-rouge">st2api</code>’s endpoints are mixed in with the regular API endpoints. At the time of this writing, the two are kept separate when the spec is loaded by either component by none other than…regular expressions! If you look at <a href="https://github.com/StackStorm/st2/blob/master/st2auth/st2auth/app.py"><code class="highlighter-rouge">st2api</code>’s app definition</a>, you’ll notice a few transformations are passed to the <code class="highlighter-rouge">router.add_spec</code> function. Among other things, these are used within the <code class="highlighter-rouge">add_spec</code> to determine which endpoints to associate with this application.</p> </blockquote> <p>The <a href="https://github.com/StackStorm/st2/blob/master/st2auth/st2auth/controllers/v1/auth.py">API controller</a> for <code class="highlighter-rouge">st2api</code> is relatively simple, and provides implementations for the two endpoints:</p> <ol> <li>Token Validation</li> <li>Authentication and Token Allocation</li> </ol> <p>As you can see, <code class="highlighter-rouge">st2auth</code> is fairly simple. We already learned the basics of how WSGI applications are run with Gunicorn in StackStorm when we explored <code class="highlighter-rouge">st2api</code>, and <code class="highlighter-rouge">st2auth</code> is quite similar: just with different endpoints and back-end implementations.</p> <h3 id="st2resultstracker">st2resultstracker</h3> <p>Due to the available options for running <a href="https://docs.stackstorm.com/workflows.html">Workflows</a> in StackStorm, sometimes workflow executions happen outside the scope of StackStorm’s domain. For instance, to run Mistral workflows, StackStorm must interact exclusively through Mistral’s API. As a result, after the workflow is executed, StackStorm needs to continue to poll this API for the results of that workflow, in order to update the local StackStorm copy of those executions in the database. Interestingly, the <a href="https://docs.stackstorm.com/troubleshooting/mistral.html#troubleshooting-mistral-workflow-completion-latency">Mistral troubleshooting doc</a> contains some useful information about this process.</p> <blockquote> <p>A better architectural approach would be to implement callbacks in workflow engines like Mistral that push result updates to subscribers, rather than have StackStorm periodically poll the API. There are a number of <a href="https://review.openstack.org/#/c/455083/">existing proposals</a> for doing this, and hopefully in the next few release cycles, this will be implemented, making <code class="highlighter-rouge">st2resultstracker</code> unnecessary.</p> </blockquote> <p>The end-goal here is to provide the results of a Workflow execution in StackStorm, rather than forcing users to go somewhere else for that information.</p> <p><code class="highlighter-rouge">st2resultstracker</code> runs as its own standalone process. When a workflow is executed, it consumes a message from a special queue (note the <code class="highlighter-rouge">get_tracker</code> function in <a href="https://github.com/StackStorm/st2/blob/master/st2actions/st2actions/resultstracker/resultstracker.py">resultstracker.py</a>). That message follows a <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/models/db/executionstate.py">database model</a> focused on tracking execution state, and contains the parameter <code class="highlighter-rouge">query_module</code>. If the execution is a Mistral workflow, this will be set to <code class="highlighter-rouge">mistral_v2</code>, which causes <code class="highlighter-rouge">st2resultstracker</code> to load the <a href="https://github.com/StackStorm/st2/blob/master/contrib/runners/mistral_v2/query/mistral_v2.py">mistral-specific querier</a>. That querier contains all of the code necessary for interacting with Mistral to receive results information. <code class="highlighter-rouge">st2resultstracker</code> uses this module to query Mistral and place the results in the StackStorm database.</p> <h3 id="st2notifier">st2notifier</h3> <p>The primary role of <code class="highlighter-rouge">st2notifier</code> is to provide an integration point for <a href="https://docs.stackstorm.com/chatops/notifications.html">notifying</a> external systems that an action has completed. <a href="https://docs.stackstorm.com/chatops/chatops.html">Chatops</a> is a big use case for this, but there are others.</p> <p>At the time of this writing, <code class="highlighter-rouge">st2notifier</code> serves two main purposes:</p> <ul> <li>Generate <code class="highlighter-rouge">st2.core.actiontrigger</code> and <code class="highlighter-rouge">st2.core.notifytrigger</code> triggers based on the completion and runtime parameters of an Action execution.</li> <li>Act as a backup scheduler for actions that may not have been scheduled - i.e., delayed by policy.</li> </ul> <p><code class="highlighter-rouge">st2notifier</code> dispatches two types of triggers. The first, <code class="highlighter-rouge">st2.core.actiontrigger</code> is fired for each completed execution. This is enabled by default, so you can hit the ground running by writing a rule to consume this trigger and notify external systems like Slack or JIRA when an action is completed. The second trigger, <code class="highlighter-rouge">st2.core.notifytrigger</code> is more action-specific. As mentioned in the <a href="https://docs.stackstorm.com/chatops/notifications.html">Notification</a> documentation, you can add a <code class="highlighter-rouge">notify</code> section to your Action metadata. If this section is present, <code class="highlighter-rouge">st2notifier</code> will also dispatch a <code class="highlighter-rouge">notifytrigger</code> for each route specified in the <code class="highlighter-rouge">notify</code> section. You can consume these triggers with rules and publish according to the routing information inside that section.</p> <p>If you look at the <a href="https://github.com/StackStorm/st2/blob/master/st2actions/st2actions/notifier/notifier.py">notifier implementation</a>, you can see the familiar message queue subscription logic at the bottom (see <code class="highlighter-rouge">get_notifier</code> function). <code class="highlighter-rouge">st2notifier</code> receives messages from the queue so that the <code class="highlighter-rouge">process</code> function is kicked off when action executions complete. From there, the logic is straightforward; the <code class="highlighter-rouge">actiontrigger</code> fires for each action (provided the config option is still enabled), and <code class="highlighter-rouge">notifytrigger</code> is fired based on the <code class="highlighter-rouge">notify</code> field in the <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/models/db/liveaction.py">LiveActionDB</a> sent over the message queue.</p> <p><code class="highlighter-rouge">st2notifier</code> also acts as a <a href="https://github.com/StackStorm/st2/blob/master/st2actions/st2actions/notifier/scheduler.py">rescheduler</a> for Actions that have been delayed, for instance, because of a <a href="https://docs.stackstorm.com/reference/policies.html#concurrency">concurrency policy</a>. Based on the configuration, <code class="highlighter-rouge">st2notifier</code> can attempt to reschedule executions that have been delayed past a certain time threshold.</p> <h3 id="st2garbagecollector">st2garbagecollector</h3> <p><code class="highlighter-rouge">st2garbagecollector</code> is a relatively simple service aimed at providing garbage collection services for things like action executions and trigger-instances. For some high-activity deployments of StackStorm, it may be useful to delete executions after a certain amount of time, rather than continue to keep them around forever, eating up system resources.</p> <blockquote> <p>NOTE that this is “garbage collection” in the StackStorm sense, not at the language level (Python).</p> </blockquote> <p>Garbage collection is optional, and not enabled by default. You can enable this in the <code class="highlighter-rouge">garbagecollector</code> section of the <a href="https://github.com/StackStorm/st2/blob/master/conf/st2.conf.sample">StackStorm config</a>.</p> <p>The design of <code class="highlighter-rouge">st2garbagecollector</code> is straightforward. Runs as its own process, and executes the garbage collection functionality within an <a href="https://github.com/StackStorm/st2/blob/master/st2reactor/st2reactor/garbage_collector/base.py">eventlet</a> which performs collection in a loop. The interval is configurable. Both <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/garbage_collection/executions.py">executions</a> and <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/garbage_collection/trigger_instances.py">trigger instances</a> have collection functionality at the time of this writing.</p> <h3 id="st2stream">st2stream</h3> <p>The goal of <code class="highlighter-rouge">st2stream</code> is to provide an event stream to external components like the WebUI and Chatops (as well as third party software).</p> <p><code class="highlighter-rouge">st2stream</code> is the third and final service constructed as a <a href="https://github.com/StackStorm/st2/blob/master/st2stream/st2stream/app.py">WSGI application</a>. If you’ve read the section on <code class="highlighter-rouge">st2api</code> and <code class="highlighter-rouge">st2auth</code>, very little will be new to you here. Searching the <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/openapi.yaml">OpenAPI</a> spec for StackStorm’s API for <code class="highlighter-rouge">/stream</code> will lead to the one and only endpoint for this service.</p> <p>The documentation for this endpoint is <a href="https://github.com/StackStorm/st2docs/issues/550">a bit lacking at the moment</a> but you can get a sense for how it works with a simple <code class="highlighter-rouge">curl</code> call:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~$ curl http://127.0.0.1:9102/v1/stream event: st2.liveaction__create data: {"status": "requested", "start_timestamp": "2017-08-28T21:01:10.414877Z", "parameters": {"cmd": "date"}, "action_is_workflow": false, "runner_info": {}, "callback": {}, "result": {}, "context": {"user": "stanley"}, "action": "core.local", "id": "59a4849602ebd558f14a66d8"} ... </code></pre></div></div> <p>This will keep a connection open to <code class="highlighter-rouge">st2api</code> and events will stream into the console as events take place (I ran <code class="highlighter-rouge">st2 core.local date</code> command in a separate tab to produce this once I had subscribed to the stream).</p> <p>The <a href="https://github.com/StackStorm/st2/blob/master/st2stream/st2stream/controllers/v1/stream.py">controller</a> for this API endpoint is also fairly straightforward - it returns a response of type <code class="highlighter-rouge">text/event-stream</code>, which instructs the <a href="https://github.com/StackStorm/st2/blob/master/st2common/st2common/router.py">Router</a> to maintain this persistent connection so that events can be forward to the client.</p> <h2 id="conclusion">Conclusion</h2> <p>There are several external services like Mistral, RabbitMQ, NGINX, MongoDB, and Postgres that we explicitly didn’t cover in this post. They’re crucial for the operation of StackStorm, but better suited for a separate post in the near future.</p> <p>We also skipped covering one “core” service, <code class="highlighter-rouge">st2chatops</code>. This is an optional service (disabled by default until configured) that allows chatops integration in StackStorm. There’s a lot to talk about with respect to chatops on its own, so that will also be done in a separate post.</p> <p>For now, I hope this was a useful exploration into the services that make StackStorm work. Stay tuned for follow-up posts on specific topics that we glossed over for now.</p> Mon, 28 Aug 2017 00:00:00 +0000 http://keepingitclassless.net/2017/08/stackstorm-architecture-core-services/ http://keepingitclassless.net/2017/08/stackstorm-architecture-core-services/ Your Cheese Moved a Long Time Ago <p>I was recently on a panel at the <a href="https://www.meetup.com/Auto-Remediation-and-Event-Driven-Automation/">Event-Driven Automation Meetup</a> at LinkedIn in Sunnyvale, CA, and we all had a really good hour-long conversation about automation. What really made me happy was that nearly the entire conversation focused on bringing the same principles that companies like LinkedIn and Facebook use on their network to smaller organizations, making them practical for more widespread use.</p> <blockquote class="twitter-tweet tw-align-center" data-lang="en"><p lang="en" dir="ltr">Nina Mushiana of <a href="https://twitter.com/LinkedIn">@LinkedIn</a> says &quot;Anything that can be documented should be automated&quot;.<br />Great Auto-Remediation Meetup! <a href="https://t.co/l76U1IydjB">pic.twitter.com/l76U1IydjB</a></p>&mdash; StackStorm (@Stack_Storm) <a href="https://twitter.com/Stack_Storm/status/847664487620530177">March 31, 2017</a></blockquote> <script async="" src="//platform.twitter.com/widgets.js" charset="utf-8"></script> <p>One particular topic that came up was one I’ve struggled with for the past few years; What about Day 2 of network automation? So, we manage to write some Ansible playbooks to push configuration files to switches - what’s next? Often this question isn’t asked. I think the network automation conversation has progressed to the point where we should all start asking this question more often.</p> <p>I believe that the network engineering discipline is at a crossroads, and the workforce as a whole needs to make some changes and decisions in order to stay relevant. Those changes are all based on the following premise:</p> <blockquote> <p>The value of the network does not come from discrete nodes (like routers and switches - physical or virtual), or their configuration, but from the services they provide.</p> </blockquote> <p>If you’re just getting started down the path of following basic configuration management or infrastructure-as-code principles, <strong>that’s fantastic</strong>. This post is not meant to discourage you from doing that. Those things are great for 1-2 years in the future. This post focuses on year 3+ of the network automation journey.</p> <h1 id="your-cheese-has-moved">Your Cheese Has Moved</h1> <p>We’ve all heard the lamentations that come from server admins (<a href="https://keepingitclassless.net/2015/02/free-form-discussion-cleur/">throwback alert</a>) like “why does it take weeks to provision a new VLAN?”; I worked as a network and data center consultant for a number of years and I can tell you that these stories are true, and it gets much worse than that.</p> <p>As I’ve said before, what the sysadmin usually doesn’t know is all the activity that goes on behind the scenes to deliver that VLAN. Usually what they’re asking for is a new logical network, which isn’t just a tag on a switchport - it’s also adding a layer 3 interface, and potentially routing changes, edits to the firewall, a new load balancing configuration, and on and on and on. The network has traditionally provided a lot of these services, that the sysadmin took for granted.</p> <p>You might understand their frustration, but the reality is that the network engineer is trying hard just to provide these services and ensure they’re changing adequately for the applications that rely upon them. It also doesn’t help when processes like ITIL force such changes to take places every first weekend of the month at 2AM. This is a far cry from what the application teams and developers have come to expect, like response times of seconds or minutes, not weeks or months. But hey, those silly developers don’t know networking, so they can just deal with it, right?</p> <p>Yes, it can be tempting to make fun of some developers that can’t tell a frame from a packet. However, it may be useful to remember that a developer wrote the software in your router. Someone had to write the algorithms that power your load balancer. It is indeed possible that some software developers know networking - even better than most network engineers out there. Then, if you put them in the constantly-innovating culture of silicon valley that is always looking for a problem to solve, it’s inevitable; the arduous processes and inflexible tooling that has dominated networking for so long provided those developers and sysadmins with a problem to solve on a silver platter.</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2017/04/cheese.png"><img src="http://keepingitclassless.net/assets/2017/04/cheese.png" width="300" /></a></div> <p>And solve it they did. When x86 virtualization was really hitting the mainstream, network engineers didn’t really acknowledge the vSwitch. They wrote it off as “those server guys”. What about when we started routing in the host or hypervisor? I know a lot of people like to make fun of the whole <code class="highlighter-rouge">docker0</code> bridge/NAT thing. Those silly server people, right? Developers are spinning up haproxy instances for load balancing, and learning how to use iptables to secure their own infrastructure. On top of that, all of these network services are <strong>also being offered by AWS</strong> and are all in one nice dashboard and also totally programmable. Can you really blame the developer now? Put yourself in their shoes - if you were faced with an inflexible network infrastructure that your application depended on, and you had no control over it, how long would it take you to follow the shiny red ball over to Amazon where they make all those same network <em>services</em> totally abstract and API-controllable?</p> <p>So what’s happening here is that “those server guys” are basically running their own network at this point. We’ve clung to our black boxes, and our configuration files at the cost of <strong>losing control over the actual network services</strong>. The truth is, we need to play a lot of catch-up.</p> <blockquote> <p>I know what you’re thinking - there’s more to the network than the data center. But like it or not, the datacenter houses the applications, and the applications are where the business sees the value in IT. Applications and software development teams sit closer to the boss, and they’re learning how to manage network services pretty well on their own out of necessity.</p> </blockquote> <h1 id="getting-the-cheese-back">Getting the Cheese Back</h1> <p>Network automation is about so much more than merely solving a configuration management problem. If it was, this would all be a bit anticlimactic, wouldn’t it? Everyone would just learn Ansible/Salt/Puppet and be done with it.</p> <p>Network automation, just like all other forms, is about <strong>services integration</strong>. There aren’t “existing tools” for your legacy, internal applications. At some point <a href="https://keepingitclassless.net/2017/03/learn-programming-or-perish/">you’re going to have to write some code</a>, even if it’s an extension to an existing tool. It’s time to get over this aversion to dealing with even basic scripting, and start filling in the 20% of our workflows that can’t be addressed by a turnkey tool or product. To me, this is the next step of network automation - being able to fill in the gaps between historically air-gapped services to create an automated broader IT system.</p> <p>For instance - Kubernetes is an increasingly popular choice for those looking to deploy distributed applications (don’t make me say “cloud native”). It’s great at managing the entities (like pods) under it’s control, but it’s not meant to run everything meaningful to your business. If you’re running Kubernetes in your organization, it will have to run alongside a bunch of other stuff like OpenStack, vSphere, even mainframes. This is the reality of brownfield.</p> <p>As you might expect, all these systems need to work together, and we’ve historically “integrated” them by hand for a long time by looking at different areas of our technology stack, and “rendering” abstract concepts of desired state into implementation-specific commands and configurations. Just take networking as a specific example - a network engineer is the human manifestation of a cross platform orchestrator, seamlessly translating between Cisco and Juniper CLI syntaxes.</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2017/04/dr_garencieres.jpg"><img src="http://keepingitclassless.net/assets/2017/04/dr_garencieres.jpg" width="500" /></a></div> <p>So, to return to the main point; the network is now no longer the sole proprietor of network services - those are slowly but surely migrating into the realm of the sysadmin and software developer. How can we adapt to this? One way is to acknowledge that the new “network edge” is very blurred. No longer is there a physical demarcation like a switchport; rather, these services are being provided either directly adjacent, or even co-resident with the application.</p> <p>It’s actually a bit encouraging that this has happened. This change represents a huge opportunity for network engineers to gain more control over the network than they’ve ever had. Historically, these network services were hidden behind “value-add, differentiating features” like CLI syntax (insert sarcasm undertone here). In the new world these services are either taking place in open-source software, or are at least driven by well-designed, well-documented APIs. So, this new model is out there ready for us. We can take it, or lose it.</p> <h1 id="conclusion">Conclusion</h1> <p>The migration of network services out of the network itself was inevitable, but it’s absolutely not a death blow to the network engineer - it’s a huge opportunity to move forward in a big way. There’s a lot of work to do, but as <a href="https://keepingitclassless.net/2017/03/learn-programming-or-perish/">I wrote about last week</a>, the networking skill set is still sought after, and still needed in this new world.</p> <p><a href="http://info.interop.com/itx/2017/scheduler/session/fundamental-principles-of-automation">I’ll be speaking at Interop ITX</a> in Vegas next month, about this, and more related topics. If you want to talk about automation, or just geek out about beer or food, I’d love to chat with you.</p> Thu, 06 Apr 2017 00:00:00 +0000 http://keepingitclassless.net/2017/04/cheese-moved-long-time-ago/ http://keepingitclassless.net/2017/04/cheese-moved-long-time-ago/ Learn Programming or Perish(?) <p>I was honored to return to Packet Pushers for <a href="http://packetpushers.net/podcast/podcasts/show-332-dont-believe-programming-hype/">a discussion on programming skillsets in the networking industry</a>. I verbalized some thoughts there, but even 60 minutes isn’t enough for a conversation like this.</p> <p>To be clear, this post is written primarily to my followers in the networking industry, since that’s largely where this conversation is taking place.</p> <h1 id="scripting-is-not-programming">Scripting is NOT Programming</h1> <p>I want to put something to rest right now, and that is the conflation of scripting and software development. You may be hesitant to pick up any skills in this area because you feel like you have to boil the ocean in order to be effective, which is not true.</p> <p>As I briefly mention in the podcast, I spent the first 4 years or so of my career making networking my day job. Because of that, I picked up a lot of useful knowledge in this area. However, as I started to explore software, I realized that networking wasn’t something I wanted to do as a day job anymore, but I still greatly value the networking skillset I retain from this experience.</p> <p>Making this leap over 2 years ago revealed a multitude of subskills, fundamental knowledge, and daily responsibilities I simply wasn’t exposed to when I wasn’t doing this full time. Things I even take for granted now - like code review, automated testing, and computer science basics like algorithms. While I wouldn’t ever discourage anyone from learning these kinds of things, it is very understandable that a network engineer doesn’t deal with these things, because they go way beyond simple scripting.</p> <blockquote> <p>That said, you may run into challenges as your scripts become more complex. It may be useful to pair with someone that writes code for a living, and learn how to make your scripts more modular, scalable, and reusable.</p> </blockquote> <p>In short, don’t conflate <strong>skillset</strong> with <strong>occupation</strong>. Don’t feel like you have to boil the ocean in order to get started. You don’t have to become a programmer, but you should be able to write and maintain scripts using a modern language.</p> <h1 id="stop-talking-start-building">Stop Talking, Start Building</h1> <p>Hopefully the previous section drew a clear line between the <strong>skill</strong> of scripting and the <strong>occupation</strong> of software development, and that as a network engineer, you no more “need” to become a software developer than a car mechanic “needs” to become a heart surgeon. Now that this is out of the way, it’s time to have some real talk about this whole debate.</p> <p>One thing I’ve noticed since joining a team that has ties to just about every area of IT, including networking, is that other disciplines realized long ago that these skills are necessary for reasonably modern operations. There is no “should sysadmins learn code” discussions going on right now - they’ve all picked up Python, bash, or similar. It’s not a discussion of whether or not being able to augment their workflows with code is useful; it is assumed. Yet in networking we’re still debating this for some reason. It pains me when I hear perspectives that paint basic scripting skills as something that only engineers at Facebook or Google need to worry about, when other disciplines, even at smaller scale, simply assume this skillset exists in their operational model.</p> <p>Frankly, I am a bit disturbed that this is still so much of a discussion in networking. I worry that the vast majority of the industry is primarily interested in having their problems solved for them. This is something I observed about 3 years ago, and is a big reason I wanted to make a change in my own career - I didn’t feel like I was building anything, just operating something that someone else built. We alluded to this in the podcast - the industry seems to be trending away from “engineering”, and towards “administration”. Of course, this is a generalization. It’s obvious that the rather explosive growth of communities like <a href="http://networktocode.com/community/">“Network to Code”</a> are indicating at least some interest, but I worry that it’s not enough.</p> <p>There are only two possible conclusions that I can draw from my observations:</p> <ul> <li>People assume that in order to be useful, they have to learn everything a software developer has learned.</li> <li>The difference between software development and scripting is understood, but even scripting is viewed as something “only for Facebook or Google”.</li> </ul> <p>Hopefully the previous section sufficiently refuted the first point. This just isn’t true. Don’t conflate occupation with skillset.</p> <p>Regarding the second point, I am not sure how to solve this, to be honest, other than to advise that you look at how other disciplines have incorporated those skillsets. Attend conferences that don’t explicitly focus on networking. I attended <a href="https://stackstorm.com/2017/03/23/stackstorm-srecon-2017/">SREcon</a> recently and was blown away by the difference in mindset towards these skillsets, compared to my experience at networking conferences. I worry that we get into this networking echo chamber where we listen to each other reject these skillsets, and use that to justify not picking them up ourselves.</p> <h1 id="focusing-on-real-fundamentals">Focusing on REAL Fundamentals</h1> <p>All of that in mind, I want to wrap up with a brief discussion about the difference in types of skillsets, since this often comes up when bringing up software skills in networking. For instance, headlines like “Learn Programming, or get CCIE?” piss me off, frankly. It just misses the point entirely, and subverts the tremendous amount of nuance that needs to be explored in this discussion.</p> <p>I believe strongly that focusing on fundamentals, especially if you’re just starting in your career, <strong>and regardless of which discipline you fall under</strong>, will set you up best for success in the long run. It will allow you to make a lot more sense of specific implementations like CLI syntax. Don’t be afraid to lean on the user guide when you need to look up the syntax for a command. Commit the concepts that sit under that command to memory instead of the syntax itself.</p> <p>As an illustration, consider the artist/painter. If painters learned like the network industry wants us to learn, then art schools would only teach how to replicate the Mona Lisa. Instead, artists learn the fundamentals of brush technique. They learn what colors do when blended on the palette. They use their own creativity and decision making to put these fundamentals into practice when it comes time to make something. Similarly, programmers learn fundamentals like sorting algorithms, Big-O notation, CPU architectures, etc, and rely on knowledge of these tools to solve a problem when it arises.</p> <p>It’s worth saying, that because of where this industry is right now, implementation knowledge is important too, especially since the networking industry is in love with certifications that demonstrate implementation knowledge. It’s obvious that the networking industry places a lot more value on specific implementations - just look at the salary estimates for a CompTIA Network+ vs just about any Cisco certification.</p> <p>However, vendor certs are basically a way of putting the vendor in control of your career. On the other hand, fundamental knowledge puts YOU in control. It lets YOU dominate interviews, instead of the vendor you’ve tied yourself to. Always emphasize learning the fundamentals, and consider that the “real” networking fundamentals may not be on any popular curriculum.</p> <p>To build your career, you will likely have to balance implementation-level knowledge like certs, and fundamental knowledge. Certs let you get in the door - that’s just a reality for the current state of the interview. But don’t let this keep you from going way deeper - it will do wonders for your career long-term.</p> <h1 id="conclusion">Conclusion</h1> <p>To wrap up; if you only take two things away from this post, they are:</p> <ul> <li>Scripting is for everyone. Yes, that includes you. It’s something you can start with today, because it’s not magical. We’re just talking about the description of the logic you already use in your day-to-day operations as source code. That’s it.</li> <li>Emphasize fundamental knowledge. Learn enough about implementations to get in the door, but make sure you know how TCP and ARP work (as an example) regardless of platform.</li> </ul> Mon, 27 Mar 2017 00:00:00 +0000 http://keepingitclassless.net/2017/03/learn-programming-or-perish/ http://keepingitclassless.net/2017/03/learn-programming-or-perish/ 2016 Recap and 2017 Goals <p>Yet another recap post to follow up on <a href="https://keepingitclassless.net/2015/12/2015-recap-2016-goals/">last year’s</a>. 2015 was a big transition year for me, and last year I wanted to make sure I kept the momentum going.</p> <blockquote> <p>I make this post yearly to publicly track my own professional development goals. I find this helps me stay accountable to these goals, and it also allows others to give me a kick in the butt if I’m falling behind.</p> </blockquote> <h1 id="2015-goal-recap">2015 Goal Recap</h1> <p>First, let me recap some of the goals <a href="http://keepingitclassless.net/2014/12/2014-recap-2015-goals/">I set for myself at the beginning of the year</a>, and see how well I did.</p> <p><strong>Network Automation Book</strong> - <a href="http://keepingitclassless.net/2015/12/training-next-generation-network-engineer/">At this time last year</a>, I announced that I was working on a network automation book with Scott Lowe and Jason Edelman. This has certainly taken a bit more time than any of us would have liked, but we’re very near the end. The three of us have had a very busy year, and there are very few things to do for this release. However, we have pushed several additional chapters to O’Reilly, so you can still read these via Safari.</p> <p><strong>Open Source</strong> - Given that <a href="https://keepingitclassless.net/2016/10/new-automation-chapter-begins/">I now work for a company centered around an open source project</a>, I’d say I definitely made a good move towards this goal. I also open sourced <a href="https://keepingitclassless.net/2016/03/test-driven-network-automation/">ToDD</a> earlier this year, which has been steadily growing and becoming more stable over the last few months.</p> <p><strong>Deeper into Go and Python</strong> - I did well in this goal as well, for some of the same reasons as the open source goal - namely, that I work for a company centered around a Python-based open source project, and that I maintain ToDD, which is written in Go. I decided early this year, that in order to continue the momentum from my transition to full-time developer in 2015, I want to focus on Go and Python, so that I can be more flexible than knowing a single language, but also focused enough that I can get depth. is a new topic to me. This is a big reason I am getting more involved with Go.</p> <p><strong>More Community Output</strong> - It’s no secret that blogging output has slowed for me lately. My motivations for blogging and for being involved with the community in general are just very different from what they used to be. My early career was defined by trying to become as broad as possible - working with all different kinds of technologies. Now, I tend to spend more time focusing on one thing at a time, getting to a very deep level of understanding. Though I wish this wasn’t the case, this tends to exhaust the energy I’d normally use to write about what I learned. However, while this part has slowed down, I am still fairly pleased with the other things I’ve done. I do feel like my involvement with open source (which has become quite substantial) is filling this gap quite a bit. I’ve also spoke at conferences and am already continuing this in 2017. So to recap, I feel like this goal was accomplished, but perhaps in a different way than it has been in years past.</p> <h1 id="goals-for-2016">Goals for 2016</h1> <p>While my focus since joining StackStorm has certainly included network automation use cases, it’s also exposed me to other industries and customer use cases. In many ways, these scenarios are much more interesting to me personally than what I’ve been working on in networking for the past few years. So I am hoping to branch into other technical areas beyond networking in 2017.</p> <p>I am leaving this intentionally vague, because I don’t know the future obviously, but I feel like the time is right for a change. I’ll always have ties to networking, of course, and I intend on continuing to advocate for network automation, but I want to do more. Lately I’ve been getting more interested in the security industry - and I feel like there might be a gap for me to fill with my networking and software skillset. I’ll be exploring this in greater detail in 2017.</p> <blockquote> <p>I don’t usually talk about personal goals for 2017, but I’d also like to pick up a piano and get back into playing jazz (hoping to find a group in Portland once I brush the rust off)</p> </blockquote> <h1 id="conclusion">Conclusion</h1> <p>I think the most memorable change for me in 2016 was the affirmation that software development was an area where I wanted to work. I’ll always have close ties to the networking industry, but I’ve realized that there’s a lot about the current state of the industry that just doesn’t satisfy my current career objectives in the same way that software, and automation have (and hopefully will). 2016 saw a big direction change towards open source, and I have really enjoyed it.</p> <p>Have a great New Year’s celebration, say safe, and see you in 2017!</p> Sat, 31 Dec 2016 00:00:00 +0000 http://keepingitclassless.net/2016/12/2016-recap-2017-goals/ http://keepingitclassless.net/2016/12/2016-recap-2017-goals/ Introduction to StackStorm <p><a href="https://keepingitclassless.net/2016/10/principles-of-automation/">Earlier</a> I wrote about some fundamental principles that I believe apply to any form of automation, whether it’s network automation, or even building a virtual factory.</p> <p>One of the most important concepts in mature automation is <strong>autonomy</strong>; that is, a system that is more or less self-sufficent. Instead of relying on human beings for input, always try to provide that input with yet another automated piece of the system. There are several benefits to this approach:</p> <ul> <li><strong>Humans Make Mistakes</strong> - This is also a benefit of automation in general, but autonomy also means mistakes are lessened on the input as well as the output of an automation component.</li> <li><strong>Humans Are Slow</strong> - we have lives outside of work, and it’s important to be able to have a system that reacts quickly, instead of waiting for us to get to work. We need a system that is “programmed” by us, and is able to do work on our behalf.</li> <li><strong>Signal To Noise</strong> - Sometimes humans just don’t need to be involved. We’ve all been there - an inbox full of noisy alerts that don’t really mean much. Instead, configure specific triggers that act on your behalf when certain conditions are met</li> </ul> <p>The reality is that we as operations teams are already event-driven by nature, we’re just doing it in our brains. Every operations shop works this way; there is a monitoring tool in place, and the ops folks watch for alerts and respond in some sort of planned way. This sort of event-driven activity is happening all the time without us thinking about it. As you explore the concepts below, note that the main focus here is to simply reproduce those reactions in an automated way with StackStorm.</p> <p>These are all concepts I’ve been seriously pondering for the past 2 years, and have spoken about at several conferences like <a href="https://keepingitclassless.net/2016/04/interop-vegas-2016/">Interop</a>. Recently, when <a href="https://www.youtube.com/watch?v=M_hacp2qd70">I saw what the team at StackStorm was building</a>, and how well it aligned with my beliefs about mature automation practices, <a href="https://keepingitclassless.net/2016/10/new-automation-chapter-begins/">I had to get involved</a>.</p> <p>StackStorm is event-driven automation. As opposed to alternative approaches (which have their own unique benefits) that rely on human input, StackStorm works on the premise that a human being will instead configure the system to watch for certain events and react autonomously on their behalf.</p> <p>I recently attended <a href="http://techfieldday.com/event/nfd12">NFD12</a> as a delegate, and was witness to a presentation by the excellent and articulate Dmitri Zimine (shameless brown nosing, he’s my boss now):</p> <div style="text-align:center;"><iframe width="560" height="315" src="https://www.youtube.com/embed/M_hacp2qd70" frameborder="0" allowfullscreen=""></iframe></div> <h1 id="infrastructure-as-code">Infrastructure as Code</h1> <p>Before I get into the details of StackStorm concepts, it’s also important to remember one of the key fundamentals of next-generation operations, which is the fantastic buzzword “Infrastructure as Code”. Yes it’s a buzzword but there’s some good stuff there. There is real value in being able to describe your infrastructure using straightforward, version-controlled text files, and being able to use these files to provision new infrastructure with ease.</p> <p>Every concept in StackStorm can be described using simple YAML, or languages like Python. This is done for a reason: to enable infrastructure-as-code and event-driven automation to work in harmony. Just like any programming language, or automation tool, this domain-specific language (DSL) that StackStorm uses will take some time to learn, but it’s all aimed at promoting infrastructure-as-code concepts. The DSL is the single source of truth, treat it as such. For instance, use mature Continuous Integration practices (including automated testing and code peer review) when making changes to it. Perform automated tests and checks when changes are made. This will make your operations much more stable.</p> <blockquote> <p>Note that while you should always treat these YAML files as the single source of truth, there are also some tools in StackStorm that allow you to generate this syntax using a friendly GUI.</p> </blockquote> <h1 id="stackstorm-concepts">StackStorm Concepts</h1> <p>Now, let’s explore some Stackstorm concepts.</p> <h2 id="packs">Packs</h2> <p>One of the biggest strengths of StackStorm is its ecosystem. StackStorm’s recent 2.1 release included a new <a href="https://exchange.stackstorm.org/">Exchange</a> which provides a new home for the <strong>over 450 integrations</strong> that already exist as part of the StackStorm ecosystem. These integrations allow StackStorm to interact with 3rd party systems.</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2016/10/exchange.png"><img src="http://keepingitclassless.net/assets/2016/10/exchange.png" width="900" /></a></div> <p>In StackStorm, we call these integrations with <a href="https://docs.stackstorm.com/packs.html">“Packs”</a>. Packs are the atomic unit of deployment for integrations and extensions to StackStorm. This means that regardless of what you’re trying to implement, whether it’s a new Action, Sensor, Rule, or Sensor, it’s done with Packs.</p> <p>As of StackStorm 2.1, pack management has also been re-vamped and improved (we’ll explore packs and pack management in detail in a future post). Installing a new integration is a one-line command. Want to allow StackStorm to run Ansible playbooks? Just run:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>st2 pack install ansible </code></pre></div></div> <p>Now that we’ve covered packs, let’s talk about some of the components you will likely find in a pack.</p> <h2 id="actions">Actions</h2> <p>Though it’s important to understand that StackStorm is all about event-driven automation, it’s also useful to spend some time talking about what StackStorm can <strong>do</strong>. Being able to watch for all the events in the world isn’t very useful if you can’t do anything about what you see. In StackStorm, we can accomplish such things through “<a href="https://docs.stackstorm.com/actions.html">Actions</a>”. Some examples include:</p> <ul> <li>Push a new router configuration</li> <li>Restart a service on a server</li> <li>Create a virtual machine</li> <li>Acknowledge a Nagios / PagerDuty alert</li> <li>Bounce a switchport</li> <li>Send a message to Slack</li> <li>Start a Docker container</li> </ul> <p>There are many others - and the list is growing all the time in the StackStorm <a href="https://exchange.stackstorm.org/">Exchange</a>.</p> <p>One of things that attracted me to the StackStorm project is the fact that Actions are designed very generically, meaning they can be written in any language. This is similar to what I’ve done with testlets in <a href="https://github.com/toddproject">ToDD</a>, and what Ansible has done with their modules. This generic interface allows you to take scripts you already have and are using in your environment, and begin using them as event-driven actions, <a href="https://docs.stackstorm.com/actions.html#converting-existing-scripts-into-actions">with only a bit of additional logic</a>. As long as that script conforms to this standard, they can be used as an Action.</p> <p>There are several actions bundled with StackStorm (truncated for easy display):</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>vagrant@st2learn:~$ st2 action list +---------------------------------+---------+------------------------------------------------- | ref | pack | description +---------------------------------+---------+------------------------------------------------- | chatops.format_execution_result | chatops | Format an execution result for chatops | chatops.post_message | chatops | Post a message to stream for chatops | chatops.post_result | chatops | Post an execution result to stream for chatops | core.announcement | core | Action that broadcasts the announcement to all s | | | consumers. | core.http | core | Action that performs an http request. | core.local | core | Action that executes an arbitrary Linux command | | | localhost. </code></pre></div></div> <p>It’s important to consider these since they may provide you with the functionality you need out of the gate. For instance, lots of systems these days come with REST APIs, and “core.http”, which allows you to send an HTTP request, may be all the Action functionality you need. Even if the predefined Actions don’t suit you, check the <a href="https://exchange.stackstorm.org/">Exchange</a> for a pack that may include an Action that gives you the functionality you’re looking for.</p> <p>Nevertheless, it may sometimes be necessary to create your own Actions.. We’ll go through this in a future blog post, but for now, understand that actions are defined by two files:</p> <ul> <li>A metadata file, usually in YAML, that describes the action to StackStorm</li> <li>A script file (i.e. Python) that implements the Action logic</li> </ul> <p>Actions may depend on certain environmental factors to run. StackStorm makes this possible through “Action Runners”. For instance, you may have a Python script you wish to use as an Action; in this case, you’d leverage the “python-script” runner. Alternatively, you may just want to run an existing Linux command as your Action. In this case you would want to use the “local-shell-cmd” runner. There are <a href="local-shell-cmd">many other published runners</a>, with more on the way.</p> <h2 id="sensors-and-triggers">Sensors and Triggers</h2> <p>For event-driven automation to work, information about the world needs to be brought in to the system so that we can act upon it. In StackStorm, this is done through <a href="https://docs.stackstorm.com/sensors.html">Sensors</a>. Sensors, like your own sense of sight or smell, allow StackStorm to observe the world around it, so that actions can eventually be taken on that information.</p> <blockquote> <p>StackStorm was not designed to be a monitoring tool, so you’ll still want to use whatever monitoring you already have in place. Sensors can/should be used to get data out of a monitoring system and take action accordingly.</p> </blockquote> <p>Sensors can be active or passive. An example of an “active” sensor would be something that actively polls an external entity, like Twitter’s API, for instance. Alternatively, sensors can also be passive; an example of this would be a sensor that subscribes to a message queue, or a streaming API, and simply sits quietly until a message is received.</p> <p>Both sensor types bring data into StackStorm, but the data is somewhat raw. In order to make sense of the data brought in by sensors, and to allow StackStorm to take action on that data, Sensors can also define “Triggers”. These help StackStorm identify incoming “events” from the raw telemetry brought in by Sensors. Triggers are useful primarily when creating a Rule, which is explained in the next section.</p> <p>Similarly to Actions, Sensors are defined using two files:</p> <ul> <li>A YAML metadata file describing the sensor to StackStorm</li> <li>A Python script that implements the sensor logic</li> </ul> <p>An example YAML metadata file might look like this:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>--- class_name: "SampleSensor" entry_point: "sample_sensor.py" description: "Sample sensor that emits triggers." trigger_types: - name: "event" description: "An example trigger." payload_schema: type: "object" properties: executed_at: type: "string" format: "date-time" default: "2014-07-30 05:04:24.578325" </code></pre></div></div> <blockquote> <p>The particular implementation of the Sensor will determine if it is a “passive” or “active sensor”; there are two Python classes that you can inherit from to determine which Sensor type you’re creating.</p> </blockquote> <h2 id="rules">Rules</h2> <p>“<a href="https://docs.stackstorm.com/rules.html">Rules</a>” bring the two concepts of Sensors and Actions together. A Rule is a definition that, in English, says “when this happens, do this other thing”. You may remember that Sensors bring data into StackStorm, and Triggers allow StackStorm to get a handle on when certain things happen with that data. Rules make event-driven automation possible by watching these Triggers, and kicking off an Action (or a Workflow, as we’ll see in the next section).</p> <p>Rules are primarily composed of three components:</p> <ul> <li><strong>Trigger</strong>: “What trigger should I watch?””</li> <li><strong>Criteria</strong>: “How do I know when that trigger indicates I should do something?””</li> <li><strong>Action</strong>: “What should I do?””</li> </ul> <p>This is a straightforward concept if you look at a sample YAML definition for a Rule:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>--- name: "rule_name" # required pack: "examples" # optional description: "Rule description." # optional enabled: true # required trigger: # required type: "trigger_type_ref" criteria: # optional trigger.payload_parameter_name1: type: "regex" pattern : "^value$" trigger.payload_parameter_name2: type: "iequals" pattern : "watchevent" action: # required ref: "action_ref" parameters: # optional foo: "bar" baz: "" </code></pre></div></div> <p>Think of “Rules” as the foundation of event-driven automation. They really are the core of what makes “If <em>__ then __</em>” possible.</p> <p>Stackstorm’s architecture keeps everything very logically separate. Sensors sense. Actions act. Then, rules tie them together and allow you to have a truly autonomous system as a result.</p> <h2 id="workflows">Workflows</h2> <p>Even simple actions rarely take place in isolation. For instance, when you detect that an application node has shut down, there could be ten or more discrete things you need to do in order to properly decommission that node in related systems. So, event-driven automation isn’t always just about kicking off a single action, but rather a “<a href="https://docs.stackstorm.com/workflows.html">Workflow</a>” of actions.</p> <p>In StackStorm, we use <a href="https://wiki.openstack.org/wiki/Mistral">OpenStack Mistral</a> to define workflows. Mistral is a service that’s part of the OpenStack project, and we <a href="https://docs.stackstorm.com/mistral.html">bundle it with StackStorm</a>. Mistral also <a href="http://docs.openstack.org/developer/mistral/dsl/dsl_v2.html">defines a YAML-based Domain-Specific Language (DSL)</a> that’s used to define the logic and flow of the workflow.</p> <p>In the following simple example, we define a Mistral workflow that accepts an arbitrary linux command as input, runs it, and prints the result to stdout:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>--- version: '2.0' examples.mistral-basic: description: A basic workflow that runs an arbitrary linux command. type: direct input: - cmd output: stdout: &lt;% $.stdout %&gt; tasks: task1: action: core.local cmd=&lt;% $.cmd %&gt; publish: stdout: &lt;% task(task1).result.stdout %&gt; stderr: &lt;% task(task1).result.stderr %&gt; </code></pre></div></div> <p>Workflows are also powerful in that you can make decisions within them and take different actions depending on the output of previous tasks. This is done by inserting little “<a href="https://docs.stackstorm.com/mistral_yaql.html">YAQL</a>” statements in the workflow (note the statements underneath “on-success” below):</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>--- version: '2.0' examples.mistral-branching: description: &gt; A sample workflow that demonstrates how to use conditions to determine which path in the workflow to take. type: direct input: - which tasks: t1: action: core.local input: cmd: "printf &lt;% $.which %&gt;" publish: path: &lt;% task(t1).result.stdout %&gt; on-success: - a: &lt;% $.path = 'a' %&gt; - b: &lt;% $.path = 'b' %&gt; - c: &lt;% not $.path in list(a, b) %&gt; a: action: core.local input: cmd: "echo 'Took path A.'" b: action: core.local input: cmd: "echo 'Took path B.'" c: action: core.local input: cmd: "echo 'Took path C.'" </code></pre></div></div> <p>Based on the output from task “t1”, we can choose which of the next tasks will take place.</p> <p>As you can see, Mistral workflows can be simple when you want it to be, but can also scale up to really powerful complex workflows as well. See the <a href="https://docs.stackstorm.com/mistral.html">StackStorm/Mistral</a> documentation for more examples.</p> <h1 id="conclusion">Conclusion</h1> <p>StackStorm has a huge community and it’s growing. Check out our <a href="https://stackstorm.com/#community">Community</a> page, where you’ll find information about how to contact us. Also make sure you follow the links there to join the Slack community (free and open), we’d love to have you even if you just want to ask some questions.</p> <p>Our <a href="https://stackstorm.com/2016/12/06/2-1-new-pack-management/">2.1 release also happened recently</a>, and it introduces a lot of new features. We’re working hard to keep putting more awesome into StackStorm, and actively want your feedback on it. There’s a lot of opportunity for the network industry in particular to take advantage of event-driven automation, and I personally will be working very hard to bridge the gap between the two.</p> <p>Thanks for reading, and stay tuned for the next post, covering the internal architecture of StackStorm.</p> Fri, 16 Dec 2016 00:00:00 +0000 http://keepingitclassless.net/2016/12/introduction-to-stackstorm/ http://keepingitclassless.net/2016/12/introduction-to-stackstorm/ A New Automation Chapter Begins <p>Two years ago, while I worked as a network engineer/consultant, I felt strongly that the industry was ripe for change. In February 2015 I jumped feet-first into the world of network automation by going back to my roots in software development, combining those skills with the lessons I learned from 3 years of network engineering.</p> <p>I’ve learned a ton in the last 2 years - not just at the day job but by actively participating in the automation and open source communities. I’ve co-authored a <a href="https://keepingitclassless.net/2015/12/training-next-generation-network-engineer/">network automation book</a>. I’ve released an open source project to facilitate <a href="https://keepingitclassless.net/2016/03/test-driven-network-automation/">automated and distributed testing</a> of network infrastructure. I’ve <a href="https://keepingitclassless.net/2016/04/interop-vegas-2016/">spoken publicly</a> about many of these concepts and more.</p> <p>Despite all this, there’s a lot left to do, and I want to make sure I’m in the best place to help move the industry forward. My goal is and has always been to help the industry at large realize the benefits of automation, and break the preconception that automation is only useful for big web properties like Google and Facebook. Bringing these concepts down to Earth and providing very practical steps to achieve this goal is a huge passion of mine.</p> <p>Automation isn’t just about running some scripts - it’s about autonomous software. It’s about creating a pipeline of actions that take place with minimal human input. It’s about maintaining high quality software. I wrote about this and more yesterday in my post on the “<a href="https://keepingitclassless.net/2016/10/principles-of-automation/">Principles of Automation</a>”.</p> <h1 id="stackstorm">StackStorm</h1> <p>Later this month, I’m starting a new chapter in my career and joining the team at <a href="https://stackstorm.com/">StackStorm</a>.</p> <p>In short, StackStorm (<a href="https://github.com/StackStorm/st2">the project</a>) is an event-driven automation platform. Use cases include auto-remediation, security responses, facilitated troubleshooting, and complex deployments.</p> <p>StackStorm presented at the recent <a href="http://techfieldday.com/event/nfd12/">Network Field Day 12</a> and discussed not only the core platform, but some of the use cases that, while not specifically network-centric, are important to consider:</p> <div style="text-align:center;"><iframe width="560" height="315" src="https://www.youtube.com/embed/M_hacp2qd70" frameborder="0" allowfullscreen=""></iframe></div> <p>When I first saw StackStorm, I realized quickly that the project aligned well with the <a href="https://keepingitclassless.net/2016/10/principles-of-automation/">Principles of Automation</a> I was rattling around in my head, especially the Rule of Autonomy, which dictates that automation should be driven by input from other software systems. StackStorm makes it easy to move beyond simple “scripts” and truly drive decisions based on events that take place elsewhere.</p> <p>So, how does this change things in terms of my community involvement? Actually I expect this to improve. Naturally, you’ll likely see me writing and talking about StackStorm and related technologies - not just because they’re my employer but because the project matches well with my automation ideals and principles. This does NOT mean that I will stop talking about other concepts and projects. One unique thing about automation is that it’s never a one-size-fits-all….you’re always going to deal with multiple tools in a pipeline to get the job done. I am still very passionate about the people and process problems that aren’t tackled directly by technology solutions, and I plan to continue to grow my own experience in these areas and share them with you all.</p> <p>I still very strongly believe that the first problems we should be solving in the networking industry, and in IT as a whole, are problems of culture and process. So, from that perspective, nothing has changed - but from this new team I feel like I’ll have the support and platform I need to really get these ideas out there.</p> <p>Lastly, there are still <a href="https://stackstorm.com/careers/">openings on the team</a> so if you’re passionate about automation, please consider applying.</p> <p>By no means am I done yet - but I do want to take the opportunity to say <strong>Thank You</strong> to all who have been a part of my public journey for the past 5+ years. I couldn’t have had the learning experiences I’ve had without readers who were just as passionate about technology. My goal is only to increase my involvement in the community in the years to come, and I hope that what I contribute is helpful.</p> <blockquote> <p>I attended NFD12 as a delegate as part of <a href="http://techfieldday.com/about/">Tech Field Day</a>, well before I started talking with StackStorm team about employment opportunities. Events like these are sponsored by networking vendors who may cover a portion of our travel costs. In addition to a presentation (or more), vendors may give us a tasty unicorn burger, <a href="http://www.youtube.com/watch?v=oQrJk9JzW8o">warm sweater made from presenter’s beard</a> or a similar tchotchke. The vendors sponsoring Tech Field Day events don’t ask for, nor are they promised any kind of consideration in the writing of my blog posts … and as always, all opinions expressed here are entirely my own. (<a href="http://keepingitclassless.net/disclaimers/">Full disclaimer here</a>)</p> </blockquote> Wed, 19 Oct 2016 00:00:00 +0000 http://keepingitclassless.net/2016/10/new-automation-chapter-begins/ http://keepingitclassless.net/2016/10/new-automation-chapter-begins/ Principles of Automation <p>Automation is an increasingly interesting topic in pretty much every technology discipline these days. There’s lots of talk about tooling, practices, skill set evolution, and more - but little conversation about fundamentals. What little <strong>is</strong> published by those actually practicing automation, usually takes the form of source code or technical whitepapers. While these are obviously valuable, they don’t usually cover some of the fundamental basics that could prove useful to the reader who wishes to perform similar things in their own organization, but may have different technical requirements.</p> <p>I write this post to cover what I’m calling the “Principles of Automation”. I have pondered this topic for a while and I believe I have three principles that cover just about any form of automation you may consider. These principles have nothing to do with technology disciplines, tools, or programming languages - they are fundamental principles that you can adopt regardless of the implementation.</p> <p>I hope you enjoy.</p> <blockquote> <p>It’s a bit of a long post, so TL;DR - automation isn’t magic. It isn’t only for the “elite”. Follow these guidelines and you can realize the same value regardless of your scale.</p> </blockquote> <h1 id="factorio">Factorio</h1> <p>Lately I’ve been obsessed with a game called <a href="https://www.factorio.com/">“Factorio”</a>. In it, you play an engineer that’s crash-landed on a planet with little more than the clothes on your back, and some tools for gathering raw materials like iron or copper ore, coal, wood, etc. Your objective is to use these materials, and your systems know-how to construct more and more complicated systems that eventually construct a rocket ship to blast off from the planet.</p> <p>Even the very first stages of this game end up being more complicated than they initially appear. Among your initial inventory is a drill that you can use to mine coal, a useful ingredient for anything that needs to burn fuel - but the drill itself actually requires that same fuel. So, the first thing you need to do is mine some coal by hand, to get the drill started.</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2016/10/manual_mining.jpg"><img src="http://keepingitclassless.net/assets/2016/10/manual_mining.jpg" width="600" /></a></div> <p>We can also use some of the raw materials to manually kick-start some automation. With a second drill, we can start mining for raw iron ore. In order to do that we need to build a “burner inserter”, which moves the coal that the first drill gathered into the second drill:</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2016/10/manual_refuel.gif"><img src="http://keepingitclassless.net/assets/2016/10/manual_refuel.gif" width="600" /></a></div> <p>Even this very early automation requires manual intervention, as it all requires coal to burn, and not everything has coal automatically delivered to it (yet).</p> <p>Now, there are things you can do to improve <strong>your own</strong> efficiency, such as building/using better tools:</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2016/10/faster_manual.jpg"><img src="http://keepingitclassless.net/assets/2016/10/faster_manual.jpg" width="600" /></a></div> <p>However, this is just one optimization out of a multitude. Our objectives will never be met if we only think about optimizing the manual process; we need to adopt a “big picture” systems mindset.</p> <p>Eventually we have a reasonably good system in place for mining raw materials; we now need to move to the next level in the technology tree, and start smelting our raw iron ore into iron plates. As with other parts of our system, at first we start by manually placing raw iron ore and coal into a furnace. However, we soon realize that we can be much more efficient if we allow some burner inserters to take care of this for us:</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2016/10/midway_to_automation.gif"><img src="http://keepingitclassless.net/assets/2016/10/midway_to_automation.gif" width="600" /></a></div> <p>With a little extra work we can automate coal delivery to this furnace as well:</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2016/10/full_auto.gif"><img src="http://keepingitclassless.net/assets/2016/10/full_auto.gif" width="600" /></a></div> <p>There’s too much to Factorio to provide screenshots of every step - the number of technology layers you must go through in order to unlock fairly basic technology like solar power is astounding; not to mention being able to launch a fully functional rocket.</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2016/10/full_scale.gif"><img src="http://keepingitclassless.net/assets/2016/10/full_scale.gif" width="600" /></a></div> <p>As you continue to automate processes, you continue to unlock higher and higher capabilities and technology; they all build on each other. Along the way you run into all kinds of issues. These issues could arise in trying to create new technology, or you could uncover a bottleneck that didn’t reveal itself until the system scaled to a certain point.</p> <p>For instance, in the last few screenshots we started smelting some iron plates to use for things like pipes or circuit boards. Eventually, the demand for this very basic resource will outgrow the supply - so as you build production facilities, you have to consider how well they’ll scale as the demand increases. Here’s an example of an iron smelting “facility” that’s built to scale horizontally:</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2016/10/big_auto_2.jpg"><img src="http://keepingitclassless.net/assets/2016/10/big_auto_2.jpg" width="600" /></a></div> <p>Scaling out one part of this system isn’t all you need to be aware of, however. The full end-to-end supply chain matters too.</p> <p>As an example, a “green” science pack is one resource that’s used to perform research that unlocks technologies in Factorio. If you are running short on these, you may immediately think “Well, hey, I need to add more factories that produce green science packs!”. However, the bottleneck might not be the number of factories producing green science, but further back in the system.</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2016/10/bottleneck.png"><img src="http://keepingitclassless.net/assets/2016/10/bottleneck.png" width="250" /></a></div> <p>Green science packs are made by combining a single inserter with a single transport belt panel - and in the screenshot above, while we have plenty of transport belt panels, we aren’t getting any inserters! This means we now have to analyze the part of our system that produces that part - which also might be suffering a shortage in <strong>it’s</strong> supply chain. Sometimes such shortages can be traced all the way down to the lowest level - running out of raw ore.</p> <p>In summary, Factorio is a really cool game that you should definitely check out - but if you work around systems as part of your day job, I encourage you to pay close attention to the following sections, as I’d like to recap some of the systems design principles that I’ve illustrated above. I really do believe there are some valuable lessons to be learned here.</p> <p>I refer to these as the Principles of Automation, and they are:</p> <ul> <li>The Rule of Algorithmic Thinking</li> <li>The Rule of Bottlenecks</li> <li>The Rule of Autonomy</li> </ul> <h1 id="the-rule-of-algorithmic-thinking">The Rule of Algorithmic Thinking</h1> <p>Repeat after me: “Everything is a system”.</p> <p>Come to grips with this, because this is where automation ceases to be some magical concept only for the huge hyperscale companies like Facebook and Google. Everything you do, say, or smell is part of a system, whether you think it is or not; from the complicated systems that power your favorite social media site, all the way down to the water cycle:</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2016/10/Diagram_of_the_Water_Cycle.jpg"><img src="http://keepingitclassless.net/assets/2016/10/Diagram_of_the_Water_Cycle.jpg" width="500" /></a></div> <blockquote> <p>By the way, just as humans are a part of the water cycle, humans are and always will be part of an automated system you construct.</p> </blockquote> <p>In all areas of IT there is a lot of hand-waving; engineers claim to know a technology, but when things go wrong, and it’s necessary to go deeper, they don’t really know it that well. Another name for this could be “user manual” engineering - they know how it should work when things go well, but don’t actually know what makes it tick, which is useful when things start to break.</p> <p>There are many tangible skills that you can acquire that an automation or software team will find attractive, such as language experience, and automated testing. It’s important to know how to write idiomatic code. It’s important to understand what real quality looks like in software systems. However, these things are fairly easy to learn with a little bit of experience. What’s more difficult is understanding what it means to write a <em>meaningful</em> test, and not just check the box when a line of code is “covered”. That kind of skill set requires more experience, and a lot of passion (you have to <strong>want</strong> to write good tests).</p> <p>Harder still is the ability to look at a system with a “big picture” perspective, while also being able to drill in to a specific part and optimize it…and most importantly, the wisdom to know when to do the latter. I like to refer to this skill as “Algorithmic Thinking”. Engineers with this skill are able to mentally deconstruct a system into it’s component parts without getting tunnel vision on any one of them - maintaining that systems perspective.</p> <blockquote> <p>If you think Algorithms are some super-advanced topic that’s way over your head, they’re not. See one of my <a href="https://keepingitclassless.net/2016/08/cs101-algorithms/">earlier posts</a> for a demystification of this subject.</p> </blockquote> <p>A great way to understand this skill is to imagine you’re in an interview, and the interviewer asks you to enumerate all of the steps needed to load a web page. Simple, right? It sure seems like it at first, but what’s really happening is that the interviewer is trying to understand how well you know (or want to know) all of the complex activities that take place in order to load a web page. Sure, the user types a URL into the address bar and hits enter - then the HTTP request magically takes place. Right? Well, how did the machine know what IP address was being represented by that domain name? That leads you to the DNS configuration. How did the machine know how to reach the DNS server address? That leads you to the routing table, which likely indicates the default gateway is used to reach the DNS server. How does the machine get the DNS traffic to the default gateway? In that case, ARP is used to identify the right MAC address to use as the destination for that first hop.</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2016/10/http.png"><img src="http://keepingitclassless.net/assets/2016/10/http.png" width="500" /></a></div> <p>Those are just some of the high-level steps that take place <em>before the request can even be sent</em>. Algorithmic thinking recognizes that each part of a system, no matter how simple, has numerous subsystems that all perform their own tasks. It is the ability to understand that nothing is magic - only layers of abstraction. These days, this is understandably a tall order. As technology gets more and more advanced, so do the abstractions. It may seem impossible to be able to operate at both sides of the spectrum.</p> <blockquote> <p>It’s true, no one can know everything. However, a skilled engineer will have the wisdom to dive behind the abstraction when appropriate. After all, the aforementioned “problem” seemed simple, but there are a multitude of things going on behind the scenes - any one of which could have prevented that page from loading. Being able to think algorithmically doesn’t mean you know everything, but it does mean that when a problem arises, it might be time to jump a little further down the rabbit hole.</p> </blockquote> <p>Gaining experience with automation is all about demystification. Automation is not magic, and it’s not reserved only for Facebook and Google. It is the recognition that we are all part of a system, and if we don’t want to get paged at 3AM anymore, we may as well put software in place that allows us to remove ourselves from that part of the system. If we have the right mindset, we’ll know where to apply those kinds of solutions.</p> <p>Most of us have close friends or family members that are completely non-technical. You know, the type that breaks computers just by looking at them. My suggestion to you is this: if you really want to learn a technology, figure out how to explain it to them. Until you can do that, you don’t really know it that well.</p> <h1 id="the-rule-of-bottlenecks">The Rule of Bottlenecks</h1> <p>Recently I was having a conversation with a web developer about automated testing. They made the argument that they wanted to use automated testing, but couldn’t because each web application they deployed for customers were snowflake custom builds, and it was not feasible to do anything but manual testing (click this, type this). Upon further inspection, I discovered that the majority of their customer requirements were nearly identical. In this case, the real bottleneck wasn’t just that they weren’t doing automated testing; they weren’t even setting themselves up to be able to do it in the first place. In terms of systems design, the problem is much closer to the source - I don’t mean “source code”, but that the problem lies further up the chain of events that could lead to being able to do automated testing.</p> <p>I hear the same old story in networking. “Our network can’t be automated or tested, we’re too unique. We have a special snowflake network”. This highlights an often overlooked part of network automation, and that is that the network design has to be solid. Network automation isn’t just about code - it’s about simple design too; the network has to be designed with automation in mind.</p> <blockquote> <p>This is what DevOps is <strong>really</strong> about. Not automation or tooling, but communication. The ability to share feedback about design-related issues with the other parts of the technology discipline. Yes, this means you need to seek out and proactively talk to your developers. Developers, this means sitting down with your peers on the infrastructure side. Get over it and learn from each other.</p> </blockquote> <p>Once you’ve learned to think Algorithmically, you start to look at your infrastructure like a graph - a series of nodes and edges. The nodes would be your servers, your switches, your access points, your operating systems. These nodes communicate with each other on a huge mesh of edges. When failures happen, they often cause a cascading effect, not unlike the cascading shortages I illustrated in Factorio where a shortage of green science packs doesn’t <em>necessarily</em> mean I need to spin up more green science machines. The bottleneck might not always be where you think it is; in order to fix the real problem, understanding how to locate the <em>real</em> bottleneck is a good skill to have.</p> <p>The cause of a bottleneck could be bad design:</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2016/10/baddesign.png"><img src="http://keepingitclassless.net/assets/2016/10/baddesign.png" width="600" /></a></div> <p>Or it could be improper/insufficient input (which could in turn be caused by a bad design elsewhere):</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2016/10/bottleneck.png"><img src="http://keepingitclassless.net/assets/2016/10/bottleneck.png" width="250" /></a></div> <p>One part of good design is understanding the kind of scale you might have to deal with and reflecting it in your design. This doesn’t mean you have to build something that scales to trillions of nodes today, only that the system you put in place doesn’t prevent you from scaling organically in the near future.</p> <p>As an example, when I built a new plant in Factorio to produce copper wiring, I didn’t build 20 factories, I started with 2 - but I allowed myself room for 20, in case I needed it in the future. In the same way, you can design with scale in mind without having to boil the ocean and <strong>actually</strong> build a solution that meets some crazy unrealistic demand on day one.</p> <p>This blog post is already way too long to talk about proper design, especially considering that this post is fairly technology-agnostic. For now, suffice it to say that having a proper design is important, especially if you’re going in to a new automation project. It’s okay to write some quick prototypes to figure some stuff out, but before you commit yourself to a design, do it on paper (or whiteboard) first. Understanding the steps there will save you a lot of headaches in the long run. Think about the system-to-be using an Algorithmic mindset, and walk through each of the steps in the system to ensure you understand each level.</p> <div style="text-align:center;"><a href="http://imgs.xkcd.com/comics/fixing_problems.png"><img src="http://keepingitclassless.net/assets/2016/10/fixing_problems.png" width="300" /></a></div> <p>As the system matures, it’s going to have bottlenecks. That bottleneck might be a human being that still holds power over a manual process you didn’t know existed. It might be an aging service that was written in the 80s. Just like in Factorio, something somewhere will be a bottleneck - the question is, do you know where it is, and is it worth addressing? It may not be. Everything is a tradeoff, and some bottlenecks are tolerable at certain points in the maturity of the system.</p> <h1 id="the-rule-of-autonomy">The Rule of Autonomy</h1> <p>I am <strong>very</strong> passionate about this section; here, we’re going to talk about the impact of automation on human beings.</p> <p>Factorio is a game where you ascend the tech tree towards the ultimate goal of launching a rocket. As the game progresses, and you automate more and more of the system (which you have to do in order to complete the game in any reasonable time), you unlock more and more elaborate and complicated technologies, which then enable you to climb even higher. Building a solid foundation means you spend less time fussing with gears and armatures, and more time unlocking capabilities you simply didn’t have before.</p> <p>In the “real” world, the idea that automation means human beings are removed from a system is patently false. At first light, automation actually creates more opportunities for human beings because it enables new capabilities that weren’t possible before it existed. Anyone who tells you otherwise doesn’t have a ton of experience in automation. Automation is not a night/day difference - it is an iterative process. We didn’t start Factorio with a working factory - we started it with the clothes on our back.</p> <blockquote> <p>This idea is well described by <a href="https://en.wikipedia.org/wiki/Jevons_paradox">Jevon’s Paradox</a>, which basically states that the more efficiently you produce a resource, the greater the demand for that resource grows.</p> </blockquote> <p>Not only is automation highly incremental, it’s also imperfect at every layer. Everything in systems design is about tradeoffs. At the beginning of Factorio, we had to manually insert coal into many of the components; this was a worthy tradeoff due to the simple nature of the system. It wasn’t <strong>that</strong> big of a deal to do this part manually at that stage, because the system was an infant.</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2016/10/manual_refuel.gif"><img src="http://keepingitclassless.net/assets/2016/10/manual_refuel.gif" width="600" /></a></div> <p>However, at some point, the our factory needed to grow. We needed to allow the two parts to exchange resources directly instead of manually ferrying them between components.</p> <p>The Rule of Autonomy is this: machines can communicate with other machines really well. Let them. Of course, automation is an iterative system, so you’ll undoubtedly start out by writing a few scripts and leveraging some APIs to do some task you previously had to do yourself, but don’t stop there. Always be asking yourself if you need to be in the direct path at all. Maybe you don’t <strong>really</strong> need to provide input to the script in order for it to do it’s work, maybe you can change that script to operate autonomously by getting that input from some other system in your infrastructure.</p> <p>As an example, I once had a script that would automatically put together a Cisco MDS configuration based on some WWPNs I put into a spreadsheet. This script wasn’t useless, it saved me a lot of time, and helped ensure a consistent configuration between deployments. However, it still required my input, specifically for the WWPNs. I quickly decided it wouldn’t be that hard to extend this script to make API calls to Cisco UCS to get those WWPNs and automatically place them into the switch configuration. I was no longer required for that part of the system, it operated autonomously. Of course, I’d return to this software periodically to make improvements, but largely it was off my plate. I was able to focus on other things that I wanted to explore in greater depth.</p> <p>The goal is to remove humans as functional components of a subsystem so they can make improvements to the system as a whole. Writing code is not magic - it is the machine manifestation of human logic. For many tasks, there is no need to have a human manually enumerate the steps required to perform a task; that human logic can be described in code and used to work on the human’s behalf. So when we talk about replacing humans in a particular part of a system, what we’re really talking about is reproducing the logic that they’d employ in order to perform a task as code that doesn’t get tired, burnt out, or narrowly focused. It works asynchronously to the human, and therefore will allow the human to then go make the same reproduction elsewhere, or make other improvements to the system as a whole. If you insist on staying “the cog” in a machine, you’ll quickly lose sight of the big picture.</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2016/10/full_scale.gif"><img src="http://keepingitclassless.net/assets/2016/10/full_scale.gif" width="600" /></a></div> <p>This idea that “automation will take my job” is based on the incorrect assumption is that once automation is in place, the work is over. Automation is not a monolithic “automate everything” movement. Like our efforts in Factorio, automation is designed to take a particular workflow in one very small part of the overall system and take it off of our plates, once we understand it well enough. Once that’s done, our attention is freed up to explore new capabilities we were literally unable to address while we were mired in the lower details of the system. We constantly remove ourselves as humans from higher and higher parts of the system.</p> <p>Note that I said “parts” of the system. Remember: everything is a system, so it’s foolish to think that human beings can (or should) be entirely removed - you’re always going to need human input to the system as a whole. In technology there are just some things that require human input - like new policies or processes. Keeping that in mind, always be asking yourself “Do I really need <strong>human</strong> input at <strong>this</strong> specific part of the system?” Constantly challenge this idea.</p> <p>Automation is <strong>so</strong> not about removing human beings from a system. It’s about moving humans to a new part of the system, and about allowing automation to be driven by events that take place elsewhere in the system.</p> <h1 id="conclusion">Conclusion</h1> <p>Note that I haven’t really talked about specific tools or languages in this post. It may seem strange - often when other automation junkies talk about how to get involved, they talk about learning to code, or learning Ansible or Puppet, etc. As I’ve mentioned earlier in this post (and as I’ve presented at conferences), this is all very meaningful - at some point the rubber needs to meet the road. However, when doing this yourself, hearing about someone else’s implementation details is not enough - you need some core fundamentals to aim for.</p> <p>The best way to get involved with automation is to want it. I can’t make you want to invest in automation as a skill set, nor can your manager; only you can do that. I believe that if the motivation is there, you’ll figure out the right languages and tools for yourself. Instead, I like to focus on the fundamentals listed above - which are language and tool agnostic. These are core principles that I wish I had known about when I started on this journey - principles that don’t readily reveal themselves in a quick Stack Overflow search.</p> <p>That said, my parting advice is:</p> <ol> <li><strong>Get Motivated</strong> - think of a problem you actually care about. “Hello World” examples get old pretty fast. It’s really hard to build quality systems if you don’t care. Get some passion, or hire folks that have it. Take ownership of your system. Make the move to automation with strategic vision, and not a half-cocked effort.</li> <li><strong>Experiment</strong> - learn the tools and languages that are most powerful for you. Automation is like cooking - you can’t just tie yourself to the recipe book. You have to learn the fundamentals and screw up a few times to really learn. Make mistakes, and devise automated tests that ensure you don’t make the same mistake twice.</li> <li><strong>Collaborate</strong> - there are others out there that are going through this journey with you. Sign up for the <a href="http://slack.networktocode.com/">networktocode slack channel (free)</a> and participate in the community.</li> </ol> Tue, 18 Oct 2016 00:00:00 +0000 http://keepingitclassless.net/2016/10/principles-of-automation/ http://keepingitclassless.net/2016/10/principles-of-automation/ ToDD Has Moved! <p>ToDD has been out in the wild for 6 months, and in that time I’ve been really pleased with it’s growth and adoption. Considering this was just a personal side-project, I’ve been blown away by what it’s doing for my own learning experiences as well as for the network automation pipelines of the various folks that pop onto the slack channel asking questions.</p> <p>For the last 6 months I’ve hosted ToDD on <a href="https://github.com/Mierdin">my personal Github profile</a>. It was a good initial location, becuase there really was no need at the time to do anything further.</p> <p>However, as of tonight, ToDD’s new permanent location is <a href="https://github.com/toddproject/todd">https://github.com/toddproject/todd</a>. Read on for some reasons for this.</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2016/09/github.png"><img src="http://keepingitclassless.net/assets/2016/09/github.png" width="400" /></a></div> <h1 id="native-testlets">Native Testlets</h1> <p>One of the biggest reasons for creating the <a href="https://github.com/toddproject">“toddproject” organization</a> came about when I started rewriting some of the testlets in Go. These are called <a href="https://todd.readthedocs.io/en/latest/testlets/nativetestlets/nativetestlets.html">native testlets</a> and the intention is that they are packaged alongside ToDD because they’re useful to a very wide percentage of ToDD’s userbase (in the same way the legacy bash testlets were).</p> <p>For this reason, I created the “toddproject” organization, and once that was done, it made a lot of sense to move ToDD there as well.</p> <p>Rewriting the legacy bash testlets in Go offers several advantages, but the top two are:</p> <ul> <li>Ability to take advantage of some common code in ToDD so that the testlets aren’t reinventing the wheel</li> <li>Better cross-platform testing (existing testlets pretty much required linux)</li> </ul> <p>Currently only the “ping” testlet has been implemented in Go - but I hope to replace “http” and “iperf” soon with Go alternatives.</p> <h1 id="updated-docs">Updated Docs</h1> <p>In addition to moving to a new location, the documentation for ToDD has been massively improved and simplified:</p> <div style="text-align:center;"><a href="http://keepingitclassless.net/assets/2016/09/newandold.png"><img src="http://keepingitclassless.net/assets/2016/09/newandold.png" width="900" /></a></div> <p>As you can see, the order now actually makes sense. Please check out <a href="https://todd.readthedocs.io/en/latest/">todd.readthedocs.io</a> and let me know what you think!</p> Fri, 30 Sep 2016 00:00:00 +0000 http://keepingitclassless.net/2016/09/todd-has-moved/ http://keepingitclassless.net/2016/09/todd-has-moved/ The Importance of the Network Software Supply Chain <p>At <a href="http://techfieldday.com/event/nfd12/">Networking Field Day 12</a>, we heard from a number of vendors that offered solutions to some common enterprise network problems, from management, to security, and more.</p> <p>However, there were a few presentations that didn’t seem directly applicable to the canonical network admin’s day-to-day. This was made clear by some comments by delegates in the room, as well as others tweeting about the presentation.</p> <h1 id="accelerating-the-x86-data-plane">Accelerating the x86 Data Plane</h1> <p>Intel, for instance, <a href="http://techfieldday.com/appearance/intel-presents-at-networking-field-day-12/">spent a significant amount of time</a> discussing the <a href="http://dpdk.org/">Data Plane Development Kit (DPDK)</a>, which provides a different way of leveraging CPU resources for fast packet processing.</p> <div style="text-align:center;"><iframe width="560" height="315" src="https://www.youtube.com/embed/t9AERPGqEvQ" frameborder="0" allowfullscreen=""></iframe></div> <p>In their presentation, Intel explained the various ways that they’ve circumvented some of the existing bottlenecks in the Linux kernel, resulting in a big performance increase for applications sending and receiving data on the network. DPDK operates in user space, meaning the traditional overhead associated with copying memory resources between user and kernel space is avoided. In addition, techniques like parallel processing and poll mode drivers (as opposed to the traditional interrupt processing model) means packet processing can be done much more efficiently, resulting in better performance.</p> <p>This is all great (and as a software nerd, very interesting to me personally) but what does this have to do with the average IT network administrator?</p> <h1 id="pay-no-attention-to-the-overlay-behind-the-curtain">Pay No Attention to the Overlay Behind the Curtain</h1> <p>In addition, Teridion spent some time discussing their solution to increasing performance between content providers by actively monitoring performance on the internet through cloud-deployed agents and routers, and deploying overlays as necessary to ensure that the content uses the best-performing path at all times.</p> <div style="text-align:center;"><iframe width="560" height="315" src="https://www.youtube.com/embed/gkKrfT99ctI" frameborder="0" allowfullscreen=""></iframe></div> <p>In contrast to the aforementioned presentation from Intel, who have been very clear about the deepest technical detail of their solutions, Teridion was very guarded about most of the interesting technical detail of their solution, claiming it was part of their “special sauce”. While in some ways this is understandable (they are not the size of Intel, and might want to be more careful about giving away their IP), they were in front of the Tech Field Day audience and using terms like “pixie dust” in lieu of technical detail is ineffective at best.</p> <p>Despite this, and after some questioning by the delegates in the room, it became clear that their solution was also not targeted towards enterprise IT, but rather at the content providers themselves.</p> <p>Like the technologies discussed by Intel, the Teridion solution has become one of the “behind the scenes” technologies that we might want to consider when evaluating content providers. As an enterprise network architect, I may not directly interface with Teridion, but knowing more about them will tell me a great deal about how well a relationship with someone who <strong>is</strong> using them might go. When someone isn’t willing to share those details, I ask myself “Why am I here?”.</p> <h1 id="caring-about-the-supply-chain">Caring about the Supply Chain</h1> <p>When I walk into the supermarket looking for some chicken to grill, my thoughts are not limited to what’s gone on in that particular store, but also with that store’s supply chain. I care about how those chickens were raised. Perhaps I do not agree with the supermarket chain’s choice in supplier; that will drive my decision to stay in that store, or go down the street to the butcher.</p> <p><strong>In the same way</strong>, we should care about the supply chain behind the solutions we use in our network infrastructure. It’s useful to know if a vendor chose to build their router on DPDK or the like, because it means they recognized the futility of trying to reinvent the wheel and decided to use a common, optimized base. They provide value on top of that. Knowing the details of DPDK means I can know the details of all vendors that choose to use that common base.</p> <blockquote> <p>It’s clear that solutions like what were presented by these two vendors is targeted - not at the hundreds or thousands of enterprise IT customers but rather on a handful of network vendors (in the case of Intel) or big content providers (in the case of Teridion). It obviously makes sense from a technical perspective, but also from a business perspective, since acquiring those customers means Intel and Teridion get all <strong>their</strong> customers as well.</p> </blockquote> <p>Another good example is a <a href="https://www.youtube.com/watch?v=ufGolasNmak">Packet Pushers podcast we recorded at Network Field Day 11</a>, where we discussed the growing trend of network vendors willing to use an open source base for their operating systems. This is a <strong>good thing</strong>; not only does it help us as customers immediately understand a large part of the technical solution, it also means the vendor isn’t wasting cycles reinventing the wheel and charging me for the privilege.</p> <p>When companies are unwilling to go deeper than describing their technology as “special sauce”, it hurts my ability to conceptualize this supply chain. It’s like if a poultry farmer just waved their hands and said “don’t worry, our chickens are happy”. Can you not <em>at least</em> show me a picture of where you raise the chickens? It’s not like that picture is going to let me immediately start a competing chicken farm.</p> <p>When the world around networking is embracing open source to the point where we’re actually building entire business models around it, the usage of terms like “pixie dust” in lieu of technical detail just smells of old-world thinking. I’m not saying to give everything away for free, but meet me halfway - enable me to conceptualize and make a reasonable decision regarding my software supply chain.</p> <blockquote> <p>I attended NFD12 as a delegate as part of <a href="http://techfieldday.com/about/">Tech Field Day</a>. Events like these are sponsored by networking vendors who may cover a portion of our travel costs. In addition to a presentation (or more), vendors may give us a tasty unicorn burger, <a href="http://www.youtube.com/watch?v=oQrJk9JzW8o">warm sweater made from presenter’s beard</a> or a similar tchotchke. The vendors sponsoring Tech Field Day events don’t ask for, nor are they promised any kind of consideration in the writing of my blog posts … and as always, all opinions expressed here are entirely my own. (<a href="http://keepingitclassless.net/disclaimers/">Full disclaimer here</a>)</p> </blockquote> Tue, 16 Aug 2016 00:00:00 +0000 http://keepingitclassless.net/2016/08/importance-network-supply-chain/ http://keepingitclassless.net/2016/08/importance-network-supply-chain/