Keeping It Classless Perspectives On The Intersection of Networking and Software Development Your Cheese Moved a Long Time Ago <p>I was recently on a panel at the <a href="">Event-Driven Automation Meetup</a> at LinkedIn in Sunnyvale, CA, and we all had a really good hour-long conversation about automation. What really made me happy was that nearly the entire conversation focused on bringing the same principles that companies like LinkedIn and Facebook use on their network to smaller organizations, making them practical for more widespread use.</p> <blockquote class="twitter-tweet tw-align-center" data-lang="en"><p lang="en" dir="ltr">Nina Mushiana of <a href="">@LinkedIn</a> says &quot;Anything that can be documented should be automated&quot;.<br />Great Auto-Remediation Meetup! <a href=""></a></p>&mdash; StackStorm (@Stack_Storm) <a href="">March 31, 2017</a></blockquote> <script async="" src="//" charset="utf-8"></script> <p>One particular topic that came up was one I’ve struggled with for the past few years; What about Day 2 of network automation? So, we manage to write some Ansible playbooks to push configuration files to switches - what’s next? Often this question isn’t asked. I think the network automation conversation has progressed to the point where we should all start asking this question more often.</p> <p>I believe that the network engineering discipline is at a crossroads, and the workforce as a whole needs to make some changes and decisions in order to stay relevant. Those changes are all based on the following premise:</p> <blockquote> <p>The value of the network does not come from discrete nodes (like routers and switches - physical or virtual), or their configuration, but from the services they provide.</p> </blockquote> <p>If you’re just getting started down the path of following basic configuration management or infrastructure-as-code principles, <strong>that’s fantastic</strong>. This post is not meant to discourage you from doing that. Those things are great for 1-2 years in the future. This post focuses on year 3+ of the network automation journey.</p> <h1 id="your-cheese-has-moved">Your Cheese Has Moved</h1> <p>We’ve all heard the lamentations that come from server admins (<a href="">throwback alert</a>) like “why does it take weeks to provision a new VLAN?”; I worked as a network and data center consultant for a number of years and I can tell you that these stories are true, and it gets much worse than that.</p> <p>As I’ve said before, what the sysadmin usually doesn’t know is all the activity that goes on behind the scenes to deliver that VLAN. Usually what they’re asking for is a new logical network, which isn’t just a tag on a switchport - it’s also adding a layer 3 interface, and potentially routing changes, edits to the firewall, a new load balancing configuration, and on and on and on. The network has traditionally provided a lot of these services, that the sysadmin took for granted.</p> <p>You might understand their frustration, but the reality is that the network engineer is trying hard just to provide these services and ensure they’re changing adequately for the applications that rely upon them. It also doesn’t help when processes like ITIL force such changes to take places every first weekend of the month at 2AM. This is a far cry from what the application teams and developers have come to expect, like response times of seconds or minutes, not weeks or months. But hey, those silly developers don’t know networking, so they can just deal with it, right?</p> <p>Yes, it can be tempting to make fun of some developers that can’t tell a frame from a packet. However, it may be useful to remember that a developer wrote the software in your router. Someone had to write the algorithms that power your load balancer. It is indeed possible that some software developers know networking - even better than most network engineers out there. Then, if you put them in the constantly-innovating culture of silicon valley that is always looking for a problem to solve, it’s inevitable; the arduous processes and inflexible tooling that has dominated networking for so long provided those developers and sysadmins with a problem to solve on a silver platter.</p> <div style="text-align:center;"><a href=""><img src="" width="300" /></a></div> <p>And solve it they did. When x86 virtualization was really hitting the mainstream, network engineers didn’t really acknowledge the vSwitch. They wrote it off as “those server guys”. What about when we started routing in the host or hypervisor? I know a lot of people like to make fun of the whole <code class="highlighter-rouge">docker0</code> bridge/NAT thing. Those silly server people, right? Developers are spinning up haproxy instances for load balancing, and learning how to use iptables to secure their own infrastructure. On top of that, all of these network services are <strong>also being offered by AWS</strong> and are all in one nice dashboard and also totally programmable. Can you really blame the developer now? Put yourself in their shoes - if you were faced with an inflexible network infrastructure that your application depended on, and you had no control over it, how long would it take you to follow the shiny red ball over to Amazon where they make all those same network <em>services</em> totally abstract and API-controllable?</p> <p>So what’s happening here is that “those server guys” are basically running their own network at this point. We’ve clung to our black boxes, and our configuration files at the cost of <strong>losing control over the actual network services</strong>. The truth is, we need to play a lot of catch-up.</p> <blockquote> <p>I know what you’re thinking - there’s more to the network than the data center. But like it or not, the datacenter houses the applications, and the applications are where the business sees the value in IT. Applications and software development teams sit closer to the boss, and they’re learning how to manage network services pretty well on their own out of necessity.</p> </blockquote> <h1 id="getting-the-cheese-back">Getting the Cheese Back</h1> <p>Network automation is about so much more than merely solving a configuration management problem. If it was, this would all be a bit anticlimactic, wouldn’t it? Everyone would just learn Ansible/Salt/Puppet and be done with it.</p> <p>Network automation, just like all other forms, is about <strong>services integration</strong>. There aren’t “existing tools” for your legacy, internal applications. At some point <a href="">you’re going to have to write some code</a>, even if it’s an extension to an existing tool. It’s time to get over this aversion to dealing with even basic scripting, and start filling in the 20% of our workflows that can’t be addressed by a turnkey tool or product. To me, this is the next step of network automation - being able to fill in the gaps between historically air-gapped services to create an automated broader IT system.</p> <p>For instance - Kubernetes is an increasingly popular choice for those looking to deploy distributed applications (don’t make me say “cloud native”). It’s great at managing the entities (like pods) under it’s control, but it’s not meant to run everything meaningful to your business. If you’re running Kubernetes in your organization, it will have to run alongside a bunch of other stuff like OpenStack, vSphere, even mainframes. This is the reality of brownfield.</p> <p>As you might expect, all these systems need to work together, and we’ve historically “integrated” them by hand for a long time by looking at different areas of our technology stack, and “rendering” abstract concepts of desired state into implementation-specific commands and configurations. Just take networking as a specific example - a network engineer is the human manifestation of a cross platform orchestrator, seamlessly translating between Cisco and Juniper CLI syntaxes.</p> <div style="text-align:center;"><a href=""><img src="" width="500" /></a></div> <p>So, to return to the main point; the network is now no longer the sole proprietor of network services - those are slowly but surely migrating into the realm of the sysadmin and software developer. How can we adapt to this? One way is to acknowledge that the new “network edge” is very blurred. No longer is there a physical demarcation like a switchport; rather, these services are being provided either directly adjacent, or even co-resident with the application.</p> <p>It’s actually a bit encouraging that this has happened. This change represents a huge opportunity for network engineers to gain more control over the network than they’ve ever had. Historically, these network services were hidden behind “value-add, differentiating features” like CLI syntax (insert sarcasm undertone here). In the new world these services are either taking place in open-source software, or are at least driven by well-designed, well-documented APIs. So, this new model is out there ready for us. We can take it, or lose it.</p> <h1 id="conclusion">Conclusion</h1> <p>The migration of network services out of the network itself was inevitable, but it’s absolutely not a death blow to the network engineer - it’s a huge opportunity to move forward in a big way. There’s a lot of work to do, but as <a href="">I wrote about last week</a>, the networking skill set is still sought after, and still needed in this new world.</p> <p><a href="">I’ll be speaking at Interop ITX</a> in Vegas next month, about this, and more related topics. If you want to talk about automation, or just geek out about beer or food, I’d love to chat with you.</p> Thu, 06 Apr 2017 00:00:00 +0000 Learn Programming or Perish(?) <p>I was honored to return to Packet Pushers for <a href="">a discussion on programming skillsets in the networking industry</a>. I verbalized some thoughts there, but even 60 minutes isn’t enough for a conversation like this.</p> <p>To be clear, this post is written primarily to my followers in the networking industry, since that’s largely where this conversation is taking place.</p> <h1 id="scripting-is-not-programming">Scripting is NOT Programming</h1> <p>I want to put something to rest right now, and that is the conflation of scripting and software development. You may be hesitant to pick up any skills in this area because you feel like you have to boil the ocean in order to be effective, which is not true.</p> <p>As I briefly mention in the podcast, I spent the first 4 years or so of my career making networking my day job. Because of that, I picked up a lot of useful knowledge in this area. However, as I started to explore software, I realized that networking wasn’t something I wanted to do as a day job anymore, but I still greatly value the networking skillset I retain from this experience.</p> <p>Making this leap over 2 years ago revealed a multitude of subskills, fundamental knowledge, and daily responsibilities I simply wasn’t exposed to when I wasn’t doing this full time. Things I even take for granted now - like code review, automated testing, and computer science basics like algorithms. While I wouldn’t ever discourage anyone from learning these kinds of things, it is very understandable that a network engineer doesn’t deal with these things, because they go way beyond simple scripting.</p> <blockquote> <p>That said, you may run into challenges as your scripts become more complex. It may be useful to pair with someone that writes code for a living, and learn how to make your scripts more modular, scalable, and reusable.</p> </blockquote> <p>In short, don’t conflate <strong>skillset</strong> with <strong>occupation</strong>. Don’t feel like you have to boil the ocean in order to get started. You don’t have to become a programmer, but you should be able to write and maintain scripts using a modern language.</p> <h1 id="stop-talking-start-building">Stop Talking, Start Building</h1> <p>Hopefully the previous section drew a clear line between the <strong>skill</strong> of scripting and the <strong>occupation</strong> of software development, and that as a network engineer, you no more “need” to become a software developer than a car mechanic “needs” to become a heart surgeon. Now that this is out of the way, it’s time to have some real talk about this whole debate.</p> <p>One thing I’ve noticed since joining a team that has ties to just about every area of IT, including networking, is that other disciplines realized long ago that these skills are necessary for reasonably modern operations. There is no “should sysadmins learn code” discussions going on right now - they’ve all picked up Python, bash, or similar. It’s not a discussion of whether or not being able to augment their workflows with code is useful; it is assumed. Yet in networking we’re still debating this for some reason. It pains me when I hear perspectives that paint basic scripting skills as something that only engineers at Facebook or Google need to worry about, when other disciplines, even at smaller scale, simply assume this skillset exists in their operational model.</p> <p>Frankly, I am a bit disturbed that this is still so much of a discussion in networking. I worry that the vast majority of the industry is primarily interested in having their problems solved for them. This is something I observed about 3 years ago, and is a big reason I wanted to make a change in my own career - I didn’t feel like I was building anything, just operating something that someone else built. We alluded to this in the podcast - the industry seems to be trending away from “engineering”, and towards “administration”. Of course, this is a generalization. It’s obvious that the rather explosive growth of communities like <a href="">“Network to Code”</a> are indicating at least some interest, but I worry that it’s not enough.</p> <p>There are only two possible conclusions that I can draw from my observations:</p> <ul> <li>People assume that in order to be useful, they have to learn everything a software developer has learned.</li> <li>The difference between software development and scripting is understood, but even scripting is viewed as something “only for Facebook or Google”.</li> </ul> <p>Hopefully the previous section sufficiently refuted the first point. This just isn’t true. Don’t conflate occupation with skillset.</p> <p>Regarding the second point, I am not sure how to solve this, to be honest, other than to advise that you look at how other disciplines have incorporated those skillsets. Attend conferences that don’t explicitly focus on networking. I attended <a href="">SREcon</a> recently and was blown away by the difference in mindset towards these skillsets, compared to my experience at networking conferences. I worry that we get into this networking echo chamber where we listen to each other reject these skillsets, and use that to justify not picking them up ourselves.</p> <h1 id="focusing-on-real-fundamentals">Focusing on REAL Fundamentals</h1> <p>All of that in mind, I want to wrap up with a brief discussion about the difference in types of skillsets, since this often comes up when bringing up software skills in networking. For instance, headlines like “Learn Programming, or get CCIE?” piss me off, frankly. It just misses the point entirely, and subverts the tremendous amount of nuance that needs to be explored in this discussion.</p> <p>I believe strongly that focusing on fundamentals, especially if you’re just starting in your career, <strong>and regardless of which discipline you fall under</strong>, will set you up best for success in the long run. It will allow you to make a lot more sense of specific implementations like CLI syntax. Don’t be afraid to lean on the user guide when you need to look up the syntax for a command. Commit the concepts that sit under that command to memory instead of the syntax itself.</p> <p>As an illustration, consider the artist/painter. If painters learned like the network industry wants us to learn, then art schools would only teach how to replicate the Mona Lisa. Instead, artists learn the fundamentals of brush technique. They learn what colors do when blended on the palette. They use their own creativity and decision making to put these fundamentals into practice when it comes time to make something. Similarly, programmers learn fundamentals like sorting algorithms, Big-O notation, CPU architectures, etc, and rely on knowledge of these tools to solve a problem when it arises.</p> <p>It’s worth saying, that because of where this industry is right now, implementation knowledge is important too, especially since the networking industry is in love with certifications that demonstrate implementation knowledge. It’s obvious that the networking industry places a lot more value on specific implementations - just look at the salary estimates for a CompTIA Network+ vs just about any Cisco certification.</p> <p>However, vendor certs are basically a way of putting the vendor in control of your career. On the other hand, fundamental knowledge puts YOU in control. It lets YOU dominate interviews, instead of the vendor you’ve tied yourself to. Always emphasize learning the fundamentals, and consider that the “real” networking fundamentals may not be on any popular curriculum.</p> <p>To build your career, you will likely have to balance implementation-level knowledge like certs, and fundamental knowledge. Certs let you get in the door - that’s just a reality for the current state of the interview. But don’t let this keep you from going way deeper - it will do wonders for your career long-term.</p> <h1 id="conclusion">Conclusion</h1> <p>To wrap up; if you only take two things away from this post, they are:</p> <ul> <li>Scripting is for everyone. Yes, that includes you. It’s something you can start with today, because it’s not magical. We’re just talking about the description of the logic you already use in your day-to-day operations as source code. That’s it.</li> <li>Emphasize fundamental knowledge. Learn enough about implementations to get in the door, but make sure you know how TCP and ARP work (as an example) regardless of platform.</li> </ul> Mon, 27 Mar 2017 00:00:00 +0000 2016 Recap and 2017 Goals <p>Yet another recap post to follow up on <a href="">last year’s</a>. 2015 was a big transition year for me, and last year I wanted to make sure I kept the momentum going.</p> <blockquote> <p>I make this post yearly to publicly track my own professional development goals. I find this helps me stay accountable to these goals, and it also allows others to give me a kick in the butt if I’m falling behind.</p> </blockquote> <h1 id="2015-goal-recap">2015 Goal Recap</h1> <p>First, let me recap some of the goals <a href="">I set for myself at the beginning of the year</a>, and see how well I did.</p> <p><strong>Network Automation Book</strong> - <a href="">At this time last year</a>, I announced that I was working on a network automation book with Scott Lowe and Jason Edelman. This has certainly taken a bit more time than any of us would have liked, but we’re very near the end. The three of us have had a very busy year, and there are very few things to do for this release. However, we have pushed several additional chapters to O’Reilly, so you can still read these via Safari.</p> <p><strong>Open Source</strong> - Given that <a href="">I now work for a company centered around an open source project</a>, I’d say I definitely made a good move towards this goal. I also open sourced <a href="">ToDD</a> earlier this year, which has been steadily growing and becoming more stable over the last few months.</p> <p><strong>Deeper into Go and Python</strong> - I did well in this goal as well, for some of the same reasons as the open source goal - namely, that I work for a company centered around a Python-based open source project, and that I maintain ToDD, which is written in Go. I decided early this year, that in order to continue the momentum from my transition to full-time developer in 2015, I want to focus on Go and Python, so that I can be more flexible than knowing a single language, but also focused enough that I can get depth. is a new topic to me. This is a big reason I am getting more involved with Go.</p> <p><strong>More Community Output</strong> - It’s no secret that blogging output has slowed for me lately. My motivations for blogging and for being involved with the community in general are just very different from what they used to be. My early career was defined by trying to become as broad as possible - working with all different kinds of technologies. Now, I tend to spend more time focusing on one thing at a time, getting to a very deep level of understanding. Though I wish this wasn’t the case, this tends to exhaust the energy I’d normally use to write about what I learned. However, while this part has slowed down, I am still fairly pleased with the other things I’ve done. I do feel like my involvement with open source (which has become quite substantial) is filling this gap quite a bit. I’ve also spoke at conferences and am already continuing this in 2017. So to recap, I feel like this goal was accomplished, but perhaps in a different way than it has been in years past.</p> <h1 id="goals-for-2016">Goals for 2016</h1> <p>While my focus since joining StackStorm has certainly included network automation use cases, it’s also exposed me to other industries and customer use cases. In many ways, these scenarios are much more interesting to me personally than what I’ve been working on in networking for the past few years. So I am hoping to branch into other technical areas beyond networking in 2017.</p> <p>I am leaving this intentionally vague, because I don’t know the future obviously, but I feel like the time is right for a change. I’ll always have ties to networking, of course, and I intend on continuing to advocate for network automation, but I want to do more. Lately I’ve been getting more interested in the security industry - and I feel like there might be a gap for me to fill with my networking and software skillset. I’ll be exploring this in greater detail in 2017.</p> <blockquote> <p>I don’t usually talk about personal goals for 2017, but I’d also like to pick up a piano and get back into playing jazz (hoping to find a group in Portland once I brush the rust off)</p> </blockquote> <h1 id="conclusion">Conclusion</h1> <p>I think the most memorable change for me in 2016 was the affirmation that software development was an area where I wanted to work. I’ll always have close ties to the networking industry, but I’ve realized that there’s a lot about the current state of the industry that just doesn’t satisfy my current career objectives in the same way that software, and automation have (and hopefully will). 2016 saw a big direction change towards open source, and I have really enjoyed it.</p> <p>Have a great New Year’s celebration, say safe, and see you in 2017!</p> Sat, 31 Dec 2016 00:00:00 +0000 Introduction to StackStorm <p><a href="">Earlier</a> I wrote about some fundamental principles that I believe apply to any form of automation, whether it’s network automation, or even building a virtual factory.</p> <p>One of the most important concepts in mature automation is <strong>autonomy</strong>; that is, a system that is more or less self-sufficent. Instead of relying on human beings for input, always try to provide that input with yet another automated piece of the system. There are several benefits to this approach:</p> <ul> <li><strong>Humans Make Mistakes</strong> - This is also a benefit of automation in general, but autonomy also means mistakes are lessened on the input as well as the output of an automation component.</li> <li><strong>Humans Are Slow</strong> - we have lives outside of work, and it’s important to be able to have a system that reacts quickly, instead of waiting for us to get to work. We need a system that is “programmed” by us, and is able to do work on our behalf.</li> <li><strong>Signal To Noise</strong> - Sometimes humans just don’t need to be involved. We’ve all been there - an inbox full of noisy alerts that don’t really mean much. Instead, configure specific triggers that act on your behalf when certain conditions are met</li> </ul> <p>The reality is that we as operations teams are already event-driven by nature, we’re just doing it in our brains. Every operations shop works this way; there is a monitoring tool in place, and the ops folks watch for alerts and respond in some sort of planned way. This sort of event-driven activity is happening all the time without us thinking about it. As you explore the concepts below, note that the main focus here is to simply reproduce those reactions in an automated way with StackStorm.</p> <p>These are all concepts I’ve been seriously pondering for the past 2 years, and have spoken about at several conferences like <a href="">Interop</a>. Recently, when <a href="">I saw what the team at StackStorm was building</a>, and how well it aligned with my beliefs about mature automation practices, <a href="">I had to get involved</a>.</p> <p>StackStorm is event-driven automation. As opposed to alternative approaches (which have their own unique benefits) that rely on human input, StackStorm works on the premise that a human being will instead configure the system to watch for certain events and react autonomously on their behalf.</p> <p>I recently attended <a href="">NFD12</a> as a delegate, and was witness to a presentation by the excellent and articulate Dmitri Zimine (shameless brown nosing, he’s my boss now):</p> <div style="text-align:center;"><iframe width="560" height="315" src="" frameborder="0" allowfullscreen=""></iframe></div> <h1 id="infrastructure-as-code">Infrastructure as Code</h1> <p>Before I get into the details of StackStorm concepts, it’s also important to remember one of the key fundamentals of next-generation operations, which is the fantastic buzzword “Infrastructure as Code”. Yes it’s a buzzword but there’s some good stuff there. There is real value in being able to describe your infrastructure using straightforward, version-controlled text files, and being able to use these files to provision new infrastructure with ease.</p> <p>Every concept in StackStorm can be described using simple YAML, or languages like Python. This is done for a reason: to enable infrastructure-as-code and event-driven automation to work in harmony. Just like any programming language, or automation tool, this domain-specific language (DSL) that StackStorm uses will take some time to learn, but it’s all aimed at promoting infrastructure-as-code concepts. The DSL is the single source of truth, treat it as such. For instance, use mature Continuous Integration practices (including automated testing and code peer review) when making changes to it. Perform automated tests and checks when changes are made. This will make your operations much more stable.</p> <blockquote> <p>Note that while you should always treat these YAML files as the single source of truth, there are also some tools in StackStorm that allow you to generate this syntax using a friendly GUI.</p> </blockquote> <h1 id="stackstorm-concepts">StackStorm Concepts</h1> <p>Now, let’s explore some Stackstorm concepts.</p> <h2 id="packs">Packs</h2> <p>One of the biggest strengths of StackStorm is its ecosystem. StackStorm’s recent 2.1 release included a new <a href="">Exchange</a> which provides a new home for the <strong>over 450 integrations</strong> that already exist as part of the StackStorm ecosystem. These integrations allow StackStorm to interact with 3rd party systems.</p> <div style="text-align:center;"><a href=""><img src="" width="900" /></a></div> <p>In StackStorm, we call these integrations with <a href="">“Packs”</a>. Packs are the atomic unit of deployment for integrations and extensions to StackStorm. This means that regardless of what you’re trying to implement, whether it’s a new Action, Sensor, Rule, or Sensor, it’s done with Packs.</p> <p>As of StackStorm 2.1, pack management has also been re-vamped and improved (we’ll explore packs and pack management in detail in a future post). Installing a new integration is a one-line command. Want to allow StackStorm to run Ansible playbooks? Just run:</p> <div class="highlighter-rouge"><pre class="highlight"><code>st2 pack install ansible </code></pre> </div> <p>Now that we’ve covered packs, let’s talk about some of the components you will likely find in a pack.</p> <h2 id="actions">Actions</h2> <p>Though it’s important to understand that StackStorm is all about event-driven automation, it’s also useful to spend some time talking about what StackStorm can <strong>do</strong>. Being able to watch for all the events in the world isn’t very useful if you can’t do anything about what you see. In StackStorm, we can accomplish such things through “<a href="">Actions</a>”. Some examples include:</p> <ul> <li>Push a new router configuration</li> <li>Restart a service on a server</li> <li>Create a virtual machine</li> <li>Acknowledge a Nagios / PagerDuty alert</li> <li>Bounce a switchport</li> <li>Send a message to Slack</li> <li>Start a Docker container</li> </ul> <p>There are many others - and the list is growing all the time in the StackStorm <a href="">Exchange</a>.</p> <p>One of things that attracted me to the StackStorm project is the fact that Actions are designed very generically, meaning they can be written in any language. This is similar to what I’ve done with testlets in <a href="">ToDD</a>, and what Ansible has done with their modules. This generic interface allows you to take scripts you already have and are using in your environment, and begin using them as event-driven actions, <a href="">with only a bit of additional logic</a>. As long as that script conforms to this standard, they can be used as an Action.</p> <p>There are several actions bundled with StackStorm (truncated for easy display):</p> <div class="highlighter-rouge"><pre class="highlight"><code>vagrant@st2learn:~$ st2 action list +---------------------------------+---------+------------------------------------------------- | ref | pack | description +---------------------------------+---------+------------------------------------------------- | chatops.format_execution_result | chatops | Format an execution result for chatops | chatops.post_message | chatops | Post a message to stream for chatops | chatops.post_result | chatops | Post an execution result to stream for chatops | core.announcement | core | Action that broadcasts the announcement to all s | | | consumers. | core.http | core | Action that performs an http request. | core.local | core | Action that executes an arbitrary Linux command | | | localhost. </code></pre> </div> <p>It’s important to consider these since they may provide you with the functionality you need out of the gate. For instance, lots of systems these days come with REST APIs, and “core.http”, which allows you to send an HTTP request, may be all the Action functionality you need. Even if the predefined Actions don’t suit you, check the <a href="">Exchange</a> for a pack that may include an Action that gives you the functionality you’re looking for.</p> <p>Nevertheless, it may sometimes be necessary to create your own Actions.. We’ll go through this in a future blog post, but for now, understand that actions are defined by two files:</p> <ul> <li>A metadata file, usually in YAML, that describes the action to StackStorm</li> <li>A script file (i.e. Python) that implements the Action logic</li> </ul> <p>Actions may depend on certain environmental factors to run. StackStorm makes this possible through “Action Runners”. For instance, you may have a Python script you wish to use as an Action; in this case, you’d leverage the “python-script” runner. Alternatively, you may just want to run an existing Linux command as your Action. In this case you would want to use the “local-shell-cmd” runner. There are <a href="local-shell-cmd">many other published runners</a>, with more on the way.</p> <h2 id="sensors-and-triggers">Sensors and Triggers</h2> <p>For event-driven automation to work, information about the world needs to be brought in to the system so that we can act upon it. In StackStorm, this is done through <a href="">Sensors</a>. Sensors, like your own sense of sight or smell, allow StackStorm to observe the world around it, so that actions can eventually be taken on that information.</p> <blockquote> <p>StackStorm was not designed to be a monitoring tool, so you’ll still want to use whatever monitoring you already have in place. Sensors can/should be used to get data out of a monitoring system and take action accordingly.</p> </blockquote> <p>Sensors can be active or passive. An example of an “active” sensor would be something that actively polls an external entity, like Twitter’s API, for instance. Alternatively, sensors can also be passive; an example of this would be a sensor that subscribes to a message queue, or a streaming API, and simply sits quietly until a message is received.</p> <p>Both sensor types bring data into StackStorm, but the data is somewhat raw. In order to make sense of the data brought in by sensors, and to allow StackStorm to take action on that data, Sensors can also define “Triggers”. These help StackStorm identify incoming “events” from the raw telemetry brought in by Sensors. Triggers are useful primarily when creating a Rule, which is explained in the next section.</p> <p>Similarly to Actions, Sensors are defined using two files:</p> <ul> <li>A YAML metadata file describing the sensor to StackStorm</li> <li>A Python script that implements the sensor logic</li> </ul> <p>An example YAML metadata file might look like this:</p> <div class="highlighter-rouge"><pre class="highlight"><code>--- class_name: "SampleSensor" entry_point: "" description: "Sample sensor that emits triggers." trigger_types: - name: "event" description: "An example trigger." payload_schema: type: "object" properties: executed_at: type: "string" format: "date-time" default: "2014-07-30 05:04:24.578325" </code></pre> </div> <blockquote> <p>The particular implementation of the Sensor will determine if it is a “passive” or “active sensor”; there are two Python classes that you can inherit from to determine which Sensor type you’re creating.</p> </blockquote> <h2 id="rules">Rules</h2> <p>“<a href="">Rules</a>” bring the two concepts of Sensors and Actions together. A Rule is a definition that, in English, says “when this happens, do this other thing”. You may remember that Sensors bring data into StackStorm, and Triggers allow StackStorm to get a handle on when certain things happen with that data. Rules make event-driven automation possible by watching these Triggers, and kicking off an Action (or a Workflow, as we’ll see in the next section).</p> <p>Rules are primarily composed of three components:</p> <ul> <li><strong>Trigger</strong>: “What trigger should I watch?””</li> <li><strong>Criteria</strong>: “How do I know when that trigger indicates I should do something?””</li> <li><strong>Action</strong>: “What should I do?””</li> </ul> <p>This is a straightforward concept if you look at a sample YAML definition for a Rule:</p> <div class="highlighter-rouge"><pre class="highlight"><code>--- name: "rule_name" # required pack: "examples" # optional description: "Rule description." # optional enabled: true # required trigger: # required type: "trigger_type_ref" criteria: # optional trigger.payload_parameter_name1: type: "regex" pattern : "^value$" trigger.payload_parameter_name2: type: "iequals" pattern : "watchevent" action: # required ref: "action_ref" parameters: # optional foo: "bar" baz: "" </code></pre> </div> <p>Think of “Rules” as the foundation of event-driven automation. They really are the core of what makes “If <em>__ then __</em>” possible.</p> <p>Stackstorm’s architecture keeps everything very logically separate. Sensors sense. Actions act. Then, rules tie them together and allow you to have a truly autonomous system as a result.</p> <h2 id="workflows">Workflows</h2> <p>Even simple actions rarely take place in isolation. For instance, when you detect that an application node has shut down, there could be ten or more discrete things you need to do in order to properly decommission that node in related systems. So, event-driven automation isn’t always just about kicking off a single action, but rather a “<a href="">Workflow</a>” of actions.</p> <p>In StackStorm, we use <a href="">OpenStack Mistral</a> to define workflows. Mistral is a service that’s part of the OpenStack project, and we <a href="">bundle it with StackStorm</a>. Mistral also <a href="">defines a YAML-based Domain-Specific Language (DSL)</a> that’s used to define the logic and flow of the workflow.</p> <p>In the following simple example, we define a Mistral workflow that accepts an arbitrary linux command as input, runs it, and prints the result to stdout:</p> <div class="highlighter-rouge"><pre class="highlight"><code>--- version: '2.0' examples.mistral-basic: description: A basic workflow that runs an arbitrary linux command. type: direct input: - cmd output: stdout: &lt;% $.stdout %&gt; tasks: task1: action: core.local cmd=&lt;% $.cmd %&gt; publish: stdout: &lt;% task(task1).result.stdout %&gt; stderr: &lt;% task(task1).result.stderr %&gt; </code></pre> </div> <p>Workflows are also powerful in that you can make decisions within them and take different actions depending on the output of previous tasks. This is done by inserting little “<a href="">YAQL</a>” statements in the workflow (note the statements underneath “on-success” below):</p> <div class="highlighter-rouge"><pre class="highlight"><code>--- version: '2.0' examples.mistral-branching: description: &gt; A sample workflow that demonstrates how to use conditions to determine which path in the workflow to take. type: direct input: - which tasks: t1: action: core.local input: cmd: "printf &lt;% $.which %&gt;" publish: path: &lt;% task(t1).result.stdout %&gt; on-success: - a: &lt;% $.path = 'a' %&gt; - b: &lt;% $.path = 'b' %&gt; - c: &lt;% not $.path in list(a, b) %&gt; a: action: core.local input: cmd: "echo 'Took path A.'" b: action: core.local input: cmd: "echo 'Took path B.'" c: action: core.local input: cmd: "echo 'Took path C.'" </code></pre> </div> <p>Based on the output from task “t1”, we can choose which of the next tasks will take place.</p> <p>As you can see, Mistral workflows can be simple when you want it to be, but can also scale up to really powerful complex workflows as well. See the <a href="">StackStorm/Mistral</a> documentation for more examples.</p> <h1 id="conclusion">Conclusion</h1> <p>StackStorm has a huge community and it’s growing. Check out our <a href="">Community</a> page, where you’ll find information about how to contact us. Also make sure you follow the links there to join the Slack community (free and open), we’d love to have you even if you just want to ask some questions.</p> <p>Our <a href="">2.1 release also happened recently</a>, and it introduces a lot of new features. We’re working hard to keep putting more awesome into StackStorm, and actively want your feedback on it. There’s a lot of opportunity for the network industry in particular to take advantage of event-driven automation, and I personally will be working very hard to bridge the gap between the two.</p> <p>Thanks for reading, and stay tuned for the next post, covering the internal architecture of StackStorm.</p> Fri, 16 Dec 2016 00:00:00 +0000 A New Automation Chapter Begins <p>Two years ago, while I worked as a network engineer/consultant, I felt strongly that the industry was ripe for change. In February 2015 I jumped feet-first into the world of network automation by going back to my roots in software development, combining those skills with the lessons I learned from 3 years of network engineering.</p> <p>I’ve learned a ton in the last 2 years - not just at the day job but by actively participating in the automation and open source communities. I’ve co-authored a <a href="">network automation book</a>. I’ve released an open source project to facilitate <a href="">automated and distributed testing</a> of network infrastructure. I’ve <a href="">spoken publicly</a> about many of these concepts and more.</p> <p>Despite all this, there’s a lot left to do, and I want to make sure I’m in the best place to help move the industry forward. My goal is and has always been to help the industry at large realize the benefits of automation, and break the preconception that automation is only useful for big web properties like Google and Facebook. Bringing these concepts down to Earth and providing very practical steps to achieve this goal is a huge passion of mine.</p> <p>Automation isn’t just about running some scripts - it’s about autonomous software. It’s about creating a pipeline of actions that take place with minimal human input. It’s about maintaining high quality software. I wrote about this and more yesterday in my post on the “<a href="">Principles of Automation</a>”.</p> <h1 id="stackstorm">StackStorm</h1> <p>Later this month, I’m starting a new chapter in my career and joining the team at <a href="">StackStorm</a>.</p> <p>In short, StackStorm (<a href="">the project</a>) is an event-driven automation platform. Use cases include auto-remediation, security responses, facilitated troubleshooting, and complex deployments.</p> <p>StackStorm presented at the recent <a href="">Network Field Day 12</a> and discussed not only the core platform, but some of the use cases that, while not specifically network-centric, are important to consider:</p> <div style="text-align:center;"><iframe width="560" height="315" src="" frameborder="0" allowfullscreen=""></iframe></div> <p>When I first saw StackStorm, I realized quickly that the project aligned well with the <a href="">Principles of Automation</a> I was rattling around in my head, especially the Rule of Autonomy, which dictates that automation should be driven by input from other software systems. StackStorm makes it easy to move beyond simple “scripts” and truly drive decisions based on events that take place elsewhere.</p> <p>So, how does this change things in terms of my community involvement? Actually I expect this to improve. Naturally, you’ll likely see me writing and talking about StackStorm and related technologies - not just because they’re my employer but because the project matches well with my automation ideals and principles. This does NOT mean that I will stop talking about other concepts and projects. One unique thing about automation is that it’s never a one-size-fits-all….you’re always going to deal with multiple tools in a pipeline to get the job done. I am still very passionate about the people and process problems that aren’t tackled directly by technology solutions, and I plan to continue to grow my own experience in these areas and share them with you all.</p> <p>I still very strongly believe that the first problems we should be solving in the networking industry, and in IT as a whole, are problems of culture and process. So, from that perspective, nothing has changed - but from this new team I feel like I’ll have the support and platform I need to really get these ideas out there.</p> <p>Lastly, there are still <a href="">openings on the team</a> so if you’re passionate about automation, please consider applying.</p> <p>By no means am I done yet - but I do want to take the opportunity to say <strong>Thank You</strong> to all who have been a part of my public journey for the past 5+ years. I couldn’t have had the learning experiences I’ve had without readers who were just as passionate about technology. My goal is only to increase my involvement in the community in the years to come, and I hope that what I contribute is helpful.</p> <blockquote> <p>I attended NFD12 as a delegate as part of <a href="">Tech Field Day</a>, well before I started talking with StackStorm team about employment opportunities. Events like these are sponsored by networking vendors who may cover a portion of our travel costs. In addition to a presentation (or more), vendors may give us a tasty unicorn burger, <a href="">warm sweater made from presenter’s beard</a> or a similar tchotchke. The vendors sponsoring Tech Field Day events don’t ask for, nor are they promised any kind of consideration in the writing of my blog posts … and as always, all opinions expressed here are entirely my own. (<a href="">Full disclaimer here</a>)</p> </blockquote> Wed, 19 Oct 2016 00:00:00 +0000 Principles of Automation <p>Automation is an increasingly interesting topic in pretty much every technology discipline these days. There’s lots of talk about tooling, practices, skill set evolution, and more - but little conversation about fundamentals. What little <strong>is</strong> published by those actually practicing automation, usually takes the form of source code or technical whitepapers. While these are obviously valuable, they don’t usually cover some of the fundamental basics that could prove useful to the reader who wishes to perform similar things in their own organization, but may have different technical requirements.</p> <p>I write this post to cover what I’m calling the “Principles of Automation”. I have pondered this topic for a while and I believe I have three principles that cover just about any form of automation you may consider. These principles have nothing to do with technology disciplines, tools, or programming languages - they are fundamental principles that you can adopt regardless of the implementation.</p> <p>I hope you enjoy.</p> <blockquote> <p>It’s a bit of a long post, so TL;DR - automation isn’t magic. It isn’t only for the “elite”. Follow these guidelines and you can realize the same value regardless of your scale.</p> </blockquote> <h1 id="factorio">Factorio</h1> <p>Lately I’ve been obsessed with a game called <a href="">“Factorio”</a>. In it, you play an engineer that’s crash-landed on a planet with little more than the clothes on your back, and some tools for gathering raw materials like iron or copper ore, coal, wood, etc. Your objective is to use these materials, and your systems know-how to construct more and more complicated systems that eventually construct a rocket ship to blast off from the planet.</p> <p>Even the very first stages of this game end up being more complicated than they initially appear. Among your initial inventory is a drill that you can use to mine coal, a useful ingredient for anything that needs to burn fuel - but the drill itself actually requires that same fuel. So, the first thing you need to do is mine some coal by hand, to get the drill started.</p> <div style="text-align:center;"><a href=""><img src="" width="600" /></a></div> <p>We can also use some of the raw materials to manually kick-start some automation. With a second drill, we can start mining for raw iron ore. In order to do that we need to build a “burner inserter”, which moves the coal that the first drill gathered into the second drill:</p> <div style="text-align:center;"><a href=""><img src="" width="600" /></a></div> <p>Even this very early automation requires manual intervention, as it all requires coal to burn, and not everything has coal automatically delivered to it (yet).</p> <p>Now, there are things you can do to improve <strong>your own</strong> efficiency, such as building/using better tools:</p> <div style="text-align:center;"><a href=""><img src="" width="600" /></a></div> <p>However, this is just one optimization out of a multitude. Our objectives will never be met if we only think about optimizing the manual process; we need to adopt a “big picture” systems mindset.</p> <p>Eventually we have a reasonably good system in place for mining raw materials; we now need to move to the next level in the technology tree, and start smelting our raw iron ore into iron plates. As with other parts of our system, at first we start by manually placing raw iron ore and coal into a furnace. However, we soon realize that we can be much more efficient if we allow some burner inserters to take care of this for us:</p> <div style="text-align:center;"><a href=""><img src="" width="600" /></a></div> <p>With a little extra work we can automate coal delivery to this furnace as well:</p> <div style="text-align:center;"><a href=""><img src="" width="600" /></a></div> <p>There’s too much to Factorio to provide screenshots of every step - the number of technology layers you must go through in order to unlock fairly basic technology like solar power is astounding; not to mention being able to launch a fully functional rocket.</p> <div style="text-align:center;"><a href=""><img src="" width="600" /></a></div> <p>As you continue to automate processes, you continue to unlock higher and higher capabilities and technology; they all build on each other. Along the way you run into all kinds of issues. These issues could arise in trying to create new technology, or you could uncover a bottleneck that didn’t reveal itself until the system scaled to a certain point.</p> <p>For instance, in the last few screenshots we started smelting some iron plates to use for things like pipes or circuit boards. Eventually, the demand for this very basic resource will outgrow the supply - so as you build production facilities, you have to consider how well they’ll scale as the demand increases. Here’s an example of an iron smelting “facility” that’s built to scale horizontally:</p> <div style="text-align:center;"><a href=""><img src="" width="600" /></a></div> <p>Scaling out one part of this system isn’t all you need to be aware of, however. The full end-to-end supply chain matters too.</p> <p>As an example, a “green” science pack is one resource that’s used to perform research that unlocks technologies in Factorio. If you are running short on these, you may immediately think “Well, hey, I need to add more factories that produce green science packs!”. However, the bottleneck might not be the number of factories producing green science, but further back in the system.</p> <div style="text-align:center;"><a href=""><img src="" width="250" /></a></div> <p>Green science packs are made by combining a single inserter with a single transport belt panel - and in the screenshot above, while we have plenty of transport belt panels, we aren’t getting any inserters! This means we now have to analyze the part of our system that produces that part - which also might be suffering a shortage in <strong>it’s</strong> supply chain. Sometimes such shortages can be traced all the way down to the lowest level - running out of raw ore.</p> <p>In summary, Factorio is a really cool game that you should definitely check out - but if you work around systems as part of your day job, I encourage you to pay close attention to the following sections, as I’d like to recap some of the systems design principles that I’ve illustrated above. I really do believe there are some valuable lessons to be learned here.</p> <p>I refer to these as the Principles of Automation, and they are:</p> <ul> <li>The Rule of Algorithmic Thinking</li> <li>The Rule of Bottlenecks</li> <li>The Rule of Autonomy</li> </ul> <h1 id="the-rule-of-algorithmic-thinking">The Rule of Algorithmic Thinking</h1> <p>Repeat after me: “Everything is a system”.</p> <p>Come to grips with this, because this is where automation ceases to be some magical concept only for the huge hyperscale companies like Facebook and Google. Everything you do, say, or smell is part of a system, whether you think it is or not; from the complicated systems that power your favorite social media site, all the way down to the water cycle:</p> <div style="text-align:center;"><a href=""><img src="" width="500" /></a></div> <blockquote> <p>By the way, just as humans are a part of the water cycle, humans are and always will be part of an automated system you construct.</p> </blockquote> <p>In all areas of IT there is a lot of hand-waving; engineers claim to know a technology, but when things go wrong, and it’s necessary to go deeper, they don’t really know it that well. Another name for this could be “user manual” engineering - they know how it should work when things go well, but don’t actually know what makes it tick, which is useful when things start to break.</p> <p>There are many tangible skills that you can acquire that an automation or software team will find attractive, such as language experience, and automated testing. It’s important to know how to write idiomatic code. It’s important to understand what real quality looks like in software systems. However, these things are fairly easy to learn with a little bit of experience. What’s more difficult is understanding what it means to write a <em>meaningful</em> test, and not just check the box when a line of code is “covered”. That kind of skill set requires more experience, and a lot of passion (you have to <strong>want</strong> to write good tests).</p> <p>Harder still is the ability to look at a system with a “big picture” perspective, while also being able to drill in to a specific part and optimize it…and most importantly, the wisdom to know when to do the latter. I like to refer to this skill as “Algorithmic Thinking”. Engineers with this skill are able to mentally deconstruct a system into it’s component parts without getting tunnel vision on any one of them - maintaining that systems perspective.</p> <blockquote> <p>If you think Algorithms are some super-advanced topic that’s way over your head, they’re not. See one of my <a href="">earlier posts</a> for a demystification of this subject.</p> </blockquote> <p>A great way to understand this skill is to imagine you’re in an interview, and the interviewer asks you to enumerate all of the steps needed to load a web page. Simple, right? It sure seems like it at first, but what’s really happening is that the interviewer is trying to understand how well you know (or want to know) all of the complex activities that take place in order to load a web page. Sure, the user types a URL into the address bar and hits enter - then the HTTP request magically takes place. Right? Well, how did the machine know what IP address was being represented by that domain name? That leads you to the DNS configuration. How did the machine know how to reach the DNS server address? That leads you to the routing table, which likely indicates the default gateway is used to reach the DNS server. How does the machine get the DNS traffic to the default gateway? In that case, ARP is used to identify the right MAC address to use as the destination for that first hop.</p> <div style="text-align:center;"><a href=""><img src="" width="500" /></a></div> <p>Those are just some of the high-level steps that take place <em>before the request can even be sent</em>. Algorithmic thinking recognizes that each part of a system, no matter how simple, has numerous subsystems that all perform their own tasks. It is the ability to understand that nothing is magic - only layers of abstraction. These days, this is understandably a tall order. As technology gets more and more advanced, so do the abstractions. It may seem impossible to be able to operate at both sides of the spectrum.</p> <blockquote> <p>It’s true, no one can know everything. However, a skilled engineer will have the wisdom to dive behind the abstraction when appropriate. After all, the aforementioned “problem” seemed simple, but there are a multitude of things going on behind the scenes - any one of which could have prevented that page from loading. Being able to think algorithmically doesn’t mean you know everything, but it does mean that when a problem arises, it might be time to jump a little further down the rabbit hole.</p> </blockquote> <p>Gaining experience with automation is all about demystification. Automation is not magic, and it’s not reserved only for Facebook and Google. It is the recognition that we are all part of a system, and if we don’t want to get paged at 3AM anymore, we may as well put software in place that allows us to remove ourselves from that part of the system. If we have the right mindset, we’ll know where to apply those kinds of solutions.</p> <p>Most of us have close friends or family members that are completely non-technical. You know, the type that breaks computers just by looking at them. My suggestion to you is this: if you really want to learn a technology, figure out how to explain it to them. Until you can do that, you don’t really know it that well.</p> <h1 id="the-rule-of-bottlenecks">The Rule of Bottlenecks</h1> <p>Recently I was having a conversation with a web developer about automated testing. They made the argument that they wanted to use automated testing, but couldn’t because each web application they deployed for customers were snowflake custom builds, and it was not feasible to do anything but manual testing (click this, type this). Upon further inspection, I discovered that the majority of their customer requirements were nearly identical. In this case, the real bottleneck wasn’t just that they weren’t doing automated testing; they weren’t even setting themselves up to be able to do it in the first place. In terms of systems design, the problem is much closer to the source - I don’t mean “source code”, but that the problem lies further up the chain of events that could lead to being able to do automated testing.</p> <p>I hear the same old story in networking. “Our network can’t be automated or tested, we’re too unique. We have a special snowflake network”. This highlights an often overlooked part of network automation, and that is that the network design has to be solid. Network automation isn’t just about code - it’s about simple design too; the network has to be designed with automation in mind.</p> <blockquote> <p>This is what DevOps is <strong>really</strong> about. Not automation or tooling, but communication. The ability to share feedback about design-related issues with the other parts of the technology discipline. Yes, this means you need to seek out and proactively talk to your developers. Developers, this means sitting down with your peers on the infrastructure side. Get over it and learn from each other.</p> </blockquote> <p>Once you’ve learned to think Algorithmically, you start to look at your infrastructure like a graph - a series of nodes and edges. The nodes would be your servers, your switches, your access points, your operating systems. These nodes communicate with each other on a huge mesh of edges. When failures happen, they often cause a cascading effect, not unlike the cascading shortages I illustrated in Factorio where a shortage of green science packs doesn’t <em>necessarily</em> mean I need to spin up more green science machines. The bottleneck might not always be where you think it is; in order to fix the real problem, understanding how to locate the <em>real</em> bottleneck is a good skill to have.</p> <p>The cause of a bottleneck could be bad design:</p> <div style="text-align:center;"><a href=""><img src="" width="600" /></a></div> <p>Or it could be improper/insufficient input (which could in turn be caused by a bad design elsewhere):</p> <div style="text-align:center;"><a href=""><img src="" width="250" /></a></div> <p>One part of good design is understanding the kind of scale you might have to deal with and reflecting it in your design. This doesn’t mean you have to build something that scales to trillions of nodes today, only that the system you put in place doesn’t prevent you from scaling organically in the near future.</p> <p>As an example, when I built a new plant in Factorio to produce copper wiring, I didn’t build 20 factories, I started with 2 - but I allowed myself room for 20, in case I needed it in the future. In the same way, you can design with scale in mind without having to boil the ocean and <strong>actually</strong> build a solution that meets some crazy unrealistic demand on day one.</p> <p>This blog post is already way too long to talk about proper design, especially considering that this post is fairly technology-agnostic. For now, suffice it to say that having a proper design is important, especially if you’re going in to a new automation project. It’s okay to write some quick prototypes to figure some stuff out, but before you commit yourself to a design, do it on paper (or whiteboard) first. Understanding the steps there will save you a lot of headaches in the long run. Think about the system-to-be using an Algorithmic mindset, and walk through each of the steps in the system to ensure you understand each level.</p> <div style="text-align:center;"><a href=""><img src="" width="300" /></a></div> <p>As the system matures, it’s going to have bottlenecks. That bottleneck might be a human being that still holds power over a manual process you didn’t know existed. It might be an aging service that was written in the 80s. Just like in Factorio, something somewhere will be a bottleneck - the question is, do you know where it is, and is it worth addressing? It may not be. Everything is a tradeoff, and some bottlenecks are tolerable at certain points in the maturity of the system.</p> <h1 id="the-rule-of-autonomy">The Rule of Autonomy</h1> <p>I am <strong>very</strong> passionate about this section; here, we’re going to talk about the impact of automation on human beings.</p> <p>Factorio is a game where you ascend the tech tree towards the ultimate goal of launching a rocket. As the game progresses, and you automate more and more of the system (which you have to do in order to complete the game in any reasonable time), you unlock more and more elaborate and complicated technologies, which then enable you to climb even higher. Building a solid foundation means you spend less time fussing with gears and armatures, and more time unlocking capabilities you simply didn’t have before.</p> <p>In the “real” world, the idea that automation means human beings are removed from a system is patently false. At first light, automation actually creates more opportunities for human beings because it enables new capabilities that weren’t possible before it existed. Anyone who tells you otherwise doesn’t have a ton of experience in automation. Automation is not a night/day difference - it is an iterative process. We didn’t start Factorio with a working factory - we started it with the clothes on our back.</p> <blockquote> <p>This idea is well described by <a href="">Jevon’s Paradox</a>, which basically states that the more efficiently you produce a resource, the greater the demand for that resource grows.</p> </blockquote> <p>Not only is automation highly incremental, it’s also imperfect at every layer. Everything in systems design is about tradeoffs. At the beginning of Factorio, we had to manually insert coal into many of the components; this was a worthy tradeoff due to the simple nature of the system. It wasn’t <strong>that</strong> big of a deal to do this part manually at that stage, because the system was an infant.</p> <div style="text-align:center;"><a href=""><img src="" width="600" /></a></div> <p>However, at some point, the our factory needed to grow. We needed to allow the two parts to exchange resources directly instead of manually ferrying them between components.</p> <p>The Rule of Autonomy is this: machines can communicate with other machines really well. Let them. Of course, automation is an iterative system, so you’ll undoubtedly start out by writing a few scripts and leveraging some APIs to do some task you previously had to do yourself, but don’t stop there. Always be asking yourself if you need to be in the direct path at all. Maybe you don’t <strong>really</strong> need to provide input to the script in order for it to do it’s work, maybe you can change that script to operate autonomously by getting that input from some other system in your infrastructure.</p> <p>As an example, I once had a script that would automatically put together a Cisco MDS configuration based on some WWPNs I put into a spreadsheet. This script wasn’t useless, it saved me a lot of time, and helped ensure a consistent configuration between deployments. However, it still required my input, specifically for the WWPNs. I quickly decided it wouldn’t be that hard to extend this script to make API calls to Cisco UCS to get those WWPNs and automatically place them into the switch configuration. I was no longer required for that part of the system, it operated autonomously. Of course, I’d return to this software periodically to make improvements, but largely it was off my plate. I was able to focus on other things that I wanted to explore in greater depth.</p> <p>The goal is to remove humans as functional components of a subsystem so they can make improvements to the system as a whole. Writing code is not magic - it is the machine manifestation of human logic. For many tasks, there is no need to have a human manually enumerate the steps required to perform a task; that human logic can be described in code and used to work on the human’s behalf. So when we talk about replacing humans in a particular part of a system, what we’re really talking about is reproducing the logic that they’d employ in order to perform a task as code that doesn’t get tired, burnt out, or narrowly focused. It works asynchronously to the human, and therefore will allow the human to then go make the same reproduction elsewhere, or make other improvements to the system as a whole. If you insist on staying “the cog” in a machine, you’ll quickly lose sight of the big picture.</p> <div style="text-align:center;"><a href=""><img src="" width="600" /></a></div> <p>This idea that “automation will take my job” is based on the incorrect assumption is that once automation is in place, the work is over. Automation is not a monolithic “automate everything” movement. Like our efforts in Factorio, automation is designed to take a particular workflow in one very small part of the overall system and take it off of our plates, once we understand it well enough. Once that’s done, our attention is freed up to explore new capabilities we were literally unable to address while we were mired in the lower details of the system. We constantly remove ourselves as humans from higher and higher parts of the system.</p> <p>Note that I said “parts” of the system. Remember: everything is a system, so it’s foolish to think that human beings can (or should) be entirely removed - you’re always going to need human input to the system as a whole. In technology there are just some things that require human input - like new policies or processes. Keeping that in mind, always be asking yourself “Do I really need <strong>human</strong> input at <strong>this</strong> specific part of the system?” Constantly challenge this idea.</p> <p>Automation is <strong>so</strong> not about removing human beings from a system. It’s about moving humans to a new part of the system, and about allowing automation to be driven by events that take place elsewhere in the system.</p> <h1 id="conclusion">Conclusion</h1> <p>Note that I haven’t really talked about specific tools or languages in this post. It may seem strange - often when other automation junkies talk about how to get involved, they talk about learning to code, or learning Ansible or Puppet, etc. As I’ve mentioned earlier in this post (and as I’ve presented at conferences), this is all very meaningful - at some point the rubber needs to meet the road. However, when doing this yourself, hearing about someone else’s implementation details is not enough - you need some core fundamentals to aim for.</p> <p>The best way to get involved with automation is to want it. I can’t make you want to invest in automation as a skill set, nor can your manager; only you can do that. I believe that if the motivation is there, you’ll figure out the right languages and tools for yourself. Instead, I like to focus on the fundamentals listed above - which are language and tool agnostic. These are core principles that I wish I had known about when I started on this journey - principles that don’t readily reveal themselves in a quick Stack Overflow search.</p> <p>That said, my parting advice is:</p> <ol> <li><strong>Get Motivated</strong> - think of a problem you actually care about. “Hello World” examples get old pretty fast. It’s really hard to build quality systems if you don’t care. Get some passion, or hire folks that have it. Take ownership of your system. Make the move to automation with strategic vision, and not a half-cocked effort.</li> <li><strong>Experiment</strong> - learn the tools and languages that are most powerful for you. Automation is like cooking - you can’t just tie yourself to the recipe book. You have to learn the fundamentals and screw up a few times to really learn. Make mistakes, and devise automated tests that ensure you don’t make the same mistake twice.</li> <li><strong>Collaborate</strong> - there are others out there that are going through this journey with you. Sign up for the <a href="">networktocode slack channel (free)</a> and participate in the community.</li> </ol> Tue, 18 Oct 2016 00:00:00 +0000 ToDD Has Moved! <p>ToDD has been out in the wild for 6 months, and in that time I’ve been really pleased with it’s growth and adoption. Considering this was just a personal side-project, I’ve been blown away by what it’s doing for my own learning experiences as well as for the network automation pipelines of the various folks that pop onto the slack channel asking questions.</p> <p>For the last 6 months I’ve hosted ToDD on <a href="">my personal Github profile</a>. It was a good initial location, becuase there really was no need at the time to do anything further.</p> <p>However, as of tonight, ToDD’s new permanent location is <a href=""></a>. Read on for some reasons for this.</p> <div style="text-align:center;"><a href=""><img src="" width="400" /></a></div> <h1 id="native-testlets">Native Testlets</h1> <p>One of the biggest reasons for creating the <a href="">“toddproject” organization</a> came about when I started rewriting some of the testlets in Go. These are called <a href="">native testlets</a> and the intention is that they are packaged alongside ToDD because they’re useful to a very wide percentage of ToDD’s userbase (in the same way the legacy bash testlets were).</p> <p>For this reason, I created the “toddproject” organization, and once that was done, it made a lot of sense to move ToDD there as well.</p> <p>Rewriting the legacy bash testlets in Go offers several advantages, but the top two are:</p> <ul> <li>Ability to take advantage of some common code in ToDD so that the testlets aren’t reinventing the wheel</li> <li>Better cross-platform testing (existing testlets pretty much required linux)</li> </ul> <p>Currently only the “ping” testlet has been implemented in Go - but I hope to replace “http” and “iperf” soon with Go alternatives.</p> <h1 id="updated-docs">Updated Docs</h1> <p>In addition to moving to a new location, the documentation for ToDD has been massively improved and simplified:</p> <div style="text-align:center;"><a href=""><img src="" width="900" /></a></div> <p>As you can see, the order now actually makes sense. Please check out <a href=""></a> and let me know what you think!</p> Fri, 30 Sep 2016 00:00:00 +0000 The Importance of the Network Software Supply Chain <p>At <a href="">Networking Field Day 12</a>, we heard from a number of vendors that offered solutions to some common enterprise network problems, from management, to security, and more.</p> <p>However, there were a few presentations that didn’t seem directly applicable to the canonical network admin’s day-to-day. This was made clear by some comments by delegates in the room, as well as others tweeting about the presentation.</p> <h1 id="accelerating-the-x86-data-plane">Accelerating the x86 Data Plane</h1> <p>Intel, for instance, <a href="">spent a significant amount of time</a> discussing the <a href="">Data Plane Development Kit (DPDK)</a>, which provides a different way of leveraging CPU resources for fast packet processing.</p> <div style="text-align:center;"><iframe width="560" height="315" src="" frameborder="0" allowfullscreen=""></iframe></div> <p>In their presentation, Intel explained the various ways that they’ve circumvented some of the existing bottlenecks in the Linux kernel, resulting in a big performance increase for applications sending and receiving data on the network. DPDK operates in user space, meaning the traditional overhead associated with copying memory resources between user and kernel space is avoided. In addition, techniques like parallel processing and poll mode drivers (as opposed to the traditional interrupt processing model) means packet processing can be done much more efficiently, resulting in better performance.</p> <p>This is all great (and as a software nerd, very interesting to me personally) but what does this have to do with the average IT network administrator?</p> <h1 id="pay-no-attention-to-the-overlay-behind-the-curtain">Pay No Attention to the Overlay Behind the Curtain</h1> <p>In addition, Teridion spent some time discussing their solution to increasing performance between content providers by actively monitoring performance on the internet through cloud-deployed agents and routers, and deploying overlays as necessary to ensure that the content uses the best-performing path at all times.</p> <div style="text-align:center;"><iframe width="560" height="315" src="" frameborder="0" allowfullscreen=""></iframe></div> <p>In contrast to the aforementioned presentation from Intel, who have been very clear about the deepest technical detail of their solutions, Teridion was very guarded about most of the interesting technical detail of their solution, claiming it was part of their “special sauce”. While in some ways this is understandable (they are not the size of Intel, and might want to be more careful about giving away their IP), they were in front of the Tech Field Day audience and using terms like “pixie dust” in lieu of technical detail is ineffective at best.</p> <p>Despite this, and after some questioning by the delegates in the room, it became clear that their solution was also not targeted towards enterprise IT, but rather at the content providers themselves.</p> <p>Like the technologies discussed by Intel, the Teridion solution has become one of the “behind the scenes” technologies that we might want to consider when evaluating content providers. As an enterprise network architect, I may not directly interface with Teridion, but knowing more about them will tell me a great deal about how well a relationship with someone who <strong>is</strong> using them might go. When someone isn’t willing to share those details, I ask myself “Why am I here?”.</p> <h1 id="caring-about-the-supply-chain">Caring about the Supply Chain</h1> <p>When I walk into the supermarket looking for some chicken to grill, my thoughts are not limited to what’s gone on in that particular store, but also with that store’s supply chain. I care about how those chickens were raised. Perhaps I do not agree with the supermarket chain’s choice in supplier; that will drive my decision to stay in that store, or go down the street to the butcher.</p> <p><strong>In the same way</strong>, we should care about the supply chain behind the solutions we use in our network infrastructure. It’s useful to know if a vendor chose to build their router on DPDK or the like, because it means they recognized the futility of trying to reinvent the wheel and decided to use a common, optimized base. They provide value on top of that. Knowing the details of DPDK means I can know the details of all vendors that choose to use that common base.</p> <blockquote> <p>It’s clear that solutions like what were presented by these two vendors is targeted - not at the hundreds or thousands of enterprise IT customers but rather on a handful of network vendors (in the case of Intel) or big content providers (in the case of Teridion). It obviously makes sense from a technical perspective, but also from a business perspective, since acquiring those customers means Intel and Teridion get all <strong>their</strong> customers as well.</p> </blockquote> <p>Another good example is a <a href="">Packet Pushers podcast we recorded at Network Field Day 11</a>, where we discussed the growing trend of network vendors willing to use an open source base for their operating systems. This is a <strong>good thing</strong>; not only does it help us as customers immediately understand a large part of the technical solution, it also means the vendor isn’t wasting cycles reinventing the wheel and charging me for the privilege.</p> <p>When companies are unwilling to go deeper than describing their technology as “special sauce”, it hurts my ability to conceptualize this supply chain. It’s like if a poultry farmer just waved their hands and said “don’t worry, our chickens are happy”. Can you not <em>at least</em> show me a picture of where you raise the chickens? It’s not like that picture is going to let me immediately start a competing chicken farm.</p> <p>When the world around networking is embracing open source to the point where we’re actually building entire business models around it, the usage of terms like “pixie dust” in lieu of technical detail just smells of old-world thinking. I’m not saying to give everything away for free, but meet me halfway - enable me to conceptualize and make a reasonable decision regarding my software supply chain.</p> <blockquote> <p>I attended NFD12 as a delegate as part of <a href="">Tech Field Day</a>. Events like these are sponsored by networking vendors who may cover a portion of our travel costs. In addition to a presentation (or more), vendors may give us a tasty unicorn burger, <a href="">warm sweater made from presenter’s beard</a> or a similar tchotchke. The vendors sponsoring Tech Field Day events don’t ask for, nor are they promised any kind of consideration in the writing of my blog posts … and as always, all opinions expressed here are entirely my own. (<a href="">Full disclaimer here</a>)</p> </blockquote> Tue, 16 Aug 2016 00:00:00 +0000 CS101: Algorithms <p>First in <a href="">this series</a> is the subject of Algorithms. This topic is very interesting to me because when I first strived to understand what exactly they were, I was expecting something a lot more complicated than what they turned out to be. I think, shamefully, that Hollywood may have had an influence on this, as the term “algorithm” is one of many terms abused by “cyber” movies and the like, portrayed to be some sort of ultimate cyber weapon in the war against Ellingson Mineral Company.</p> <p>The reality is much simpler. “Algorithm” is defined as “a set of steps that are followed in order to solve a mathematical problem or to complete a computer process”. It really is that simple. Think of a mathematical problem that you’d need to solve yourself (ignoring for the moment that there’s likely a 3rd party library that has already done this).</p> <p>A common example is the calculation of the Fibonacci sequence. Forget about writing code for a minute, and think about the problem in plain English. Given a starting sequence (1, 1), how do you continue calculating and adding numbers to this sequence, to produce N number of Fibonacci numbers?</p> <ul> <li>Get the last two numbers in the sequence</li> <li>Add the two numbers together</li> <li>Append the result to the end of the sequence</li> </ul> <p>In plain English, what we’ve described is an algorithm! When considering a problem like this - especially if you’re new to algorithms - it’s useful to describe the solution in this way.</p> <p>Algorithms are all around us. Now that you know this, and how simple the concept is, it’s time to bring the concept to reality with a few examples.</p> <h1 id="code-example">Code Example</h1> <p>Practically, algorithms tend to take the form of a “function” or “method” in a computer program. Algorithms generally have some kind of standardized input and output, so they are commonly placed within the context of a function so their internal logic can be contained, and we simply call them.</p> <p>Most if not all algorithms can be described mathematically. However, I prefer more concrete examples using real-world languages, so we’ll use Python. We’ll use this to implement an example algorithm to calculate the Fibonacci sequence.</p> <p>We don’t want to waste memory resources, so we write some clever code that recursively calls itself to calculate Fibonacci values on the fly, instead of store them.</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">sys</span> <span class="k">def</span> <span class="nf">get_fib_at_n</span><span class="p">(</span><span class="n">N</span><span class="p">):</span> <span class="c"># If n is less than or equal to 1, we know the answer</span> <span class="c"># is equal to n, so let's just return that</span> <span class="k">if</span> <span class="p">(</span><span class="n">N</span> <span class="o">&lt;=</span> <span class="mi">1</span><span class="p">):</span> <span class="k">return</span> <span class="n">N</span> <span class="c"># One-liner that calculates the sum of the N-1 and N-2</span> <span class="c"># (This will result in a recursive calculation until the first</span> <span class="c"># two values have been reached)</span> <span class="k">return</span> <span class="n">get_fib_at_n</span><span class="p">(</span><span class="n">N</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="n">get_fib_at_n</span><span class="p">(</span><span class="n">N</span> <span class="o">-</span> <span class="mi">2</span><span class="p">)</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span> <span class="k">try</span><span class="p">:</span> <span class="c"># N is the position of the fibonacci sequence that we wish to retrieve</span> <span class="n">N</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="k">print</span><span class="p">(</span><span class="n">get_fib_at_n</span><span class="p">(</span><span class="n">N</span><span class="p">))</span> <span class="k">except</span> <span class="nb">IndexError</span><span class="p">:</span> <span class="k">print</span><span class="p">(</span><span class="s">"Please provide N"</span><span class="p">)</span> <span class="n">sys</span><span class="o">.</span><span class="nb">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="k">except</span> <span class="nb">ValueError</span><span class="p">:</span> <span class="k">print</span><span class="p">(</span><span class="s">"Please provide an integer for N"</span><span class="p">)</span> <span class="n">sys</span><span class="o">.</span><span class="nb">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">"__main__"</span><span class="p">:</span> <span class="n">main</span><span class="p">()</span></code></pre></figure> <p>All we have to do is pass in a single argument - N - which indicates the position of the Fibonacci number we wish to retrieve. For instance, if we pass in “3”, we get “2” as output; if we pass in “6”, we get “8”, etc.</p> <div class="highlighter-rouge"><pre class="highlight"><code>~$ python3 3 2 ~$ python3 6 8 </code></pre> </div> <p>What if we pass in a larger N value? Something like 35? This works, but it’s around this time that we start to discover a problem. Running this function with an N value of 35 takes several seconds to compute.</p> <p>What gives? Lower N values seemed to take no time at all, so why is this taking longer just by increasing the value of N?</p> <h2 id="big-o-notation--how-fast-does-this-run">“Big O” Notation / “How Fast Does This Run?”</h2> <p>When discussing algorithms - either one you’ve written yourself, or one that you’ve “inherited”, it’s useful to understand how “fast” a given algorithm will run.</p> <p>However, when we talk about algorithmic speed, we’re not usually talking about actual calculation time; most of the time we’re trying to answer questions like “How is the performance of this algorithm influenced by the input?”.</p> <p>We can use this bash one-liner to run our program with incrementally increasing N values, and time the runtime of each call, so we can see how this time increases with N:</p> <div class="highlighter-rouge"><pre class="highlight"><code>for i in `seq 1 35`; do time python3 $i done </code></pre> </div> <p>I’ll save you the trouble of running this yourself, here’s a graph showing calculation times:</p> <p><a href=""><img src="" alt="" /></a></p> <p>As you can see, the required time to calculate the Nth Fibonacci value increases exponentially as N increases. If we are interested in calculating any large Fibonacci numbers, we’re going to be waiting a very long time, which is impractical.</p> <p>It’s fairly easy to calculate the first few numbers - but what if we need to calculate the first ten thousand? Certainly a big concern is the size of the numbers themselves, but as we can see, we have an even more immediate problem. Our calculation time seems to be increasing exponentially even with N values as low as 35.</p> <p>Computer Scientists use several notations (called “Asymptotic Notations”) to describe algorithms, in order to reason about how the computation time can change depending on the input to the algorithm. One such notation - a very popular example - is “Big O”.</p> <p>There are certainly several factors that can affect the performance of an algorithm, such as the type of hardware being used, the efficiency of the compiler used to compile the source code, etc. However, asymptotic notations like Big O are predicated on the idea that we can ignore some of these messy details in order to describe algorithmic performance at a high level and solve some of the larger problems.</p> <p>Big O notation intentionally throws away inconsequential constants, providing a cleaner notation. A good example of this is including calculations for things that always take a constant amount of time - in other words, things that are not directly influenced by the input to the algorithm. For instance, if your algorithm always initializes an array of size N before doing any work, that operation does take some time, but the amount of time it might take only increases at a constant rate. Therefore:</p> <div class="highlighter-rouge"><pre class="highlight"><code>O(3n) = O(n) </code></pre> </div> <blockquote> <p>O(n) sounds good, but is it? What if N is 10 trillion? Don’t fall into the trap of believing that a runtime that was good for one problem is good for another. A big-picture understanding of the problem space is still highly relevant.</p> </blockquote> <p>It’s not exact, and it’s not meant to be. Big O notation is meant to give us a “big picture” overview of the performance of a certain algorithm.</p> <blockquote> <p>Note that you might also hear “worst case” vs “best case” within the context of Big O. This refers to the fact that an algorithm can receive a wide variety of input - both very small (which tends to not illuminate performance problems) to very large. In my experience, if this is not explicitly mentioned, the Big O notation refers to “worst case” scenario, such as the largest possible input to an algorithm. This is because “worst case” tends to really show any performance problems in an algorithm.</p> </blockquote> <p>Big O is a “first line of defense” in our optimization journey. If we’re at the point where we need to worry about making “constant” level improvements (things that shave nanoseconds off a computation) then we’re already beyond Big O notation. Think “Big O helps us solve big problems”.</p> <h1 id="big-o-analysis">Big O Analysis</h1> <p>Let’s take a look at some examples and determine our computational complexity in terms of Big O.</p> <p>If an operation within an algorithm simply does not change it’s behavior based on input, we can notate that easily as well. For instance, many functions return a value. This is a one-time operation, since it serves as an exit point for a function. This means that the value of our input N does not have an impact on how many time this runs. We can say that a “return” statement runs at :</p> <div class="highlighter-rouge"><pre class="highlight"><code>O(1) </code></pre> </div> <p>A dead giveaway for a basic performance issue is when you see nested code that is dependent on input. For instance, take the following Python example that calculates the maximum difference between two splits of an input list.</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">maxdiff</span><span class="p">(</span><span class="n">N</span><span class="p">):</span> <span class="s">"""Calculates max difference between two splits of an input list (N) """</span> <span class="n">biggest_diff</span> <span class="o">=</span> <span class="mi">0</span> <span class="n">k</span> <span class="o">=</span> <span class="mi">1</span> <span class="c"># This loop runs at O(input_list) because in the worst-case scenario,</span> <span class="c"># it must iterate over the length of the entire list</span> <span class="k">while</span> <span class="n">k</span> <span class="o">&lt;</span> <span class="nb">len</span><span class="p">(</span><span class="n">input_list</span><span class="p">):</span> <span class="c"># Python's "max" function also iterates over</span> <span class="c"># the input list; it runs at O(input_list)</span> <span class="c"># in this case</span> <span class="n">left_max</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">input_list</span><span class="p">[:</span><span class="n">k</span><span class="p">])</span> <span class="n">right_max</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">input_list</span><span class="p">[</span><span class="n">k</span><span class="p">:])</span> <span class="n">this_diff</span> <span class="o">=</span> <span class="nb">abs</span><span class="p">(</span><span class="n">left_max</span> <span class="o">-</span> <span class="n">right_max</span><span class="p">)</span> <span class="k">if</span> <span class="n">this_diff</span> <span class="o">&gt;</span> <span class="n">biggest_diff</span><span class="p">:</span> <span class="n">biggest_diff</span> <span class="o">=</span> <span class="n">this_diff</span> <span class="n">k</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">return</span> <span class="n">biggest_diff</span></code></pre></figure> <p>As noted by comments in the code above, the outer loop runs over the length of the input list “input_list”. This means, the outer loop runs in O(N) time, where N is the “input_list” parameter.</p> <p>In addition, within this loop, the built-in Python function max() is called twice, once on each part of the list that’s been split up. This ends up roughly running in O(N) time as well. Since this is nested within the loop that’s already running at O(N) time, we say that the maxdiff function runs at O(n^2) time, which is not great. This means that the time it takes to calculate the result will grow exponentially as the length of the input list increases. Eventually, for longer and longer inputs, this time will become infeasible.</p> <p>We can now turn our attention back to the fibonacci example from earlier in this post, which is clearly a very poorly optimized algorithm. We can be fairly confident that recursively calling a function to calculate each number in the sequence can get way out of control, fast. But how fast?</p> <p>If you try to visualize recursion, most often what you come up with is some kind of tree structure. A canonical “interview question” example of recursion is to write a program that recursively looks through a tree to do some work.</p> <p>What we’re doing here is not far off. Every time we run the “get_fib_at_n” function, it kicks off two more instances of itself. It does this until n == 1, in which case the recursion stops, and each instance returns its calculated value.</p> <p>This means that if we were to visualize this process in a tree, each node, or decision point, where n is not yet 1, has two nodes branching off of it. Also, the leaf nodes at the bottom, always equal 1, since that’s the “exit condition” from this recursive process.</p> <div class="highlighter-rouge"><pre class="highlight"><code> n / \ n-1 n-2 -------- maximum 2^1 additions / \ / \ n-2 n-3 n-3 n-4 -------- maximum 2^2 additions / \ n-3 n-4 -------- maximum 2^3 additions Credit StackOverflow( </code></pre> </div> <p>Our constant “2” is there to illustrate the fact that we are performing two operations in order to get the sum - but the number of times that operation is performed grows exponentially based on N. As you can see from the tree, even by looking at the first 3 layers, the complexity grows at a rate of 2^(n-1). In Big O notation, the exponent can be simplified, and we say this algorithm runs at:</p> <div class="highlighter-rouge"><pre class="highlight"><code>O=2^n </code></pre> </div> <p>This runtime is atrocious - and it means we can’t practically use this algorithm as implemented.</p> <h1 id="fibonacci-algorithm---optimized">Fibonacci Algorithm - Optimized</h1> <p>There are many ways to fix problems like this, which depend greatly on the problem being solved. For our Fibonacci example, we can simply store the calculated numbers in a Python list, and within our loop, simply refer two the last two items in the list.</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">main</span><span class="p">():</span> <span class="n">fibSequence</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">try</span><span class="p">:</span> <span class="c"># "n" is the position within the fibonacci</span> <span class="c"># sequence that we wish to retrieve</span> <span class="n">N</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="k">except</span> <span class="nb">IndexError</span><span class="p">:</span> <span class="k">print</span><span class="p">(</span><span class="s">"Please provide N"</span><span class="p">)</span> <span class="n">sys</span><span class="o">.</span><span class="nb">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="k">except</span> <span class="nb">ValueError</span><span class="p">:</span> <span class="k">print</span><span class="p">(</span><span class="s">"Please provide an integer for N"</span><span class="p">)</span> <span class="n">sys</span><span class="o">.</span><span class="nb">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">N</span><span class="p">):</span> <span class="k">if</span> <span class="n">i</span> <span class="o">&lt;=</span> <span class="mi">1</span><span class="p">:</span> <span class="n">this_number</span> <span class="o">=</span> <span class="mi">1</span> <span class="k">else</span><span class="p">:</span> <span class="n">this_number</span> <span class="o">=</span> <span class="n">fibSequence</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="n">fibSequence</span><span class="p">[</span><span class="o">-</span><span class="mi">2</span><span class="p">]</span> <span class="c"># This is an example of memoization - storing the result of</span> <span class="c"># a calculation so that in the future, the calculation doesn't</span> <span class="c"># need to be repeated</span> <span class="n">fibSequence</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">this_number</span><span class="p">)</span> <span class="k">print</span><span class="p">(</span><span class="n">fibSequence</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span> <span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">"__main__"</span><span class="p">:</span> <span class="n">main</span><span class="p">()</span></code></pre></figure> <p>Python performs such list lookups in constant time, which means that this algorithm runs in O(N), due to the loop. This is a much more feasible runtime.</p> <p>We solved this particular problem using a technique called “memoization”. This is a common technique to consider when optimizing algorithms. In short, with memoization we store calculated values and refer to them later, instead of recalculating them repeatedly and unnecessarily.</p> <p>If you’re recalculating the same value, or if parts of your logic is directly influenced by the size of N, it’s worth re-evaluating your algorithm to see if memoization can help keep things efficient.</p> <h1 id="striking-a-balance">Striking a Balance</h1> <p>You might also hear about two types of complexity that’s described by Big O. Time complexity is one that we’ve already discussed in detail. Our first Fibonacci algorithm ran in O(2^n) time, for instance.</p> <p>However, Big-O can also be used to describe storage complexity. If your program doesn’t clean up memory resources, or isn’t careful about what it stores, the same problems can be realized from a storage capacity perspective (both with repect to capacity as well as speed of access).</p> <p>We created that first Fibonacci example because we feared that storing these values might present a storage problem - but this fear was not driven by data. It turns out that it’s far less costly to store the fibonnaci values than to recursively calculate them on the fly.</p> <p>To be fair, eventually we would run out of space, so if you wanted to take this example even further, you don’t have to store all of the Fibonacci values in memory - you really only need a pair of values, in order to calculate the 3rd. Once the 3rd has been reached, you can discard the 1st value, and you have two new values.</p> <p>Algorithm design is all about being aware of the tradeoffs you’re making, and striking the right balance. Be aware of this for both computational complexity as well as storage complexity.</p> <h1 id="algorithmic-approaches">Algorithmic Approaches</h1> <p><a href="">There are a number of approaches</a> that you can take when trying to solve a problem with an algorithm. I won’t explore all of them, but will enumerate a few here.</p> <p>Once popular choice is “divide and conquer”, which divides a problem into a subset of smaller problems that are easier to deal with. This approach often results in some kind of recursive solution, since you likely want to perform the same logic on the sub-problem, as you did on the larger problem.</p> <p>A good example of this is the “<a href="">merge sort</a>”. This is a sorting technique which cuts the input set in half, and then merges the two halves together in the right order. Despite the fact that this solution usually leverages recursion, each level reduces the input by cutting it in half. As a result, a merge sort generally runs in O(n log n) time, which is a lot better than O(2^n)</p> <p>There are many other approaches, each of which is not necessarily “better” than the others - but rather are more suitable for a certain type of problem. It’s important to be aware of all of these, and consider them when looking at designing your own algorithm.</p> <h1 id="testing-techniques">Testing Techniques</h1> <p>When writing an algorithm, you should definitely test it. Take a sort of “hacker” approach to the problem - really put your algorithm to the test, and try to see where it breaks.</p> <p>I tend to write some function that randomly generates values of a size that I determine using arguments. For instance, you may have an idea of a reasonable input that your algorithm may be subjected to. I wrote a “gen_testdata” function that takes such parameters, and generates a random list of integers that I can then run an algorithm on - perhaps a sorting algorithm:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">random</span> <span class="k">def</span> <span class="nf">gen_testdata</span><span class="p">(</span><span class="n">lower</span><span class="p">,</span> <span class="n">upper</span><span class="p">,</span> <span class="n">upper_len</span><span class="p">,</span> <span class="n">lower_len</span><span class="o">=</span><span class="mi">0</span><span class="p">):</span> <span class="s">"""Produces a randomized list of integers This is useful for testing algorithms - feed it some of this data and watch it melt """</span> <span class="n">list_len</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="n">lower_len</span><span class="p">,</span> <span class="n">upper_len</span><span class="p">)</span> <span class="n">ret_list</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">list_len</span><span class="p">):</span> <span class="n">ret_list</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="n">lower</span><span class="p">,</span> <span class="n">upper</span><span class="p">))</span> <span class="k">return</span> <span class="n">ret_list</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span> <span class="c"># We can feed in the boundaries for our test data as parameters</span> <span class="c"># to the function that generates test data, so we know</span> <span class="c"># within what limits our algorithm performs</span> <span class="n">sample_data</span> <span class="o">=</span> <span class="n">gen_testdata</span><span class="p">(</span><span class="o">-</span><span class="mi">39487</span><span class="p">,</span> <span class="mi">45984</span><span class="p">,</span> <span class="mi">10000</span><span class="p">)</span> <span class="n">start</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span> <span class="n">thissolution</span> <span class="o">=</span> <span class="n">awesome_algorithm</span><span class="p">(</span><span class="n">sample_data</span><span class="p">)</span> <span class="n">done</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span> <span class="k">print</span> <span class="s">"Solution is: "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">thissolution</span><span class="p">)</span> <span class="n">elapsed</span> <span class="o">=</span> <span class="n">done</span> <span class="o">-</span> <span class="n">start</span> <span class="k">print</span> <span class="s">"Computation time: </span><span class="si">%</span><span class="s">s seconds"</span> <span class="o">%</span> <span class="p">(</span> <span class="n">elapsed</span><span class="o">.</span><span class="n">total_seconds</span><span class="p">()</span> <span class="p">)</span> <span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">"__main__"</span><span class="p">:</span> <span class="n">main</span><span class="p">()</span></code></pre></figure> <p>You’ll also notice that I am taking a timestamp immediately before and after I run the “awesome_algorithm” function, so I know how long it took to run.</p> <p>In addition, if you’re concerned about memory footprint, the third-party Python package “memory_profiler” is quite popular. The usage is quite easy - once we import the package, we can add a simple decorator to the top of our algorithm:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">memory_profiler</span> <span class="kn">import</span> <span class="n">profile</span> <span class="nd">@profile</span> <span class="k">def</span> <span class="nf">awesome_algorithm</span><span class="p">():</span> <span class="o">...</span></code></pre></figure> <p>When we run this code, we get a nice report printed to the shell:</p> <div class="highlighter-rouge"><pre class="highlight"><code>~$ python Filename: Line # Mem usage Increment Line Contents ================================================ 66 8.9 MiB 0.0 MiB @profile 67 def awesome_algorithm(): 68 8.9 MiB 0.0 MiB test_list = [] 69 70 8.9 MiB 0.0 MiB for i in range(100): 71 8.9 MiB 0.0 MiB test_list.append(i * 2) 72 73 8.9 MiB 0.0 MiB return test_list </code></pre> </div> <p>Once you have a handle on how exactly you want to test your particular algorithm, it’s super common to run those tests within some kind of unit test, which can be run automatically by your Continuous Integration service (which you’re totally using, right?). This way you know that your algorithm is holding up to your tests when you change it.</p> <h1 id="conclusion">Conclusion</h1> <p>Here are some great resources that you should definitely check out to take your understanding of algorithms further:</p> <ul> <li><a href="">Algorithmic Toolbox on Coursera</a> - I took this, and it is a great introduction to algorithms.</li> <li><a href="">Big O Cheat Sheet</a> - amazing collection of algorithms and their runtimes for quick reference</li> <li><a href="">Algorithm Visualizer</a> - There is also a <a href="">hosted version</a> of this, but it’s not always online. Hopefully it is for you. VERY handy and FUN tool!</li> </ul> <p>There are many more details involved with the study of algorithms, including additional asymptotic notations in addition to “Big O”, but this is a good starting point.</p> <p>You may be asking yourself - why does any of this matter? You may not be interested in software development, or perhaps you’re just getting started, and with today’s open source ecosystem, it’s likely that these kind of low-level concepts are already implemented in some kind of library that you can simply re-use, right?</p> <p>The study of algorithms is still useful. It matters becuase it’s helpful to be thinking about things in terms of algorithms and ensuring that you are making the right tradeoffs. Sometimes, optimizing a piece of code is worth it - sometimes it is not. Educating yourself on the tradeoffs (and communicating them well in documentation) is a worthwhile exercise.</p> <p>Even if you’re not a software developer - algorithms are all around you. They are at the core of every piece of technology you interact with daily.</p> Tue, 09 Aug 2016 00:05:00 +0000 New Series: CS 101 <p>Historically, my background is far closer to the systems side of things, but as I’ve picked up software development experience over the past few years, I’ve come to appreciate the fundamentals of computer science that others in my shoes may not have been exposed to. That said, I have been working on a pseudo-formal blog series on computer science fundamentals.</p> <p>These fundamentals have a wide variety of applications. Those with more of an IT-focused background will learn that even if you don’t use graph theory, or optimize algorithms in your day job, many of these concepts are at the crux of many of the technologies that we use every day. If, like me, you’ve become bored with the endless cycle of IT certifications, learning these concepts could be a great addition to your skill set, as you can leverage these concepts to extrapolate details from some of the “closed” products we use from IT vendors.</p> <p>Finally, it’s important to remember that the most important part of any of this is how this knowledge is applied. As you read the posts that I’ll release in the next few weeks, remember that understanding how to optimize a piece of code is useful, but even more useful is the wisdom to know when to apply that knowledge. Have the wisdom to know when it’s okay to make slightly less-performant code to improve readability. Sometimes such tradeoffs are worth it - but that analysis and decision is on you. Don’t lose sight of the big picture.</p> <p>This series will not be exhaustive, or as deep as a real CS course or degree program. My perspective here is to provide some of the most fundamental, useful topics (in my opinion) to the kind of audience that reads my blog posts. They’ve helped me tremendously, and my goal is to share some of that positive influence they’ve had on my career path.</p> <p>I start this series primarily because I’ve been doing this software development thing for long enough to have realized the benefits of returning to these fundamentals, so I would like to share some of the most important concepts with you in the hope that the additional perspective is useful to you. Take what I share and run with it on your own. The intention is to whet your appetite so that you go hunting for more.</p> <p>Enjoy, and check out the <a href="">first post in our series, which focuses on Algorithms</a>.</p> Tue, 09 Aug 2016 00:00:00 +0000