
Once you've used a tool like the Netomata Config Generator (NCG) to generate configs for a bunch of devices on your network, how do you convince yourself that those new configs are complete and correct and ready to deploy? How do you determine that the newly-generated configs differ from the old configs in only the ways that you want, and that you haven't inadvertently introduced unintended changes?
Wouldn't it be great if you could, say, compare the newly-generated configs to the original (hand-created) configs for those devices, or to the previous generated configs? And how cool would it be if there was some sort of "approval" mechanism wrapped around this, so that you could easily identify the files that had been reviewed and approved as good-to-go for installation?
We've got a tool for you!
We've just released the Netomata Config Review Tool, which addresses these issues. It is a simple web-based tool for reviewing NCG-generated config files and approving them for installation on devices. It is written in Ruby as a web CGI program; it should work fine on any web server that supports CGI programs, such as Apache. We're releasing it as open source under a GPLv3 license (the same as NCG).
This tool is an outgrowth of a recent consulting project that we did for Netflix, helping them install NCG and set it up to generate configs for the routers at their dozens of shipping hubs throughout the USA. We'd love to do a project like this for your organization, too!
How it works
For each device, the tool keeps track of 3 config files (if they exist):
For each device, this tool lets you:
The tool does not (yet) install approved configs on devices; the assumption is that you will use a tool such as RANCID to do that, from the files in the "approved" directory.
How to get it
You can read all about it, see screen shots, and download the code at http://www.netomata.com/wiki/config_review_tool
On Wed 23 June 2010, I'll be presenting a 30-minute overview of network automation benefits and tools as part of a free online webinar produced by SearchNetworking, entitled Optimizing and Managing the Dynamic Enterprise Network:
Today, as more applications and IT functions converge on the network infrastructure, user expectations are higher than ever. The advent of cloud computing and virtualization demands a solid yet flexible network that can instantly adjust to changing conditions. Unfortunately, many IT departments today find themselves facing this technology challenge with lean networking teams and low budgets. That makes choosing the right network management and optimization tools critical.
In this free, one-day virtual seminar our experts will cover how to rethink your management strategy and implement techniques that allow networking teams to understand performance, make the most of the infrastructure, and offload low-level tasks so that they can focus on improving performance and making progress.
Attend and gain insight on how to:
- Manage your network in the age of the dynamic network
- Ensure application performance on the WAN
- Use network automation to make your network more cost-effective, reliable, and flexible
And much more!
My part of the webinar is scheduled to start at 10:30am PDT (1:30pm EDT). After the webinar, I'll be online to answer questions from the audience.
Optimizing and Managing the Dynamic Enterprise Network:
A group of folks who are active in the emerging "devops" field are putting together DevOps Day, a free one-day conference on Friday 25 June 2010, in Mountain View, CA, hosted by LinkedIn:
DevOps Day is an open event for discussing all topics around improving the interaction between what is traditionally considered development activity and that which is traditionally considered operations activity.
...
DevOps Day US is a single-track conference organized around a series of panels where open discussion amongst all conference participants is encouraged.
This is a one-day "hmm, we're all facing similar issues; let's get together and talk about this" event being put together by practitioners, not a "conference" being sponsored by folks who are trying to sell you something. I expect it to be more like an extended user group meeting than anything else, and I'm looking forward to some very interesting discussions.
Planned discussions include:
Expected participants include Luke Kanies (creator of Puppet) and Adam Jacob (creator of Chef), as well as practitioners from organizations such as LinkedIn, Shopzilla, Etsy, Cisco, ITA Software, and Tripwire.
All in all, it's a very interesting topic, and this looks like it will be a fascinating event. I'll be there, and I hope to see you there too!
This quarter's NANOG meeting is in San Francisco, and I'll be presenting a 90-minute tutorial on Automating Network Configuration:
You've been using tools like Puppet and cfengine to corral the complexity on your servers. You revel in the scalability, reliability, and ease of maintenance of doing it The Right Way. You don't fear the next change because you know the tools will just get it Right. But you still tremble at an 'enable' prompt, hoping you remembered all the bits that need to be twiddled, on all the networking devices everywhere. Is your DNS tied on straight - both ways? Is it all *really* being monitored by Nagios? As your network's complexity increases, so do the errors, inconsistencies, and omissions caused by manual configuration, and brokenness abounds. But wait - there's a way out of the swamp! Come hear Brent Chapman as he reveals methods and tools for automating the mind-numbing task of configuring network devices and services. Among other things, he'll talk about his cool new open source Netomata Config Generator, which addresses some of these problems.
Brent Chapman is the founder, CEO, and technical lead of Netomata, Inc. He is the coauthor of the highly regarded O'Reilly & Associates book Building Internet Firewalls
. He is also the founder of the Firewalls, List-Managers, and Network-Automation Internet mailing lists, and the creator of the Majordomo mailing list management package. In 2004, Brent was honored with the annual SAGE Outstanding Achievement Award 'for outstanding sustained contributions to the community of system administrators'. He has been a frequent and popular speaker at USENIX, LISA, BayLISA, and many other events over the past 15 years.
I expect to be there for the full NANOG meeting, from Sun 13 Jun 2010 through Wed 16 Jun 2010; if you're there, too, I hope you'll come to my talk, or at least catch me and say hello.
And if you haven't registered for NANOG yet, it's not too late... As the NANOG web site says:
NANOG49 will feature presentations on networking advancements and techniques, educational tutorials, interesting tracks, and more. Whether you are new to the networking profession or a seasoned veteran, NANOG49 will educate and inform with a full agenda of interesting topics.
I highly recommend it, and I hope to see you there!
The O'Reilly Velocity conference is only in its third year, but it has rapidly become one of my favorite events. If you do web operations or architecture, I'd say it's a "must do" conference; the amount of info you'll pick up in 2 short days (3 if you attend the workshops) is amazing.
Even better, O'Reilly has just announced a special 25% discount on registration, good from now through Memorial Day weekend (until Tue 1 Jun 2010); just use the discount code "MEMORIALDAY" when you register.
I hope to see you there!
Someone recently asked me to share my thoughts on ZipTie (now officially known as "AlterPoint NetworkAuthority Inventory" or "AlterPoint NAI") versus RANCID as network configuration management tools.
To begin with, what are these tools?
RANCID is a command line tool which handles configuration communications with various types of networking devices (most major brands of routers, switches, load balancers, firewalls, etc.). You can use it to copy config files to and from devices, or to execute a series of commands on the device. Essentially, RANCID pretends to be a human user of the device's command line interface, and you give RANCID a simple "script" to follow in dealing with the device (i.e., "when you see the 'login:' prompt, send 'admin'; then, when you see the 'password:' prompt, send 'opensesame'; then, when you see the 'alibabascave>' prompt, send 'enable'; then ..."). RANCID is sometimes used by itself, but more often used as a building block in larger, custom-built automated network management systems; people use it in conjunction with tools to manage an archive of config files (such as CVSweb), or in conjunction with tools to programmatically generate config files (such as our own Netomata Config Generator (NCG) tool), or in a wide variety of other ways.
ZipTie, on the other hand, has a slick web-based user interface, and is designed to be a complete "environment" for managing the devices on your network. According to its web page:
NetworkAuthority Inventory provides continuous discovery and tracking of your network devices. Using a simple, web-based interface you can backup and restore device configurations, detect configuration changes and compare configurations between devices. NetworkAuthority Inventory generates an accurate, real-time, detailed view of every device in your network and keeps it up to date.
So, what are the key differences between RANCID and ZipTie?
So, essentially, I suggest the following approach to comparing these two tools for your situation:
Most network engineers and sysadmins would probably say that they're intimately familiar with 'traceroute', and consider it one of their fundamental network troubleshooting tools... I certainly do. But you might be amazed to learn, as I did, how much you don't know about traceroute.
Richard Steenbergen of nLayer Communications, Inc., did an excellent presentation on traceroute at this month's NANOG (North American Network Operators Group) meeting:
Among other things, this presentation shows you:
One of the coolest tricks I learned from this presentation is, to find out more about what's at the other end of some hop that appears to be a point-to-point link, assume that the IP address you see is one of the two addresses in a /30 subnet (as is commonly assigned to point-to-point links), and do a DNS reverse lookup of the other address in the /30.
This is useful, for example, in figuring out which egress port a packet went out on, since traceroute normally only shows you the ingress ports for each device along the way. For example, let's say I was looking at the following traceroute output, and wanted to know the egress port on router #3, as the packet moved to router #4:
brent% traceroute www.google.com traceroute: Warning: www.google.com has multiple addresses; using 208.67.219.230 traceroute to google.navigation.opendns.com (208.67.219.230), 64 hops max, 40 byte packets 1 192.168.0.1 (192.168.0.1) 3.145 ms 2.573 ms 2.382 ms 2 75-101-29-1.dsl.static.sonic.net (75.101.29.1) 9.555 ms 9.054 ms 9.089 ms 3 127.at-X-X-X.gw3.200p-sf.sonic.net (208.106.96.193) 9.510 ms 9.871 ms 9.194 ms 4 200.ge-0-1-0.gw.equinix-sj.sonic.net (64.142.0.210) 11.965 ms 11.870 ms 11.839 ms 5 0.as0.gw2.equinix-sj.sonic.net (64.142.0.150) 11.928 ms 12.519 ms 12.394 ms 6 GigabitEthernet3-1.GW2.SJC7.ALTER.NET (157.130.194.17) 11.360 ms 16.257 ms 11.268 ms 7 0.so-0-0-1.XL4.SJC7.ALTER.NET (152.63.51.50) 11.729 ms 11.679 ms 11.403 ms 8 0.so-7-0-0.XL2.PAO1.ALTER.NET (152.63.113.21) 14.775 ms 17.455 ms 0.so-5-0-0.XL2.PAO1.ALTER.NET (152.63.48.9) 15.548 ms 9 POS7-0.GW6.PAO1.ALTER.NET (152.63.55.14) 12.886 ms 13.143 ms 13.029 ms 10 65.203.37.46 (65.203.37.46) 13.517 ms 14.708 ms 16.566 ms 11 * * * 12 * * * ^C
To find out more about router #3's egress port, I look at the IP address for router #4 (64.142.0.210), figure out what would be the other IP address in the same /30 (64.142.0.209; hint: the lower address in a /30 pair always ends in an odd number, and the higher address always ends in an even number, so if the address you know ends in an odd number, the other address in the same /30 is going to be the next higher number, and if the address you know is even, the other is going to be the next lower number), and do a DNS reverse lookup of that address:
brent% dig -x 64.142.0.209 ; <<>> DiG 9.4.3-P3 <<>> -x 64.142.0.209 ;; global options: printcmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49382 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;209.0.142.64.in-addr.arpa. IN PTR ;; ANSWER SECTION: 209.0.142.64.in-addr.arpa. 259200 IN PTR 200.ge-6-3-0.gw3.200p-sf.sonic.net. ;; Query time: 31 msec ;; SERVER: 208.67.222.222#53(208.67.222.222) ;; WHEN: Fri Nov 13 09:42:05 2009 ;; MSG SIZE rcvd: 91
Another handy tip from the presentation is that, since light travels through fiber optic cable at about 200 km (or 125 miles, if you prefer) per millisecond, each 1 ms of delay shown by traceroute (which, remember, is round trip delay) should represent about 100 km (62.5 mi) of distance if the delay were due entirely to the distance travelled (i.e., no queuing or processing delays). Using that fact, you can see that 40ms for a packet to go from San Francisco to New York (about 2500 miles, or 4000km) would be "normal", but 40ms for a packet to go from San Francisco to San Jose (about 50 miles, or 80km) would indicate a problem; it should take the packet less than 1ms to cover that distance and back, so something else (congestion or processing delays, for example) must account for the other 39ms.
There's a lot more in this presentation, about more complex issues such as
Anyway, if you ever use traceroute, I highly recommend that you review this excellent presentation. I think you'll be pleasantly surprised at how much you learn.
Thanks to Strata Chalup of Virtual.net for bringing this very informative presentation to my attention.
->At some point during their growth, usually around the 50-100 employee stage, most startups face a "quadruple whammy" of IT infrastructure challenges. If the startup doesn't recognize that this is happening (or, better yet, anticipate and prevent it from happening), IT can quickly become a major drag on the startup's continued growth.
Early on, a startup's IT needs are generally handled internally on an ad hoc basis by a de facto IT team of various personnel, acting in addition to their primary responsibilities as engineers, managers, and so forth. This works fine for a while, often for several years. At some point, though, as the startup continues to grow, several factors all come together:
As a result of these factors, bad things start happening:
Essentially, at this point, the startup needs to put in place the framework of IT architectures, systems, processes, and people that will enable its IT infrastructure to facilitate the company's growth, rather than impede that growth.
Netomata's staff have helped many startups through this transition; if this situation sounds all too familiar to you, contact us, and we can help you too!
Too many organizations treat network management as a "nice to have" part of their operational toolkit, rather than a "must-have" capability. You can usually get away with this for a while, but eventually your luck runs out...
Last week, I related an all-too-typical tale of woe about how a startup suffered an all-day customer-visible outage because of a network problem, explaining how network automation could have shortened the outage from hours to minutes. Well, it turns out that lack of network automation wasn't their only problem...
As it happened, at the time of the outage, they didn't have any network management capability, because their sole network management host had suffered a disk failure several days before and they hadn't gotten around to restoring the host yet because it was "just the network management system".
Unfortunately for them and their customers, the failed system that was "just the network management system" would have:
In retrospect, I'm sure they wish that they had engineered "just the network management system" with the same level of service reliability as their customer-visible "production" systems. I'm sure they wish that they had treated the failure of "just the network management system" with the same sort of urgency as they would a failure of one of their customer-visible "production" systems.
Once the network management system failed, they were living on borrowed time. When something else failed (i.e., the ethernet switch), they were severely hampered in their ability to detect and deal with that failure, which resulted in an extended customer-visible outage. Even though the network management system isn't itself customer-visible, it is an essential part of providing a reliable service, and needs to be treated as such.
Netomata can help you avoid problems like this with your network, while making your network more cost-effective, reliable, and flexible; please contact us to discuss how.
A friend of mine recently related a tale of woe about network problems at his startup, a cloud service provider. Unfortunately, because they lacked a network automation system, they suffered a day-long customer-visible service outage; if they'd had an appropriate network automation system, they could have dealt with the problem in less than an hour.
It all started with a failing Ethernet switch, one of the pair of core switches in their data center installation. The failing switch would simply drop its 10Gb Ethernet connection to the other core switch, with no warning and no explanation. They tried the obvious quick fixes (try a different port on the failing switch, try a different cable between the switches, etc.), with no success; no matter what they tried, they couldn't resurrect the connection to the other core switch.
For various reasons, a drop-in replacement switch wasn't immediately available. After a physical inspection, counting open and used ports on both switches, they determined that they had just enough open ports on the working switch to allow them to re-home all the connections from the failing switch. "All" they needed to do was configure those ports on the working switch, along with associated VLAN definitions, access control lists, and so forth. Essentially, they needed to merge the functionality from the two switch configs (failing and working) into a single switch config.
Unfortunately, they had to do this configuration work by hand, because they don't use an automated configuration management tool such as NCG. Moving two dozen port configurations (plus associated VLAN definitions, access control lists, and so forth) from one switch to another by hand poses a number of problems:
If they had been using an automated configuration management tool such as NCG, they could have been back in service much sooner (probably in less than an hour), with a much higher degree of confidence in the new config for the remaining switch.
A hypothetical automated configuration management system for their network would probably have the following characteristics:
Here are the steps they could have followed instead of doing everything by hand, had they been using such an automated system:
Using network automation tools such as NCG, RANCID, and ZipTie:
In my experience, it only takes a week or two of work to use open source tools to assemble a network automation system for an existing network such as this (i.e., a handful of related switches and associated monitoring systems, all of which you already have working manually-created configs for).
Hopefully, my friend's company will see the light, and automate their network management so that they're better prepared for next time; maybe they'll even offer me a consulting contract to help them get there... ;-)
Please contact us to discuss how Netomata can help you avoid problems like this with your network, while making your network more cost-effective, reliable, and flexible.