ChatOps – Contender or Pretender?

| | | 1 comment

TL;DR FogOps is built around ChatOps, I thought I’d share why we chose it. Every time a new word is born, I can’t wait to explain why we didn’t need that word.  Especially words that aren’t really words; they’re just two words, or parts of words, glommed together, and deemed the ‘next big thing’.   My first reaction to the term “ChatOps” was “You’ve got to be kidding me”.  Chat has been around since the IRC days, and now someone puts a pretty face on it, and suddenly it’s going to change the way we do things?  I’m not buying it.  But as a leader in our organization, I obviously had to do my due diligence.  After seeing how ChatOps was implemented at GitHub, it started to sink in. We all need some place to live.  For years we used email as our digital home.  Newsgroups were available, but it was so much easier to just subscribe to an email list.  These were our ‘social networks’ before Facebook.  All of our forums were email enabled, to make sure that we were notified when something cool happened.  In business, Microsoft made Outlook our home.  We ‘live’ in Outlook so much, that some people send urgent, time sensitive requests via email, and assume they will be read in a timely manner.  We integrated our ticketing systems with email, to ensure everyone was notified of the important items. The problems with email as a place to live are many, the largest being organization and control.  Many tools exist to try to control spam and clutter, but we still cannot control what initiatives and subjects grab our attention at any time.  The publish/subscribe model was made popular by the next generation of social networks, and really made a ton of sense in business to help fix the issues associated with email as our main communication and collaboration tool at the office.

Enter Chat

slackOperations teams need somewhere to live as well.  They are extremely reliant on collaboration, communication, and documentation.  They also have a penchant for doing things, not writing about them.  Who moved my cheese?  It’s vital in an operations team to ensure that everyone can see change history.  Chat rooms are a great way to solve this.  Got a crazy ticket assigned to you when you get in to work? Just check the ops channel, and I bet reading the history of the last shift will give you context. But we need to do more than just chat.  We need to do things.  And if we do things, and we don’t record them by telling people we did them in the Chat channel, we are back to where we started.  The only way to ensure that people record these events is to have them do the events in the channel itself!

ChatOps is Born

If I truly want to live in Chat, and want to record and collaborate in one place, then I need to use the Chat tool to both talk about things as well as use it as my operations interface.  Since our chat tools are made to write text, they can be used pretty easily as a command line interface.  If we make all of the basic operations tasks into scripts that can be run from within the chat window, we could all see things happening real-time.

Meet the Chat Bots

ChatOps is not ChatOps without bots.  Bots enable us to receive information, like alerts, in push notification fashion.  They enable us to query for current status.  And they enable us to run common tasks without becoming experts in every tool.  This is what enables us to live in our chat. And since chat can be organized by function or initiative, and membership can be controlled, it allows us to focus on the task at hand.

DevOps without the DevOps

Everyone’s got their opinion of DevOps, but I’ll share mine.  This was a rudimentary organization solution to the divide between development and operations. Nix operations and make the developers do it.  It was simple but transformative and effective.  But it forces great developers to learn full stack, which is less than optimal, and potentially can lead to churn of the most talented developers.  With ChatOps, we can enable developers to see status, push code, and do other “ops-y” tasks, without forcing them to learn all the tools and engineer all the systems.

Long Live SRE

By creating a simplified toolset of commands specifically for them, our SRE teams can collaborate with development teams and enable them, without forcing them to take pager duty, and without  slowing them down.  This allows the benefits of DevOps while still allowing team members to contribute where they are strongest.  This enables an SRE organization to exist without recreating the issues that occur when there is a divide between Development and Operations.

ChatOps Example – Hello World

This is easier to understand by example, so I’ll mock a real world situation with the world famous Hello World example.  So, imagine you are running an SRE team responsible for keeping your company’s Hello World web site up and running, and working with the developers who need to push changes to font type, size, and color all day long.  How could ChatOps help you?  For this example I’ll use Foggy, our chat bot. Well, we first want to make sure we know that our site is available to the world.  So we set up an external DNS based health check to helloworld.fogops.io.  If the check fails, we are in urgent crisis, so we make sure to alert on it.  We page the on-call SRE, but we also notify our chat bot, and our chat bot posts the alert in our  #hello-world-ops slack channel.  The event is effectively logged with time stamp for the team to use later, without having to dig through log files or log in to the monitoring system.  Only #hello-world-ops alerts are posted here, so everything we see is in context to the initiative, which is to maintain the best hello world site known to man. The team begins troubleshooting the problem.  They check the status of the web server with a quick slack command and get the OK that the web server is up and running, but can’t establish a connection to the database. The team checks status of the database server and finds the problem.   Looks like our SQL service died.  Foggy can handle this one for us…

The team completes the fix, the site comes back up, and the monitoring system posts the green light to confirm. The dev team is happy their site is back up, and they immediately push a new font. This is a pretty simple example, with the goal of helping get your creative juices going.

FogOps – Powered by ChatOps

Our FogOps operating model is built around the close collaboration between our SRE team and your Development team.  I could not find a better engagement methodology than ChatOps.  Foghorn has embraced ChatOps. We enable our customers’ development teams to push code, see alerts, and communicate with the SRE team, all while being able to focus on development and get out of the rut of trying to stay current on every cloud, integration, deployment, monitoring, and configuration management technology.    We bought into ChatOps.  Will you?