Announcing the Remote execution engine in Amon 3.7

By Martin Rusev read

Welcome to Amon 3.7. This is a very important release for me personally, because it finally evolves Amon in the direction I originally intended back in 2011-2012. In 2013-2014 I got scared and decided to "follow the market" instead, which as you might have guessed - ended badly.

This is quite long, personal story about how I got to this release, you can skip all of that and go straight to the announcement at the end.

Automation and Devops

One of the main reasons I've started working on Amon and getting so deeply involved in DevOps was that I am obsessed with infrastructure automation.

It all started back in 2008 when I had to deploy a Django application I have been working on in the "cloud". This was my first project outside of the boring PHP world, where FTPs and cPanels run supreme. At the time Django was leaps and bounds ahead of anything I've seen and I expected that the same would apply to deployment. After all "The Cloud" sounds way cooler than FTPs.

It took me 2 days of immense frustration and I promised to myself to never touch a server ever again. (Boy, was I wrong). I invested a lot of time in automation tools back then - from Capistrano, Chef, Puppet, then Fabric and all was well until late 2010.

In 2010 I was working on a cool project, called Creativespace which was a version control for web designers. The project was very exciting technically - it converted PSD/AI/EPS files to PNGs in real time, it was something like Box for web designers. On the back-end I had a huge mix of technologies - from Python, C++, databases, cache layers, ActiveMQ, etc. With all the devops knowledge I had - deployment was not hard this time. The hard part came afterwords, when Creativespace started crumbling under load.

Nagios, Cacti, Munin and all the others

This marks the second chapter of my journey in sysadmin land which as I developer I though I will never get to. The land of open source/self-hosted monitoring tools in 2010. As a developer all I wanted was to quickly install a tool and see what is going on my servers, so I can go back to the real work - developing apps.

What really happened is that I spent a week, installed and tested any tool I could get my hands on and at the end could not believe that actual people have been using these for close to 10 years and there was no innovation whatsoever, especially when it comes to down to simplicity and design.

This let me to the idea to create a small, minimalistic tool I could build in a weekend and install with 1 line of code. It took me 4 years and it is not even to being done. The one line install rule still stands though.

Monitoring and auto-healing

Coming back to my obsession with automation, I believe a monitoring system should not be just a passive observer. After all - it already has all the data and it should be able to do something about it, not just alerting a human and wasting his time.

Some of the issues go through multiple layers and have to be analyzed by a human being, but if you take the time and write down the reasons for you, doing "ssh ...." in the middle of the night you will see that it in most cases it is something trivial like "restart a server and check the error.log" file. Something that could be easily be automated.

Devops and Real-time

The devops movement has exploded in recent years and we now have a plethora of really well made and modern tools and for the most part - touching a server by hand is almost a thing a of the past.

One issue I personally have is the static nature of most of the tools. You write a recipe, a playbook, a dockerfile, a manifest in a text editor and then execute. They still communicate with all your machines, but if you want to do debug a machine in real time - it is the old and annoying "ssh ....". SSH is really good for what it does - connecting securely to 1 machine, but it gets infuriatingly complex when you have to move files between servers, establish inter-server communications, do things on multiple machines at the same time, etc. After so many years of using SSH I always know that any interaction with ~/.ssh and authorized keys and chmod 400 of files will end up with me feeling helpless and angry.

Sometime ago I discovered Salt, a tool that is really good at communicating with your servers in real-time. Behind the scenes it uses ZeroMQ, which is insanely fast for network operations.

In the last month or so I've been using Salt for an SSH replacement and so far it has been great. I had to resort back to SSH only once or twice for that period. Salt is really good for real-time debugging, but this information is hidden in the docs and it takes a while to get to. On the surface, just like everyone else Salt is promoting writing static states in a text editor and then applying them.

One problem with Salt is that is still works from the terminal. Which brings us to Amon 3.7

Amon 3.7

The remote execution module is available only in Amon Company Edition.

Amon 3.7 comes with two major additions:

The Real-time remote execution app

The first one is the remote execution app, which gives you the option to see a problem in Amon and debug/fix it right there from the browser. It is really fast and secure and there is no need to open a terminal and a SSH session anymore. You can install apps, check log files, restart services or just about anything you could do in a SSH session.

This is something I personally wanted, because SSH is centralized by nature. We put our ssh keys on a designated work machine. Connecting to a server from any other place is cumbersome and as the world works - most of the problems happen when we are not at work.

Executing commands on alert

The second addition is the option to automatically execute commands on alerts. When an alert is triggered Amon will send the command to your server, execute and collect the results which you can see later in the alert history screen. You can do all kinds of cool things with this feature - like keeping an eye of the size of a specific directory, restarting services, executing scripts, etc.