Automating Nagios service checks via SSH
Hello! Here’s a nifty article on the Server management series. Today, we’ll learn how to automate Nagios service checks via SSH, even when the remote server does not support password-less (key-based) SSH logins.
We’ll cobble a lot of technologies together in order to achieve this amazing feat. While no experience is required with most of the technologies we’ll be using, you’ll need to be proficient in configuring Nagios. I’m also assuming you already have a Nagios instance up, running and operational.
Beware that this guide will require you to save your SSH password for the remote server in a file — not awfully secure. This file will be unreadable to anyone except the Nagios service checker — not even the Web interface has access to it. I consider this an acceptable compromise under the circumstances, but you may not.
Without further ado, here’s the guide.
Why this guide?
Now, what prompts me to do this guide? Of course, it’s an itch to be scratched. My home computer runs Nagios, and I’ve set it up to periodically check for the health of the Apache service that runs this site, because it tends to go haywire and very slow during traffic storms.
The check_http Nagios plugin does work. So why don’t I use it? Because it doesn’t work as I intended. This is better explained by live example.
Most Nagios plugins are simple command-line programs which accept two timeout threshold arguments: the warning and the critical threshold. They report back the success, failure, warning or critical status to the Nagios server in a line of text.
This makes for powerful, modular server supervision and testing. Remote testing of service health is a piece of cake: once you’ve set it up, it’s fire and forget (until any problem with the server arises). In this particular case, Nagios periodically runs the check_http plugin, which times the response time of the Apache server; if the service takes too long to respond, Nagios issues a WARNING or a CRITICAL message to me.
Of course, in a perfect world, air poses no resistance, cows are perfectly round, and network congestion is a non-issue. But my computer is at 18 high-latency hops from the server I’m checking. To boot, I usually have a BitTorrent client open and, no matter now much tuning you do, that kind of traffic breeds congestion. While the service itself may be serving one page in 0.5 seconds, Nagios usually reports times above 10 seconds (in my book, that’s CRITICAL).
So check_http, as shipped, is useless to me — I cannot reliably determine if my remote host is truly slow, or if it’s just me.
The solution: time server response times directly at the source
With that in mind, I set out to find a solution. And the first one that sprang to mind is the one that I implemented. Basically, running check_http directly on this server would yield good timing information, and drastically cut back on warnings. And you can run check_http via SSH, which is an even better proposition.
Keep reading to find out how I did it.





February 7th, 2008 at 19:42
Hi,
Why not generate a public /private key pair, then append the public key to /root/.ssh/authorized_keys on the target server? Then ssh will authenticate you on the basis of this key. You will need to establish the connection once manually, the target machine is added to a list of known hosts on the nagios server, from then on the logins will be silently granted without a password.