Monitoring & alerting
Nagiosgraph, adding trending to your Nagios monitoring
by daven on Aug.22, 2007, under General System Administration, Monitoring & alerting
I recently completed a project to upgrade our Nagios installation from 2.5 to 3.0b1 (Yes, I know thats a beta). As part of this upgrade I decide to include nagiosgraph to provide trending information via rrdtool.
Here is an example of an hourly cpu load graph from a test system
![]()
a brief overview of /proc/$PID
by daven on Jul.25, 2007, under Automation, General System Administration, Monitoring & alerting
In Linux you can look in /proc/$PID , where $PID is your Process ID of course, to find out a great deal about currently running processes. It is laid out in a hierarchical format with file names that are fairly intuitive, Read on for an example from a running Nagios instance.
Nagios: Checking Dynamic content with the check_http plugin
by daven on Apr.10, 2007, under Monitoring & alerting
Most of the tutorials and howto’s on Nagios showing the use of the check_http plugin focus simply on determining if a page is returning a simple 200 OK code. This will tell you that your web server is up and returning pages, but for dynamic content this does not mean that it is displaying the correct content. However, add the “-R” flag to check_http and suddenly you are ensuring that specific content is available on the page as well.
define command{
command_name check_http
command_line $USER1$/check_http -H www.cordump.com
}
Example: check_http simple checking for a 200 OK
define command{
command_name check_http_regex
command_line $USER1$/check_http -H www.cordump.com -R “Dave Nash”
}
Example: adding -R to check for “Dave Nash” in the pages contents
So now I can check www.cordump.com to make sure it always has posts by me visible on the front page, and since I am the only poster that means it isn’t a blank page because I accidentally removed all my database entries.
The other nice thing about the -R switch is it is searching the html of the page so it can find data located in the html source.
Awstats for graphic Mail log statistics
by daven on Apr.02, 2007, under Monitoring & alerting
Awstats is an oldie that I setup a while ago but it came to prominence again recently at work to give us an idea if some new features resulted in a drastic change in mail volume.
Provided you have a web server installed on your mail host Awstats is trivial following along with the instructions. However, I had a cluster of 2 load balanced mail hosts to check complicating the setup a little bit. So here is my setup
Click on the link below to read the complete description including complete configuration files.
Nagios Event Handlers
by daven on Mar.26, 2007, under Monitoring & alerting
I have been meaning to try out Nagios Event hander feature for a while but I got a little side tracked a little bit by my stalled quest to find a Nagios replacement. However, today I a need to implement this functionality to support a immature application without requiring regular System Admin intervention.
In Nagios an Event Handler is simply a script or command that gets called whenever any event occurs such as a service changing state to any of the following OK,UNKNOWN,WARNING or CRITICAL. However, your script will need to be responsible for ensuring if it is appropriate to run since you would not want to run a restart script when a service changes to the OK state.
I need to have a script perform an application restart when the service went into a warning state to resolve the issue before it went critical and generated a pager alert. Starting with the Nagios Documentation I created a new event handler.
- Ensure that enable_event_handlers=1 is set in the nagios.cfg
- Create a script to perform in case of the event see the Nagios Documentation for an example. One item of Note is that $STATETYPE$ is $SERVICESTATETYPE$ in Nagios 2.0 and above.
- Define a check command in checkcommand.cfg
define command{
command_name restart-httpd
# wrap it in check_by_ssh to restart the service on a remote server
command_line $USER1$/libexec/check_by_ssh $HOSTNAME$ -t 10 -c “$USER1$/libexec/eventhandlers/restart-httpd $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$”
}- add an event handler line to the service check
define service{
use generic-service ; Name of service template to use
hostgroup_name myWebServers
service_description HTTP Check
check_period 24×7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
event_handler restart_http
contact_groups Notify
notification_interval 24×7
notification_period 24×7
notification_options w,u,c,r
check_command check_http
}
- Make sure your Nagios user has the correct permissions to perform the action in the event script. In my case I have to give the Nagios user sudo access to the init script in order to restart the service.
<\OL>
With these simple steps I was able to automate the restarts of an immature service which was failing on average of 5 times a week requiring System Administrators intervention to resolve. All and all a great investment of time! - Define a check command in checkcommand.cfg
Outgrowing Nagios: GroundWork
by daven on Mar.02, 2007, under Monitoring & alerting
Contender #4 GroundWork Opensource
I learned about the existence of GroundWorks at LISA ‘06. It seemed interesting, but Zenoss seemed more interesting, so it took me a bit longer to take a look at it.
Pro’s
- Imports an existing Nagios configuration, it took me only a few minutes to convert my existing Nagios config
- Strong community and commercial support
- Stores the data in a MySQL database so its easy to get to
- add’s reporting and an admin GUI to Nagios
- using Nagios and its basic monitoring engine
Con’s
- The really cool stuff, performance graphs & the dashboards, are behind the pay wall
- The GUI while a lot more functional than most it not what anyone would call pretty
- Dealing with the sales team, when investigating the commercial version, involved them dropping the ball quiet often. Not a great way to begin a relationship.
Conclusion: Working on getting a 30 trail of the commercial version to see if its worth pay for, but as in the cons above it slow going. I will post updates with it as I move forward.
Outgrowing Nagios: Zabbix
by daven on Feb.18, 2007, under Monitoring & alerting
Contender #3 Zabbix
I don’ remember exactly where I heard about Zabbix, but it seems to get mentioned a lot in the same context as Cacti. So I did a demo install several months ago to try it out and see if it was the answer to replacing Nagios.
Pro’s
- Definately a nicer interface
- Strong community and commercial support
- Rapidly expanding with a product roadmap to show you where its going
Con’s
- Like cacti, if you are going to build a GUI make it simple. Having to go to multiple different locations within the GUI in the right order to set anything up is an automatic deal killer for me.
- Documentation was rather confusing
Conclusion: Passing for now.
Outgrowing Nagios: Zenoss
by daven on Feb.11, 2007, under Monitoring & alerting
Contender #2: Zenoss
I learned about the existence of Zenoss at LISA ‘06. It looked really sweet so upon my return from LISA I quickly installed in a VMWare session on my laptop at work and set about monitoring.
Pro’s
- Auto discovery worked quiet well
- Built in Graphing and Trending
- Seemed to have a strong community behind it
Con’s
- Configuration is all GUI based (as far as I could tell)
- Seemed to require multiple non-linked steps to set anything else
- Stores Data in a Zope structure which was a bit to python specific for my taste.
- I just didn’t get the feeling was developed for a command line jockey like me
There was lots to recommend about Zenoss, but I just never felt comfortable using it. So I decide to keep looking for other alternatives keeping zenoss as a backup in case I was not satisfied with any other options.
Outgrowing Nagios: openNMS
by daven on Feb.01, 2007, under Monitoring & alerting
Contender #1 openNMS
I was first made a aware of openNMS listening to the Floss Weekly podcast. It really sounded like all that and a box of chocolates so I figured I would give it try on my home network.
Pro’s
- Its purdy
- seems to be on the level of Tivoli or OpenView
- GPL license, so its free as in beer and speech
- Commercial support is available
Con’s
- Its written in java
- Never got it to install properly on Ubuntu server*, and since this was at home minimal effort was all it was getting.
- seems to be on the level of Tivoli or OpenView, i.e. if you not willing to pay someone full time to maintain it dont bother.
- No packages for popular OS’s
So in the final estimate I decide to pass on openNMS. Installation and configuration were not quick and intuitive as I would have liked, this strongly suggested that day to day use would not be enjoyable.
* So why exactly can’t Java have the equivelant of Perl’s CPAN or Ruby’s Gems? I mean it can’t be that difficult CPAN’s been around for 10 years so its not like this is a new idea or something
Outgrowing Nagios
by daven on Jan.31, 2007, under Monitoring & alerting
I have been using Nagios for a few years now and while I love it for it for how simple it is to setup and create custom monitors, well lets face it the UI is butt ugly and minimalistic. On top of that it does not store historical information in an easy to retrieve format and provides no real reporting to make to show the rest of the company how your systems are doing.
So now that I have decides its time to move on from Nagios, what I am a looking for in my perfect monitoring package.
- Stores history and performance data in a way that I can easily extract it, include my writing my own scriptlets.
- Provides reports showing the history and states of all monitored systems/service
- Looks “purdy”, compared to Nagios I think this one should be easy
- Should be able to do graphs and trending so I can throw that pain in the ass cacti out the window
- As close to free as possible, if there isn’t at least a free demo then its out
- Can easily import or convert from Nagios, I have a lot invested in my Nagios config and I am not willing to rebuild it from scratch
Ok, so we have my basic requirements (lets see if I go back and add some later) so here is my list of contenders from reading trade mags and attending LISA ‘06
So over the next few days I will post my impressions of each of the contenders