Page 1 of 2

Need help with a new plugin

Posted: Tue Jun 30, 2015 6:47 am
by MaxHeadroom
I have a system that implements a messaging middleware. I want to build a plugin for Nagios to capture status messages being sent across this middleware. The status messages can be very robust and include many parameters (E.g., State, mode, status, uptime, software revs, hardware info, installation details, Temperature, etc )

I was thinking the plugin would parse the status messages and pass the detailed information to Nagios. From there I could use the facilities within Nagios to determine thresholds, create alerts, and perform trend analysis and reporting. But it doesn't seem to work that way. It appears I only have OK, Warning, or Critical for passing to Nagios.

Am I reading this wrong? is there a way to pass more information to Nagios????

Thanks
Randy

Re: Need help with a new plugin

Posted: Tue Jun 30, 2015 9:17 am
by tmcdonald
The exit code of 0 through 3 determines the OK, WARNING, CRITICAL, or UNKNOWN status. You can also pass back textual information, and information that will be used to graph (performance data):

https://nagios-plugins.org/doc/guidelines.html

Re: Need help with a new plugin

Posted: Tue Jun 30, 2015 9:24 am
by ssax
In addition to what tmcdonald posted:

From http://nagios.sourceforge.net/docs/3_0/pluginapi.html:
Plugin Output Spec

At a minimum, plugins should return at least one of text output. Beginning with Nagios 3, plugins can optionally return multiple lines of output. Plugins may also return optional performance data that can be processed by external applications. The basic format for plugin output is shown below:

TEXT OUTPUT | OPTIONAL PERFDATA
LONG TEXT LINE 1
LONG TEXT LINE 2
...
LONG TEXT LINE N | PERFDATA LINE 2
PERFDATA LINE 3
...
PERFDATA LINE N

The performance data (shown in orange) is optional. If a plugin returns performance data in its output, it must separate the performance data from the other text output using a pipe (|) symbol. Additional lines of long text output (shown in blue) are also optional.
You can read more here as well:

https://nagios-plugins.org/doc/guidelines.html

Re: Need help with a new plugin

Posted: Tue Jun 30, 2015 9:32 am
by MaxHeadroom
tmcdonald wrote:The exit code of 0 through 3 determines the OK, WARNING, CRITICAL, or UNKNOWN status. You can also pass back textual information, and information that will be used to graph (performance data):

https://nagios-plugins.org/doc/guidelines.html
Thanks. Graphing is something that I am interested in. Can I have Nagios compare textual information (E.g., temperature) against a threshold and use the alert facilities of Nagios should I go above some value?

Re: Need help with a new plugin

Posted: Tue Jun 30, 2015 9:46 am
by MaxHeadroom
ssax wrote:In addition to what tmcdonald posted:

From http://nagios.sourceforge.net/docs/3_0/pluginapi.html:
Plugin Output Spec

At a minimum, plugins should return at least one of text output. Beginning with Nagios 3, plugins can optionally return multiple lines of output. Plugins may also return optional performance data that can be processed by external applications. The basic format for plugin output is shown below:

TEXT OUTPUT | OPTIONAL PERFDATA
LONG TEXT LINE 1
LONG TEXT LINE 2
...
LONG TEXT LINE N | PERFDATA LINE 2
PERFDATA LINE 3
...
PERFDATA LINE N

The performance data (shown in orange) is optional. If a plugin returns performance data in its output, it must separate the performance data from the other text output using a pipe (|) symbol. Additional lines of long text output (shown in blue) are also optional.

Thanks. Could an example temperature performance data look like this:

Temperature | 78
SendingSystemId=123456789
Units=degrees


Then, again, I would want Nagios to compare the temperature (I.e., 78) to a threshold value. If the threshold is breached then I want Nagios to send out the alerts. coding the threshold inside the plugin doesn't seem like a good architecture.

Re: Need help with a new plugin

Posted: Tue Jun 30, 2015 9:48 am
by jdalrymple
MaxHeadroom wrote:and use the alert facilities of Nagios should I go above some value
This is what Nagios does with the return values 0-3 as indicated by the plugin guidelines already posted.
MaxHeadroom wrote:Graphing is something that I am interested in
This is what perfdata is for
MaxHeadroom wrote:Can I have Nagios compare textual information (E.g., temperature) against a threshold
You can write plugins to compare anything you want, generally I wouldn't consider temperature to be textual, but maybe as an alternative example there are plenty of different logfile readers that can monitor for textual (string) existence and alert on it.

Re: Need help with a new plugin

Posted: Tue Jun 30, 2015 9:51 am
by jdalrymple
MaxHeadroom wrote:coding the threshold inside the plugin doesn't seem like a good architecture
We agree:
Nagios Plugin Guidelines wrote:There are a few reserved options that should not be used for other purposes:

-V version (--version)
-h help (--help)
-t timeout (--timeout)
-w warning threshold (--warning)
-c critical threshold (--critical)

-H hostname (--hostname)
-v verbose (--verbose)

Re: Need help with a new plugin

Posted: Tue Jun 30, 2015 10:10 am
by MaxHeadroom
jdalrymple wrote:
MaxHeadroom wrote:coding the threshold inside the plugin doesn't seem like a good architecture
We agree:
Nagios Plugin Guidelines wrote:There are a few reserved options that should not be used for other purposes:

-V version (--version)
-h help (--help)
-t timeout (--timeout)
-w warning threshold (--warning)
-c critical threshold (--critical)

-H hostname (--hostname)
-v verbose (--verbose)

These look like command line options. What if I have dozens of thresholds? thanks for the great dialog.
Randy

Re: Need help with a new plugin

Posted: Tue Jun 30, 2015 10:14 am
by MaxHeadroom
tmcdonald wrote:The exit code of 0 through 3 determines the OK, WARNING, CRITICAL, or UNKNOWN status
When you say "Exit code", you don't actually mean the plugin exits, do you?

Re: Need help with a new plugin

Posted: Tue Jun 30, 2015 10:27 am
by jdalrymple
MaxHeadroom wrote:What if I have dozens of thresholds?
If you have 1000 refrigerators that use the same temperature thresholds they can be defined as one service that looks like this:

Code: Select all

define service {
	service_description			1000 refrigerators
	check_command			   	check_temp!-w 30 -c 40 #implies warning at 30 degrees, critical at 40 degrees
	hostgroup_name		  		all_my_fridges
	...
	}
If you have 2 sets of 500 refrigerators, each set with specific thresholds you'd define them like this:

Code: Select all

define service {
	service_description			500 refrigerators
	check_command			   	check_temp!-w 30 -c 40
	hostgroup_name			  	half_my_fridges
	...
	}
	
define service {
	service_description			500 different refrigerators
	check_command				   check_temp!-w 30 -c 40
	hostgroup_name				  the_other_half
	...
	}
If every fridge is different - no monitoring software I know of can work without having some thresholds defined somewhere, and the mind-link function isn't online yet. If you have to define thousands of thresholds and can't aggregate anything, you have my sympathy:

Code: Select all

define service {
	service_description			a refrigerator
	check_command				   check_temp!-w 30 -c 40
	host_name					    1_fridge
	...
	}
	
define service {
	service_description			a different refrigerator
	check_command				   check_temp!-w 31 -c 41
	host_name					    2_fridge
	...
	}

define service {
	service_description			yet another
	check_command				   check_temp!-w 32 -c 42
	host_name					    red_fridge
	...
	}
	
define service {
	service_description			and another
	check_command				   check_temp!-w 35 -c 36
	host_name					    blue_fridge
	...
	}
MaxHeadroom wrote:When you say "Exit code", you don't actually mean the plugin exits, do you?
That's how it works:

Code: Select all

[jdalrymple@localhost libexec]$ ./check_uptime -w 10 -u days
Uptime OK: 0 day(s) 19 hour(s) 15 minute(s) | uptime=0.000000;10.000000;;
[jdalrymple@localhost libexec]$ echo $?
0
[jdalrymple@localhost libexec]$ ./check_uptime -w 10 -u minutes
Uptime WARNING: 0 day(s) 19 hour(s) 15 minute(s) | uptime=1155.000000;10.000000;;
[jdalrymple@localhost libexec]$ echo $?
1