Page 1 of 2
Need help with a new plugin
Posted: Tue Jun 30, 2015 6:47 am
by MaxHeadroom
I have a system that implements a messaging middleware. I want to build a plugin for Nagios to capture status messages being sent across this middleware. The status messages can be very robust and include many parameters (E.g., State, mode, status, uptime, software revs, hardware info, installation details, Temperature, etc )
I was thinking the plugin would parse the status messages and pass the detailed information to Nagios. From there I could use the facilities within Nagios to determine thresholds, create alerts, and perform trend analysis and reporting. But it doesn't seem to work that way. It appears I only have OK, Warning, or Critical for passing to Nagios.
Am I reading this wrong? is there a way to pass more information to Nagios????
Thanks
Randy
Re: Need help with a new plugin
Posted: Tue Jun 30, 2015 9:17 am
by tmcdonald
The exit code of 0 through 3 determines the OK, WARNING, CRITICAL, or UNKNOWN status. You can also pass back textual information, and information that will be used to graph (performance data):
https://nagios-plugins.org/doc/guidelines.html
Re: Need help with a new plugin
Posted: Tue Jun 30, 2015 9:24 am
by ssax
In addition to what tmcdonald posted:
From
http://nagios.sourceforge.net/docs/3_0/pluginapi.html:
Plugin Output Spec
At a minimum, plugins should return at least one of text output. Beginning with Nagios 3, plugins can optionally return multiple lines of output. Plugins may also return optional performance data that can be processed by external applications. The basic format for plugin output is shown below:
TEXT OUTPUT | OPTIONAL PERFDATA
LONG TEXT LINE 1
LONG TEXT LINE 2
...
LONG TEXT LINE N | PERFDATA LINE 2
PERFDATA LINE 3
...
PERFDATA LINE N
The performance data (shown in orange) is optional. If a plugin returns performance data in its output, it must separate the performance data from the other text output using a pipe (|) symbol. Additional lines of long text output (shown in blue) are also optional.
You can read more here as well:
https://nagios-plugins.org/doc/guidelines.html
Re: Need help with a new plugin
Posted: Tue Jun 30, 2015 9:32 am
by MaxHeadroom
tmcdonald wrote:The exit code of 0 through 3 determines the OK, WARNING, CRITICAL, or UNKNOWN status. You can also pass back textual information, and information that will be used to graph (performance data):
https://nagios-plugins.org/doc/guidelines.html
Thanks. Graphing is something that I am interested in. Can I have Nagios compare textual information (E.g., temperature) against a threshold and use the alert facilities of Nagios should I go above some value?
Re: Need help with a new plugin
Posted: Tue Jun 30, 2015 9:46 am
by MaxHeadroom
ssax wrote:In addition to what tmcdonald posted:
From
http://nagios.sourceforge.net/docs/3_0/pluginapi.html:
Plugin Output Spec
At a minimum, plugins should return at least one of text output. Beginning with Nagios 3, plugins can optionally return multiple lines of output. Plugins may also return optional performance data that can be processed by external applications. The basic format for plugin output is shown below:
TEXT OUTPUT | OPTIONAL PERFDATA
LONG TEXT LINE 1
LONG TEXT LINE 2
...
LONG TEXT LINE N | PERFDATA LINE 2
PERFDATA LINE 3
...
PERFDATA LINE N
The performance data (shown in orange) is optional. If a plugin returns performance data in its output, it must separate the performance data from the other text output using a pipe (|) symbol. Additional lines of long text output (shown in blue) are also optional.
Thanks. Could an example temperature performance data look like this:
Temperature | 78
SendingSystemId=123456789
Units=degrees
Then, again, I would want Nagios to compare the temperature (I.e., 78) to a threshold value. If the threshold is breached then I want Nagios to send out the alerts. coding the threshold inside the plugin doesn't seem like a good architecture.
Re: Need help with a new plugin
Posted: Tue Jun 30, 2015 9:48 am
by jdalrymple
MaxHeadroom wrote:and use the alert facilities of Nagios should I go above some value
This is what Nagios does with the return values 0-3 as indicated by the plugin guidelines already posted.
MaxHeadroom wrote:Graphing is something that I am interested in
This is what
perfdata is for
MaxHeadroom wrote:Can I have Nagios compare textual information (E.g., temperature) against a threshold
You can write plugins to compare anything you want, generally I wouldn't consider temperature to be textual, but maybe as an alternative example
there are plenty of different logfile readers that can monitor for textual (string) existence and alert on it.
Re: Need help with a new plugin
Posted: Tue Jun 30, 2015 9:51 am
by jdalrymple
MaxHeadroom wrote:coding the threshold inside the plugin doesn't seem like a good architecture
We agree:
Nagios Plugin Guidelines wrote:There are a few reserved options that should not be used for other purposes:
-V version (--version)
-h help (--help)
-t timeout (--timeout)
-w warning threshold (--warning)
-c critical threshold (--critical)
-H hostname (--hostname)
-v verbose (--verbose)
Re: Need help with a new plugin
Posted: Tue Jun 30, 2015 10:10 am
by MaxHeadroom
jdalrymple wrote:MaxHeadroom wrote:coding the threshold inside the plugin doesn't seem like a good architecture
We agree:
Nagios Plugin Guidelines wrote:There are a few reserved options that should not be used for other purposes:
-V version (--version)
-h help (--help)
-t timeout (--timeout)
-w warning threshold (--warning)
-c critical threshold (--critical)
-H hostname (--hostname)
-v verbose (--verbose)
These look like command line options. What if I have dozens of thresholds? thanks for the great dialog.
Randy
Re: Need help with a new plugin
Posted: Tue Jun 30, 2015 10:14 am
by MaxHeadroom
tmcdonald wrote:The exit code of 0 through 3 determines the OK, WARNING, CRITICAL, or UNKNOWN status
When you say "Exit code", you don't actually mean the plugin exits, do you?
Re: Need help with a new plugin
Posted: Tue Jun 30, 2015 10:27 am
by jdalrymple
MaxHeadroom wrote:What if I have dozens of thresholds?
If you have 1000 refrigerators that use the same temperature thresholds they can be defined as one service that looks like this:
Code: Select all
define service {
service_description 1000 refrigerators
check_command check_temp!-w 30 -c 40 #implies warning at 30 degrees, critical at 40 degrees
hostgroup_name all_my_fridges
...
}
If you have 2 sets of 500 refrigerators, each set with specific thresholds you'd define them like this:
Code: Select all
define service {
service_description 500 refrigerators
check_command check_temp!-w 30 -c 40
hostgroup_name half_my_fridges
...
}
define service {
service_description 500 different refrigerators
check_command check_temp!-w 30 -c 40
hostgroup_name the_other_half
...
}
If every fridge is different - no monitoring software I know of can work without having some thresholds defined somewhere, and the mind-link function isn't online yet. If you have to define thousands of thresholds and can't aggregate anything, you have my sympathy:
Code: Select all
define service {
service_description a refrigerator
check_command check_temp!-w 30 -c 40
host_name 1_fridge
...
}
define service {
service_description a different refrigerator
check_command check_temp!-w 31 -c 41
host_name 2_fridge
...
}
define service {
service_description yet another
check_command check_temp!-w 32 -c 42
host_name red_fridge
...
}
define service {
service_description and another
check_command check_temp!-w 35 -c 36
host_name blue_fridge
...
}
MaxHeadroom wrote:When you say "Exit code", you don't actually mean the plugin exits, do you?
That's how it works:
Code: Select all
[jdalrymple@localhost libexec]$ ./check_uptime -w 10 -u days
Uptime OK: 0 day(s) 19 hour(s) 15 minute(s) | uptime=0.000000;10.000000;;
[jdalrymple@localhost libexec]$ echo $?
0
[jdalrymple@localhost libexec]$ ./check_uptime -w 10 -u minutes
Uptime WARNING: 0 day(s) 19 hour(s) 15 minute(s) | uptime=1155.000000;10.000000;;
[jdalrymple@localhost libexec]$ echo $?
1