Page 1 of 1

LONGSERVICEOUTPUT or Multiline check output on nagios log

Posted: Thu Jun 15, 2017 2:57 pm
by khmon
Hello

We are currently developing some checks that output information on multiline format. For example

OK/WARN/CRIT Myservice | Perfdata
Line1
Line2
Line3


Now, the issue (Nagios 4.3.1, Linux Centos 7) is that I do see the events on the main log file, the current status on status.dat, but for historic purposes, I don't seem to find the multi-line output of a check on nagios.log. That is the log contains

[timestamp] .... OK/WARN/CRIT Myservice

The thing is that I don't see any "Line1 ... Line3" in there.

Are we doing something wrong or status.dat's long_plugin_output is not logged on nagios.log? Should we enable some setting?

Re: LONGSERVICEOUTPUT or Multiline check output on nagios lo

Posted: Fri Jun 16, 2017 10:46 am
by lmiltchev
There is a feature request for adding this functionality to Nagios Core - see here: https://github.com/NagiosEnterprises/na ... issues/262

Re: LONGSERVICEOUTPUT or Multiline check output on nagios lo

Posted: Tue Jun 20, 2017 8:49 am
by khmon
Thanks for your feedback!

Is there an ETA for when that request may be completed?

I was able to get it to work for the time being by editing base/logging.c , and updating log_service_event() to do:

asprintf(&temp_buffer, "SERVICE ALERT: %s;%s;%s;%s;%d;%s;%s\n",
svc->host_name, svc->description,
service_state_name(svc->current_state),
state_type_name(svc->state_type),
svc->current_attempt,
(svc->plugin_output == NULL) ? "" : svc->plugin_output,
(svc->long_plugin_output == NULL) ? "" : svc->long_plugin_output);

The last "(svc->long_plugin_output == NULL) ? "" : svc->long_plugin_output)" and the additional ",%s" is the minimum required to get the nagios long (multiline) plugin output saved to the nagios.log fie.

I haven't dived enough into the code as to know if nagios.h "MAX_PLUGIN_OUTPUT_LENGTH" also controls the amount of data a plugin can send back to nagios. Will try to look for that.

I have tried to see how feasible it would be to submit this code, but I'd need to dive at the code to see how options are submitted and managed on the code to add "options" if the only way to get this in the codebase is to have some sort of a "log_long_pluging_output 1" on nagios.cfg.

There might be other places beyond SERVICE ALERT where something like this might be required, though.

Re: LONGSERVICEOUTPUT or Multiline check output on nagios lo

Posted: Tue Jun 20, 2017 3:35 pm
by tgriep
It looks like the enhancement is in review so there is not any time frame on when or if it will be released.

Re: LONGSERVICEOUTPUT or Multiline check output on nagios lo

Posted: Sun Jul 16, 2017 7:57 pm
by khmon
Thanks for your reply.


Here's the issue why we need this feature working and stable as soon as it might realistically be.

We are using livestatus and check_mk to gather data from a fair number of Nagios servers, each monitoring a number of servers and services in each of our customer's infrastructure. We have a good number of plugins that show data (such as data from databases) which have a LONGSERVICEOUTPUT with data about mostly problematic queries. We have now our own branch of nagios working so that it at least logs on a file said LONGSERVICEOUTPUT.

However, as this is not today a Nagios feature, mklivestatus and check_mk produce "funny" (and sadly... misleading) data on the console "live" and on reports.

Say for example that a plugin on a DB server shows slow queries with a certain query, which it displays on the LONGSERVICEOUTPUT of the plugin, such as "SELECT custid from customers where (something == 3)". Imagine now that the plugin is returning "CRITICAL" with that query, Ok, so you see that on Nagios, you DON'T see that LONGSERVICEOUTPUT on the log file on a regular Nagios, and you see that on check_mk if you add the column to see LONGSERVICEOUTPUT. So far so good.

Now say the plugin goes back to OK, no LONGSERVICEOUTPUT, so there's no output to display, or log. As soon as the service goes to OK, the "LONGSERVICEOUTPUT" column goes back to blank.

However, on the events, as there is no log, there's no information about what was the slow query as again, current Nagios leaves no log.

Say that the DBA is looking at the console, and the plugin goes again CRITICAL, with the query now being "select custid from customers where (something == 4)". So you see that in the same places like before. ONLY THAT NOW, check_mk shows "select custid from customers where (something == 4)" in the LONGSERVICEOUPUT for the current critical issue, but also for the previous one (and all the other previous ones!!!), Now all of the occurences on check_mk of that alarm on that server show "select custid from customers where (something == 4)" even if something else caused them in the past, either in the previous event, and in all other previous events. So this beats out any good thing about having LINGSERVICEOUTPUT displayed, as it will display wrong data for all the other past queries.

Now, I'd like to also have that fixed on check_mk, but I can't go trying to get this check_mk/livestatus fixed, if Nagios itself does not have a stable (and actual) way of going back over previous LONGSERVICEOUTPUT of previous events, as it even does not stores that information anywhere, let alone retrieve it. Once this is stable on Nagios, I can go about getting it fixed elsewhere. In the meantime, we are dealing with this by having different teams go to each Nagios individually, each with a local branch of Nagios which logs the LONGSERVICEOUTPUT, and have them totally disregard the LONGSERVICEOUTPUT on the check_mk console, where they actually look for the first advice of issues coming from the different customers.


Let me know if this makes sense with the current status of nagios, or you guys think this might be caused by something else.

Re: LONGSERVICEOUTPUT or Multiline check output on nagios lo

Posted: Mon Jul 17, 2017 4:40 pm
by tmcdonald
While I can see the use case, it is ultimately up to the developers whether this is added, and if so, when.