Annoying Unknown Service after Updating it

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
roland.konrad
Posts: 7
Joined: Wed Sep 02, 2015 3:21 am

Annoying Unknown Service after Updating it

Post by roland.konrad »

So Nagios 5.5 doesn't like NSClient 0.3.9 anymore. Fine.
We move on to 0.5.2. Now it says: Invalid command line: unrecognised option ...

Ok, I have to reconstruct the Command (reading Windows Backup entries from the eventlog).
Some hours later it's working.

But, and here is my grievance, every now and then the service switches to status Unknown and comes back with the 5 day old error.
If I force an immediate check it comes back ok.

This is really annoying.

Hope you have an idea what to do.

Best, Roland
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Annoying Unknown Service after Updating it

Post by cdienger »

Does the nsclient.log show anything that lines up with the nagios.log when the unknown status comes back? What exactly is the 5 day old error?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
roland.konrad
Posts: 7
Joined: Wed Sep 02, 2015 3:21 am

Re: Annoying Unknown Service after Updating it

Post by roland.konrad »

There is no error in the nsclient.log.
The error is the same error it showed before I changed the command syntax.

Invalid command line: unrecognised option ...

As I said, once I force a check the error is gone. Status switches to ok.
If I don't check unknown will go away some time and come back later.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Annoying Unknown Service after Updating it

Post by cdienger »

5.5.1 was just released with a bug fix dealing with checks. Update the 5.5.1 and let us know if the behavior continues.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
roland.konrad
Posts: 7
Joined: Wed Sep 02, 2015 3:21 am

Re: Annoying Unknown Service after Updating it

Post by roland.konrad »

I upgraded to 5.5.1 - the problem persists.

In the meantime I removed every trace of the old command from the database.
I just don't understand how Nagios 'remembers' a command I used 7 days ago.

To me it looks like Nagios queries NSClient with an 'old' command it 'remembers' instead of looking in it's database.
The client replies by saying it doesn't recognise the option and gives a list of options it knows.
EDIT: I keep playing with it. Another explaination would be Nagios 'remembers' the clients reply. It still makes no sense to me.
Sometimes I force a check and it goes to ok - just to switch back to unknown a few seconds later.

If I force a check it always uses the latest version of command.

I am sorry to keep bothering you, but with services going on and off it feels like Las Vegas...
jomann
Development Lead
Posts: 611
Joined: Mon Apr 22, 2013 10:06 am
Location: Nagios Enterprises

Re: Annoying Unknown Service after Updating it

Post by jomann »

Have you re-written your entire configuration from the CCM database by going into the CCM and selecting "Config File Management" and clicking on Delete Files, Write Config, and then Verify/Apply? This would ensure that all your files for Core are written out according to what is in the CCM. If you are nervous I would recommend making a backup of /usr/local/nagios/etc just in case.

Also, can you run and copy the output of this to verify that Nagios Core only has one instance running?

Code: Select all

ps aux | grep nagios
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
roland.konrad
Posts: 7
Joined: Wed Sep 02, 2015 3:21 am

Re: Annoying Unknown Service after Updating it

Post by roland.konrad »

I deleted files, wrote config and verified/applied.
There were some minor issues which I corrected (check interval > notification interval).

Then I rebooted the server to male sure it has a clean start.

Then I ran ps aux | grep nagios with the following output:
  • [root@nagios ~]# ps aux | grep nagios
    nagios 1455 0.0 0.0 41360 1436 ? Ss 08:19 0:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
    nagios 1585 0.0 0.0 303352 956 ? S 08:19 0:00 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
    nagios 1597 0.0 0.0 49828 840 ? Ss 08:19 0:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
    nagios 1650 0.3 0.2 22532 3864 ? Ss 08:19 0:21 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
    nagios 1655 0.0 0.0 10036 1068 ? S 08:19 0:04 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
    nagios 1656 0.0 0.0 10036 1044 ? S 08:19 0:03 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
    nagios 1657 0.0 0.0 10036 1064 ? S 08:19 0:02 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
    nagios 1658 0.0 0.0 10036 1068 ? S 08:19 0:03 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
    nagios 1662 0.0 0.0 49828 1620 ? S 08:19 0:01 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
    nagios 1663 1.2 0.1 50372 2276 ? S 08:19 1:12 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
    nagios 1682 0.0 0.1 22016 2736 ? S 08:19 0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
    postgres 2001 0.0 0.3 217856 7152 ? Ss 08:19 0:00 postgres: nagiosxi nagiosxi ::1(47648) idle
    postgres 2009 0.0 0.3 217856 7036 ? Ss 08:19 0:00 postgres: nagiosxi nagiosxi ::1(47655) idle
    postgres 2012 0.0 0.3 217240 6252 ? Ss 08:19 0:00 postgres: nagiosxi nagiosxi ::1(47656) idle
    postgres 2018 0.0 0.3 217240 6252 ? Ss 08:19 0:00 postgres: nagiosxi nagiosxi ::1(47658) idle
    postgres 2021 0.0 0.3 217240 6276 ? Ss 08:19 0:00 postgres: nagiosxi nagiosxi ::1(47659) idle
    postgres 2781 0.0 0.3 217240 6256 ? Ss 08:20 0:00 postgres: nagiosxi nagiosxi ::1(47792) idle
    postgres 2784 0.0 0.3 217828 6840 ? Ss 08:20 0:00 postgres: nagiosxi nagiosxi ::1(47793) idle
    postgres 2861 0.0 0.3 217240 6020 ? Ss 08:20 0:00 postgres: nagiosxi nagiosxi ::1(47805) idle
    postgres 3016 0.0 0.3 217240 6252 ? Ss 08:20 0:00 postgres: nagiosxi nagiosxi ::1(47841) idle
    postgres 4375 0.0 0.3 217312 6256 ? Ss 08:21 0:00 postgres: nagiosxi nagiosxi ::1(48104) idle
    postgres 7904 0.0 0.3 217244 6228 ? Ss 08:23 0:00 postgres: nagiosxi nagiosxi ::1(48610) idle
    nagios 12450 0.0 0.0 106060 1268 ? Ss 09:55 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php >> /usr/local/nagiosxi/var/sysstat.log 2>&1
    nagios 12452 0.0 0.0 106060 1264 ? Ss 09:55 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php >> /usr/local/nagiosxi/var/feedproc.log 2>&1
    nagios 12453 0.0 0.0 106060 1264 ? Ss 09:55 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php >> /usr/local/nagiosxi/var/eventman.log 2>&1
    nagios 12457 0.0 0.0 106060 1260 ? Ss 09:55 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php >> /usr/local/nagiosxi/var/cmdsubsys.log 2>&1
    nagios 12458 0.0 0.0 106060 1264 ? Ss 09:55 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/event_handler.php >> /usr/local/nagiosxi/var/event_handler.log 2>&1
    nagios 12459 0.0 0.0 106060 1264 ? Ss 09:55 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php >> /usr/local/nagiosxi/var/perfdataproc.log 2>&1
    nagios 12461 0.4 1.3 321904 25172 ? S 09:55 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
    nagios 12463 0.3 1.2 321612 24924 ? S 09:55 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
    nagios 12464 0.5 1.8 332436 36024 ? S 09:55 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
    nagios 12468 0.6 1.8 332716 36172 ? S 09:55 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
    nagios 12469 0.4 1.3 322168 25296 ? S 09:55 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
    nagios 12470 0.4 1.2 321676 24924 ? S 09:55 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/event_handler.php
    postgres 12483 0.0 0.2 217244 5764 ? Ss 09:55 0:00 postgres: nagiosxi nagiosxi ::1(39534) idle
    postgres 12485 0.0 0.2 217104 4956 ? Ss 09:55 0:00 postgres: nagiosxi nagiosxi ::1(39536) idle
    postgres 12487 0.0 0.3 217244 5776 ? Ss 09:55 0:00 postgres: nagiosxi nagiosxi ::1(39538) idle
    postgres 12488 0.0 0.3 217244 5824 ? Ss 09:55 0:00 postgres: nagiosxi nagiosxi ::1(39539) idle
    postgres 12533 0.0 0.3 217264 6256 ? Ss 09:55 0:00 postgres: nagiosxi nagiosxi ::1(39542) idle
    postgres 12537 0.0 0.3 217304 6120 ? Ss 09:55 0:00 postgres: nagiosxi nagiosxi ::1(39544) idle
    nagios 12928 4.8 1.9 206852 36936 ? S 09:55 0:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_esx3.pl -H 188.20.66.67:11444 -f /usr/local/nagiosxi/etc/components/vmware/esxi_rrp_at_auth.txt -l CPU
    nagios 13015 0.0 0.0 106060 1268 ? S 09:55 0:00 sh -c /usr/bin/iostat -c 5 2 | tail --lines=2 | head --lines=1 | awk '{ print $1,$2,$3,$4,$5,$6 }'
    nagios 13016 0.0 0.0 100960 844 ? S 09:55 0:00 /usr/bin/iostat -c 5 2
    nagios 13017 0.0 0.0 100944 696 ? S 09:55 0:00 tail --lines=2
    nagios 13018 0.0 0.0 100920 676 ? S 09:55 0:00 head --lines=1
    nagios 13019 0.0 0.0 105944 968 ? S 09:55 0:00 awk { print $1,$2,$3,$4,$5,$6 }
    root 13074 0.0 0.0 103248 884 pts/0 S+ 09:55 0:00 grep nagios
    [root@nagios ~]#
roland.konrad
Posts: 7
Joined: Wed Sep 02, 2015 3:21 am

Re: Annoying Unknown Service after Updating it

Post by roland.konrad »

Oh, I forgot to say that the problem still persists ...
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Annoying Unknown Service after Updating it

Post by npolovenko »

@roland.konrad, Are you using mod gearman workers on the XI server? If so you may try upgrading it using the instructions here:
https://assets.nagios.com/downloads/nag ... ios_XI.pdf

Also, you may removing the:

Code: Select all

/usr/local/nagios/var/status.dat
and

Code: Select all

/usr/local/nagios/var/retention.dat
So run these commands in order:

Code: Select all

service nagios stop
killall -9 nagios
rm /usr/local/nagios/var/status.dat
rm /usr/local/nagios/var/retention.dat
service nagios start
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
roland.konrad
Posts: 7
Joined: Wed Sep 02, 2015 3:21 am

Re: Annoying Unknown Service after Updating it

Post by roland.konrad »

Thank you, that seems to have done the trick.

Stopped the service. There was no status.dat but a retention.dat

Looks good!
Thanks a lot!
Locked