Page 1 of 1
no checks
Posted: Wed Oct 30, 2019 3:59 am
by scharft
hi,
today Nagios doesn't do any check (host or service) from ~ 6.30 pm to 9.30pm.
Every Nagios service was running, the System Component Status and Monitoring Engine Status are good.
Maybe the high htime is a indicator?
Service Check Latency
Min 0.00 sec
Max 108.37 sec
Avg 0.19 sec
Service Check Execution Time
Min 0.00 sec
Max 96.16 sec
Avg 1.67 sec
How can find the issue?
A host reboot fixed the issue for the moment but we should find a solution to prevent this in the future happen again.
Best Regards
Thomas
Re: no checks
Posted: Wed Oct 30, 2019 9:55 am
by benjaminsmith
Hi Thomas,
Looking at the check latency, it would suggest that one or more service checks may have not run as scheduled. We'd like to review the logs on the server. Can you send us the current nagios.log, it's in the /usr/local/nagios/var directory?
Also, a fresh system profile and the /var/log/messages file. Thanks.
Re: no checks
Posted: Wed Oct 30, 2019 4:18 pm
by tgriep
Thanks for the profile but we need to get the full /usr/local/nagios/var/nagios.log file and the /var/log/messages file for when the issue happened.
I would like to look at this file as well as I saw some Postgres database issues.
/var/lib/pgsql/data/pg_log/postgresql-Wed.log
I took a look at the profile and I see what looks like a passive check sending in lot of bad data. You should fix that.
Code: Select all
At line:1 char:1
+ C:\\MONITORING\\check_passive.ps1 ; exit($lastexitcode)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (C:\\MONITORING\\check_passive.p
s1:String) [], CommandNotFoundEx
n returned error Unknown or unsupported command
[1572433299] Warning: Unrecognized external command -> CommandNotFoundEx
n;
[1572433299] External command function, script file, or operable program. Check the spelling of the name, or
if a path was included, verify that the path is correct and try again.
At line:1 char:1
+ C:\\MONITORING\\check_passive.ps1 ; exit($lastexitcode)
In the profile, the nagios process is not running, please start it.
The kernel message queue needs to be increased. Follow this article to do that.
https://support.nagios.com/kb/article/n ... d-139.html
Re: no checks
Posted: Mon Nov 04, 2019 1:30 am
by scharft
i happend again, setting a downtime in nagios was not possible, the production phoned me during my sleep and produced many alarms because they were not able to set the downtime.
So it was not possible to download the system profile before i reboot the server....
the message queues is increased.
The log file from wednesday and today is attached, the nagios.log file is 96mb big, to much for the this upload..
Re: no checks
Posted: Mon Nov 04, 2019 2:20 am
by scharft
the passive checks every 2nd day in stale state..
enough sending log files to you.
i want a technican who remotely works on our nagios server to fix this issue THIS WEEK
Re: no checks
Posted: Mon Nov 04, 2019 3:26 pm
by benjaminsmith
Hello Thomas,
I fully appreciate the inconvenience this has caused you, and would like to maintain your cooperation in this process. We believe you have issues with your passive check configurations, and let's get a ticket opened on this and a remote session.
In regards to the log files, they are necessary for troubleshooting, and also help the technicians prepare for a successful remote. In short, the logs help speed up the time to resolution.
To open a support ticket, please visit.
https://support.nagios.com/tickets/