All checks have stopped. Not check since yesterday

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

All checks have stopped. Not check since yesterday

Post by krobertson71 »

All checks are showing they have not ran since 2/22 around 4:00 pm EST.

Nothing in Nagios.log other than lack of activity. We didn't notice until about an hour ago when one of our apps went down and no alert went out.

I need help fast on this.
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

Re: All checks have stopped. Not check since yesterday

Post by krobertson71 »

Tried restarting Nagios, rebooting, restarting the Monitoring engine.
User avatar
hsmith
Agent Smith
Posts: 3539
Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:

Re: All checks have stopped. Not check since yesterday

Post by hsmith »

A few things:

Code: Select all

chage -l nagios
service crond status
service nagios status
service mysqld status
tail -n25 /var/log/cron
tail -n25 /var/log/messages
top | head -n5
Former Nagios Employee.
me.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: All checks have stopped. Not check since yesterday

Post by rkennedy »

Can you send over a profile as well?
Former Nagios Employee
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

Re: All checks have stopped. Not check since yesterday

Post by krobertson71 »

Note we did attempt to configure SSL on nagios xi Last firday. This was done incorrectly was was reset. The only thing we did not do is remove our root CA from the ca-store. Server was rebooted today but monitoring process still shows running since yesterday.

Code: Select all

[nagios@nagiasp01 var]$ chage -l nagios
Last password change					: Dec 19, 2014
Password expires					: never
Password inactive					: never
Account expires						: never
Minimum number of days between password change		: 0
Maximum number of days between password change		: 99999
Number of days of warning before password expires	: 7
All services running

CRON log

Code: Select all

Feb 23 14:23:01 nagiasp01 CROND[16936]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1)

Feb 23 14:23:01 nagiasp01 CROND[16940]: (root) CMD (/opt/numara-software/footprints-asset-core/client/bin/agent_check.sh)

Feb 23 14:23:01 nagiasp01 CROND[16937]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1)

Feb 23 14:23:01 nagiasp01 CROND[16938]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php > /usr/local/nagiosxi/var/nom.log 2>&1)

Feb 23 14:24:01 nagiasp01 CROND[18050]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php > /usr/local/nagiosxi/var/nom.log 2>&1)

Feb 23 14:24:01 nagiasp01 CROND[18051]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feedproc.log 2>&1)

Feb 23 14:24:01 nagiasp01 CROND[18052]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php > /usr/local/nagiosxi/var/cleaner.log 2>&1)

Feb 23 14:24:01 nagiasp01 CROND[18053]: (root) CMD (/opt/numara-software/footprints-asset-core/client/bin/agent_check.sh)

Feb 23 14:24:01 nagiasp01 CROND[18057]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php > /usr/local/nagiosxi/var/reportengine.log 2>&1)

Feb 23 14:24:01 nagiasp01 CROND[18058]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1)

Feb 23 14:24:01 nagiasp01 CROND[18059]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1)

Feb 23 14:24:01 nagiasp01 CROND[18060]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1)

Feb 23 14:24:01 nagiasp01 CROND[18068]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1)

Feb 23 14:25:01 nagiasp01 CROND[19185]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1)

Feb 23 14:25:01 nagiasp01 CROND[19187]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/deadpool.php > /usr/local/nagiosxi/var/deadpool.log 2>&1)

Feb 23 14:25:01 nagiasp01 CROND[19188]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1)

Feb 23 14:25:01 nagiasp01 CROND[19186]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php > /usr/local/nagiosxi/var/reportengine.log 2>&1)

Feb 23 14:25:01 nagiasp01 CROND[19190]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php > /usr/local/nagiosxi/var/nom.log 2>&1)

Feb 23 14:25:01 nagiasp01 CROND[19189]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php > /usr/local/nagiosxi/var/cleaner.log 2>&1)

Feb 23 14:25:01 nagiasp01 CROND[19191]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feedproc.log 2>&1)

Feb 23 14:25:01 nagiasp01 CROND[19198]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1)

Feb 23 14:25:01 nagiasp01 CROND[19197]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1)

Feb 23 14:25:01 nagiasp01 CROND[19202]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)

Feb 23 14:25:01 nagiasp01 CROND[19201]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1)

Feb 23 14:25:01 nagiasp01 CROND[19204]: (root) CMD (/opt/numara-software/footprints-asset-core/client/bin/agent_check.sh)
MESSAGES log

Code: Select all

Feb 23 14:02:31 nagiasp01 nagios: Warning: Duplicate definition found for service 'Disk Usage on _' on host 'carrzsp01.dcri.duke.edu' (config file '/usr/local/nagios/etc/services/carrzsp01.dcri.duke.edu.cfg', starting on line 33)

Feb 23 14:02:31 nagiasp01 nagios: Warning: Duplicate definition found for service 'Disk Usage on _opt' on host 'im46rdbp06.dhe.duke.edu' (config file '/usr/local/nagios/etc/services/im46rdbp06.dhe.duke.edu.cfg', starting on line 85)

Feb 23 14:02:31 nagiasp01 nagios: Warning: Duplicate definition found for service 'Disk Usage on _' on host 'im46rdbp06.dhe.duke.edu' (config file '/usr/local/nagios/etc/services/im46rdbp06.dhe.duke.edu.cfg', starting on line 33)

Feb 23 14:02:31 nagiasp01 nagios: Warning: Duplicate definition found for service 'Disk Usage on _opt' on host 'oamsrvrp01.dhe.duke.edu' (config file '/usr/local/nagios/etc/services/oamsrvrp01.dhe.duke.edu.cfg', starting on line 120)

Feb 23 14:02:31 nagiasp01 nagios: Warning: Duplicate definition found for service 'Disk Usage on _opt' on host 'oamsrvrp01.dhe.duke.edu' (config file '/usr/local/nagios/etc/services/oamsrvrp01.dhe.duke.edu.cfg', starting on line 103)

Feb 23 14:02:31 nagiasp01 nagios: Warning: Duplicate definition found for service 'Disk Usage on _' on host 'oamsrvrp01.dhe.duke.edu' (config file '/usr/local/nagios/etc/services/oamsrvrp01.dhe.duke.edu.cfg', starting on line 50)

Feb 23 14:02:31 nagiasp01 nagios: Warning: Duplicate definition found for service 'Disk Usage on _' on host 'oamsrvrp01.dhe.duke.edu' (config file '/usr/local/nagios/etc/services/oamsrvrp01.dhe.duke.edu.cfg', starting on line 33)

Feb 23 14:02:31 nagiasp01 nagios: Warning: Duplicate definition found for service 'Disk Usage on _' on host 'ldapasp02.dcri.duke.net' (config file '/usr/local/nagios/etc/services/ldapasp02.dcri.duke.net.cfg', starting on line 33)

Feb 23 14:02:31 nagiasp01 nagios: Warning: Duplicate definition found for service 'Disk Usage on _' on host 'bsigasp01.dcri.duke.net' (config file '/usr/local/nagios/etc/services/bsigasp01.dcri.duke.net.cfg', starting on line 33)

Feb 23 14:02:31 nagiasp01 nagios: Warning: Duplicate definition found for service 'Disk Usage on _opt' on host 'is46tdbp08.dhe.duke.edu' (config file '/usr/local/nagios/etc/services/is46tdbp08.dhe.duke.edu.cfg', starting on line 85)

Feb 23 14:02:31 nagiasp01 rsyslogd-2177: imuxsock begins to drop messages from pid 27219 due to rate-limiting

Feb 23 14:02:39 nagiasp01 ndo2db: Warning: Could not set effective GID=10003

Feb 23 14:02:39 nagiasp01 rsyslogd-2177: imuxsock lost 316 messages from pid 27219 due to rate-limiting

Feb 23 14:02:39 nagiasp01 nagios: ndomod: Error writing to data sink!  Some output may get lost...

Feb 23 14:02:39 nagiasp01 nagios: ndomod: Please check remote ndo2db log, database connection or SSL Parameters

Feb 23 14:02:55 nagiasp01 nagios: ndomod: Successfully reconnected to data sink!  0 items lost, 361 queued items to flush.

Feb 23 14:02:55 nagiasp01 nagios: ndomod: Successfully flushed 361 queued items to data sink.

Feb 23 14:11:27 nagiasp01 nagios: HOST ALERT: pediatrictrials.org;DOWN;SOFT;1;CRITICAL - Socket timeout after 10 seconds

Feb 23 14:11:29 nagiasp01 nagios: SERVICE ALERT: in46addbp1.dhe.duke.edu;CPU Usage;WARNING;SOFT;1;WARNING: percent was 92%

Feb 23 14:12:23 nagiasp01 nagios: HOST ALERT: pediatrictrials.org;UP;SOFT;2;HTTP OK: HTTP/1.1 200 OK - 37862 bytes in 0.249 second response time

Feb 23 14:12:24 nagiasp01 nagios: SERVICE ALERT: in46addbp1.dhe.duke.edu;CPU Usage;OK;SOFT;2;OK: percent was 79%

Feb 23 14:18:22 nagiasp01 nagios: HOST ALERT: iloe7vi3s08.dcri.duke.net;DOWN;SOFT;1;CRITICAL - 10.0.105.158: rta nan, lost 100%

Feb 23 14:19:20 nagiasp01 nagios: HOST ALERT: iloe7vi3s08.dcri.duke.net;UP;SOFT;2;OK - 10.0.105.158: rta 0.723ms, lost 0%

Feb 23 14:23:22 nagiasp01 nagios: HOST ALERT: iloe7vi3s12.dcri.duke.net;DOWN;SOFT;1;CRITICAL - 10.0.105.162: rta nan, lost 100%

Feb 23 14:24:20 nagiasp01 nagios: HOST ALERT: iloe7vi3s12.dcri.duke.net;UP;SOFT;2;OK - 10.0.105.162: rta 0.752ms, lost 0%

TOP (Also included free -m)

Code: Select all

top - 14:11:50 up  3:00,  2 users,  load average: 0.63, 0.83, 0.90
Tasks: 330 total,   1 running, 329 sleeping,   0 stopped,   0 zombie
Cpu(s): 12.8%us,  2.8%sy,  0.0%ni, 84.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16333568k total,  2742316k used, 13591252k free,    96512k buffers
Swap:  2097148k total,        0k used,  2097148k free,  1232008k cached
[nagios@nagiasp01 var]$ free -m
             total       used       free     shared    buffers     cached
Mem:         15950       2706      13244         13         94       1203
-/+ buffers/cache:       1408      14542 
Swap:         2047          0       2047 
Last edited by hsmith on Tue Feb 23, 2016 2:43 pm, edited 1 time in total.
Reason: We have downloaded the copy of your profile, and removed it from this post. We don't like to leave profiles sitting on the forum due to sensitive information possibly being inside of them.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: All checks have stopped. Not check since yesterday

Post by rkennedy »

Are you using LDAP to authenticate? I noticed this in your apache errors -

Code: Select all

[Tue Feb 23 13:11:56 2016] [error] [client 10.14.18.114] PHP Warning:  ldap_bind(): Unable to bind to server: Invalid credentials in /usr/local/nagiosxi/html/includes/components/ldap_ad_integration/adLDAP/src/adLDAP.php on line 714, referer: http://nagiasp01.dcri.duke.net/nagiosxi/login.php?logout
Former Nagios Employee
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

Re: All checks have stopped. Not check since yesterday

Post by krobertson71 »

LDAP integration is fine. That is lastpass getting in the way when I navigate to it from time to time. Tries to throw in default nagiosadmin creds.

I just tested and everything is fine with LDAP and AD integration.
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

Re: All checks have stopped. Not check since yesterday

Post by krobertson71 »

we can turn this into a remote session if it will help. Just let me know soon.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: All checks have stopped. Not check since yesterday

Post by tmcdonald »

If this is time-sensitive I would strongly urge you to open an email ticket. Email [email protected] with a link to this thread and a short description of the issue, and we'll see about getting a remote set up.
Former Nagios employee
krobertson71
Posts: 444
Joined: Tue Feb 11, 2014 10:16 pm

Re: All checks have stopped. Not check since yesterday

Post by krobertson71 »

Found this as well.

Code: Select all

ndomod: Error writing to data sink!  Some output may get lost...
Feb 23 13:57:41 nagiasp01 nagios: ndomod: Please check remote ndo2db log, database connection or SSL Parameters
The keys we generated are still there but we disabled using ssl. Curious if this could still be an issue.
Locked