All checks have stopped. Not check since yesterday
-
krobertson71
- Posts: 444
- Joined: Tue Feb 11, 2014 10:16 pm
All checks have stopped. Not check since yesterday
All checks are showing they have not ran since 2/22 around 4:00 pm EST.
Nothing in Nagios.log other than lack of activity. We didn't notice until about an hour ago when one of our apps went down and no alert went out.
I need help fast on this.
Nothing in Nagios.log other than lack of activity. We didn't notice until about an hour ago when one of our apps went down and no alert went out.
I need help fast on this.
-
krobertson71
- Posts: 444
- Joined: Tue Feb 11, 2014 10:16 pm
Re: All checks have stopped. Not check since yesterday
Tried restarting Nagios, rebooting, restarting the Monitoring engine.
Re: All checks have stopped. Not check since yesterday
A few things:
Code: Select all
chage -l nagios
service crond status
service nagios status
service mysqld status
tail -n25 /var/log/cron
tail -n25 /var/log/messages
top | head -n5Former Nagios Employee.
me.
me.
Re: All checks have stopped. Not check since yesterday
Can you send over a profile as well?
Former Nagios Employee
-
krobertson71
- Posts: 444
- Joined: Tue Feb 11, 2014 10:16 pm
Re: All checks have stopped. Not check since yesterday
Note we did attempt to configure SSL on nagios xi Last firday. This was done incorrectly was was reset. The only thing we did not do is remove our root CA from the ca-store. Server was rebooted today but monitoring process still shows running since yesterday.
All services running
CRON log
MESSAGES log
TOP (Also included free -m)
Code: Select all
[nagios@nagiasp01 var]$ chage -l nagios
Last password change : Dec 19, 2014
Password expires : never
Password inactive : never
Account expires : never
Minimum number of days between password change : 0
Maximum number of days between password change : 99999
Number of days of warning before password expires : 7
CRON log
Code: Select all
Feb 23 14:23:01 nagiasp01 CROND[16936]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1)
Feb 23 14:23:01 nagiasp01 CROND[16940]: (root) CMD (/opt/numara-software/footprints-asset-core/client/bin/agent_check.sh)
Feb 23 14:23:01 nagiasp01 CROND[16937]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1)
Feb 23 14:23:01 nagiasp01 CROND[16938]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php > /usr/local/nagiosxi/var/nom.log 2>&1)
Feb 23 14:24:01 nagiasp01 CROND[18050]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php > /usr/local/nagiosxi/var/nom.log 2>&1)
Feb 23 14:24:01 nagiasp01 CROND[18051]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feedproc.log 2>&1)
Feb 23 14:24:01 nagiasp01 CROND[18052]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php > /usr/local/nagiosxi/var/cleaner.log 2>&1)
Feb 23 14:24:01 nagiasp01 CROND[18053]: (root) CMD (/opt/numara-software/footprints-asset-core/client/bin/agent_check.sh)
Feb 23 14:24:01 nagiasp01 CROND[18057]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php > /usr/local/nagiosxi/var/reportengine.log 2>&1)
Feb 23 14:24:01 nagiasp01 CROND[18058]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1)
Feb 23 14:24:01 nagiasp01 CROND[18059]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1)
Feb 23 14:24:01 nagiasp01 CROND[18060]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1)
Feb 23 14:24:01 nagiasp01 CROND[18068]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1)
Feb 23 14:25:01 nagiasp01 CROND[19185]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1)
Feb 23 14:25:01 nagiasp01 CROND[19187]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/deadpool.php > /usr/local/nagiosxi/var/deadpool.log 2>&1)
Feb 23 14:25:01 nagiasp01 CROND[19188]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1)
Feb 23 14:25:01 nagiasp01 CROND[19186]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php > /usr/local/nagiosxi/var/reportengine.log 2>&1)
Feb 23 14:25:01 nagiasp01 CROND[19190]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php > /usr/local/nagiosxi/var/nom.log 2>&1)
Feb 23 14:25:01 nagiasp01 CROND[19189]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php > /usr/local/nagiosxi/var/cleaner.log 2>&1)
Feb 23 14:25:01 nagiasp01 CROND[19191]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feedproc.log 2>&1)
Feb 23 14:25:01 nagiasp01 CROND[19198]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1)
Feb 23 14:25:01 nagiasp01 CROND[19197]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1)
Feb 23 14:25:01 nagiasp01 CROND[19202]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)
Feb 23 14:25:01 nagiasp01 CROND[19201]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1)
Feb 23 14:25:01 nagiasp01 CROND[19204]: (root) CMD (/opt/numara-software/footprints-asset-core/client/bin/agent_check.sh)Code: Select all
Feb 23 14:02:31 nagiasp01 nagios: Warning: Duplicate definition found for service 'Disk Usage on _' on host 'carrzsp01.dcri.duke.edu' (config file '/usr/local/nagios/etc/services/carrzsp01.dcri.duke.edu.cfg', starting on line 33)
Feb 23 14:02:31 nagiasp01 nagios: Warning: Duplicate definition found for service 'Disk Usage on _opt' on host 'im46rdbp06.dhe.duke.edu' (config file '/usr/local/nagios/etc/services/im46rdbp06.dhe.duke.edu.cfg', starting on line 85)
Feb 23 14:02:31 nagiasp01 nagios: Warning: Duplicate definition found for service 'Disk Usage on _' on host 'im46rdbp06.dhe.duke.edu' (config file '/usr/local/nagios/etc/services/im46rdbp06.dhe.duke.edu.cfg', starting on line 33)
Feb 23 14:02:31 nagiasp01 nagios: Warning: Duplicate definition found for service 'Disk Usage on _opt' on host 'oamsrvrp01.dhe.duke.edu' (config file '/usr/local/nagios/etc/services/oamsrvrp01.dhe.duke.edu.cfg', starting on line 120)
Feb 23 14:02:31 nagiasp01 nagios: Warning: Duplicate definition found for service 'Disk Usage on _opt' on host 'oamsrvrp01.dhe.duke.edu' (config file '/usr/local/nagios/etc/services/oamsrvrp01.dhe.duke.edu.cfg', starting on line 103)
Feb 23 14:02:31 nagiasp01 nagios: Warning: Duplicate definition found for service 'Disk Usage on _' on host 'oamsrvrp01.dhe.duke.edu' (config file '/usr/local/nagios/etc/services/oamsrvrp01.dhe.duke.edu.cfg', starting on line 50)
Feb 23 14:02:31 nagiasp01 nagios: Warning: Duplicate definition found for service 'Disk Usage on _' on host 'oamsrvrp01.dhe.duke.edu' (config file '/usr/local/nagios/etc/services/oamsrvrp01.dhe.duke.edu.cfg', starting on line 33)
Feb 23 14:02:31 nagiasp01 nagios: Warning: Duplicate definition found for service 'Disk Usage on _' on host 'ldapasp02.dcri.duke.net' (config file '/usr/local/nagios/etc/services/ldapasp02.dcri.duke.net.cfg', starting on line 33)
Feb 23 14:02:31 nagiasp01 nagios: Warning: Duplicate definition found for service 'Disk Usage on _' on host 'bsigasp01.dcri.duke.net' (config file '/usr/local/nagios/etc/services/bsigasp01.dcri.duke.net.cfg', starting on line 33)
Feb 23 14:02:31 nagiasp01 nagios: Warning: Duplicate definition found for service 'Disk Usage on _opt' on host 'is46tdbp08.dhe.duke.edu' (config file '/usr/local/nagios/etc/services/is46tdbp08.dhe.duke.edu.cfg', starting on line 85)
Feb 23 14:02:31 nagiasp01 rsyslogd-2177: imuxsock begins to drop messages from pid 27219 due to rate-limiting
Feb 23 14:02:39 nagiasp01 ndo2db: Warning: Could not set effective GID=10003
Feb 23 14:02:39 nagiasp01 rsyslogd-2177: imuxsock lost 316 messages from pid 27219 due to rate-limiting
Feb 23 14:02:39 nagiasp01 nagios: ndomod: Error writing to data sink! Some output may get lost...
Feb 23 14:02:39 nagiasp01 nagios: ndomod: Please check remote ndo2db log, database connection or SSL Parameters
Feb 23 14:02:55 nagiasp01 nagios: ndomod: Successfully reconnected to data sink! 0 items lost, 361 queued items to flush.
Feb 23 14:02:55 nagiasp01 nagios: ndomod: Successfully flushed 361 queued items to data sink.
Feb 23 14:11:27 nagiasp01 nagios: HOST ALERT: pediatrictrials.org;DOWN;SOFT;1;CRITICAL - Socket timeout after 10 seconds
Feb 23 14:11:29 nagiasp01 nagios: SERVICE ALERT: in46addbp1.dhe.duke.edu;CPU Usage;WARNING;SOFT;1;WARNING: percent was 92%
Feb 23 14:12:23 nagiasp01 nagios: HOST ALERT: pediatrictrials.org;UP;SOFT;2;HTTP OK: HTTP/1.1 200 OK - 37862 bytes in 0.249 second response time
Feb 23 14:12:24 nagiasp01 nagios: SERVICE ALERT: in46addbp1.dhe.duke.edu;CPU Usage;OK;SOFT;2;OK: percent was 79%
Feb 23 14:18:22 nagiasp01 nagios: HOST ALERT: iloe7vi3s08.dcri.duke.net;DOWN;SOFT;1;CRITICAL - 10.0.105.158: rta nan, lost 100%
Feb 23 14:19:20 nagiasp01 nagios: HOST ALERT: iloe7vi3s08.dcri.duke.net;UP;SOFT;2;OK - 10.0.105.158: rta 0.723ms, lost 0%
Feb 23 14:23:22 nagiasp01 nagios: HOST ALERT: iloe7vi3s12.dcri.duke.net;DOWN;SOFT;1;CRITICAL - 10.0.105.162: rta nan, lost 100%
Feb 23 14:24:20 nagiasp01 nagios: HOST ALERT: iloe7vi3s12.dcri.duke.net;UP;SOFT;2;OK - 10.0.105.162: rta 0.752ms, lost 0%TOP (Also included free -m)
Code: Select all
top - 14:11:50 up 3:00, 2 users, load average: 0.63, 0.83, 0.90
Tasks: 330 total, 1 running, 329 sleeping, 0 stopped, 0 zombie
Cpu(s): 12.8%us, 2.8%sy, 0.0%ni, 84.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 16333568k total, 2742316k used, 13591252k free, 96512k buffers
Swap: 2097148k total, 0k used, 2097148k free, 1232008k cached
[nagios@nagiasp01 var]$ free -m
total used free shared buffers cached
Mem: 15950 2706 13244 13 94 1203
-/+ buffers/cache: 1408 14542
Swap: 2047 0 2047
Last edited by hsmith on Tue Feb 23, 2016 2:43 pm, edited 1 time in total.
Reason: We have downloaded the copy of your profile, and removed it from this post. We don't like to leave profiles sitting on the forum due to sensitive information possibly being inside of them.
Reason: We have downloaded the copy of your profile, and removed it from this post. We don't like to leave profiles sitting on the forum due to sensitive information possibly being inside of them.
Re: All checks have stopped. Not check since yesterday
Are you using LDAP to authenticate? I noticed this in your apache errors -
Code: Select all
[Tue Feb 23 13:11:56 2016] [error] [client 10.14.18.114] PHP Warning: ldap_bind(): Unable to bind to server: Invalid credentials in /usr/local/nagiosxi/html/includes/components/ldap_ad_integration/adLDAP/src/adLDAP.php on line 714, referer: http://nagiasp01.dcri.duke.net/nagiosxi/login.php?logout
Former Nagios Employee
-
krobertson71
- Posts: 444
- Joined: Tue Feb 11, 2014 10:16 pm
Re: All checks have stopped. Not check since yesterday
LDAP integration is fine. That is lastpass getting in the way when I navigate to it from time to time. Tries to throw in default nagiosadmin creds.
I just tested and everything is fine with LDAP and AD integration.
I just tested and everything is fine with LDAP and AD integration.
-
krobertson71
- Posts: 444
- Joined: Tue Feb 11, 2014 10:16 pm
Re: All checks have stopped. Not check since yesterday
we can turn this into a remote session if it will help. Just let me know soon.
Re: All checks have stopped. Not check since yesterday
If this is time-sensitive I would strongly urge you to open an email ticket. Email [email protected] with a link to this thread and a short description of the issue, and we'll see about getting a remote set up.
Former Nagios employee
-
krobertson71
- Posts: 444
- Joined: Tue Feb 11, 2014 10:16 pm
Re: All checks have stopped. Not check since yesterday
Found this as well.
The keys we generated are still there but we disabled using ssl. Curious if this could still be an issue.
Code: Select all
ndomod: Error writing to data sink! Some output may get lost...
Feb 23 13:57:41 nagiasp01 nagios: ndomod: Please check remote ndo2db log, database connection or SSL Parameters