Monitoring engine won't start after adding hosts to hostgrou
Monitoring engine won't start after adding hosts to hostgrou
I added approximately 260 hosts to a host group. Now the Monitoring Engine won't start. I verified the config files through the CCM and all I get are warnings, but no errors. I know this has previously been caused by a bad configuration when I've bulk added but this time everything was added through the GUI. Where can I find what's preventing it from starting?
Re: Monitoring engine won't start after adding hosts to host
I deactivated the Host Group, applied configuration changes, activated it, applied configuration changes, and now the engine has started.
Re: Monitoring engine won't start after adding hosts to host
If the issue is resolved, can we close this post?
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Monitoring engine won't start after adding hosts to host
The engine actually stopped again without making any changes.
Is there anywhere I can find out what is causing it to stop?
Is there anywhere I can find out what is causing it to stop?
Re: Monitoring engine won't start after adding hosts to host
I would start by looking at the nagios.log, and system messages:
Also, do you use mod_gearman or mk_livestatus?
Code: Select all
tail -50 /var/log/messages
tail -50 /usr/local/nagios/var/nagios.logFormer Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Monitoring engine won't start after adding hosts to host
I do not use those. I am unfamiliar with them, is it something you'd suggest? Currently I just use the System Status icons in the top right corner of XI.
Re: Monitoring engine won't start after adding hosts to host
Could you post the tails (in code wraps) requested in my previous post?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Monitoring engine won't start after adding hosts to host
It is currently running, however here are the logs.
Code: Select all
Jun 24 14:17:00 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:00 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:00 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:00 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:00 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:00 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:00 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:00 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:00 ip-10-222-2-32 nagios: SERVICE NOTIFICATION: serverteam;FFIPWPA1;Memory Usage;CRITICAL;xi_service_notification_handler;connect to address 10.40.203.22 and port 12489: Connection refused
Jun 24 14:17:00 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:00 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:00 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:00 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:00 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:00 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:00 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:00 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:00 ip-10-222-2-32 nagios: HOST ALERT: 172.16.20.20;DOWN;SOFT;1;CRITICAL - 172.16.20.20: Time to live exceeded in transit @ 172.16.19.157. rta nan, lost 100%
Jun 24 14:17:00 ip-10-222-2-32 nagios: SERVICE ALERT: FILESFENG01;Uptime;WARNING;SOFT;2;could not fetch information from server
Jun 24 14:17:00 ip-10-222-2-32 nagios: SERVICE ALERT: PTX-SFIDC02;Memory Usage;UNKNOWN;HARD;5;could not fetch information from server
Jun 24 14:17:01 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:01 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:01 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:01 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:01 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:01 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:01 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:01 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:01 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:01 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:01 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:01 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:01 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:01 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:01 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:01 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:01 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:01 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:01 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:01 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:02 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:02 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:02 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:02 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:02 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:02 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:02 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:02 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Jun 24 14:17:02 ip-10-222-2-32 ndo2db: Message sent to queue.
Jun 24 14:17:02 ip-10-222-2-32 ndo2db: Warning: queue send error, retrying...
Code: Select all
[1435169758] SERVICE NOTIFICATION: serverteam;FFIACMILPA1;CPU Usage;CRITICAL;xi_service_notification_handler;connect to address 10.30.10.22 and port 12489: Connection refused
[1435169758] SERVICE ALERT: SFPACCLISQ;Memory Usage;WARNING;HARD;5;could not fetch information from server
[1435169758] SERVICE NOTIFICATION: serverteam;SFPACCLISQ;Memory Usage;WARNING;xi_service_notification_handler;could not fetch information from server
[1435169758] SERVICE ALERT: PTX-CLINTON02;Uptime;WARNING;SOFT;2;could not fetch information from server
[1435169758] SERVICE NOTIFICATION: serverteam;FFIDCMARPA2;Drive C: Disk Usage;CRITICAL;xi_service_notification_handler;connect to address 10.51.253.50 and port 12489: Connection refused
[1435169761] HOST ALERT: 172.16.20.38;DOWN;SOFT;1;CRITICAL - 172.16.20.38: Time to live exceeded in transit @ 172.16.19.157. rta nan, lost 100%
[1435169771] SERVICE ALERT: DFSCLINTON01;CPU Usage;WARNING;SOFT;1;could not fetch information from server
[1435169771] SERVICE ALERT: PTX-SFIDC02;Uptime;WARNING;SOFT;2;could not fetch information from server
[1435169775] SERVICE ALERT: USSNFSPACPW01;Drive C: Disk Usage;UNKNOWN;HARD;5;Free disk space : Invalid drive
[1435169775] SERVICE ALERT: SFPACSNFSQ;CPU Usage;WARNING;SOFT;1;could not fetch information from server
[1435169777] HOST ALERT: 172.16.127.2;DOWN;SOFT;1;CRITICAL - 172.16.127.2: rta nan, lost 100%
[1435169777] HOST ALERT: 172.16.13.215;DOWN;SOFT;1;CRITICAL - 172.16.13.215: rta nan, lost 100%
[1435169777] SERVICE ALERT: SFPACSNFSQ;Uptime;WARNING;SOFT;2;could not fetch information from server
[1435169778] SERVICE NOTIFICATION: serverteam;FFIDCDENPA1;Memory Usage;CRITICAL;xi_service_notification_handler;connect to address 10.10.31.231 and port 12489: Connection refused
[1435169782] SERVICE ALERT: DCSNF01;CPU Usage;WARNING;SOFT;1;could not fetch information from server
[1435169787] SERVICE ALERT: SFPACCLIPA1;Drive C: Disk Usage;UNKNOWN;HARD;5;Free disk space : Invalid drive
[1435169792] HOST ALERT: 172.16.20.38;UP;SOFT;2;OK - 172.16.20.38: rta 31.003ms, lost 0%
[1435169792] HOST FLAPPING ALERT: 172.16.20.38;STARTED; Host appears to have started flapping (22.5% change > 20.0% threshold)
[1435169795] HOST ALERT: 10.35.48.11;DOWN;SOFT;1;CRITICAL - 10.35.48.11: rta nan, lost 100%
[1435169796] HOST ALERT: 10.35.48.11;DOWN;SOFT;1;CRITICAL - 10.35.48.11: rta nan, lost 100%
[1435169797] SERVICE NOTIFICATION: serverteam;FFIACCARPA3;Drive C: Disk Usage;CRITICAL;xi_service_notification_handler;connect to address 10.10.35.118 and port 12489: Connection refused
[1435169797] SERVICE NOTIFICATION: serverteam;FNAAFFITCM11;CPU Usage;CRITICAL;xi_service_notification_handler;connect to address 10.10.17.223 and port 12489: Connection refused
[1435169797] SERVICE NOTIFICATION: serverteam;FFIACLINPA1;Drive C: Disk Usage;CRITICAL;xi_service_notification_handler;connect to address 10.50.253.22 and port 12489: Connection refused
[1435169797] SERVICE NOTIFICATION: serverteam;FFIDCMARPA2;Memory Usage;CRITICAL;xi_service_notification_handler;connect to address 10.51.253.50 and port 12489: Connection refused
[1435169797] SERVICE ALERT: PTX-SFIDC02;Uptime;WARNING;SOFT;2;could not fetch information from server
[1435169799] SERVICE ALERT: NORSPCSS05;Memory Usage;WARNING;HARD;5;could not fetch information from server
[1435169800] SERVICE ALERT: FILESFENG01;Drive C: Disk Usage;WARNING;HARD;5;could not fetch information from server
[1435169805] SERVICE ALERT: SFPACSNFPA2;CPU Usage;OK;SOFT;2;CPU Load 0% (5 min average)
[1435169806] SERVICE ALERT: NORSPCSS05;Uptime;WARNING;SOFT;4;could not fetch information from server
[1435169807] SERVICE ALERT: USSNFSPACPW02;Drive C: Disk Usage;UNKNOWN;HARD;5;Free disk space : Invalid drive
[1435169807] SERVICE ALERT: SFPCECLIPA2;Drive C: Disk Usage;WARNING;HARD;5;could not fetch information from server
[1435169817] SERVICE ALERT: NORSPCSS05;Uptime;OK;SOFT;2;System Uptime - 0 day(s) 0 hour(s) 0 minute(s)
[1435169820] SERVICE NOTIFICATION: serverteam;FFIPWPA1;Memory Usage;CRITICAL;xi_service_notification_handler;connect to address 10.40.203.22 and port 12489: Connection refused
[1435169820] HOST ALERT: 172.16.20.20;DOWN;SOFT;1;CRITICAL - 172.16.20.20: Time to live exceeded in transit @ 172.16.19.157. rta nan, lost 100%
[1435169820] SERVICE ALERT: FILESFENG01;Uptime;WARNING;SOFT;2;could not fetch information from server
[1435169820] SERVICE ALERT: PTX-SFIDC02;Memory Usage;UNKNOWN;HARD;5;could not fetch information from server
[1435169828] Auto-save of retention data completed successfully.
[1435169830] HOST ALERT: 10.35.48.11;UP;HARD;1;OK - 10.35.48.11: rta 126.023ms, lost 0%
[1435169830] HOST NOTIFICATION: Matt Douglas;10.35.48.11;UP;xi_host_notification_handler;OK - 10.35.48.11: rta 126.023ms, lost 0%
[1435169830] HOST NOTIFICATION: Network Team;10.35.48.11;UP;xi_host_notification_handler;OK - 10.35.48.11: rta 126.023ms, lost 0%
[1435169835] SERVICE ALERT: SFPACSNFSQ;CPU Usage;WARNING;SOFT;2;could not fetch information from server
[1435169836] SERVICE ALERT: SFPACSNFSQ;Uptime;OK;SOFT;3;System Uptime - 0 day(s) 0 hour(s) 0 minute(s)
[1435169837] SERVICE NOTIFICATION: serverteam;FFIILPA1;CPU Usage;CRITICAL;xi_service_notification_handler;connect to address 10.10.34.226 and port 12489: Connection refused
[1435169837] SERVICE NOTIFICATION: serverteam;Kansas City Domain Controller;Drive C: Disk Usage;CRITICAL;xi_service_notification_handler;connect to address 10.8.16.42 and port 12489: Connection refused
[1435169837] SERVICE ALERT: SFPCECLIPA1;CPU Usage;WARNING;SOFT;2;could not fetch information from server
[1435169837] SERVICE NOTIFICATION: serverteam;FFIACCDCPA1;Memory Usage;CRITICAL;xi_service_notification_handler;connect to address 10.40.253.32 and port 12489: Connection refused
[1435169837] SERVICE NOTIFICATION: serverteam;FNAAFFITCM06;Memory Usage;CRITICAL;xi_service_notification_handler;connect to address 10.30.253.223 and port 12489: Connection refused
[1435169840] SERVICE ALERT: DCSNF01;CPU Usage;WARNING;SOFT;2;could not fetch information from server
[1435169850] SERVICE ALERT: USSNFSPACPW02;CPU Usage;WARNING;SOFT;1;could not fetch information from server
[1435169850] SERVICE ALERT: DFSCLINTON01;Uptime;OK;SOFT;2;System Uptime - 0 day(s) 0 hour(s) 0 minute(s)
Re: Monitoring engine won't start after adding hosts to host
It has now stopped.
Code: Select all
tail -50 /usr/local/nagios/var/nagios.log
[1435172822] HOST NOTIFICATION: serverteam;FNAFFITCM02;UP;xi_host_notification_handler;OK - 10.43.253.223: rta 149.202ms, lost 0%
[1435172823] HOST ALERT: 172.16.100.247;DOWN;SOFT;1;CRITICAL - 172.16.100.247: rta nan, lost 100%
[1435172823] SERVICE ALERT: RSVIEWK201;Uptime;OK;SOFT;2;System Uptime - 0 day(s) 0 hour(s) 0 minute(s)
[1435172824] HOST ALERT: FNAFFITCM02;UP;HARD;1;OK - 10.43.253.223: rta 165.258ms, lost 0%
[1435172824] HOST NOTIFICATION: Matt Douglas;FNAFFITCM02;UP;xi_host_notification_handler;OK - 10.43.253.223: rta 165.258ms, lost 0%
[1435172824] HOST NOTIFICATION: serverteam;FNAFFITCM02;UP;xi_host_notification_handler;OK - 10.43.253.223: rta 165.258ms, lost 0%
[1435172825] SERVICE ALERT: SFPACSNFSQ;Uptime;WARNING;SOFT;1;could not fetch information from server
[1435172827] SERVICE ALERT: DCSNF01;CPU Usage;WARNING;SOFT;1;could not fetch information from server
[1435172831] SERVICE FLAPPING ALERT: SFPACCLIPA2;Drive C: Disk Usage;STOPPED; Service appears to have stopped flapping (3.8% change < 5.0% threshold)
[1435172831] SERVICE ALERT: SFPACSNFSQ;Drive C: Disk Usage;UNKNOWN;HARD;5;Free disk space : Invalid drive
[1435172832] HOST ALERT: 10.43.254.16;UP;HARD;1;OK - 10.43.254.16: rta 163.490ms, lost 0%
[1435172832] HOST NOTIFICATION: Matt Douglas;10.43.254.16;UP;xi_host_notification_handler;OK - 10.43.254.16: rta 163.490ms, lost 0%
[1435172832] HOST NOTIFICATION: Network Team;10.43.254.16;UP;xi_host_notification_handler;OK - 10.43.254.16: rta 163.490ms, lost 0%
[1435172836] SERVICE ALERT: USSNFSPACPW01;Drive C: Disk Usage;UNKNOWN;HARD;5;Free disk space : Invalid drive
[1435172839] HOST ALERT: 10.35.48.11;DOWN;SOFT;2;CRITICAL - 10.35.48.11: rta nan, lost 100%
[1435172841] HOST ALERT: 172.16.100.247;DOWN;SOFT;1;CRITICAL - 172.16.100.247: rta nan, lost 100%
[1435172843] HOST ALERT: 10.35.32.10;UP;SOFT;2;OK - 10.35.32.10: rta 34.372ms, lost 20%
[1435172844] SERVICE ALERT: PTX-SFIDC02;Uptime;WARNING;SOFT;1;could not fetch information from server
[1435172846] SERVICE ALERT: PTX-SFIDC02;CPU Usage;WARNING;SOFT;1;could not fetch information from server
[1435172847] SERVICE ALERT: USSNFSPACPW01;Memory Usage;WARNING;HARD;5;could not fetch information from server
[1435172848] SERVICE ALERT: USSNFSPACPW01;Uptime;WARNING;SOFT;1;could not fetch information from server
[1435172848] SERVICE ALERT: FILESFENG01;CPU Usage;WARNING;SOFT;1;could not fetch information from server
[1435172848] SERVICE ALERT: DCSNF01;Memory Usage;WARNING;HARD;5;could not fetch information from server
[1435172848] SERVICE ALERT: USWILSPACPW01;Proc: Sql Server Buff Hit;CRITICAL;SOFT;1;1
[1435172849] HOST ALERT: 10.35.48.11;DOWN;SOFT;2;CRITICAL - 10.35.48.11: rta nan, lost 100%
[1435172851] HOST ALERT: ARNPMSFDC01;UP;SOFT;2;PING OK - Packet loss = 16%, RTA = 182.69 ms
[1435172860] SERVICE ALERT: PTX-SFIDC02;Drive C: Disk Usage;WARNING;HARD;5;could not fetch information from server
[1435172860] HOST ALERT: 10.35.48.11;DOWN;SOFT;2;CRITICAL - 10.35.48.11: rta nan, lost 100%
[1435172861] SERVICE ALERT: FILEK201;Uptime;OK;SOFT;2;System Uptime - 0 day(s) 0 hour(s) 0 minute(s)
[1435172861] HOST ALERT: 172.16.100.247;DOWN;SOFT;2;CRITICAL - 172.16.100.247: rta nan, lost 100%
[1435172865] SERVICE ALERT: PTX-SFIDC02;Drive C: Disk Usage;WARNING;HARD;5;could not fetch information from server
[1435172866] SERVICE ALERT: SFPACSNFSQ;Drive C: Disk Usage;WARNING;HARD;5;could not fetch information from server
[1435172866] SERVICE ALERT: USSNFSPACPW02;Drive C: Disk Usage;WARNING;HARD;5;could not fetch information from server
[1435172869] SERVICE ALERT: NORSPCSS05;CPU Usage;WARNING;SOFT;2;could not fetch information from server
[1435172877] HOST ALERT: 10.35.47.65;DOWN;SOFT;1;CRITICAL - Network Unreachable (10.35.47.65)
[1435172877] SERVICE ALERT: USSNFSPACPW02;Uptime;WARNING;SOFT;1;could not fetch information from server
[1435172880] HOST ALERT: 10.35.32.10;UP;SOFT;2;OK - 10.35.32.10: rta 43.767ms, lost 0%
[1435172881] SERVICE ALERT: SFPACSNFSQ;Memory Usage;WARNING;HARD;5;could not fetch information from server
[1435172888] HOST ALERT: 10.35.32.10;UP;SOFT;2;OK - 10.35.32.10: rta 29.753ms, lost 0%
[1435172890] SERVICE ALERT: USSNFSPACPW01;Drive C: Disk Usage;UNKNOWN;HARD;5;Free disk space : Invalid drive
[1435172890] SERVICE ALERT: NORSPCSS05;Drive C: Disk Usage;WARNING;HARD;5;could not fetch information from server
[1435172894] SERVICE ALERT: PTX-SFIDC02;Drive C: Disk Usage;UNKNOWN;HARD;5;Free disk space : Invalid drive
[1435172894] SERVICE NOTIFICATION: serverteam;FFIACCDCPA2;Drive C: Disk Usage;CRITICAL;xi_service_notification_handler;connect to address 10.40.253.33 and port 12489: Connection refused
[1435172896] SERVICE ALERT: DCSNF01;Uptime;WARNING;SOFT;1;could not fetch information from server
[1435172899] SERVICE ALERT: DCSNF01;Drive C: Disk Usage;WARNING;HARD;5;could not fetch information from server
[1435172905] SERVICE ALERT: DCSNF01;Memory Usage;WARNING;HARD;5;could not fetch information from server
[1435172909] SERVICE ALERT: FILESFENG01;Uptime;WARNING;SOFT;1;could not fetch information from server
[1435172911] HOST ALERT: 172.16.20.72;DOWN;SOFT;1;CRITICAL - 172.16.20.72: rta 37.824ms, lost 50%
[1435172912] HOST ALERT: 172.16.20.42;DOWN;SOFT;1;CRITICAL - 172.16.20.42: Time to live exceeded in transit @ 172.16.19.157. rta nan, lost 100%
[1435172914] HOST ALERT: ARNPMSFDC01;UP;SOFT;2;PING WARNING - System call sent warnings to stderr Packet loss = 16%, RTA = 265.97 ms
Re: Monitoring engine won't start after adding hosts to host
This issue could be related to ulimits. Please run the following on your CLI:
And check the following post:
http://support.nagios.com/wiki/index.ph ... 3.x_Issues
Code: Select all
ulimit -ahttp://support.nagios.com/wiki/index.ph ... 3.x_Issues