Nagios XI host check orphaned and duplicate nagios process

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios XI host check orphaned and duplicate nagios proce

Post by tgriep »

The next time you do any changes on the server, run the tail command before hand, apply the configuration and send in the output of the tail. Hopefully is will help in debugging this.

Code: Select all

tail -f /usr/local/nagiosxi/var/cmdsubsys.log
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
emartine
Posts: 660
Joined: Thu Dec 29, 2011 10:47 am

Re: Nagios XI host check orphaned and duplicate nagios proce

Post by emartine »

This is almost impossible to catch but it happens once in a while. It happened again this morning but I was not in the office when it occurred.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios XI host check orphaned and duplicate nagios proce

Post by tgriep »

Can you check the nagios log file for any clues for this issue?

Code: Select all

/usr/local/nagios/var/nagios.log
Could you upload this file so we can review it?

Code: Select all

/etc/init.d/nagios
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
emartine
Posts: 660
Joined: Thu Dec 29, 2011 10:47 am

Re: Nagios XI host check orphaned and duplicate nagios proce

Post by emartine »

in the nagios.log file I see the following

Code: Select all

[1439298219] Caught SIGTERM, shutting down...
[1439298220] Successfully shutdown... (PID=54454)
[1439298220] Event broker module 'NERD' deinitialized successfully.
[1439298220] Event broker module '/usr/lib64/mod_gearman/mod_gearman.o' deinitialized successfully.
[1439298220] ndomod: Shutdown complete.
[1439298220] Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
[1439298221] Nagios 4.0.8 starting... (PID=56317)
[1439298221] Local time is Tue Aug 11 08:03:41 CDT 2015
[1439298221] LOG VERSION: 2.0
[1439298221] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1439298221] qh: core query handler registered
[1439298221] nerd: Channel hostchecks registered successfully
[1439298221] nerd: Channel servicechecks registered successfully
[1439298221] nerd: Channel opathchecks registered successfully
[1439298221] nerd: Fully initialized and ready to rock!
[1439298221] wproc: Successfully registered manager as @wproc with query handler
[1439298221] wproc: Registry request: name=Core Worker 56319;pid=56319
[1439298221] wproc: Registry request: name=Core Worker 56320;pid=56320
[1439298221] wproc: Registry request: name=Core Worker 56321;pid=56321
[1439298221] wproc: Registry request: name=Core Worker 56322;pid=56322
[1439298221] wproc: Registry request: name=Core Worker 56324;pid=56324
[1439298221] wproc: Registry request: name=Core Worker 56323;pid=56323
[1439298221] wproc: Registry request: name=Core Worker 56327;pid=56327
[1439298221] wproc: Registry request: name=Core Worker 56326;pid=56326
[1439298221] wproc: Registry request: name=Core Worker 56325;pid=56325
[1439298221] wproc: Registry request: name=Core Worker 56367;pid=56367
[1439298221] wproc: Registry request: name=Core Worker 56332;pid=56332
[1439298221] wproc: Registry request: name=Core Worker 56331;pid=56331
[1439298221] wproc: Registry request: name=Core Worker 56334;pid=56334
[1439298221] wproc: Registry request: name=Core Worker 56333;pid=56333
[1439298221] wproc: Registry request: name=Core Worker 56336;pid=56336
[1439298221] wproc: Registry request: name=Core Worker 56337;pid=56337
[1439298221] wproc: Registry request: name=Core Worker 56338;pid=56338
[1439298221] wproc: Registry request: name=Core Worker 56335;pid=56335
[1439298221] wproc: Registry request: name=Core Worker 56340;pid=56340
[1439298221] wproc: Registry request: name=Core Worker 56328;pid=56328
[1439298221] wproc: Registry request: name=Core Worker 56339;pid=56339
[1439298221] wproc: Registry request: name=Core Worker 56342;pid=56342
[1439298221] wproc: Registry request: name=Core Worker 56343;pid=56343
[1439298221] wproc: Registry request: name=Core Worker 56341;pid=56341
[1439298221] wproc: Registry request: name=Core Worker 56345;pid=56345
[1439298221] wproc: Registry request: name=Core Worker 56344;pid=56344
[1439298221] wproc: Registry request: name=Core Worker 56348;pid=56348
[1439298221] wproc: Registry request: name=Core Worker 56346;pid=56346
[1439298221] wproc: Registry request: name=Core Worker 56347;pid=56347
[1439298221] wproc: Registry request: name=Core Worker 56349;pid=56349
[1439298221] wproc: Registry request: name=Core Worker 56350;pid=56350
[1439298221] wproc: Registry request: name=Core Worker 56351;pid=56351
[1439298221] wproc: Registry request: name=Core Worker 56352;pid=56352
[1439298221] wproc: Registry request: name=Core Worker 56353;pid=56353
[1439298221] wproc: Registry request: name=Core Worker 56354;pid=56354
[1439298221] wproc: Registry request: name=Core Worker 56355;pid=56355
[1439298221] wproc: Registry request: name=Core Worker 56356;pid=56356
[1439298221] wproc: Registry request: name=Core Worker 56357;pid=56357
[1439298221] wproc: Registry request: name=Core Worker 56358;pid=56358
[1439298221] wproc: Registry request: name=Core Worker 56359;pid=56359
[1439298221] wproc: Registry request: name=Core Worker 56360;pid=56360
[1439298221] wproc: Registry request: name=Core Worker 56361;pid=56361
[1439298221] wproc: Registry request: name=Core Worker 56362;pid=56362
[1439298221] wproc: Registry request: name=Core Worker 56363;pid=56363
[1439298221] wproc: Registry request: name=Core Worker 56364;pid=56364
[1439298221] wproc: Registry request: name=Core Worker 56365;pid=56365
[1439298221] wproc: Registry request: name=Core Worker 56330;pid=56330
[1439298221] wproc: Registry request: name=Core Worker 56368;pid=56368
[1439298221] mod_gearman: initialized version 1.5.0b1 (libgearman 1.1.8)
[1439298221] Event broker module '/usr/lib64/mod_gearman/mod_gearman.o' initialized successfully.
[1439298221] ndomod: NDOMOD 2.0.0 (02-28-2014) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1439298221] ndomod: Successfully connected to data sink.  0 queued items to flush.
[1439298221] ndomod registered for process data
[1439298221] ndomod registered for log data'
[1439298221] ndomod registered for system command data'
[1439298221] ndomod registered for event handler data'
[1439298221] ndomod registered for notification data'
[1439298221] ndomod registered for comment data'
[1439298221] ndomod registered for downtime data'
[1439298221] ndomod registered for flapping data'
[1439298221] ndomod registered for program status data'
[1439298221] ndomod registered for host status data'
[1439298221] ndomod registered for service status data'
[1439298221] ndomod registered for adaptive program data'
[1439298221] ndomod registered for adaptive host data'
[1439298221] ndomod registered for adaptive service data'
[1439298221] ndomod registered for external command data'
[1439298221] ndomod registered for aggregated status data'
[1439298221] ndomod registered for retention data'
[1439298221] ndomod registered for contact data'
[1439298221] ndomod registered for contact notification data'
[1439298221] ndomod registered for acknowledgement data'
[1439298221] ndomod registered for state change data'
[1439298221] ndomod registered for contact status data'
[1439298221] ndomod registered for adaptive contact data'
[1439298221] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[1439298221] ndomod: NDOMOD 2.0.0 (02-28-2014) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1439298221] ndomod: Successfully connected to data sink.  0 queued items to flush.
[1439298221] ndomod registered for process data
[1439298221] ndomod registered for log data'
[1439298221] ndomod registered for system command data'
[1439298221] ndomod registered for event handler data'
[1439298221] ndomod registered for notification data'
[1439298221] ndomod registered for comment data'
[1439298221] ndomod registered for downtime data'
[1439298221] ndomod registered for flapping data'
[1439298221] ndomod registered for program status data'
[1439298221] ndomod registered for host status data'
[1439298221] ndomod registered for service status data'
[1439298221] ndomod registered for adaptive program data'
[1439298221] ndomod registered for adaptive host data'
[1439298221] ndomod registered for adaptive service data'
[1439298221] ndomod registered for external command data'
[1439298221] ndomod registered for aggregated status data'
[1439298221] ndomod registered for retention data'
[1439298221] ndomod registered for contact data'
[1439298221] ndomod registered for contact notification data'
[1439298221] ndomod registered for acknowledgement data'
[1439298221] ndomod registered for state change data'
[1439298221] ndomod registered for contact status data'
[1439298221] ndomod registered for adaptive contact data'
[1439298221] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[1439298224] Successfully launched command file worker with pid 56459
[1439298224] HOST DOWNTIME ALERT: console1;STARTED; Host has entered a period of scheduled downtime
[1439298263] SERVICE ALERT: widowsserver;CPU Load;OK;SOFT;2;CPU Load 1% (80 min average) 1% (180 min average) 19% (1440 min average)
[1439298263] GLOBAL SERVICE EVENT HANDLER: windowsserver;CPU Load;OK;SOFT;2;xi_service_event_handler
Then I see a lot of these for different services

Code: Select all

[1439298283] Warning: The results of service 'CPU Load' on host 'windowshost' are stale by 0d 0h 0m 43s (threshold=0d 0h 3m 20s).  I'm forcing an immediate check of the service.
After all of the service warning alerts regarding stale results...

Code: Select all

[1439298362] SERVICE ALERT: printer;Ping;CRITICAL;HARD;3;CRITICAL - 10.68.34.46: rta nan, lost 100%
[1439298362] GLOBAL SERVICE EVENT HANDLER: printer;Ping;CRITICAL;HARD;3;xi_service_event_handler
[1439298383] SERVICE ALERT: anotherprinter;Ping;CRITICAL;HARD;3;CRITICAL - 10.165.66.228: rta nan, lost 100%
[1439298383] GLOBAL SERVICE EVENT HANDLER: anotherprinter;Ping;CRITICAL;HARD;3;xi_service_event_handler
[1439298443] SERVICE ALERT: windowsserver;CPU Load;CRITICAL;SOFT;1;CPU Load 1% (80 min average) 99% (180 min average) 31% (1440 min average)
[1439298443] GLOBAL SERVICE EVENT HANDLER: windowsserver;CPU Load;CRITICAL;SOFT;1;xi_service_event_handler
[1439298493] SERVICE ALERT: wuindowsserver;CPU Load;CRITICAL;SOFT;2;CPU Load 98% (80 min average) 99% (180 min average) 35% (1440 min average)
[1439298493] GLOBAL SERVICE EVENT HANDLER: windowsserver;CPU Load;CRITICAL;SOFT;2;xi_service_event_handler
[1439298523] Nagios 4.0.8 starting... (PID=58051)
[1439298523] Local time is Tue Aug 11 08:08:43 CDT 2015
[1439298523] LOG VERSION: 2.0
[1439298523] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1439298523] qh: core query handler registered
[1439298523] nerd: Channel hostchecks registered successfully
[1439298523] nerd: Channel servicechecks registered successfully
[1439298523] nerd: Channel opathchecks registered successfully
[1439298523] nerd: Fully initialized and ready to rock!
[1439298523] wproc: Successfully registered manager as @wproc with query handler
[1439298523] wproc: Registry request: name=Core Worker 58053;pid=58053
[1439298523] wproc: Registry request: name=Core Worker 58055;pid=58055
[1439298523] wproc: Registry request: name=Core Worker 58054;pid=58054
[1439298523] wproc: Registry request: name=Core Worker 58056;pid=58056
[1439298523] wproc: Registry request: name=Core Worker 58057;pid=58057
[1439298523] wproc: Registry request: name=Core Worker 58091;pid=58091
[1439298523] wproc: Registry request: name=Core Worker 58099;pid=58099
[1439298523] wproc: Registry request: name=Core Worker 58062;pid=58062
[1439298523] wproc: Registry request: name=Core Worker 58063;pid=58063
[1439298523] wproc: Registry request: name=Core Worker 58064;pid=58064
[1439298523] wproc: Registry request: name=Core Worker 58096;pid=58096
[1439298523] wproc: Registry request: name=Core Worker 58066;pid=58066
[1439298523] wproc: Registry request: name=Core Worker 58067;pid=58067
[1439298523] wproc: Registry request: name=Core Worker 58060;pid=58060
[1439298523] wproc: Registry request: name=Core Worker 58069;pid=58069
[1439298523] wproc: Registry request: name=Core Worker 58068;pid=58068
[1439298523] wproc: Registry request: name=Core Worker 58070;pid=58070
[1439298523] wproc: Registry request: name=Core Worker 58071;pid=58071
[1439298523] wproc: Registry request: name=Core Worker 58072;pid=58072
[1439298523] wproc: Registry request: name=Core Worker 58073;pid=58073
[1439298523] wproc: Registry request: name=Core Worker 58074;pid=58074
[1439298523] wproc: Registry request: name=Core Worker 58075;pid=58075
[1439298523] wproc: Registry request: name=Core Worker 58076;pid=58076
[1439298523] wproc: Registry request: name=Core Worker 58077;pid=58077
[1439298523] wproc: Registry request: name=Core Worker 58078;pid=58078
[1439298523] wproc: Registry request: name=Core Worker 58079;pid=58079
[1439298523] wproc: Registry request: name=Core Worker 58081;pid=58081
[1439298523] wproc: Registry request: name=Core Worker 58082;pid=58082
[1439298523] wproc: Registry request: name=Core Worker 58083;pid=58083
[1439298523] wproc: Registry request: name=Core Worker 58080;pid=58080
[1439298523] wproc: Registry request: name=Core Worker 58087;pid=58087
[1439298523] wproc: Registry request: name=Core Worker 58088;pid=58088
[1439298523] wproc: Registry request: name=Core Worker 58084;pid=58084
[1439298523] wproc: Registry request: name=Core Worker 58089;pid=58089
[1439298523] wproc: Registry request: name=Core Worker 58086;pid=58086
[1439298523] wproc: Registry request: name=Core Worker 58092;pid=58092
[1439298523] wproc: Registry request: name=Core Worker 58085;pid=58085
[1439298523] wproc: Registry request: name=Core Worker 58093;pid=58093
[1439298523] wproc: Registry request: name=Core Worker 58094;pid=58094
[1439298523] wproc: Registry request: name=Core Worker 58095;pid=58095
[1439298523] wproc: Registry request: name=Core Worker 58090;pid=58090
[1439298523] wproc: Registry request: name=Core Worker 58058;pid=58058
[1439298523] wproc: Registry request: name=Core Worker 58097;pid=58097
[1439298523] wproc: Registry request: name=Core Worker 58061;pid=58061
[1439298523] wproc: Registry request: name=Core Worker 58100;pid=58100
[1439298523] wproc: Registry request: name=Core Worker 58101;pid=58101
[1439298523] wproc: Registry request: name=Core Worker 58065;pid=58065
[1439298523] wproc: Registry request: name=Core Worker 58103;pid=58103
[1439298523] mod_gearman: initialized version 1.5.0b1 (libgearman 1.1.8)
[1439298523] Event broker module '/usr/lib64/mod_gearman/mod_gearman.o' initialized successfully.
[1439298523] ndomod: NDOMOD 2.0.0 (02-28-2014) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1439298523] ndomod: Successfully connected to data sink.  0 queued items to flush.
[1439298523] ndomod registered for process data
[1439298523] ndomod registered for log data'
[1439298523] ndomod registered for system command data'
[1439298523] ndomod registered for event handler data'
[1439298523] ndomod registered for notification data'
[1439298523] ndomod registered for comment data'
[1439298523] ndomod registered for downtime data'
[1439298523] ndomod registered for flapping data'
[1439298523] ndomod registered for program status data'
[1439298523] ndomod registered for host status data'
[1439298523] ndomod registered for service status data'
[1439298523] ndomod registered for adaptive program data'
[1439298523] ndomod registered for adaptive host data'
[1439298523] ndomod registered for adaptive service data'
[1439298523] ndomod registered for external command data'
[1439298523] ndomod registered for aggregated status data'
[1439298523] ndomod registered for retention data'
[1439298523] ndomod registered for contact data'
[1439298523] ndomod registered for contact notification data'
[1439298523] ndomod registered for acknowledgement data'
[1439298523] ndomod registered for state change data'
[1439298523] ndomod registered for contact status data'
[1439298523] ndomod registered for adaptive contact data'
[1439298523] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[1439298525] Successfully launched command file worker with pid 58152
Then I see:

Code: Select all

[1439298585] Warning: The results of service 'CPU Load' on host 'windows host' are stale by 0d 0h 1m 51s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1439299065] Warning: The results of service 'Dig linux to dns server' on host 'linuxserver' are stale by 0d 0h 0m 40s (threshold=0d 0h 3m 20s).  I'm forcing an immediate check of the service.
[1439299185] Warning: The check of host 'windowsshost' looks like it was orphaned (results never came back).  I'm scheduling an immediate check of the host...
[1439299185] Warning: The check of host 'anotherwindowshost' looks like it was orphaned (results never came back).  I'm scheduling an immediate check of the host...
Then I begin seeing alot of these for different server these:

Code: Select all

[1439299195] HOST ALERT: yetanotherwindowshost;DOWN;SOFT;1;(host check orphaned, is the mod-gearman worker on queue 'host' running?)
[1439299195] GLOBAL HOST EVENT HANDLER: yet anotherwindowshost;DOWN;SOFT;1;xi_host_event_handler
Then I start seeing a lot of these for hosts and services:


Code: Select all

[1439299245] Warning: The check of host 'different windowshost' looks like it was orphaned (results never came back).  I'm scheduling an immediate check of the host...
[1439299245] Warning: The check of host 'different windowshostname' looks like it was orphaned (results never came back).  I'm scheduling an immediate check of the host...


[1439299245] Warning: The check of service 'Memory Usage' on host 'different windowshost' looks like it was orphaned (results never came back; last_check=1439298055; next_check=1439298545).  I'm scheduling an immediate check of the service...
[1439299245] Warning: The check of service 'Crystal Page Server' on host 'different windowshost'' looks like it was orphaned (results never came back; last_check=1439298201; next_check=1439298565).  I'm scheduling an immediate check of the service...

Then I start seeing a lot of these:

Code: Select all

[1439299255] GLOBAL HOST EVENT HANDLER: yetanotherhost;DOWN;SOFT;1;xi_host_event_handler
[1439299255] HOST ALERT: differenthost;DOWN;SOFT;1;(host check orphaned, is the mod-gearman worker on queue 'host' running?)

[1439299304] Warning: The results of service 'Current Load' on host 'anotherhost' are stale by 0d 0h 0m 59s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1439299304] Warning: The results of service 'Swap' on host 'anotherhost' are stale by 0d 0h 0m 59s (threshold=0d 0h 5m 0s).  I'm forcing an immediate check of the service.
[1439299855] GLOBAL HOST EVENT HANDLER: anotherhost;UP;SOFT;2;xi_host_event_handler
[1439299855] HOST ALERT: anotherhost;DOWN;SOFT;1;(host check orphaned, is the mod-gearman worker on queue 'host' running?)
[1439299855] GLOBAL HOST EVENT HANDLER: anotherhost;DOWN;SOFT;1;xi_host_event_handler
Then a lot of these:

Code: Select all

[1439299965] Warning: The check of host 'somehost' looks like it was orphaned (results never came back).  I'm scheduling an immediate check of the host...
[1439299965] Warning: The check of host 'somehost' looks like it was orphaned (results never came back).  I'm scheduling an immediate check of the host...
Then a lot of alerts these:

Code: Select all

[1439299975] HOST ALERT: anotherhost;DOWN;HARD;3;(host check orphaned, is the mod-gearman worker on queue 'host' running?)
[1439299975] HOST NOTIFICATION: someusers;somehost;DOWN;xi_host_notification_handler;(host check orphaned, is the mod-gearman worker on queue 'host' running?)
It continues with a lot of:

Code: Select all

[1439299305] Warning: The check of host 'somehost' looks like it was orphaned (results never came back).  I'm scheduling an immediate check of the host...
[1439299305] Warning: The check of service 'Memory Usage' on host 'somehost' looks like it was orphaned (results never came back; last_check=1439298197; next_check=1439298585).  I'm scheduling an immediate check of the service...
Then I start getting flapping alerts:

Code: Select all

[1439299995] HOST FLAPPING ALERT: somehost;STARTED; Host appears to have started flapping (23.8% change > 20.0% threshold)
[1439299995] HOST NOTIFICATION: someuser;somehost;FLAPPINGSTART (UP);xi_host_notification_handler;OK - 165.68.39.164: rta 0.221ms, lost 0%
.....


host and service results continue to never come back.....

Nagios was then killed and started by a coworker:

Code: Select all

[1439300841] Caught SIGTERM, shutting down...
[1439300842] Successfully shutdown... (PID=58051)
[1439300842] Event broker module 'NERD' deinitialized successfully.
[1439300842] Event broker module '/usr/lib64/mod_gearman/mod_gearman.o' deinitialized successfully.
[1439300842] ndomod: Shutdown complete.
[1439300842] Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
[1439300955] Nagios 4.0.8 starting... (PID=38881)
[1439300955] Local time is Tue Aug 11 08:49:15 CDT 2015
[1439300955] LOG VERSION: 2.0
[1439300955] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1439300955] qh: core query handler registered
[1439300955] nerd: Channel hostchecks registered successfully
[1439300955] nerd: Channel servicechecks registered successfully
[1439300955] nerd: Channel opathchecks registered successfully
[1439300955] nerd: Fully initialized and ready to rock!
[1439300955] wproc: Successfully registered manager as @wproc with query handler
[1439300955] wproc: Registry request: name=Core Worker 38883;pid=38883
[1439300955] wproc: Registry request: name=Core Worker 38885;pid=38885
[1439300955] wproc: Registry request: name=Core Worker 38886;pid=38886
[1439300955] wproc: Registry request: name=Core Worker 38884;pid=38884
[1439300955] wproc: Registry request: name=Core Worker 38890;pid=38890
[1439300955] wproc: Registry request: name=Core Worker 38926;pid=38926
[1439300955] wproc: Registry request: name=Core Worker 38887;pid=38887
[1439300955] wproc: Registry request: name=Core Worker 38892;pid=38892
[1439300955] wproc: Registry request: name=Core Worker 38888;pid=38888
[1439300955] wproc: Registry request: name=Core Worker 38894;pid=38894
[1439300955] wproc: Registry request: name=Core Worker 38895;pid=38895
[1439300955] wproc: Registry request: name=Core Worker 38893;pid=38893
[1439300955] wproc: Registry request: name=Core Worker 38896;pid=38896
[1439300955] wproc: Registry request: name=Core Worker 38897;pid=38897
[1439300955] wproc: Registry request: name=Core Worker 38898;pid=38898
[1439300955] wproc: Registry request: name=Core Worker 38899;pid=38899
[1439300955] wproc: Registry request: name=Core Worker 38900;pid=38900
[1439300955] wproc: Registry request: name=Core Worker 38902;pid=38902
[1439300955] wproc: Registry request: name=Core Worker 38901;pid=38901
[1439300955] wproc: Registry request: name=Core Worker 38903;pid=38903
[1439300955] wproc: Registry request: name=Core Worker 38904;pid=38904
[1439300955] wproc: Registry request: name=Core Worker 38905;pid=38905
[1439300955] wproc: Registry request: name=Core Worker 38906;pid=38906
[1439300955] wproc: Registry request: name=Core Worker 38907;pid=38907
[1439300955] wproc: Registry request: name=Core Worker 38909;pid=38909
[1439300955] wproc: Registry request: name=Core Worker 38908;pid=38908
[1439300955] wproc: Registry request: name=Core Worker 38910;pid=38910
[1439300955] wproc: Registry request: name=Core Worker 38911;pid=38911
[1439300955] wproc: Registry request: name=Core Worker 38912;pid=38912
[1439300955] wproc: Registry request: name=Core Worker 38913;pid=38913
[1439300955] wproc: Registry request: name=Core Worker 38915;pid=38915
[1439300955] wproc: Registry request: name=Core Worker 38914;pid=38914
[1439300955] wproc: Registry request: name=Core Worker 38916;pid=38916
[1439300955] wproc: Registry request: name=Core Worker 38917;pid=38917
[1439300955] wproc: Registry request: name=Core Worker 38918;pid=38918
[1439300955] wproc: Registry request: name=Core Worker 38919;pid=38919
[1439300955] wproc: Registry request: name=Core Worker 38920;pid=38920
[1439300955] wproc: Registry request: name=Core Worker 38921;pid=38921
[1439300955] wproc: Registry request: name=Core Worker 38922;pid=38922
[1439300955] wproc: Registry request: name=Core Worker 38923;pid=38923
[1439300955] wproc: Registry request: name=Core Worker 38924;pid=38924
[1439300955] wproc: Registry request: name=Core Worker 38925;pid=38925
[1439300955] wproc: Registry request: name=Core Worker 38891;pid=38891
[1439300955] wproc: Registry request: name=Core Worker 38927;pid=38927
[1439300955] wproc: Registry request: name=Core Worker 38928;pid=38928
[1439300955] wproc: Registry request: name=Core Worker 38929;pid=38929
[1439300955] wproc: Registry request: name=Core Worker 38930;pid=38930
[1439300955] wproc: Registry request: name=Core Worker 38931;pid=38931
[1439300955] mod_gearman: initialized version 1.5.0b1 (libgearman 1.1.8)
[1439300955] Event broker module '/usr/lib64/mod_gearman/mod_gearman.o' initialized successfully.
[1439300955] ndomod: NDOMOD 2.0.0 (02-28-2014) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1439300955] ndomod: Successfully connected to data sink.  0 queued items to flush.
[1439300955] ndomod registered for process data
[1439300955] ndomod registered for log data'
[1439300955] ndomod registered for system command data'
[1439300955] ndomod registered for event handler data'
[1439300955] ndomod registered for notification data'
[1439300955] ndomod registered for comment data'
[1439300955] ndomod registered for downtime data'
[1439300955] ndomod registered for flapping data'
[1439300955] ndomod registered for program status data'
[1439300955] ndomod registered for host status data'
[1439300955] ndomod registered for service status data'
[1439300955] ndomod registered for adaptive program data'
[1439300955] ndomod registered for adaptive host data'
[1439300955] ndomod registered for adaptive service data'
[1439300955] ndomod registered for external command data'
[1439300955] ndomod registered for aggregated status data'
[1439300955] ndomod registered for retention data'
[1439300955] ndomod registered for contact data'
[1439300955] ndomod registered for contact notification data'
[1439300955] ndomod registered for acknowledgement data'
[1439300955] ndomod registered for state change data'
[1439300955] ndomod registered for contact status data'
[1439300955] ndomod registered for adaptive contact data'
[1439300955] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[1439300957] Successfully launched command file worker with pid 38938
Everything went back to normal.
Last edited by tmcdonald on Tue Aug 11, 2015 2:24 pm, edited 1 time in total.
Reason: Please use [code][/code] tags around long output - https://support.nagios.com/forum/viewtopic.php?f=39&t=34138
User avatar
emartine
Posts: 660
Joined: Thu Dec 29, 2011 10:47 am

Re: Nagios XI host check orphaned and duplicate nagios proce

Post by emartine »

/etc/init.d/nagios is attached. I had to give it a .txt extension.. wasn't allowed to upload otherwise.
You do not have the required permissions to view the files attached to this post.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios XI host check orphaned and duplicate nagios proce

Post by tgriep »

On line 160 of the /etc/init.d/nagios file thange this from

Code: Select all

for i in 1 2 3 4 5 6 7 8 9 10 ; do
to

Code: Select all

for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30; do
See if that helps out on the duplicate nagios process issue.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
emartine
Posts: 660
Joined: Thu Dec 29, 2011 10:47 am

Re: Nagios XI host check orphaned and duplicate nagios proce

Post by emartine »

tgriep wrote:On line 160 of the /etc/init.d/nagios file thange this from

Code: Select all

for i in 1 2 3 4 5 6 7 8 9 10 ; do
to

Code: Select all

for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30; do
See if that helps out on the duplicate nagios process issue.


What is this supposed to do?
User avatar
emartine
Posts: 660
Joined: Thu Dec 29, 2011 10:47 am

Re: Nagios XI host check orphaned and duplicate nagios proce

Post by emartine »

After reading this configuration file... wouldn't it be better to kill nagios by name instead of id? Doing killall -9 nagios instead of kill -9 <pid>
Seems like it doesn't kill all nagios processes when using the pid.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Nagios XI host check orphaned and duplicate nagios proce

Post by lmiltchev »

The "killall -9 nagios" is not used as nagios wouldn't be stopped "cleanly".
SIGKILL
The SIGKILL signal is sent to a process to cause it to terminate immediately (kill). In contrast to SIGTERM and SIGINT, this signal cannot be caught or ignored, and the receiving process cannot perform any clean-up upon receiving this signal.
Did the mod suggested by tgriep help?
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
emartine
Posts: 660
Joined: Thu Dec 29, 2011 10:47 am

Re: Nagios XI host check orphaned and duplicate nagios proce

Post by emartine »

I'm not sure if it helped. I won't know until it happens again.
Locked