Nagios XI 5.5.3 and Mod_Gearman compatibility

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
salami
Posts: 30
Joined: Tue Jun 26, 2018 4:36 am

Nagios XI 5.5.3 and Mod_Gearman compatibility

Post by salami »

Hi

I have a Nagios XI 5.5.3 with more than 12K host so, I need to distribute my environment. I choose Mod_Gearman for this purpose. but because of compatibility issue between nagios core 4.4.2 and mod_gearman, I faced with error during mod_gearman installation procedure.
How can I downgrade nagios core from 4.4.2 to 4.2.4 in nagios XI 5.5.3?
I need to have new features in Nagios XI 5.5.3 such as user management ACL new features.
would you please help me in this regard?

thanks
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios XI 5.5.3 and Mod_Gearman compatibility

Post by scottwilkerson »

The following will downgrade core to 4.2.4

Code: Select all

cd /tmp
rm -rf nagiosxi xi*
wget https://assets.nagios.com/downloads/nagiosxi/5/xi-5.4.13.tar.gz
tar xzf xi*.tar.gz
cd /tmp/nagiosxi/subcomponents/nagioscore
./upgrade
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
salami
Posts: 30
Joined: Tue Jun 26, 2018 4:36 am

Re: Nagios XI 5.5.3 and Mod_Gearman compatibility

Post by salami »

Thanks for your reply.
I found this procedure in Nagios Support knowledgebase from the link below and test it but I faced with an Error about starting nagios service. seems Nagios looking for Nagios core 4.4.2 and cannot find it so it cannot start.

https://support.nagios.com/kb/article/n ... e-823.html

you can see the status detail of starting Nagios service as follow:

Code: Select all

systemctl status nagios.service

● nagios.service - Nagios Core 4.4.2
   Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2018-09-05 09:01:31 +0430; 56s ago
     Docs: https://www.nagios.org/documentation
  Process: 15923 ExecStopPost=/bin/rm -f /usr/local/nagios/var/rw/nagios.cmd (code=exited, status=0/SUCCESS)
  Process: 15920 ExecStop=/bin/kill -s TERM ${MAINPID} (code=exited, status=0/SUCCESS)
  Process: 16581 ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg (code=exited, status=254)
  Process: 16579 ExecStartPre=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
 Main PID: 8297 (code=exited, status=0/SUCCESS)

Sep 05 09:01:31 localhost.localdomain nagios[16579]: Checking global event handlers...
Sep 05 09:01:31 localhost.localdomain nagios[16579]: Checking obsessive compulsive processor commands...
Sep 05 09:01:31 localhost.localdomain nagios[16579]: Checking misc settings...
Sep 05 09:01:31 localhost.localdomain nagios[16579]: Total Warnings: 0
Sep 05 09:01:31 localhost.localdomain nagios[16579]: Total Errors:   0
Sep 05 09:01:31 localhost.localdomain nagios[16579]: Things look okay - No serious problems were detected during the pre-flight check
Sep 05 09:01:31 localhost.localdomain systemd[1]: nagios.service: control process exited, code=exited status=254
Sep 05 09:01:31 localhost.localdomain systemd[1]: Failed to start Nagios Core 4.4.2.
Sep 05 09:01:31 localhost.localdomain systemd[1]: Unit nagios.service entered failed state.
Sep 05 09:01:31 localhost.localdomain systemd[1]: nagios.service failed.
would you please let me know how can I recover this issue.

thanks
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios XI 5.5.3 and Mod_Gearman compatibility

Post by scottwilkerson »

Can you run the following

Code: Select all

 grep lock_file /usr/local/nagios/etc/nagios.cfg
ps -ef|grep nagios.cfg
ll /etc/init.d/nagios
tail -20 /usr/local/nagios/var/nagios.log
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
salami
Posts: 30
Joined: Tue Jun 26, 2018 4:36 am

Re: Nagios XI 5.5.3 and Mod_Gearman compatibility

Post by salami »

grep lock_file /usr/local/nagios/etc/nagios.cfg is as follow:

Code: Select all

lock_file=/var/run/nagios.lock
ps -ef|grep nagios.cfg | grep -v grep has no result

ll /etc/init.d/nagios is as follow:

Code: Select all

-rw-r--r-- 1 root root 7534 Sep  5 08:59 /etc/init.d/nagios
tail -20 /usr/local/nagios/var/nagios.log is as follow:

Code: Select all

[1536120982] ndomod registered for contact data'
[1536120982] ndomod registered for contact notification data'
[1536120982] ndomod registered for acknowledgement data'
[1536120982] ndomod registered for state change data'
[1536120982] ndomod registered for contact status data'
[1536120982] ndomod registered for adaptive contact data'
[1536120982] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[1536120982] Successfully launched command file worker with pid 8322
[1536121752] Caught SIGTERM, shutting down...
[1536121752] Caught SIGTERM, shutting down...
[1536121752] Caught SIGTERM, shutting down...
[1536121752] Successfully shutdown... (PID=8297)
[1536121752] ndomod: Shutdown complete.
[1536121752] Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
[1536121752] Failed to obtain lock on file /var/run/nagios.lock: Permission denied
[1536121752] Bailing out due to errors encountered while attempting to daemonize... (PID=15927)
[1536121891] Failed to obtain lock on file /var/run/nagios.lock: Permission denied
[1536121891] Bailing out due to errors encountered while attempting to daemonize... (PID=16581)
[1536123550] Failed to obtain lock on file /var/run/nagios.lock: Permission denied
[1536123550] Bailing out due to errors encountered while attempting to daemonize... (PID=24313)

thanks
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios XI 5.5.3 and Mod_Gearman compatibility

Post by scottwilkerson »

edit /usr/local/nagios/etc/nagios.cfg
change

Code: Select all

lock_file=/var/run/nagios.lock
to

Code: Select all

lock_file=/usr/local/nagios/var/nagios.lock
then

Code: Select all

rm -f /usr/local/nagios/var/nagios.lock /var/run/nagios.lock
service nagios start
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
salami
Posts: 30
Joined: Tue Jun 26, 2018 4:36 am

Re: Nagios XI 5.5.3 and Mod_Gearman compatibility

Post by salami »

I do the procedure but when I start nagios daemon I faced with permission denied error so I run the following command and after that, nagios daemon has been started successfully.

Code: Select all

chmod +x /etc/init.d/nagios
Also, I run mod-gearman script and no error detected during installation after downgrading nagios core to 4.2.4, but after installing mod-gearman, nagios daemon will shutdown automatically. when I restart it, it was started for a short time (1 or 2 seconds) and then it was stopped again.

this is the result of nagios.log during starting nagios daemon:

Code: Select all

[1536387066] Nagios 4.2.4 starting... (PID=27262)
[1536387066] Local time is Sat Sep 08 10:41:06 +0430 2018
[1536387066] LOG VERSION: 2.0
[1536387066] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1536387066] qh: core query handler registered
[1536387066] nerd: Channel hostchecks registered successfully
[1536387066] nerd: Channel servicechecks registered successfully
[1536387066] nerd: Channel opathchecks registered successfully
[1536387066] nerd: Fully initialized and ready to rock!
[1536387066] wproc: Successfully registered manager as @wproc with query handler
[1536387066] wproc: Registry request: name=Core Worker 27263;pid=27263
[1536387066] wproc: Registry request: name=Core Worker 27264;pid=27264
[1536387066] wproc: Registry request: name=Core Worker 27265;pid=27265
[1536387066] wproc: Registry request: name=Core Worker 27266;pid=27266
[1536387066] wproc: Registry request: name=Core Worker 27267;pid=27267
[1536387066] wproc: Registry request: name=Core Worker 27269;pid=27269
[1536387066] wproc: Registry request: name=Core Worker 27270;pid=27270
[1536387066] wproc: Registry request: name=Core Worker 27272;pid=27272
[1536387066] wproc: Registry request: name=Core Worker 27268;pid=27268
[1536387066] wproc: Registry request: name=Core Worker 27273;pid=27273
[1536387066] wproc: Registry request: name=Core Worker 27274;pid=27274
[1536387066] wproc: Registry request: name=Core Worker 27271;pid=27271
[1536387066] wproc: Registry request: name=Core Worker 27275;pid=27275
[1536387066] wproc: Registry request: name=Core Worker 27276;pid=27276
[1536387066] wproc: Registry request: name=Core Worker 27277;pid=27277
[1536387066] mod_gearman: initialized version 2.1.1 (libgearman 0.33)
[1536387066] Event broker module '/usr/lib64/mod_gearman2/mod_gearman2.o' initialized successfully.
[1536387066] ndomod: NDOMOD 2.1.2 (11-14-2016) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1536387066] ndomod: Successfully connected to data sink.  0 queued items to flush.
[1536387066] ndomod registered for process data
[1536387066] ndomod registered for log data'
[1536387066] ndomod registered for system command data'
[1536387066] ndomod registered for event handler data'
[1536387066] ndomod registered for notification data'
[1536387066] ndomod registered for comment data'
[1536387066] ndomod registered for downtime data'
[1536387066] ndomod registered for flapping data'
[1536387066] ndomod registered for program status data'
[1536387066] ndomod registered for host status data'
[1536387066] ndomod registered for service status data'
[1536387066] ndomod registered for adaptive program data'
[1536387066] ndomod registered for adaptive host data'
[1536387066] ndomod registered for adaptive service data'
[1536387066] ndomod registered for external command data'
[1536387066] ndomod registered for aggregated status data'
[1536387066] ndomod registered for retention data'
[1536387066] ndomod registered for contact data'
[1536387066] ndomod registered for contact notification data'
[1536387066] ndomod registered for acknowledgement data'
[1536387066] ndomod registered for state change data'
[1536387066] ndomod registered for contact status data'
[1536387066] ndomod registered for adaptive contact data'
[1536387066] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[1536387070] Successfully launched command file worker with pid 27285
[1536387070] Caught SIGSEGV, shutting down...
[1536387070] Caught SIGTERM, shutting down...
Also, you can see the status result after trying to start nagios daemon:

Code: Select all

● nagios.service - Nagios Core 4.4.2
   Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Sat 2018-09-08 11:08:27 +0430; 724ms ago
     Docs: https://www.nagios.org/documentation
  Process: 8119 ExecStopPost=/bin/rm -f /usr/local/nagios/var/rw/nagios.cmd (code=exited, status=0/SUCCESS)
  Process: 8116 ExecStop=/bin/kill -s TERM ${MAINPID} (code=exited, status=1/FAILURE)
  Process: 8081 ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
  Process: 8035 ExecStartPre=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
 Main PID: 8083 (code=exited, status=254)

Sep 08 11:08:23 localhost.localdomain nagios[8083]: ndomod registered for adaptive contact data'
Sep 08 11:08:23 localhost.localdomain nagios[8083]: Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
Sep 08 11:08:27 localhost.localdomain nagios[8083]: Successfully launched command file worker with pid 8106
Sep 08 11:08:27 localhost.localdomain nagios[8083]: Caught SIGSEGV, shutting down...
Sep 08 11:08:27 localhost.localdomain systemd[1]: nagios.service: main process exited, code=exited, status=254/n/a
Sep 08 11:08:27 localhost.localdomain kill[8116]: kill: cannot find process ""
Sep 08 11:08:27 localhost.localdomain systemd[1]: nagios.service: control process exited, code=exited status=1
Sep 08 11:08:27 localhost.localdomain nagios[8106]: Caught SIGTERM, shutting down...
Sep 08 11:08:27 localhost.localdomain systemd[1]: Unit nagios.service entered failed state.
Sep 08 11:08:27 localhost.localdomain systemd[1]: nagios.service failed.
would you please let me know why we have signal termination during start procedure?
Also, Seems nagios.log show us Nagios starting core 4.2.4 but nagios daemon status trying to start core 4.4.2

thanks
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios XI 5.5.3 and Mod_Gearman compatibility

Post by scottwilkerson »

Can you show the output of the following:

Code: Select all

grep nagios.lock /etc/init.d/nagios
grep nagios.lock /usr/local/nagios/etc/nagios.cfg
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
salami
Posts: 30
Joined: Tue Jun 26, 2018 4:36 am

Re: Nagios XI 5.5.3 and Mod_Gearman compatibility

Post by salami »

The out put of "grep nagios.lock /etc/init.d/nagios" is as follow:

Code: Select all

# pidfile: /usr/local/nagios/var/nagios.lock
NagiosRunFile=${prefix}/var/nagios.lock
The output of "grep nagios.lock /usr/local/nagios/etc/nagios.cfg" is as follow:

Code: Select all

lock_file=/usr/local/nagios/var/nagios.lock
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Nagios XI 5.5.3 and Mod_Gearman compatibility

Post by ssax »

Are you seeing any errors in your /var/log/gearmand/gearmand.log?
Locked