Page 1 of 2

Nagios Task Scheduling

Posted: Mon Jun 30, 2014 12:14 am
by rajasegar
Nagios XI 2014R1.2
Offloaded DB
RHEL 6.5 x64
Manual Install
Firefox 30
30-06-2014 01-09-33 PM.png
Can you please advice on the graph pattern above?
It used to be like wave with everything below 700, now the leftmost column is always very high.

We are not using Modgearman.

Thanks

Re: Nagios Task Scheduling

Posted: Mon Jun 30, 2014 2:29 am
by WillemDH
Hello,

We had a similar problem some time ago after a vmotion of the Nagios XI server to another datacenter.

See this thread for the solution proposed by Nagios support team: http://support.nagios.com/forum/viewtop ... 16&t=27351

Personally I did not have remove autoretention file, the problem slowly solved itself after a few days.

Grtz

Willem

Re: Nagios Task Scheduling

Posted: Mon Jun 30, 2014 2:59 am
by rajasegar
WillemDH wrote:Hello,

We had a similar problem some time ago after a vmotion of the Nagios XI server to another datacenter.

See this thread for the solution proposed by Nagios support team: http://support.nagios.com/forum/viewtop ... 16&t=27351

Personally I did not have remove autoretention file, the problem slowly solved itself after a few days.

Grtz

Willem

Thanks. Removed the retention.dat and restarted.
It was ok a few seconds then back to the same problem.
Load is ok, CPU ok, Memory ok. What else could be the problem?

I also noticed the 1 minute active checks is 0 now for hosts and services, it used to be around 500 - 600.

30-06-2014 03-53-10 PM.png

Re: Nagios Task Scheduling

Posted: Mon Jun 30, 2014 5:30 am
by rajasegar
Installed mod gearman and same problem.

Increased worker from 50 to 100. Problem went away.
30-06-2014 06-17-21 PM.png
2014-06-30 18:22:02 - localhost:4730 - v0.25

Code: Select all

 Queue Name           | Worker Available | Jobs Waiting | Jobs Running
-----------------------------------------------------------------------
 check_results        |               1  |           0  |           0
 eventhandler         |             100  |           0  |           0
 host                 |             100  |           0  |           4
 service              |             100  |           0  |          63
 worker_nagiosprodxi1 |               1  |           0  |           0
-----------------------------------------------------------------------

However my performance data files is not updating.
Already made the changes
Change process-host-perfdata-file-bulk command's to:

Code: Select all

 define command {
       command_name                             process-host-perfdata-file-bulk
       command_line                             sed -i 's/\\n//g' /usr/local/nagios/var/host-perfdata && /bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.host
}

define command {
       command_name                             process-service-perfdata-file-bulk
       command_line                             sed -i 's/\\n//g' /usr/local/nagios/var/service-perfdata && /bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/$TIMET$.perfdata.service
}

Code: Select all

[nagios@nagiosprodxi1 mod_gearman]$ tail mod_gearman_worker.log
[2014-06-30 18:02:56][20693][ERROR] sending job to gearmand failed: flush(GEARMAN_COULD_NOT_CONNECT) localhost:4730 -> libgearman/connection.cc:498
[2014-06-30 18:02:57][18900][ERROR] sending job to gearmand failed: flush(GEARMAN_COULD_NOT_CONNECT) localhost:4730 -> libgearman/connection.cc:498
[2014-06-30 18:02:57][19127][ERROR] sending job to gearmand failed: flush(GEARMAN_COULD_NOT_CONNECT) localhost:4730 -> libgearman/connection.cc:498
[2014-06-30 18:02:57][21006][ERROR] sending job to gearmand failed: flush(GEARMAN_COULD_NOT_CONNECT) localhost:4730 -> libgearman/connection.cc:498
[2014-06-30 18:02:57][19889][ERROR] sending job to gearmand failed: flush(GEARMAN_COULD_NOT_CONNECT) localhost:4730 -> libgearman/connection.cc:498
[2014-06-30 18:02:57][20467][ERROR] sending job to gearmand failed: flush(GEARMAN_COULD_NOT_CONNECT) localhost:4730 -> libgearman/connection.cc:498
[2014-06-30 18:02:57][19670][ERROR] sending job to gearmand failed: flush(GEARMAN_COULD_NOT_CONNECT) localhost:4730 -> libgearman/connection.cc:498
[2014-06-30 18:02:57][18456][ERROR] sending job to gearmand failed: flush(GEARMAN_COULD_NOT_CONNECT) localhost:4730 -> libgearman/connection.cc:498
[2014-06-30 18:02:58][17857][ERROR] sending job to gearmand failed: flush(GEARMAN_COULD_NOT_CONNECT) localhost:4730 -> libgearman/connection.cc:498
[2014-06-30 18:03:04][2359][INFO ] mod_gearman worker daemon started with pid 2359
[nagios@nagiosprodxi1 mod_gearman]$

Code: Select all

[nagios@nagiosprodxi1 mod_gearman]$ tail mod_gearman_neb.log
[2014-06-30 18:02:53][32692][ERROR] sending job to gearmand failed: flush(GEARMAN_COULD_NOT_CONNECT) localhost:4730 -> libgearman/connection.cc:498

Code: Select all

[nagios@nagiosprodxi1 mod_gearman]$ nc -z localhost 4730
Connection to localhost 4730 port [tcp/gearman] succeeded!

[nagios@nagiosprodxi1 mod_gearman]$ sudo service gearmand status
gearmand (pid  2315) is running...
[nagios@nagiosprodxi1 mod_gearman]$ sudo service mod_gearman_worker status
mod_gearman_worker is running with pid 2359
[nagios@nagiosprodxi1 mod_gearman]$
Anybody can help out? Still troubleshooting.

Re: Nagios Task Scheduling

Posted: Mon Jun 30, 2014 6:23 am
by rajasegar
This is a strange day. I am posting solution to my own post.

Code: Select all

[nagios@nagiosprodxi1 raja]$ sudo tail -f /var/log/messages
Jun 30 19:05:23 nagiosprodxi1 nagios: Warning: Attempting to execute the command "sed -i 's/\\n//g' /usr/local/nagios/var/host-perfdata &amp" resulted in a return code of 127.  Make sure the script or binary you are trying to execute actually exists...
Jun 30 19:05:24 nagiosprodxi1 nagios: Warning: Attempting to execute the command "sed -i 's/\\n//g' /usr/local/nagios/var/service-perfdata &amp" resulted in a return code of 127.  Make sure the script or binary you are trying to execute actually exists...
It appears that when I pasted the command from Excel to CCM the && got changed to && and this screwed up the performance data update. When I changed it back to && from command line and restarted. It was fine again.

Is this a bug?
I just tested with a new test command and put two &&, and this ended up in the config file

Code: Select all

define command {
       command_name                             test
       command_line                             &&
}

Re: Nagios Task Scheduling

Posted: Mon Jun 30, 2014 10:50 am
by slansing
Why were you trying to copy it from excel? Did you try actually adding "&&" directly from your keys to the field in the CCM? When you save it, does it still show &amp&amp? I'm not sure how you changed the commands.cfg manually from the command line as nagios should not accept any manual config changes, since when they get written out during the apply config process it takes the values from the mysql db and would overwrite your manual changes, hence the warning message above all configurations created via the ccm.

Edit: just confirmed this as a bug in 2014 r1.2, making a report right now.

Re: Nagios Task Scheduling

Posted: Mon Jun 30, 2014 11:22 am
by rajasegar
slansing wrote:Why were you trying to copy it from excel? Did you try actually adding "&&" directly from your keys to the field in the CCM? When you save it, does it still show &amp&amp? I'm not sure how you changed the commands.cfg manually from the command line as nagios should not accept any manual config changes, since when they get written out during the apply config process it takes the values from the mysql db and would overwrite your manual changes, hence the warning message above all configurations created via the ccm.

Edit: just confirmed this as a bug in 2014 r1.2, making a report right now.
Well I kept my notes in Excel. Anyway it was not Excel fault as when I typed in manually it still gets converted.
Funny thing is in devt it was fine.

Please note the warning message was around 19:00 and I solved the problem around 19:20 so it was the old log message.

Anyway after I changed via command line and applied changes, it worked, dont ask me how.
I will double check if the changes is still there when get to work. If it is not there will try the " ".

Re: Nagios Task Scheduling

Posted: Mon Jun 30, 2014 11:35 am
by slansing
Cool, yeah we are looking into it right now, as soon as you apply config that manual change you made will be reverted just an FYI.

Re: Nagios Task Scheduling

Posted: Mon Jun 30, 2014 6:16 pm
by rajasegar
slansing wrote:Cool, yeah we are looking into it right now, as soon as you apply config that manual change you made will be reverted just an FYI.
Applied the ccm.zip patch and there is no more $amp;&
Hopefully this is the last of the CCM issues. Thanks

edit: The /n and \n are still there after applying the patch

Re: Nagios Task Scheduling

Posted: Tue Jul 01, 2014 9:15 am
by slansing
Okay, that sed is only to remove newlines in your perfdata, which it should take care of, are you running 32-bit or 64-bit? I don't recall.