Weird check timing

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Weird check timing

Post by BanditBBS »

Thanks Trevor...just to keep anyone else in the loop, here is a better explanation I tried to provide of my issues:
NCPA is not in use anywhere(I am testing it though). I have had two issue with weird timing.

Issue #1
NagiosXI server svwdcnagios02 has a host "EMAIL CHECKS" with 127.0.0.1. There is a service on that host labeled "SMTP Check". In the "Weird check timing" thread I posted a screen grab of it going critical and then 10 second later checking again and going OK. The check is just using check_smtp against smtp.ae.com. Gearman is in use on this server and the checks are being sent to one of two workers. However, it is still nagios scheduling the checks, no?

Issue #2
NagiosXI server svwdcnagios02(10.96.123.150) has a host "ae.com" with IP of 10.1.1.121. There is a service on that host labeled "check_mobilesite" and it runs a shell script that verifies something on a url. This server is set to fwd this result to svwddnagios01(10.200.48.252). This is referenced in the "Alerted when I shouldn't have been" forum thread. If you look at the post there with 2 images, the source is the server actually doing the checks(svwdcnagios02) and only shows one critical result. The destination server is svwddnagios01 and shows the multiple criticals when it isn't doing any active checks on that service.

God I hope I explained that all well enough :)
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Weird check timing

Post by slansing »

Back to one of your questions regarding multiple core processes, you can verify this on both systems via:

Code: Select all

ps aux | grep nagios
What we are looking for is multiple nagios processes, with separate PID's, not one with multiple children PID's as that is part of the normal forking for checks process that core 3 uses. We're considering trying to get a remote session going with you if this is possible, sometimes it's easier just to get in and see the whole spread.
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Weird check timing

Post by BanditBBS »

slansing wrote:Back to one of your questions regarding multiple core processes, you can verify this on both systems via:

Code: Select all

ps aux | grep nagios
What we are looking for is multiple nagios processes, with separate PID's, not one with multiple children PID's as that is part of the normal forking for checks process that core 3 uses. We're considering trying to get a remote session going with you if this is possible, sometimes it's easier just to get in and see the whole spread.
That's what I was looking at already, but so much stuff, no clue where the parent process is listed in all of this, LOL. (actually, I think it is 9266)

Code: Select all

postgres   925  0.0  0.0 217820  6740 ?        Ss   11:22   0:05 postgres: nagiosxi nagiosxi ::1(53481) idle
nagios    1209  0.0  0.0   8496   480 ?        Ss    2013   0:00 nsca -c /usr/local/nagios/etc/nsca.cfg --inetd
nagios    1210  0.0  0.0   8496   480 ?        Ss    2013   0:00 nsca -c /usr/local/nagios/etc/nsca.cfg --inetd
nagios    1364  0.0  0.0   9228  1064 ?        Ss   16:37   0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1
nagios    1367  0.1  0.2 222060 21716 ?        S    16:37   0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
nagios    1368  0.0  0.0   9228  1060 ?        Ss   16:37   0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1
nagios    1370  0.0  0.0   9228  1060 ?        Ss   16:37   0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1
nagios    1371  0.0  0.0   9228  1060 ?        Ss   16:37   0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1
nagios    1372  0.2  0.3 228812 28048 ?        S    16:37   0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
nagios    1375  0.0  0.0   9228  1060 ?        Ss   16:37   0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feedproc.log 2>&1
nagios    1378  0.2  0.2 222556 21412 ?        S    16:37   0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
nagios    1380  0.1  0.2 222168 21232 ?        S    16:37   0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
nagios    1381  0.3  0.2 223776 23024 ?        S    16:37   0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
postgres  1382  0.0  0.0 217208  5012 ?        Ss   16:37   0:00 postgres: nagiosxi nagiosxi ::1(57759) idle
postgres  1393  0.1  0.0 217272  5372 ?        Ss   16:37   0:00 postgres: nagiosxi nagiosxi ::1(57764) idle
postgres  1394  0.0  0.0 217208  4992 ?        Ss   16:37   0:00 postgres: nagiosxi nagiosxi ::1(57765) idle
postgres  1401  0.0  0.0 217256  6200 ?        Ss   16:37   0:00 postgres: nagiosxi nagiosxi ::1(57766) idle
postgres  1412  0.0  0.0 217304  6464 ?        Ss   16:05   0:00 postgres: nagiosxi nagiosxi ::1(48440) idle
postgres  1616  0.0  0.0 217208  4980 ?        Ss   16:37   0:00 postgres: nagiosxi nagiosxi ::1(57776) idle
nagios    3710  0.0  0.0 110240  1120 pts/1    R+   16:37   0:00 ps aux
nagios    3711  0.0  0.0 103248   824 pts/1    S+   16:37   0:00 grep nagios
nagios    4623  0.0  0.0 368884   884 ?        S     2013  16:58 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
nagios    4706  0.0  0.0  49888   252 ?        Ss    2013   0:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
postgres  4919  0.0  0.0 217820  7236 ?        Ss   12:32   0:04 postgres: nagiosxi nagiosxi ::1(44405) idle
postgres  5731  0.0  0.0 217820  7176 ?        Ss   12:32   0:04 postgres: nagiosxi nagiosxi ::1(44565) idle
postgres  7588  0.0  0.0 217304  6332 ?        Ss   16:07   0:00 postgres: nagiosxi nagiosxi ::1(49211) idle
postgres  8282  0.0  0.0 217276  5704 ?        Ss   16:25   0:00 postgres: nagiosxi nagiosxi ::1(54192) idle
nagios    9262  0.0  0.0  49888  1200 ?        S    12:35   0:01 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios    9263  0.0  0.0  50432  1912 ?        S    12:35   0:09 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios    9266  0.4  0.0 101316  5080 ?        Ssl  12:35   1:08 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
postgres 12539  0.0  0.0 217304  6384 ?        Ss   15:20   0:01 postgres: nagiosxi nagiosxi ::1(35972) idle
postgres 17657  0.0  0.0 217304  6080 ?        Ss   15:23   0:01 postgres: nagiosxi nagiosxi ::1(36776) idle
postgres 18945  0.0  0.0 217304  6612 ?        Ss   14:35   0:02 postgres: nagiosxi nagiosxi ::1(50978) idle
postgres 19170  0.0  0.0 217820  7232 ?        Ss   12:56   0:03 postgres: nagiosxi nagiosxi ::1(51048) idle
postgres 22765  0.0  0.0 217820  6828 ?        Ss   11:34   0:05 postgres: nagiosxi nagiosxi ::1(56597) idle
root     28049  0.0  0.0 189108  3168 pts/1    S    15:28   0:00 sudo su nagios
root     28151  0.0  0.0 163260  2196 pts/1    S    15:28   0:00 su nagios
nagios   28152  0.0  0.0 110072  3728 pts/1    S    15:28   0:00 bash
postgres 28192  0.0  0.0 217304  6580 ?        Ss   12:45   0:04 postgres: nagiosxi nagiosxi ::1(47768) idle
postgres 30514  0.0  0.0 217792  6840 ?        Ss   12:28   0:04 postgres: nagiosxi nagiosxi ::1(43429) idle
postgres 32364  0.0  0.0 217276  6364 ?        Ss   16:04   0:00 postgres: nagiosxi nagiosxi ::1(48264) idle
Like I'd ever stop you guys from getting your hands in my systems :)
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Weird check timing

Post by slansing »

Woops, I should have given you a cleaner ps, it made sreinhardt freak out so it was worth it I guess, in any case, looks like only one process, which is good. Can you run the following on the other server, was the one you posted above from the main XI system?:

Code: Select all

ps -ef | grep nagios.bin
User avatar
BanditBBS
Posts: 2474
Joined: Tue May 31, 2011 12:57 pm
Location: Scio, OH
Contact:

Re: Weird check timing

Post by BanditBBS »

from source server(gearman)

Code: Select all

[clarkj@svwdcnagios02 ~]$ ps -ef | grep nagios.bin
nagios    2738     1  0 Feb09 ?        00:04:23 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
nagios    3008     1  0 Feb09 ?        00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios    3506  3008  0 13:41 ?        00:00:11 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios    3507  3506  1 13:41 ?        00:02:19 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios    3508     1  3 13:41 ?        00:05:59 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
clarkj   18917 18681  0 16:52 pts/1    00:00:00 grep nagios.bin
From destination server:

Code: Select all

[clarkj@svwddnagios01 ~]$ ps -ef | grep nagios.bin
clarkj    2085  1792  0 16:54 pts/2    00:00:00 grep nagios.bin
nagios    4623     1  0  2013 ?        00:16:58 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
nagios    4706     1  0  2013 ?        00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios    9262  4706  0 12:35 ?        00:00:01 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios    9263  9262  0 12:35 ?        00:00:10 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios    9266     1  0 12:35 ?        00:01:13 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Weird check timing

Post by tmcdonald »

You know what, let's do a remote. We're pretty free the rest of today til 5PM Central, then tomorrow until about 2PM. Gonna close this up, so email us and we'll get this sorted out.

EDIT: Re-opening thread.
Former Nagios employee
Locked