Page 2 of 2
Re: Weird check timing
Posted: Wed Mar 12, 2014 12:09 pm
by BanditBBS
Thanks Trevor...just to keep anyone else in the loop, here is a better explanation I tried to provide of my issues:
NCPA is not in use anywhere(I am testing it though). I have had two issue with weird timing.
Issue #1
NagiosXI server svwdcnagios02 has a host "EMAIL CHECKS" with 127.0.0.1. There is a service on that host labeled "SMTP Check". In the "Weird check timing" thread I posted a screen grab of it going critical and then 10 second later checking again and going OK. The check is just using check_smtp against smtp.ae.com. Gearman is in use on this server and the checks are being sent to one of two workers. However, it is still nagios scheduling the checks, no?
Issue #2
NagiosXI server svwdcnagios02(10.96.123.150) has a host "ae.com" with IP of 10.1.1.121. There is a service on that host labeled "check_mobilesite" and it runs a shell script that verifies something on a url. This server is set to fwd this result to svwddnagios01(10.200.48.252). This is referenced in the "Alerted when I shouldn't have been" forum thread. If you look at the post there with 2 images, the source is the server actually doing the checks(svwdcnagios02) and only shows one critical result. The destination server is svwddnagios01 and shows the multiple criticals when it isn't doing any active checks on that service.
God I hope I explained that all well enough
Re: Weird check timing
Posted: Wed Mar 12, 2014 3:34 pm
by slansing
Back to one of your questions regarding multiple core processes, you can verify this on both systems via:
What we are looking for is multiple nagios processes, with separate PID's, not one with multiple children PID's as that is part of the normal forking for checks process that core 3 uses. We're considering trying to get a remote session going with you if this is possible, sometimes it's easier just to get in and see the whole spread.
Re: Weird check timing
Posted: Wed Mar 12, 2014 3:40 pm
by BanditBBS
slansing wrote:Back to one of your questions regarding multiple core processes, you can verify this on both systems via:
What we are looking for is multiple nagios processes, with separate PID's, not one with multiple children PID's as that is part of the normal forking for checks process that core 3 uses. We're considering trying to get a remote session going with you if this is possible, sometimes it's easier just to get in and see the whole spread.
That's what I was looking at already, but so much stuff, no clue where the parent process is listed in all of this, LOL. (actually, I think it is 9266)
Code: Select all
postgres 925 0.0 0.0 217820 6740 ? Ss 11:22 0:05 postgres: nagiosxi nagiosxi ::1(53481) idle
nagios 1209 0.0 0.0 8496 480 ? Ss 2013 0:00 nsca -c /usr/local/nagios/etc/nsca.cfg --inetd
nagios 1210 0.0 0.0 8496 480 ? Ss 2013 0:00 nsca -c /usr/local/nagios/etc/nsca.cfg --inetd
nagios 1364 0.0 0.0 9228 1064 ? Ss 16:37 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1
nagios 1367 0.1 0.2 222060 21716 ? S 16:37 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
nagios 1368 0.0 0.0 9228 1060 ? Ss 16:37 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1
nagios 1370 0.0 0.0 9228 1060 ? Ss 16:37 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1
nagios 1371 0.0 0.0 9228 1060 ? Ss 16:37 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1
nagios 1372 0.2 0.3 228812 28048 ? S 16:37 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
nagios 1375 0.0 0.0 9228 1060 ? Ss 16:37 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feedproc.log 2>&1
nagios 1378 0.2 0.2 222556 21412 ? S 16:37 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
nagios 1380 0.1 0.2 222168 21232 ? S 16:37 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
nagios 1381 0.3 0.2 223776 23024 ? S 16:37 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
postgres 1382 0.0 0.0 217208 5012 ? Ss 16:37 0:00 postgres: nagiosxi nagiosxi ::1(57759) idle
postgres 1393 0.1 0.0 217272 5372 ? Ss 16:37 0:00 postgres: nagiosxi nagiosxi ::1(57764) idle
postgres 1394 0.0 0.0 217208 4992 ? Ss 16:37 0:00 postgres: nagiosxi nagiosxi ::1(57765) idle
postgres 1401 0.0 0.0 217256 6200 ? Ss 16:37 0:00 postgres: nagiosxi nagiosxi ::1(57766) idle
postgres 1412 0.0 0.0 217304 6464 ? Ss 16:05 0:00 postgres: nagiosxi nagiosxi ::1(48440) idle
postgres 1616 0.0 0.0 217208 4980 ? Ss 16:37 0:00 postgres: nagiosxi nagiosxi ::1(57776) idle
nagios 3710 0.0 0.0 110240 1120 pts/1 R+ 16:37 0:00 ps aux
nagios 3711 0.0 0.0 103248 824 pts/1 S+ 16:37 0:00 grep nagios
nagios 4623 0.0 0.0 368884 884 ? S 2013 16:58 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
nagios 4706 0.0 0.0 49888 252 ? Ss 2013 0:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
postgres 4919 0.0 0.0 217820 7236 ? Ss 12:32 0:04 postgres: nagiosxi nagiosxi ::1(44405) idle
postgres 5731 0.0 0.0 217820 7176 ? Ss 12:32 0:04 postgres: nagiosxi nagiosxi ::1(44565) idle
postgres 7588 0.0 0.0 217304 6332 ? Ss 16:07 0:00 postgres: nagiosxi nagiosxi ::1(49211) idle
postgres 8282 0.0 0.0 217276 5704 ? Ss 16:25 0:00 postgres: nagiosxi nagiosxi ::1(54192) idle
nagios 9262 0.0 0.0 49888 1200 ? S 12:35 0:01 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 9263 0.0 0.0 50432 1912 ? S 12:35 0:09 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 9266 0.4 0.0 101316 5080 ? Ssl 12:35 1:08 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
postgres 12539 0.0 0.0 217304 6384 ? Ss 15:20 0:01 postgres: nagiosxi nagiosxi ::1(35972) idle
postgres 17657 0.0 0.0 217304 6080 ? Ss 15:23 0:01 postgres: nagiosxi nagiosxi ::1(36776) idle
postgres 18945 0.0 0.0 217304 6612 ? Ss 14:35 0:02 postgres: nagiosxi nagiosxi ::1(50978) idle
postgres 19170 0.0 0.0 217820 7232 ? Ss 12:56 0:03 postgres: nagiosxi nagiosxi ::1(51048) idle
postgres 22765 0.0 0.0 217820 6828 ? Ss 11:34 0:05 postgres: nagiosxi nagiosxi ::1(56597) idle
root 28049 0.0 0.0 189108 3168 pts/1 S 15:28 0:00 sudo su nagios
root 28151 0.0 0.0 163260 2196 pts/1 S 15:28 0:00 su nagios
nagios 28152 0.0 0.0 110072 3728 pts/1 S 15:28 0:00 bash
postgres 28192 0.0 0.0 217304 6580 ? Ss 12:45 0:04 postgres: nagiosxi nagiosxi ::1(47768) idle
postgres 30514 0.0 0.0 217792 6840 ? Ss 12:28 0:04 postgres: nagiosxi nagiosxi ::1(43429) idle
postgres 32364 0.0 0.0 217276 6364 ? Ss 16:04 0:00 postgres: nagiosxi nagiosxi ::1(48264) idle
Like I'd ever stop you guys from getting your hands in my systems

Re: Weird check timing
Posted: Wed Mar 12, 2014 3:48 pm
by slansing
Woops, I should have given you a cleaner ps, it made sreinhardt freak out so it was worth it I guess, in any case, looks like only one process, which is good. Can you run the following on the other server, was the one you posted above from the main XI system?:
Re: Weird check timing
Posted: Wed Mar 12, 2014 3:54 pm
by BanditBBS
from source server(gearman)
Code: Select all
[clarkj@svwdcnagios02 ~]$ ps -ef | grep nagios.bin
nagios 2738 1 0 Feb09 ? 00:04:23 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
nagios 3008 1 0 Feb09 ? 00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 3506 3008 0 13:41 ? 00:00:11 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 3507 3506 1 13:41 ? 00:02:19 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 3508 1 3 13:41 ? 00:05:59 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
clarkj 18917 18681 0 16:52 pts/1 00:00:00 grep nagios.bin
From destination server:
Code: Select all
[clarkj@svwddnagios01 ~]$ ps -ef | grep nagios.bin
clarkj 2085 1792 0 16:54 pts/2 00:00:00 grep nagios.bin
nagios 4623 1 0 2013 ? 00:16:58 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
nagios 4706 1 0 2013 ? 00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 9262 4706 0 12:35 ? 00:00:01 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 9263 9262 0 12:35 ? 00:00:10 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 9266 1 0 12:35 ? 00:01:13 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
Re: Weird check timing
Posted: Thu Mar 13, 2014 11:12 am
by tmcdonald
You know what, let's do a remote. We're pretty free the rest of today til 5PM Central, then tomorrow until about 2PM. Gonna close this up, so email us and we'll get this sorted out.
EDIT: Re-opening thread.