NCPA is not in use anywhere(I am testing it though). I have had two issue with weird timing.
Issue #1
NagiosXI server svwdcnagios02 has a host "EMAIL CHECKS" with 127.0.0.1. There is a service on that host labeled "SMTP Check". In the "Weird check timing" thread I posted a screen grab of it going critical and then 10 second later checking again and going OK. The check is just using check_smtp against smtp.ae.com. Gearman is in use on this server and the checks are being sent to one of two workers. However, it is still nagios scheduling the checks, no?
Issue #2
NagiosXI server svwdcnagios02(10.96.123.150) has a host "ae.com" with IP of 10.1.1.121. There is a service on that host labeled "check_mobilesite" and it runs a shell script that verifies something on a url. This server is set to fwd this result to svwddnagios01(10.200.48.252). This is referenced in the "Alerted when I shouldn't have been" forum thread. If you look at the post there with 2 images, the source is the server actually doing the checks(svwdcnagios02) and only shows one critical result. The destination server is svwddnagios01 and shows the multiple criticals when it isn't doing any active checks on that service.
God I hope I explained that all well enough![]()
Weird check timing
Re: Weird check timing
Thanks Trevor...just to keep anyone else in the loop, here is a better explanation I tried to provide of my issues:
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Weird check timing
Back to one of your questions regarding multiple core processes, you can verify this on both systems via:
What we are looking for is multiple nagios processes, with separate PID's, not one with multiple children PID's as that is part of the normal forking for checks process that core 3 uses. We're considering trying to get a remote session going with you if this is possible, sometimes it's easier just to get in and see the whole spread.
Code: Select all
ps aux | grep nagiosRe: Weird check timing
That's what I was looking at already, but so much stuff, no clue where the parent process is listed in all of this, LOL. (actually, I think it is 9266)slansing wrote:Back to one of your questions regarding multiple core processes, you can verify this on both systems via:
What we are looking for is multiple nagios processes, with separate PID's, not one with multiple children PID's as that is part of the normal forking for checks process that core 3 uses. We're considering trying to get a remote session going with you if this is possible, sometimes it's easier just to get in and see the whole spread.Code: Select all
ps aux | grep nagios
Code: Select all
postgres 925 0.0 0.0 217820 6740 ? Ss 11:22 0:05 postgres: nagiosxi nagiosxi ::1(53481) idle
nagios 1209 0.0 0.0 8496 480 ? Ss 2013 0:00 nsca -c /usr/local/nagios/etc/nsca.cfg --inetd
nagios 1210 0.0 0.0 8496 480 ? Ss 2013 0:00 nsca -c /usr/local/nagios/etc/nsca.cfg --inetd
nagios 1364 0.0 0.0 9228 1064 ? Ss 16:37 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1
nagios 1367 0.1 0.2 222060 21716 ? S 16:37 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
nagios 1368 0.0 0.0 9228 1060 ? Ss 16:37 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1
nagios 1370 0.0 0.0 9228 1060 ? Ss 16:37 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1
nagios 1371 0.0 0.0 9228 1060 ? Ss 16:37 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1
nagios 1372 0.2 0.3 228812 28048 ? S 16:37 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
nagios 1375 0.0 0.0 9228 1060 ? Ss 16:37 0:00 /bin/sh -c /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feedproc.log 2>&1
nagios 1378 0.2 0.2 222556 21412 ? S 16:37 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
nagios 1380 0.1 0.2 222168 21232 ? S 16:37 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
nagios 1381 0.3 0.2 223776 23024 ? S 16:37 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
postgres 1382 0.0 0.0 217208 5012 ? Ss 16:37 0:00 postgres: nagiosxi nagiosxi ::1(57759) idle
postgres 1393 0.1 0.0 217272 5372 ? Ss 16:37 0:00 postgres: nagiosxi nagiosxi ::1(57764) idle
postgres 1394 0.0 0.0 217208 4992 ? Ss 16:37 0:00 postgres: nagiosxi nagiosxi ::1(57765) idle
postgres 1401 0.0 0.0 217256 6200 ? Ss 16:37 0:00 postgres: nagiosxi nagiosxi ::1(57766) idle
postgres 1412 0.0 0.0 217304 6464 ? Ss 16:05 0:00 postgres: nagiosxi nagiosxi ::1(48440) idle
postgres 1616 0.0 0.0 217208 4980 ? Ss 16:37 0:00 postgres: nagiosxi nagiosxi ::1(57776) idle
nagios 3710 0.0 0.0 110240 1120 pts/1 R+ 16:37 0:00 ps aux
nagios 3711 0.0 0.0 103248 824 pts/1 S+ 16:37 0:00 grep nagios
nagios 4623 0.0 0.0 368884 884 ? S 2013 16:58 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
nagios 4706 0.0 0.0 49888 252 ? Ss 2013 0:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
postgres 4919 0.0 0.0 217820 7236 ? Ss 12:32 0:04 postgres: nagiosxi nagiosxi ::1(44405) idle
postgres 5731 0.0 0.0 217820 7176 ? Ss 12:32 0:04 postgres: nagiosxi nagiosxi ::1(44565) idle
postgres 7588 0.0 0.0 217304 6332 ? Ss 16:07 0:00 postgres: nagiosxi nagiosxi ::1(49211) idle
postgres 8282 0.0 0.0 217276 5704 ? Ss 16:25 0:00 postgres: nagiosxi nagiosxi ::1(54192) idle
nagios 9262 0.0 0.0 49888 1200 ? S 12:35 0:01 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 9263 0.0 0.0 50432 1912 ? S 12:35 0:09 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 9266 0.4 0.0 101316 5080 ? Ssl 12:35 1:08 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
postgres 12539 0.0 0.0 217304 6384 ? Ss 15:20 0:01 postgres: nagiosxi nagiosxi ::1(35972) idle
postgres 17657 0.0 0.0 217304 6080 ? Ss 15:23 0:01 postgres: nagiosxi nagiosxi ::1(36776) idle
postgres 18945 0.0 0.0 217304 6612 ? Ss 14:35 0:02 postgres: nagiosxi nagiosxi ::1(50978) idle
postgres 19170 0.0 0.0 217820 7232 ? Ss 12:56 0:03 postgres: nagiosxi nagiosxi ::1(51048) idle
postgres 22765 0.0 0.0 217820 6828 ? Ss 11:34 0:05 postgres: nagiosxi nagiosxi ::1(56597) idle
root 28049 0.0 0.0 189108 3168 pts/1 S 15:28 0:00 sudo su nagios
root 28151 0.0 0.0 163260 2196 pts/1 S 15:28 0:00 su nagios
nagios 28152 0.0 0.0 110072 3728 pts/1 S 15:28 0:00 bash
postgres 28192 0.0 0.0 217304 6580 ? Ss 12:45 0:04 postgres: nagiosxi nagiosxi ::1(47768) idle
postgres 30514 0.0 0.0 217792 6840 ? Ss 12:28 0:04 postgres: nagiosxi nagiosxi ::1(43429) idle
postgres 32364 0.0 0.0 217276 6364 ? Ss 16:04 0:00 postgres: nagiosxi nagiosxi ::1(48264) idle
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Weird check timing
Woops, I should have given you a cleaner ps, it made sreinhardt freak out so it was worth it I guess, in any case, looks like only one process, which is good. Can you run the following on the other server, was the one you posted above from the main XI system?:
Code: Select all
ps -ef | grep nagios.binRe: Weird check timing
from source server(gearman)
From destination server:
Code: Select all
[clarkj@svwdcnagios02 ~]$ ps -ef | grep nagios.bin
nagios 2738 1 0 Feb09 ? 00:04:23 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
nagios 3008 1 0 Feb09 ? 00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 3506 3008 0 13:41 ? 00:00:11 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 3507 3506 1 13:41 ? 00:02:19 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 3508 1 3 13:41 ? 00:05:59 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
clarkj 18917 18681 0 16:52 pts/1 00:00:00 grep nagios.bin
Code: Select all
[clarkj@svwddnagios01 ~]$ ps -ef | grep nagios.bin
clarkj 2085 1792 0 16:54 pts/2 00:00:00 grep nagios.bin
nagios 4623 1 0 2013 ? 00:16:58 /usr/local/nagios/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
nagios 4706 1 0 2013 ? 00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 9262 4706 0 12:35 ? 00:00:01 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 9263 9262 0 12:35 ? 00:00:10 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 9266 1 0 12:35 ? 00:01:13 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
2 of XI5.6.14 Prod/DR/DEV - Nagios LogServer 2 Nodes
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
See my projects on the Exchange at BanditBBS - Also check out my Nagios stuff on my personal page at Bandit's Home and at github
Re: Weird check timing
You know what, let's do a remote. We're pretty free the rest of today til 5PM Central, then tomorrow until about 2PM. Gonna close this up, so email us and we'll get this sorted out.
EDIT: Re-opening thread.
EDIT: Re-opening thread.
Former Nagios employee