Page 2 of 3

Re: some cron jobs dont seem to be running

Posted: Mon Mar 05, 2018 10:13 am
by bosecorp
Any update.

could this be an issue with the nagiosxi database.

are the cron jobs supposed to remain running all the time? or the script runs and then the scripts gets restarted every time cron runs? mybe some of the scripts run forever and they never stop

also, another thing I noticed my servers keeps on crashing because is running out resources. seems like at some point there are hundreds of the same processes. as a workaround so that my server doesnt crash, I am killing cron jobs that are older than few minutes

Re: some cron jobs dont seem to be running

Posted: Mon Mar 05, 2018 1:11 pm
by cdienger
You should always see them running, but each minute they're killed and a new one started.

Increase the /etc/php.ini values as described in https://support.nagios.com/kb/article/n ... e-611.html .

I don't suspect a db issue at this time but it wouldn't hurt to run through https://assets.nagios.com/downloads/nag ... tabase.pdf to be cautious.

Re: some cron jobs dont seem to be running

Posted: Mon Mar 05, 2018 1:51 pm
by bosecorp
I think you are right, I dont think is a DB issue.

seems like the processes are not getting killed for the new one starts.

do you think adjusting php will help?

Re: some cron jobs dont seem to be running

Posted: Mon Mar 05, 2018 2:09 pm
by bosecorp
Hi, made the change

same issue. old cron jobs arent getting kill before starting the new one

Re: some cron jobs dont seem to be running

Posted: Mon Mar 05, 2018 2:29 pm
by scottwilkerson
bosecorp wrote:Hi, made the change

same issue. old cron jobs arent getting kill before starting the new one
They are not supposed to get killed just because another starts, however if you have had a machine run out of memory, I have seen this exact scenario where new processes spawn but cannot run.

Do you see data stream by if you tail the log files of some running tasks such as

Code: Select all

tail -f /usr/local/nagiosxi/var/eventman.log
Can you also post the output of

Code: Select all

ls -al /usr/local/nagiosxi/var/
df -h
If you see nothing being added tailing the log above you likely will need to reboot the server.

Re: some cron jobs dont seem to be running

Posted: Mon Mar 05, 2018 2:34 pm
by bosecorp
But then a see hundreds of duplicate cron jobs, then the server runs out of resources

# tail -f /usr/local/nagiosxi/var/eventman.log
PROCESS EVENT: ID=136995564, SOURCE=2, TYPE=2, TIME=2018-03-05 19:21:02
PROCESS EVENT: ID=136995564, SOURCE=2, TYPE=2, TIME=2018-03-05 19:23:02
PROCESS EVENT: ID=136995564, SOURCE=2, TYPE=2, TIME=2018-03-05 19:24:02
PROCESS EVENT: ID=136995565, SOURCE=2, TYPE=2, TIME=2018-03-05 19:22:01
PROCESS EVENT: ID=136995564, SOURCE=2, TYPE=2, TIME=2018-03-05 19:25:02
PROCESS EVENT: ID=136995564, SOURCE=2, TYPE=2, TIME=2018-03-05 19:27:01
PROCESS EVENT: ID=136995564, SOURCE=2, TYPE=2, TIME=2018-03-05 19:28:01
PROCESS EVENT: ID=136995564, SOURCE=2, TYPE=2, TIME=2018-03-05 19:29:02
PROCESS EVENT: ID=136995564, SOURCE=2, TYPE=2, TIME=2018-03-05 19:30:02
PROCESS EVENT: ID=136995564, SOURCE=2, TYPE=2, TIME=2018-03-05 19:31:03


root@usvanagiospxi3:(03-05 08:39): /root
# ls -al /usr/local/nagiosxi/var/
total 2273684
drwxr-xr-x 6 nagios nagios 4096 Mar 5 14:32 .
drwxr-xr-x 10 nagios nagios 94 Mar 12 2014 ..
-rw-r--r-- 1 nagios nagios 1465864 Mar 5 14:33 cleaner.log
-rw-r--r-- 1 nagios nagios 6834 Feb 28 12:53 cleaner.log.1.gz
-rw-r--r-- 1 nagios nagios 41422 Feb 14 03:47 cleaner.log-20180214.gz
-rw-r--r-- 1 nagios nagios 1010622 Mar 5 14:33 cmdsubsys.log
-rw-r--r-- 1 nagios nagios 55823 Feb 28 12:53 cmdsubsys.log.1.gz
-rw-r--r-- 1 nagios nagios 174075 Feb 4 03:49 cmdsubsys.log-20180204.gz
drwsrwsr-x 4 apache nagios 4096 Mar 2 15:44 components
-rw-r--r-- 1 nagios nagios 8 Mar 5 14:33 corelog.data
-rw-r--r-- 1 nagios nagios 0 Mar 5 14:33 corelog.diff
-rwxrwxr-x 1 nagios nagios 6 Feb 6 14:46 corelog.newobjects
-rw-r--r-- 1 nagios nagios 0 Mar 5 14:20 dbmaint.lock
-rw-r--r-- 1 nagios nagios 437590 Mar 5 14:30 dbmaint.log
-rw-r--r-- 1 nagios nagios 240278 Feb 28 12:52 dbmaint.log.1.gz
-rw-r--r-- 1 nagios nagios 246949 Feb 9 21:10 dbmaint.log-20180209.gz
-rw-r--r-- 1 nagios nagios 404010 Mar 5 14:33 deadpool.log
-rw-r--r-- 1 nagios nagios 18499 Nov 5 22:38 deadpool.log.1.gz
-rw-r--r-- 1 nagios nagios 0 Mar 5 14:32 event_handler.lock
-rw-r--r-- 1 nagios nagios 1472682127 Mar 5 14:33 event_handler.log
-rw-r--r-- 1 nagios nagios 298037611 Feb 28 12:53 event_handler.log.1.gz
-rw-r--r-- 1 nagios nagios 35474529 Mar 5 09:09 event_handler.log-20180305.gz
-rw-r--r-- 1 nagios nagios 2980681 Mar 5 14:33 eventman.log
-rw-r--r-- 1 nagios nagios 355706 Feb 28 12:52 eventman.log.1.gz
-rw-r--r-- 1 nagios nagios 948207 Feb 23 19:06 eventman.log-20180223.gz
-rw-r--r-- 1 nagios nagios 351047 Mar 5 14:33 feedproc.log
-rw-r--r-- 1 nagios nagios 48118 Feb 28 12:53 feedproc.log.1.gz
-rw-r--r-- 1 nagios nagios 1030 Mar 4 20:05 load_url.log
-rw-r--r-- 1 nagios nagios 501 Feb 28 09:56 load_url.log.1.gz
drwxrwxr-x 2 nagios nagios 6 Jul 5 2017 mkdir
-rw-r--r-- 1 nagios nagios 209850 Mar 5 14:03 nom.log
-rw-r--r-- 1 nagios nagios 66952 Feb 28 09:58 nom.log.1.gz
-rw-r--r-- 1 nagios nagios 3720673 Mar 5 14:33 perfdataproc.log
-rw-r--r-- 1 nagios nagios 2461 Feb 28 12:53 perfdataproc.log.1.gz
-rw-r--r-- 1 nagios nagios 193856 Feb 23 19:06 perfdataproc.log-20180223.gz
-rw-r--r-- 1 nagios nagios 5394099 Mar 5 13:01 recurringdowntime.log
-rw-r--r-- 1 nagios nagios 55070 Feb 28 12:01 recurringdowntime.log.1.gz
-rw-r--r-- 1 nagios nagios 5615616 Feb 24 03:01 recurringdowntime.log-20180224
-rw-r--r-- 1 nagios nagios 870482 Mar 2 15:01 recurringdowntime.log-20180302.gz
-rw-r--r-- 1 nagios nagios 178645 Mar 5 14:03 reportengine.log
-rw-r--r-- 1 nagios nagios 567 Feb 28 09:42 reportengine.log.1.gz
drwxr-xr-x 2 nagios nagios 47 Mar 5 14:29 subsys
-rw-r--r-- 1 nagios nagios 6678265 Mar 5 14:33 sysstat.log
-rw-r--r-- 1 nagios nagios 7233 Feb 28 12:53 sysstat.log.1.gz
-rw-r--r-- 1 nagios nagios 393919 Mar 4 10:31 sysstat.log-20180304.gz
-rw-r--r-- 1 root root 5830 Mar 2 22:55 tmp_xi_vars.cfg
drwxr-xr-x 2 nagios nagios 6 Apr 3 2016 upgrades
-rw-r--r-- 1 nagios nagios 12614 Feb 9 18:23 xi-sys.cfg
-rw-r--r-- 1 nagios nagios 37 Jun 7 2017 xi-uuid
-rw-r--r-- 1 nagios nagios 198 Feb 9 18:23 xiversion
root@usvanagiospxi3:(03-05 08:39): /root
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda2 100G 3.6G 97G 4% /
devtmpfs 60G 0 60G 0% /dev
tmpfs 60G 0 60G 0% /dev/shm
tmpfs 60G 154M 60G 1% /run
tmpfs 60G 0 60G 0% /sys/fs/cgroup
/dev/mapper/vg_sys-lv_var 10G 2.3G 7.8G 23% /var
/dev/mapper/vg_sys-lv_tmp 10G 33M 10G 1% /mnt/tmp
/dev/mapper/vg_sys-lv_home 10G 370M 9.7G 4% /home
tmpfs 500M 13M 488M 3% /var/nagiosramdisk
/dev/mapper/vg_sys-lv_var_log 7.0G 1.6G 5.5G 22% /var/log
/dev/mapper/vg_sys-lv_vlaudit 7.0G 66M 7.0G 1% /var/log/audit
/dev/md0 90G 53G 38G 58% /old_store
/dev/md10 120G 62G 59G 52% /usr/local/nagios
/dev/md1 5.0G 2.4G 2.7G 48% /usr/local/nagiosxi
/dev/md4 5.0G 45M 5.0G 1% /usr/local/nagvis
/dev/md5 2.0G 85M 2.0G 5% /usr/local/nrdp
/dev/md2 20G 2.0G 19G 10% /var/lib/mrtg
/dev/md6 20G 3.8G 17G 19% /var/lib/mysql
/dev/md7 30G 306M 30G 1% /var/lib/pgsql
/dev/md8 5.0G 1.5G 3.6G 30% /var/log/mod_gearman2
/dev/md9 5.0G 35M 5.0G 1% /var/log/gearmand
/dev/md3 2.0G 39M 2.0G 2% /etc/mrtg
/dev/md11 2.0G 33M 2.0G 2% /etc/cron.d
tmpfs 12G 0 12G 0% /run/user/872784
tmpfs 12G 0 12G 0% /run/user/2606
root@usvanagiospxi3:(03-05 08:39): /root
#


after the server runs out resouces, then I reboot the server. so technically that is what I been doing but doesnt seem to help

Re: some cron jobs dont seem to be running

Posted: Mon Mar 05, 2018 2:58 pm
by scottwilkerson
I noticed the date in your logs are different that the timestamps on the files can you run the following and return back

Code: Select all

php -r "echo date('r'). \"\n\";" && date -R
Also, the same event ID is showing over and over,

lets also run the following to clear old events:

Code: Select all

mysql -u ndoutils -pn@gweb nagiosxi -e 'TRUNCATE TABLE xi_meta'
mysql -u ndoutils -pn@gweb nagiosxi -e 'TRUNCATE TABLE xi_events'
mysql -u ndoutils -pn@gweb nagiosxi -e 'TRUNCATE TABLE xi_eventqueue'

Re: some cron jobs dont seem to be running

Posted: Mon Mar 05, 2018 3:30 pm
by bosecorp
here you go

root@usvanagiospxi3:(03-05 08:39): /root
# php -r "echo date('r'). \"\n\";" && date -R
Mon, 05 Mar 2018 15:28:33 -0500
Mon, 05 Mar 2018 15:28:33 -0500
You have mail in /var/spool/mail/root
root@usvanagiospxi3:(03-05 08:39): /root
#

I am running the other commands right now. It's taking a bit

Re: some cron jobs dont seem to be running

Posted: Mon Mar 05, 2018 4:26 pm
by bosecorp
this one is taking extremely long time

mysql -u ndoutils -pn@gweb nagiosxi -e 'TRUNCATE TABLE xi_meta'

do you know why?



the other two are done. also, when I did the other two in the XI System Component Status, the Event Manager went green for few minutes, but then wen red again. if try to truncate them again, turns green, but goes red again shortly after.

Re: some cron jobs dont seem to be running

Posted: Tue Mar 06, 2018 10:09 am
by scottwilkerson
can you verify there are no errors in

Code: Select all

tail -50 /var/log/mysqld.log