Page 2 of 2
Re: Nagios showing error 500 sometimes
Posted: Tue Jan 19, 2016 10:06 am
by zulu42
Short update:
Even with 8 CPU-cores and 16GB of RAM, the error occurs. So I rolled back to 4 CPU-cores and 12GB of RAM again.
Anyone an idea?
Re: Nagios showing error 500 sometimes
Posted: Tue Jan 19, 2016 10:38 am
by rkennedy
zulu42 wrote:Short update:
top|head -5
top - 07:21:43 up 35 days, 21:02, 4 users, load average: 8.12, 5.53, 4.29
Tasks: 173 total, 2 running, 168 sleeping, 3 stopped, 0 zombie
%Cpu(s): 5.7 us, 1.5 sy, 0.0 ni, 92.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 8011552 total, 1974816 free, 775024 used, 5261712 buff/cache
KiB Swap: 1257468 total, 1256288 free, 1180 used. 6589452 avail Mem
I've upgraded the RAM to 12GB. Still have Four CPU-cores:
top - 13:50:47 up 1:11, 2 users, load average: 0.02, 0.06, 0.11
Tasks: 184 total, 3 running, 181 sleeping, 0 stopped, 0 zombie
%Cpu(s): 6.1 us, 1.3 sy, 0.0 ni, 92.5 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 12140304 total, 11242764 free, 434120 used, 463420 buff/cache
KiB Swap: 1257468 total, 1257468 free, 0 used. 11446752 avail Mem
After you rebooted, and the load went down - did this issue occur right away again? Can you run the command
ps -ef and post the result?
Re: Nagios showing error 500 sometimes
Posted: Fri Jan 22, 2016 6:08 am
by zulu42
Yes, occured right again.
Here the output of ps -ef:
Code: Select all
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Jan19 ? 00:00:05 /usr/lib/systemd/systemd --switched-root --system --deserialize 24
root 2 0 0 Jan19 ? 00:00:00 [kthreadd]
root 3 2 0 Jan19 ? 00:00:03 [ksoftirqd/0]
root 5 2 0 Jan19 ? 00:00:00 [kworker/0:0H]
root 7 2 0 Jan19 ? 00:00:08 [migration/0]
root 8 2 0 Jan19 ? 00:00:00 [rcu_bh]
root 9 2 0 Jan19 ? 00:00:00 [rcuob/0]
root 10 2 0 Jan19 ? 00:00:00 [rcuob/1]
root 11 2 0 Jan19 ? 00:00:00 [rcuob/2]
root 12 2 0 Jan19 ? 00:00:00 [rcuob/3]
root 13 2 0 Jan19 ? 00:02:47 [rcu_sched]
root 14 2 0 Jan19 ? 00:01:41 [rcuos/0]
root 15 2 0 Jan19 ? 00:00:51 [rcuos/1]
root 16 2 0 Jan19 ? 00:01:02 [rcuos/2]
root 17 2 0 Jan19 ? 00:00:43 [rcuos/3]
root 18 2 0 Jan19 ? 00:00:00 [watchdog/0]
root 19 2 0 Jan19 ? 00:00:00 [watchdog/1]
root 20 2 0 Jan19 ? 00:00:07 [migration/1]
root 21 2 0 Jan19 ? 00:00:01 [ksoftirqd/1]
root 23 2 0 Jan19 ? 00:00:00 [kworker/1:0H]
root 24 2 0 Jan19 ? 00:00:01 [watchdog/2]
root 25 2 0 Jan19 ? 00:00:07 [migration/2]
root 26 2 0 Jan19 ? 00:00:01 [ksoftirqd/2]
root 28 2 0 Jan19 ? 00:00:00 [kworker/2:0H]
root 29 2 0 Jan19 ? 00:00:00 [watchdog/3]
root 30 2 0 Jan19 ? 00:00:07 [migration/3]
root 31 2 0 Jan19 ? 00:00:02 [ksoftirqd/3]
root 33 2 0 Jan19 ? 00:00:00 [kworker/3:0H]
root 34 2 0 Jan19 ? 00:00:00 [khelper]
root 35 2 0 Jan19 ? 00:00:00 [kdevtmpfs]
root 36 2 0 Jan19 ? 00:00:00 [netns]
root 37 2 0 Jan19 ? 00:00:00 [writeback]
root 38 2 0 Jan19 ? 00:00:00 [kintegrityd]
root 39 2 0 Jan19 ? 00:00:00 [bioset]
root 40 2 0 Jan19 ? 00:00:00 [kblockd]
root 41 2 0 Jan19 ? 00:00:00 [khubd]
root 42 2 0 Jan19 ? 00:00:00 [md]
root 45 2 0 Jan19 ? 00:00:00 [khungtaskd]
root 46 2 0 Jan19 ? 00:00:00 [kswapd0]
root 47 2 0 Jan19 ? 00:00:00 [ksmd]
root 48 2 0 Jan19 ? 00:00:01 [khugepaged]
root 49 2 0 Jan19 ? 00:00:00 [fsnotify_mark]
root 50 2 0 Jan19 ? 00:00:00 [crypto]
root 58 2 0 Jan19 ? 00:00:00 [kthrotld]
root 60 2 0 Jan19 ? 00:00:00 [kmpath_rdacd]
root 62 2 0 Jan19 ? 00:00:00 [kpsmoused]
root 82 2 0 Jan19 ? 00:00:00 [deferwq]
root 105 2 0 Jan19 ? 00:00:00 [kauditd]
root 287 2 0 Jan19 ? 00:00:00 [mpt_poll_0]
root 288 2 0 Jan19 ? 00:00:00 [mpt/0]
root 289 2 0 Jan19 ? 00:00:00 [ata_sff]
root 295 2 0 Jan19 ? 00:00:00 [scsi_eh_0]
root 296 2 0 Jan19 ? 00:00:00 [scsi_tmf_0]
root 297 2 0 Jan19 ? 00:00:00 [scsi_eh_1]
root 298 2 0 Jan19 ? 00:00:00 [scsi_tmf_1]
root 299 2 0 Jan19 ? 00:00:00 [scsi_eh_2]
root 304 2 0 Jan19 ? 00:00:00 [scsi_tmf_2]
root 306 2 0 Jan19 ? 00:00:00 [ttm_swap]
root 375 2 0 Jan19 ? 00:00:00 [kdmflush]
root 376 2 0 Jan19 ? 00:00:00 [bioset]
root 383 2 0 Jan19 ? 00:00:00 [kdmflush]
root 384 2 0 Jan19 ? 00:00:00 [bioset]
root 403 2 0 Jan19 ? 00:00:00 [xfsalloc]
root 404 2 0 Jan19 ? 00:00:00 [xfs_mru_cache]
root 405 2 0 Jan19 ? 00:00:00 [xfs-buf/dm-1]
root 406 2 0 Jan19 ? 00:00:00 [xfs-data/dm-1]
root 407 2 0 Jan19 ? 00:00:00 [xfs-conv/dm-1]
root 408 2 0 Jan19 ? 00:00:00 [xfs-cil/dm-1]
root 409 2 0 Jan19 ? 00:00:24 [kworker/0:1H]
root 410 2 0 Jan19 ? 00:01:31 [xfsaild/dm-1]
root 419 2 0 Jan19 ? 00:00:05 [kworker/3:1H]
root 457 2 0 Jan19 ? 00:00:08 [kworker/1:1H]
root 466 2 0 Jan19 ? 00:00:06 [kworker/2:1H]
root 480 1 0 Jan19 ? 00:00:10 /usr/lib/systemd/systemd-journald
root 498 1 0 Jan19 ? 00:00:00 /usr/sbin/lvmetad -f
root 506 1 0 Jan19 ? 00:00:00 /usr/lib/systemd/systemd-udevd
root 559 2 0 Jan19 ? 00:00:00 [xfs-buf/sda1]
root 561 2 0 Jan19 ? 00:00:00 [xfs-data/sda1]
root 562 2 0 Jan19 ? 00:00:00 [xfs-conv/sda1]
root 563 2 0 Jan19 ? 00:00:00 [xfs-cil/sda1]
root 564 2 0 Jan19 ? 00:00:00 [xfsaild/sda1]
root 565 2 0 Jan19 ? 00:00:00 [xfs-buf/sdb1]
root 566 2 0 Jan19 ? 00:00:00 [xfs-data/sdb1]
root 567 2 0 Jan19 ? 00:00:00 [xfs-conv/sdb1]
root 568 2 0 Jan19 ? 00:00:00 [xfs-cil/sdb1]
root 569 2 0 Jan19 ? 00:00:54 [xfsaild/sdb1]
root 626 1 0 Jan19 ? 00:00:00 /sbin/auditd -n
root 648 1 0 Jan19 ? 00:00:03 /usr/sbin/rsyslogd -n
dbus 649 1 0 Jan19 ? 00:00:00 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
root 656 1 0 Jan19 ? 00:00:00 /bin/sh /usr/lib/pcsd/pcsd start
root 659 1 0 Jan19 ? 00:00:22 /usr/bin/python -Es /usr/sbin/tuned -l -P
root 660 1 0 Jan19 ? 00:00:08 /usr/sbin/irqbalance --foreground
root 663 1 0 Jan19 ? 00:00:00 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid
root 673 656 0 Jan19 ? 00:00:00 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
root 674 673 0 Jan19 ? 00:00:32 /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
root 678 1 0 Jan19 ? 00:00:07 /usr/sbin/sssd -D -f
root 707 678 0 Jan19 ? 00:02:26 /usr/libexec/sssd/sssd_be --domain LDAP --uid 0 --gid 0 --debug-to-files
root 749 678 0 Jan19 ? 00:00:06 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files
root 750 678 0 Jan19 ? 00:00:01 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files
root 756 1 0 Jan19 ? 00:00:00 /usr/lib/systemd/systemd-logind
root 765 1 0 Jan19 ? 00:00:00 /usr/sbin/NetworkManager --no-daemon
polkitd 900 1 0 Jan19 ? 00:00:00 /usr/lib/polkit-1/polkitd --no-debug
root 1204 1 0 Jan19 ? 00:00:00 /usr/sbin/sshd -D
root 1212 1 0 Jan19 ? 00:00:00 /usr/bin/rhsmcertd
root 1217 1 0 Jan19 ? 00:00:21 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
root 1222 1 0 Jan19 ? 00:00:03 /usr/sbin/automount --pid-file /run/autofs.pid
root 1228 1 0 Jan19 ? 00:00:08 sendmail: accepting connections
smmsp 1242 1 0 Jan19 ? 00:00:00 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
root 1282 1 0 Jan19 ? 00:00:06 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/fe496e05cfe3194a8d5107f8953f3f81.socket --xlator-option *replicate*.node-uuid=5e05ab22-84b1-479f-b9f1-91cc9b679f84
root 1293 1 0 Jan19 ? 00:17:19 /usr/sbin/glusterfsd -s nagios001 --volfile-id gv0.nagios001.mnt-data-brick -p /var/lib/glusterd/vols/gv0/run/nagios001-mnt-data-brick.pid -S /var/run/gluster/9786b85d10790bebf4efafab1dd6647e.socket --brick-name /mnt/data/brick -l /var/log/glusterfs/bricks/mnt-data-brick.log --xlator-option *-posix.glusterd-uuid=5e05ab22-84b1-479f-b9f1-91cc9b679f84 --brick-port 49153 --xlator-option gv0-server.listen-port=49153
root 1327 1 0 Jan19 ? 00:27:55 /usr/sbin/glusterfs --volfile-server=nagios001 --volfile-id=/gv0 /mnt/nagios_data
root 1404 1 0 Jan19 ? 00:00:00 /usr/sbin/crond -n
root 1433 1 0 Jan19 tty1 00:00:00 /sbin/agetty --noclear tty1 linux
root 2415 1204 0 Jan19 ? 00:00:00 sshd: XXX [priv]
g706314 2420 2415 0 Jan19 ? 00:00:01 sshd: XXX@pts/0
g706314 2421 2420 0 Jan19 pts/0 00:00:00 -bash
root 2447 2421 0 Jan19 pts/0 00:00:00 sudo -i
root 2448 2447 0 Jan19 pts/0 00:00:00 -bash
root 2505 1 0 Jan19 ? 00:40:54 corosync
root 2524 1 0 Jan19 ? 00:00:16 /usr/sbin/pacemakerd -f
haclust+ 2525 2524 0 Jan19 ? 00:00:18 /usr/libexec/pacemaker/cib
root 2526 2524 0 Jan19 ? 00:00:15 /usr/libexec/pacemaker/stonithd
root 2527 2524 0 Jan19 ? 00:00:41 /usr/libexec/pacemaker/lrmd
haclust+ 2528 2524 0 Jan19 ? 00:00:15 /usr/libexec/pacemaker/attrd
haclust+ 2529 2524 0 Jan19 ? 00:00:13 /usr/libexec/pacemaker/pengine
haclust+ 2530 2524 0 Jan19 ? 00:00:22 /usr/libexec/pacemaker/crmd
root 2997 1 0 Jan19 ? 00:00:16 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid
nagios 3028 1 0 Jan19 ? 00:19:30 /usr/sbin/nagios -d /etc/nagios/nagios.cfg
root 4517 2 0 10:49 ? 00:00:00 [kworker/2:1]
root 5128 2 0 10:50 ? 00:00:00 [kworker/3:1]
root 5594 2 0 10:51 ? 00:00:00 [kworker/0:2]
root 5767 1204 0 10:52 ? 00:00:00 sshd: XXX [priv]
g706315 6412 5767 0 10:52 ? 00:00:00 sshd: XXX@pts/1
g706315 6413 6412 0 10:52 pts/1 00:00:00 -bash
root 6475 6413 0 10:52 pts/1 00:00:00 sudo -i
root 6476 6475 0 10:52 pts/1 00:00:00 -bash
root 6849 2448 0 10:52 pts/0 00:00:00 ps -ef
apache 7080 2997 0 09:35 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid
apache 7081 2997 0 09:35 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid
apache 7082 2997 0 09:35 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid
apache 7951 2997 0 09:36 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid
apache 7994 2997 0 09:36 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid
apache 9282 2997 0 09:38 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid
root 10040 2 0 10:34 ? 00:00:00 [kworker/2:0]
root 11225 2 0 10:34 ? 00:00:00 [kworker/3:0]
root 11671 2 0 10:35 ? 00:00:00 [kworker/0:1]
apache 15003 2997 0 09:39 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid
root 18697 2 0 Jan21 ? 00:00:35 [kworker/1:0]
root 20331 2 0 10:40 ? 00:00:00 [kworker/3:2]
root 21018 2 0 10:41 ? 00:00:00 [kworker/0:0]
apache 21719 2997 0 09:44 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid
apache 23923 2997 0 09:27 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid
root 24540 2 0 10:44 ? 00:00:00 [kworker/u8:0]
root 27779 2 0 09:49 ? 00:00:00 [kworker/u8:2]
root 28143 2 0 10:44 ? 00:00:00 [kworker/2:2]
root 29450 2 0 10:46 ? 00:00:00 [kworker/1:1]
root 30456 1204 0 09:11 ? 00:00:00 sshd: XXX [priv]
g705992 30499 30456 0 09:11 ? 00:00:00 sshd: XXX@pts/2
g705992 30500 30499 0 09:11 pts/2 00:00:00 -bash
root 30933 30500 0 09:12 pts/2 00:00:00 sudo -i
root 30934 30933 0 09:12 pts/2 00:00:00 -bash
apache 31474 2997 0 09:30 ? 00:00:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run//httpd.pid
Re: Nagios showing error 500 sometimes
Posted: Fri Jan 22, 2016 3:21 pm
by tmcdonald
Are your configuration/log files stored on a network drive? It's possible there is some disconnect periodically that makes the CGI unable to correctly determine permissions.
Re: Nagios showing error 500 sometimes
Posted: Mon Jan 25, 2016 2:06 am
by zulu42
Yes, the nagios.log and the status.dat get currently written to a glusterfs-storage in a master/master (active/active) configuration.
Also the check-configuration, host-configuration is stored on this glusterfs.
I have attached a second disk to the server, where the glusterfs is running.
Thanks for the hint. I'll move the status.dat to its original folder and see what happens. If this doesn't solve the problem, I'll also move the nagios.log to its original folder and I'll keep you updated.
Re: Nagios showing error 500 sometimes
Posted: Mon Jan 25, 2016 10:40 am
by rkennedy
Sounds good, let us know the result.
Re: Nagios showing error 500 sometimes
Posted: Thu Jan 28, 2016 2:22 am
by zulu42
Thanks again for the hint.
The glusterfs really caused the problem.
I moved the status.dat-file to its original folder on the root-disk and this already solved the problem.
I also read a post, that you really should not integrate a glusterfs into updatedb. Maybe this is also related to it.
Anyway I made a short test just because I'm interested in it:
I put the status.dat-file on an NFS-share, which works perfectly fine(, so maybe if someone plans to do this....).
Thanks again to all people who read this topic and commented on it (and helped me).
This topic can be closed now.
Re: Nagios showing error 500 sometimes
Posted: Thu Jan 28, 2016 10:54 am
by hsmith
The load looks a lot healthier, are you still running into issues?
Re: Nagios showing error 500 sometimes
Posted: Fri Jan 29, 2016 2:01 am
by zulu42
I don't have any issues since I moved the status.dat-file to the root-disk again.
Re: Nagios showing error 500 sometimes
Posted: Fri Jan 29, 2016 11:20 am
by hsmith
Good to hear. Thanks for sharing what the problem was. I'll go ahead and close this thread and mark it 'resolved'.