Status critical

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
User avatar
Rovendra
Posts: 9
Joined: Thu Aug 17, 2017 1:18 pm

Status critical

Post by Rovendra »

Hi all,

I'm new to nagios, so I'm sorry if it's a newbie question :D :D . I've got nagios running to monitor some services and one of them is a haproxy service. I had internet issues yesterday and my servers were down for a couple hours. The problem is that after everything is back online and working the monitor of the service won't recover from critical state (the haproxy is up and running).
Any suggestions on what is causing this issue?

Thanks!
Attachments
haproxy.PNG
haproxy.PNG (4.78 KiB) Viewed 5278 times
bolson

Re: Status critical

Post by bolson »

Hello,

Can you run the haproxy check command from the command line on the server and pust the result?

Thank you!
User avatar
Rovendra
Posts: 9
Joined: Thu Aug 17, 2017 1:18 pm

Re: Status critical

Post by Rovendra »

This is the check command run directly on the nagios server:
Attachments
check.PNG
bolson

Re: Status critical

Post by bolson »

When you force a check from the web gui what do you get?
User avatar
Rovendra
Posts: 9
Joined: Thu Aug 17, 2017 1:18 pm

Re: Status critical

Post by Rovendra »

Ok guys

Just to give an update on what happened since I had this problem. I've finally was able to figure out that the message is critical because I have more then 1 haproxy running on the server. That's why the answer is 2 and not 1, but that still means it's ok. Now I have two more questions ... the first is why the hell haproxy is spawning more then one process since it didn't do that before (the same is happening to my apache ... it's spawning 8 process and it didn't do it before). The second question is why the process is getting critical on the interface since the snmp check i do is:

check_snmp -o 1.3.6.1.4.1.2021.2.1.5.6 -C STGen2016 -r [1-9][0-9]* <server>

and the regex part is supposed to let nagios know that any number of processes is normal.

Any suggestions? And thanks in advace.
User avatar
Rovendra
Posts: 9
Joined: Thu Aug 17, 2017 1:18 pm

Re: Status critical

Post by Rovendra »

Ok guys, I've found out more information. Looking into old pictures of nagios I've found that spawning 2 processes in haproxy and 7 in apache are supposed to be normal. So I'm guessing something changed in nagios (I don't know how that is possible since nobody touched this server) that is now treating multiple processes as a critical status. The strange fact still remains that a status check in the command line returns ok while the interface list their status as critical.

Obs: we use nagiosQL.

I appreciate any help.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Status critical

Post by tgriep »

Try running these commands to stop and start the Nagios Daemon.

Code: Select all

service nagios stop
killall -9 nagios
service nagios start
Logout of the GUI and log back in and see if the status is updated at the next check.

If not, run the following command and post the output

Code: Select all

ps -ef --cols=300
And, open the status.dat file, find that service entry and post it here as well.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
Rovendra
Posts: 9
Joined: Thu Aug 17, 2017 1:18 pm

Re: Status critical

Post by Rovendra »

This is the output of the 'ps -ef --cols=300' command:

Code: Select all

[root@help /]# ps -ef --cols=300
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Apr05 ?        00:00:52 /sbin/init
root         2     0  0 Apr05 ?        00:00:00 [kthreadd]
root         3     2  0 Apr05 ?        00:01:31 [migration/0]
root         4     2  0 Apr05 ?        00:01:17 [ksoftirqd/0]
root         5     2  0 Apr05 ?        00:00:00 [stopper/0]
root         6     2  0 Apr05 ?        00:00:11 [watchdog/0]
root         7     2  0 Apr05 ?        00:00:49 [migration/1]
root         8     2  0 Apr05 ?        00:00:00 [stopper/1]
root         9     2  0 Apr05 ?        00:00:25 [ksoftirqd/1]
root        10     2  0 Apr05 ?        00:00:09 [watchdog/1]
root        11     2  0 Apr05 ?        00:07:01 [events/0]
root        12     2  0 Apr05 ?        01:03:40 [events/1]
root        13     2  0 Apr05 ?        00:00:00 [cgroup]
root        14     2  0 Apr05 ?        00:00:00 [khelper]
root        15     2  0 Apr05 ?        00:00:00 [netns]
root        16     2  0 Apr05 ?        00:00:00 [async/mgr]
root        17     2  0 Apr05 ?        00:00:00 [pm]
root        18     2  0 Apr05 ?        00:00:27 [sync_supers]
root        19     2  0 Apr05 ?        00:00:41 [bdi-default]
root        20     2  0 Apr05 ?        00:00:00 [kintegrityd/0]
root        21     2  0 Apr05 ?        00:00:00 [kintegrityd/1]
root        22     2  0 Apr05 ?        00:09:28 [kblockd/0]
root        23     2  0 Apr05 ?        00:09:50 [kblockd/1]
root        24     2  0 Apr05 ?        00:00:00 [kacpid]
root        25     2  0 Apr05 ?        00:00:00 [kacpi_notify]
root        26     2  0 Apr05 ?        00:00:00 [kacpi_hotplug]
root        27     2  0 Apr05 ?        00:00:00 [ata_aux]
root        28     2  0 Apr05 ?        00:00:00 [ata_sff/0]
root        29     2  0 Apr05 ?        00:00:00 [ata_sff/1]
root        30     2  0 Apr05 ?        00:00:00 [ksuspend_usbd]
root        31     2  0 Apr05 ?        00:00:00 [khubd]
root        32     2  0 Apr05 ?        00:00:00 [kseriod]
root        33     2  0 Apr05 ?        00:00:00 [md/0]
root        34     2  0 Apr05 ?        00:00:00 [md/1]
root        35     2  0 Apr05 ?        00:00:00 [md_misc/0]
root        36     2  0 Apr05 ?        00:00:00 [md_misc/1]
root        37     2  0 Apr05 ?        00:00:00 [linkwatch]
root        39     2  0 Apr05 ?        00:00:03 [khungtaskd]
root        40     2  0 Apr05 ?        00:03:00 [kswapd0]
root        41     2  0 Apr05 ?        00:00:00 [ksmd]
root        42     2  0 Apr05 ?        00:02:23 [khugepaged]
root        43     2  0 Apr05 ?        00:00:00 [aio/0]
root        44     2  0 Apr05 ?        00:00:00 [aio/1]
root        45     2  0 Apr05 ?        00:00:00 [crypto/0]
root        46     2  0 Apr05 ?        00:00:00 [crypto/1]
root        54     2  0 Apr05 ?        00:00:00 [kthrotld/0]
root        55     2  0 Apr05 ?        00:00:00 [kthrotld/1]
root        56     2  0 Apr05 ?        00:00:00 [pciehpd]
root        58     2  0 Apr05 ?        00:00:00 [kpsmoused]
root        59     2  0 Apr05 ?        00:00:00 [usbhid_resumer]
root        60     2  0 Apr05 ?        00:00:00 [deferwq]
root        92     2  0 Apr05 ?        00:00:00 [kdmremove]
root        93     2  0 Apr05 ?        00:00:00 [kstriped]
root       170     2  0 Apr05 ?        00:00:00 [scsi_eh_0]
root       171     2  0 Apr05 ?        00:00:00 [scsi_eh_1]
root       177     2  0 Apr05 ?        00:03:38 [mpt_poll_0]
root       178     2  0 Apr05 ?        00:00:00 [mpt/0]
root       179     2  0 Apr05 ?        00:00:00 [scsi_eh_2]
root       319     2  0 Apr05 ?        00:00:00 [kdmflush]
root       321     2  0 Apr05 ?        00:00:00 [kdmflush]
root       338     2  0 Apr05 ?        00:58:12 [jbd2/dm-0-8]
root       339     2  0 Apr05 ?        00:00:00 [ext4-dio-unwrit]
root       427     1  0 Apr05 ?        00:00:00 /sbin/udevd -d
ntp        630     1  0 Apr14 ?        00:00:27 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
root       640     2  0 Apr05 ?        00:02:55 [vmmemctl]
root       769     2  0 Apr05 ?        00:00:00 [jbd2/sda1-8]
root       770     2  0 Apr05 ?        00:00:00 [ext4-dio-unwrit]
root       807     2  0 Apr05 ?        00:00:22 [kauditd]
root       900     2  0 Apr05 ?        00:15:12 [flush-253:0]
root      1186     1  0 Apr05 ?        01:39:41 /usr/sbin/vmtoolsd
root      1213     1  0 Apr05 ?        00:00:00 /usr/lib/vmware-vgauth/VGAuthService -s
root      1302     1  0 Apr05 ?        00:00:28 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient-eth0.leases -pf /var/run/dhclient-eth0.pid eth0
root      1362     1  0 Apr05 ?        00:01:08 auditd
root      1392     1  0 Apr05 ?        00:00:35 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5
named     1417     1  0 Apr05 ?        00:00:48 /usr/sbin/named -u named
root      1469     1  0 Apr05 ?        00:00:44 /usr/sbin/sshd
root      1513     1  0 Apr05 ?        00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql     1618  1513  0 Apr05 ?        19:01:00 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
root      1711     1  0 Apr05 ?        00:00:50 /usr/libexec/postfix/master
postfix   1721  1711  0 Apr05 ?        00:00:22 qmgr -l -t fifo -u
root      1722     1  0 Apr05 ?        00:06:30 /usr/sbin/httpd
root      1732     1  0 Apr05 ?        00:01:34 crond
nagios    1788     1  0 Apr05 ?        00:47:41 /usr/local/pnp4nagios/bin/npcd -d -f /usr/local/pnp4nagios/etc/npcd.cfg
root      1798     1  0 Apr05 tty2     00:00:00 /sbin/mingetty /dev/tty2
root      1800     1  0 Apr05 tty3     00:00:00 /sbin/mingetty /dev/tty3
root      1802     1  0 Apr05 tty4     00:00:00 /sbin/mingetty /dev/tty4
root      1804     1  0 Apr05 tty5     00:00:00 /sbin/mingetty /dev/tty5
root      1806     1  0 Apr05 tty6     00:00:00 /sbin/mingetty /dev/tty6
root      1813   427  0 Apr05 ?        00:00:00 /sbin/udevd -d
root      1814   427  0 Apr05 ?        00:00:00 /sbin/udevd -d
root      2835     1  0 Apr06 tty1     00:00:00 /sbin/mingetty /dev/tty1
postfix   4826  1711  0 14:45 ?        00:00:00 pickup -l -t fifo -u
apache    4983  1722  0 05:35 ?        00:00:02 /usr/sbin/httpd
apache    5092  1722  0 05:36 ?        00:00:02 /usr/sbin/httpd
apache    5095  1722  0 05:36 ?        00:00:03 /usr/sbin/httpd
apache    5096  1722  0 05:36 ?        00:00:02 /usr/sbin/httpd
apache    5785  1722  0 05:42 ?        00:00:02 /usr/sbin/httpd
nagios    6081     1  0 14:56 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    6083  6081  0 14:56 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    6084  6081  0 14:56 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    6085  6081  0 14:56 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    6086  6081  0 14:56 ?        00:00:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    6087  6081  0 14:56 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root      6348 19053  0 14:58 pts/1    00:00:00 vi status.dat
nagios    7518  6084  0 15:07 ?        00:00:00 /usr/local/nagios/libexec/check_ping -H 177.71.17.71 -w 1000.0,80% -c 2000.0,100% -p 5 -4
nagios    7519  7518  0 15:07 ?        00:00:00 /bin/ping -n -U -w 15 -c 5 177.71.17.71
nagios    7534  6086  0 15:07 ?        00:00:00 /usr/bin/perl -w /usr/local/nagios/libexec/check_snmp_oid -H <server> -p 1161 -o 1.3.6.1.4.1.42.2.145.3.163.1.1.2.11.0 -C STGen2016
root      7535 29515  0 15:07 pts/0    00:00:00 ps -ef --cols=300
apache    8631  1722  0 06:07 ?        00:00:03 /usr/sbin/httpd
apache    9425  1722  0 10:52 ?        00:00:01 /usr/sbin/httpd
apache   10037  1722  0 06:19 ?        00:00:02 /usr/sbin/httpd
apache   10575  1722  0 06:23 ?        00:00:03 /usr/sbin/httpd
apache   11829  1722  0 11:11 ?        00:00:01 /usr/sbin/httpd
root     19036  1469  0 12:09 ?        00:00:00 sshd: root@pts/1
root     19053 19036  0 12:09 pts/1    00:00:00 -bash
root     23837  1469  0 08:17 ?        00:00:01 sshd: root@pts/0
root     23855 23837  0 08:17 pts/0    00:00:00 -bash
root     29463 23855  0 13:39 pts/0    00:00:00 su nagios
nagios   29464 29463  0 13:39 pts/0    00:00:00 bash
root     29507 29464  0 13:39 pts/0    00:00:00 su root
root     29515 29507  0 13:39 pts/0    00:00:00 bash
Here is the output of the status.dat file for that specific service:
nagios-status-dat2.png
Here is another picture of the command executed on the command line:
nagios-command-line.png
nagios-command-line.png (4.5 KiB) Viewed 5128 times
I'm still puzzled as to why the results are different in the command line and interface. Thanks in advance for the help.
User avatar
lmiltchev
Former Nagios Staff
Posts: 13587
Joined: Mon May 23, 2011 12:15 pm

Re: Status critical

Post by lmiltchev »

Have you tried forcing a check from the web gui by clicking on the "Re-schedule the next check of this service" link under the "Service Commands" window? Did the status change?

Can you post the config of the "haproxy process" service?
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
Rovendra
Posts: 9
Joined: Thu Aug 17, 2017 1:18 pm

Re: Status critical

Post by Rovendra »

Hi lmiltchev,

I've tried a force check and nothing changed. I've tried debbuging and it's showing critical status in the logs as well.

Here is the haproxy service configuration file:
haproxy_service_config.PNG
Thanks in advance.
Locked