Page 1 of 2

Nagios core CLI is working but Nagios web console is not.

PostPosted: Wed Dec 19, 2018 2:32 am
by satya
Hi Team,

I hope everyone is warm and energetic in this winter :-)

It's a weird issue that I am facing.

Our Nagios core CLI is working fine but our Nagios web console is not working properly (sometimes it works and sometimes it's not -- it's happening automatically :roll: )

I tried a few steps to rectify the issue:

1) Checking logs of Nagios and apache2

Nagios logs

tail -f nagios.log
[1545198312] wproc: stdout line 79: 250 2.1.5 <nagios@ip-172-31-32-238.ap-south-1.compute.internal>... Recipient ok
[1545198312] wproc: stdout line 80: 354 Enter mail, end with "." on a line by itself
[1545198312] wproc: stdout line 81: >>> .
[1545198312] wproc: stdout line 82: 050 <nagios@ip-172-31-32-238.ap-south-1.compute.internal>... Connecting to local...
[1545198312] wproc: stdout line 83: 050 <nagios@ip-172-31-32-238.ap-south-1.compute.internal>... Sent
[1545198312] wproc: stdout line 84: 250 2.0.0 wBJ5j6iv006498 Message accepted for delivery
[1545198312] wproc: stdout line 85: nagios... Sent (wBJ5j6iv006498 Message accepted for delivery)
[1545198312] wproc: stdout line 86: Closing connection to [127.0.0.1]
[1545198312] wproc: stdout line 87: >>> QUIT
[1545198312] wproc: stdout line 88: 221 2.0.0 ip-172-31-32-238.ap-south-1.compute.internal closing connection

Apache2 error logs

tail -f error.log
[Wed Dec 19 06:25:01.490007 2018] [mpm_prefork:notice] [pid 3872] AH00163: Apache/2.4.18 (Ubuntu) configured -- resuming normal operations
[Wed Dec 19 06:25:01.490031 2018] [core:notice] [pid 3872] AH00094: Command line: '/usr/sbin/apache2'

Apache2 access logs

tail -f access.log
127.0.0.1 - - [19/Dec/2018:06:32:06 +0000] "GET / HTTP/1.0" 200 11595 "-" "check_http/v2.2.1 (nagios-plugins 2.2.1)"
127.0.0.1 - - [19/Dec/2018:06:37:06 +0000] "GET / HTTP/1.0" 200 11595 "-" "check_http/v2.2.1 (nagios-plugins 2.2.1)"
127.0.0.1 - - [19/Dec/2018:06:42:06 +0000] "GET / HTTP/1.0" 200 11595 "-" "check_http/v2.2.1 (nagios-plugins 2.2.1)"
127.0.0.1 - - [19/Dec/2018:06:47:06 +0000] "GET / HTTP/1.0" 200 11595 "-" "check_http/v2.2.1 (nagios-plugins 2.2.1)"
192.168.5.224 - - [19/Dec/2018:06:50:25 +0000] "GET /nagios/cgi-bin/status.cgi?host=all&servicestatustypes=28 HTTP/1.1" 401 748 "http://nagios.lendingkart.com/nagios/cgi-bin/status.cgi?host=all&servicestatustypes=28" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
192.168.5.224 - - [19/Dec/2018:06:50:35 +0000] "GET /nagios/cgi-bin/status.cgi?hostgroup=all&style=hostdetail&hoststatustypes=12 HTTP/1.1" 401 748 "http://nagios.lendingkart.com/nagios/cgi-bin/status.cgi?hostgroup=all&style=hostdetail&hoststatustypes=12" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
192.168.5.224 - - [19/Dec/2018:06:50:45 +0000] "GET /nagios/cgi-bin/outages.cgi HTTP/1.1" 401 748 "http://nagios.lendingkart.com/nagios/cgi-bin/outages.cgi" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
127.0.0.1 - - [19/Dec/2018:06:52:06 +0000] "GET / HTTP/1.0" 200 11595 "-" "check_http/v2.2.1 (nagios-plugins 2.2.1)"
127.0.0.1 - - [19/Dec/2018:06:57:06 +0000] "GET / HTTP/1.0" 200 11595 "-" "check_http/v2.2.1 (nagios-plugins 2.2.1)"
127.0.0.1 - - [19/Dec/2018:07:02:06 +0000] "GET / HTTP/1.0" 200 11595 "-" "check_http/v2.2.1 (nagios-plugins 2.2.1)"

2) I tried checking apache2 and Nagios service were running or not while the Nagios UI was not working

Ans is yes, it was working fine

$sudo service apache2 status
● apache2.service - LSB: Apache2 web server
Loaded: loaded (/etc/init.d/apache2; bad; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: active (running) since Tue 2018-12-18 12:40:11 UTC; 17h ago
Docs: man:systemd-sysv-generator(8)
Process: 3754 ExecStop=/etc/init.d/apache2 stop (code=exited, status=0/SUCCESS)
Process: 29913 ExecReload=/etc/init.d/apache2 reload (code=exited, status=0/SUCCESS)
Process: 3855 ExecStart=/etc/init.d/apache2 start (code=exited, status=0/SUCCESS)
Tasks: 11
Memory: 41.5M
CPU: 2.989s
CGroup: /system.slice/apache2.service
├─ 3872 /usr/sbin/apache2 -k start
├─ 3875 /usr/sbin/apache2 -k start
├─ 3876 /usr/sbin/apache2 -k start
├─ 3877 /usr/sbin/apache2 -k start
├─ 3879 /usr/sbin/apache2 -k start
├─ 8393 /usr/sbin/apache2 -k start
├─ 8449 /usr/sbin/apache2 -k start
├─ 8462 /usr/sbin/apache2 -k start
├─ 8463 /usr/sbin/apache2 -k start
├─ 8464 /usr/sbin/apache2 -k start
└─13687 /usr/sbin/apache2 -k start


$sudo service nagios status
● nagios.service - Nagios
Loaded: loaded (/etc/systemd/system/nagios.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2018-12-18 12:40:47 UTC; 18h ago
Main PID: 4017 (nagios)
Tasks: 8
Memory: 41.4M
CPU: 4min 10.724s
CGroup: /system.slice/nagios.service
├─ 4017 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
├─ 4018 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─ 4019 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─ 4020 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─ 4021 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
├─ 4024 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
├─15379 /usr/local/nagios/libexec/check_ping -H 172.31.32.115 -w 3000.0,80% -c 5000.0,100% -p 5
└─15380 /bin/ping -n -U -W 30 -c 5 172.31.32.115

Dec 19 07:04:17 ip-172-31-32-238 check_nrpe[14527]: Remote 172.31.33.151 does not support Version 3 Packets
Dec 19 07:04:17 ip-172-31-32-238 check_nrpe[14527]: Remote 172.31.33.151 accepted a Version 2 Packet
Dec 19 07:05:07 ip-172-31-32-238 check_nrpe[14603]: Remote 172.31.27.83 does not support Version 3 Packets
Dec 19 07:05:07 ip-172-31-32-238 check_nrpe[14603]: Remote 172.31.27.83 accepted a Version 2 Packet
Dec 19 07:05:10 ip-172-31-32-238 check_nrpe[14604]: Remote 172.31.33.204 does not support Version 3 Packets
Dec 19 07:05:10 ip-172-31-32-238 check_nrpe[14604]: Remote 172.31.33.204 accepted a Version 2 Packet
Dec 19 07:05:14 ip-172-31-32-238 check_nrpe[14613]: Remote 172.31.33.151 does not support Version 3 Packets
Dec 19 07:05:14 ip-172-31-32-238 check_nrpe[14613]: Remote 172.31.33.151 accepted a Version 2 Packet
Dec 19 07:08:02 ip-172-31-32-238 check_nrpe[15272]: Remote 172.31.33.151 does not support Version 3 Packets
Dec 19 07:08:02 ip-172-31-32-238 check_nrpe[15272]: Remote 172.31.33.151 accepted a Version 2 Packet

$sudo netstat -nlp |grep 80
tcp6 0 0 :::80 :::* LISTEN 3872/apache2


I checked everything is working fine in Nagios, we are getting emails properly but the only problem is Nagios core UI (web interface) is not responding sometimes (sometimes it's working fine)

Note: One more information I want to give you guys to have a deep look into the issue is as below:

I ran this curl command from my local machine (our Nagios is in private subnet and we are using VPN in our organization)


## This is while everyting was running except Nagios core WebUI ##
lendingkart@LK-LP-565:~$ curl -I nagios.lendingkart.com

HTTP/1.1 502 Connection refused
Date: Tue, 18 Dec 2018 14:08:37 GMT
Cache-Control: no-cache
Pragma: no-cache
Content-Type: text/html; charset="UTF-8"
Content-Length: 71993
Via: HTTP/1.1 forward.http.proxy:3128
Connection: close


## This is while everything was running including Nagios core Web UI ##
lendingkart@LK-LP-565:~$ curl -I nagios.lendingkart.com

HTTP/1.1 200 OK
Date: Tue, 18 Dec 2018 14:20:00 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Mon, 25 Dec 2017 18:09:35 GMT
ETag: "2c39-5612e1196dfcf"
Accept-Ranges: bytes
Content-Length: 11321
Vary: Accept-Encoding
Keep-Alive: timeout=5, max=100
Content-Type: text/html
Via: HTTP/1.1 forward.http.proxy:3128
Connection: keep-alive

Can someone please help us in solving this

Re: Nagios core CLI is working but Nagios web console is not

PostPosted: Wed Dec 19, 2018 5:53 pm
by npolovenko
@satya, By non-working web console do you mean its freezing or its a blank white page with nothing on it?

There should be more errors in the apache error log. Next time when Nagios freezes please send in the whole
/var/log/httpd/error.log

Also, try increasing the memory_limit in /etc/php.ini and then restart apache with service httpd restart.

How much memory does this core server have? How many host and service checks? Are there any third party applications on this server, such as an antivirus scanner?

Re: Nagios core CLI is working but Nagios web console is not

PostPosted: Fri Dec 21, 2018 1:17 am
by satya
Hi Nopolovenko,

Thanks for the quick response,

There is no location as "/var/log/httpd/error.log"

Please find the below error logs in /var/log/apache2/error.log

ubuntu@ip-172-31-32-238:/var/log/apache2$ tail -f error.log
[Thu Dec 20 06:25:01.868953 2018] [mpm_prefork:notice] [pid 15461] AH00163: Apache/2.4.18 (Ubuntu) configured -- resuming normal operations
[Thu Dec 20 06:25:01.868976 2018] [core:notice] [pid 15461] AH00094: Command line: '/usr/sbin/apache2'
[Thu Dec 20 06:30:50.092645 2018] [auth_basic:error] [pid 22241] [client 192.168.5.224:49924] AH01617: user nagiosadmin: authentication failure for "/nagios/cgi-bin/outages.cgi": Password Mismatch, referer: http://nagios.lendingkart.com/nagios/cg ... utages.cgi
[Thu Dec 20 06:30:50.093094 2018] [auth_basic:error] [pid 22402] [client 192.168.5.224:49925] AH01617: user nagiosadmin: authentication failure for "/nagios/cgi-bin/status.cgi": Password Mismatch, referer: http://nagios.lendingkart.com/nagios/cg ... ustypes=28
[Thu Dec 20 06:30:50.093666 2018] [auth_basic:error] [pid 22238] [client 192.168.5.224:49926] AH01617: user nagiosadmin: authentication failure for "/nagios/cgi-bin/status.cgi": Password Mismatch, referer: http://nagios.lendingkart.com/nagios/cg ... ustypes=12
^C
ubuntu@ip-172-31-32-238:/var/log/apache2$ tail -f error.log.1
[Wed Dec 19 06:25:01.490007 2018] [mpm_prefork:notice] [pid 3872] AH00163: Apache/2.4.18 (Ubuntu) configured -- resuming normal operations
[Wed Dec 19 06:25:01.490031 2018] [core:notice] [pid 3872] AH00094: Command line: '/usr/sbin/apache2'
[Thu Dec 20 05:16:45.836222 2018] [mpm_prefork:notice] [pid 3872] AH00169: caught SIGTERM, shutting down
[Thu Dec 20 05:16:46.884379 2018] [mpm_prefork:notice] [pid 15461] AH00163: Apache/2.4.18 (Ubuntu) configured -- resuming normal operations
[Thu Dec 20 05:16:46.884423 2018] [core:notice] [pid 15461] AH00094: Command line: '/usr/sbin/apache2'
[Thu Dec 20 06:25:01.758714 2018] [mpm_prefork:notice] [pid 15461] AH00171: Graceful restart requested, doing restart
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1. Set the 'ServerName' directive globally to suppress this message


In which php.ini I need to increase the memory_limit?

$ubuntu@ip-172-31-32-238:/$ sudo find -name php.ini
./etc/php/7.0/apache2/php.ini
./etc/php/7.0/cli/php.ini

Number of processors in the Machine

$ nproc
1

Memory in the Machine

$ free -m
total used free shared buff/cache available
Mem: 1998 351 249 29 1396 1406
Swap: 0 0 0


No firewall enabled

Whenever the Nagios web console is not working. It's totally blank.

Re: Nagios core CLI is working but Nagios web console is not

PostPosted: Wed Dec 26, 2018 6:21 am
by satya
Hi Nopolovenko,

Any update on this?

Regards,
Satya

Re: Nagios core CLI is working but Nagios web console is not

PostPosted: Wed Dec 26, 2018 1:52 pm
by npolovenko
@satya, Sorry, we were closed during holidays. This line is interesting:
user nagiosadmin: authentication failure for "/nagios/cgi-bin/outages.cgi": Password Mismatch

Are you using LDAP for users authentication? Does the web interface stop working after a certain amount of time or one time it works and another it doesn't?
Can you upload the nagios.conf (the apache configuration file)? On my server it's located in:
/etc/httpd/conf.d/nagios.cfg

On ubuntu it may be in a slightly different directory.

How long ago did you upgrade to php7? Did this problem start after you upgraded the php?

Re: Nagios core CLI is working but Nagios web console is not

PostPosted: Wed Jan 02, 2019 1:26 am
by satya
Hi Npolovenko,

Sorry, I was on vacation.

* There was a typo while entering the password.
* We don't use any LDAP for user authentication.
* The web interface of Nagios is like, it's working for around 6-7 hrs and then it stops automatically for few hours and will start again.

Kindly find the attached Nagios configuration file.

Regards,
Satya

Re: Nagios core CLI is working but Nagios web console is not

PostPosted: Wed Jan 02, 2019 1:55 pm
by npolovenko
Hi, @satya. I highly recommend increasing the memory(RAM) on this Core server from 2Gb to at least 4Gb.
Next, please increase the memory_limit in the /etc/php/7.0/apache2/php.ini file and restart the apache with:
service httpd restart

When/IF the GUI goes blank again please send me a fresh copy of the error log as well as the output of
ps -ef

Re: Nagios core CLI is working but Nagios web console is not

PostPosted: Thu Jan 03, 2019 6:50 am
by satya
Hi Npolovenko,

There you go with the requested file.

memory_limit has been changed to 500M from 128M in /etc/php/7.0/apache2/php.ini location and restarted apache2

Regards,
Satya

Re: Nagios core CLI is working but Nagios web console is not

PostPosted: Thu Jan 03, 2019 5:39 pm
by ssax
Please send me the entire output of this command when it is working AND when it is not working:

Code: Select all
curl -L nagios.lendingkart.com -vvv


Is the system that you're running that curl command from going through a proxy?

What about if you run this command when it's not working from the XI server command line?

Code: Select all
curl -L 127.0.0.1 -vvv

Re: Nagios core CLI is working but Nagios web console is not

PostPosted: Sun Jan 06, 2019 11:44 pm
by satya
Hi ssax,

Please find the attached files as you requested.

I ran this commad from my local system
nagios-when-it's-working.txt
(866 Bytes) Downloaded 130 times


I ran this command from my local system
ngios-when-it's-not-working.txt
(70.85 KiB) Downloaded 141 times


I ran this command from within the nagios system


Please do let me know for any clarification

Regards,
Satya