Need to monitor all the linux kernel variables as possible

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
patricio.dorantes
Posts: 3
Joined: Tue Aug 13, 2013 10:54 am

Need to monitor all the linux kernel variables as possible

Post by patricio.dorantes »

Hi,

IDK if this is the right place to ask but, fb nagios Community Manager sent me here so, here I go:

I have a complex web environment with reverse proxy, web servers, application servers, and a lot of stuff that has a lot of traffic. I use Nagios to monitor all: Databases, WebServers, Proxy Servers, OS, etc. Recently I got A LOT of traffic on one node (300GiB/day). Everything is going smoothly except that the reverse proxy from time to time is reporting the next error:

Service Warning[08-12-2013 16:05:30] SERVICE ALERT: QROPC2FEDGE06;Web Container;WARNING;SOFT;1;HTTP WARNING: HTTP/1.1 400 Proxy Error: Unable to connect to remote host "2WEB06" or host not responding - URL "http://WEB06/wps/portal/!ut/p/a1/04_Sj9 ... Kd3R09",[b] errno: 111[/b] - 3012 bytes in 3.002 second response time

This is a direct error message from iptables, I know this because I changed the reject-with icmp-host-prohibited by reject-with icmp-port-unreachable. This happens from time to time so is not a misplaced rule. I checked the net.netfilter.nf_conntrack_count vs the net.netfilter.nf_conntrack_max, and it is not exceeded.

So... Could anyone here could tell me some directions on what to monitor?

Thanks in advance!
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Need to monitor all the linux kernel variables as possib

Post by abrist »

Error 400 is usually a bad or malformed request. Do you actually want to monitor kernel variables, http requests and errors, or something else?
Kernel variables would be monitored by checking against /proc and /sys while http requests would require using tcpflow/dump and/or additional GETs.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
patricio.dorantes
Posts: 3
Joined: Tue Aug 13, 2013 10:54 am

Re: Need to monitor all the linux kernel variables as possib

Post by patricio.dorantes »

abrist wrote:Error 400 is usually a bad or malformed request. Do you actually want to monitor kernel variables, http requests and errors, or something else?
Kernel variables would be monitored by checking against /proc and /sys while http requests would require using tcpflow/dump and/or additional GETs.
Thank you for your reply. The 400 is the reply that the proxy replies on backend's error. I would like to monitor kernel variables, I found out that /proc/net/sockstat gives some information about sockets used, and memory usage. I got something like: inuse 25 orphan 8 tw 548 alloc 99 mem 31, so It seems that sockets' memory usage is not the problem.

=) Thank you
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Need to monitor all the linux kernel variables as possib

Post by abrist »

No problem. How what type of usage does the server you are checking see on average? Error 400s can be hard to hunt down, do any other checks to this server, or any of your otehr checks in general fail with a 400 as well? Are both the nagios server and the checked host residing on the same network?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
patricio.dorantes
Posts: 3
Joined: Tue Aug 13, 2013 10:54 am

Re: Need to monitor all the linux kernel variables as possib

Post by patricio.dorantes »

1
abrist wrote:No problem. How what type of usage does the server you are checking see on average? Error 400s can be hard to hunt down, do any other checks to this server, or any of your otehr checks in general fail with a 400 as well? Are both the nagios server and the checked host residing on the same network?
Hi!
Yes both servers reside on the same network, in a matter of fact I have 2 different networks 1 for monitoring a 192.168.0.x and one for data exchange between proxy server and http server. Sorry if i didn't catch up your first question very well, if I understood: The server that reports the 400 has about 2.5k concurrent users, the http server that is behind it has like 150 worker threads busy and 2350 idle... so is not a matter of avilable worker threads. The other nrpe checks doesn't fail, I measure I/O... <1% CPU <39% mem<60%, HDD<65%, Bandwith >1MiBps < 4.5MiBps and NETSTATS, 200<established<1500

It seems like everything is ok :S
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Need to monitor all the linux kernel variables as possib

Post by abrist »

Fair enough. You may want to increase the timeout on the check if it is relatively short. Sound like things are behaving normally for now. We are here to help when you have future problems. Have a good week.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked