Page 1 of 2

Linux mem check wonky since 2014r2.0

Posted: Mon Nov 17, 2014 12:48 pm
by vAJ
Since I upgraded my dev XI instance last week to 2014R2.0, my Linux hosts have been flipped out on memory usage checks.

I'm running NRPE and check_mem. Did something change with the plugins package update that I would need to update something on the NRPE package?

Re: Linux mem check wonky since 2014r2.0

Posted: Mon Nov 17, 2014 1:59 pm
by lmiltchev
Can you show us the actual command run from the command line, along with the output of it?

Re: Linux mem check wonky since 2014r2.0

Posted: Mon Nov 17, 2014 3:07 pm
by vAJ
I knew you needed that...

Code: Select all

COMMAND: /usr/local/nagios/libexec/check_nrpe -H aus02nqrmq008.dev.EZCORP.com -t 30 -c check_mem -a '-w 20 -c 10'
OUTPUT: WARNING - 588 / 3829 MB (15%) Free Memory, Used: 3663 MB, Shared: 0 MB, Buffers: 218 MB, Cached: 422 MB | total=3829MB free=588MB used=3663MB shared=0 buffers=218MB cached=422MB

Re: Linux mem check wonky since 2014r2.0

Posted: Mon Nov 17, 2014 3:39 pm
by lmiltchev
Do you get a different output if you run the check with the "-n" flag?

Code: Select all

/usr/local/nagios/libexec/check_nrpe -H aus02nqrmq008.dev.EZCORP.com -t 30 -c check_mem -a '-w 20 -c 10 -n'

Re: Linux mem check wonky since 2014r2.0

Posted: Mon Nov 17, 2014 3:52 pm
by sreinhardt
Well, according to the check output, used mem(assuming thats red\brown) is displaying at 3663MB, with free of 588MB, and total of 3829MB. While it seems like a harsh and large jump in used memory, the graph does seem to match the plugin. It also does seem very level... which seems strange for memory usage, but maybe not so much when it's used to such a high percentage. Does top or other statistics tools show different usage than what you see here? If so, let's back trace the issue, try running the check locally on the remote machine and see what response you get. I'm certainly not saying the check is returning correctly, but the graphing does not vary from what the check is currently returning. Did you update the plugins on your remote systems, or just the normal XI upgrade?

Re: Linux mem check wonky since 2014r2.0

Posted: Mon Nov 17, 2014 4:06 pm
by vAJ
Yeah. It's not that flat in reality on the box.

Since check_mem on the local monitored host has not been updated, I'm assuming that check_nrpe was modified? in the:
- Updated Nagios Plugins to 2.0.3 -SW
But I don't see that in the Plugins release notes: http://nagios-plugins.org/nagios-plugin ... -released/

So I'm wondering if there is a dependency that was updated?

Re: Linux mem check wonky since 2014r2.0

Posted: Mon Nov 17, 2014 4:50 pm
by vAJ
I'm trying to find the thread from way back when and we had problems with our NRPE package that was check memory usage on a nagios host itself. Something about it not taking the buffers/cache into account.

Re: Linux mem check wonky since 2014r2.0

Posted: Mon Nov 17, 2014 5:31 pm
by sreinhardt
Something about it not taking the buffers/cache into account.
That's what lmiltchev was referencing with the -n flag. It should cause it to ignore cache\swap space. But I would guess based on the numbers, that your likely ignoring that already, unless this is a pretty small system. As for the update, that would not be due to plugins, it would have to be an nrpe change specifically, which might have happened.

EDIT: Definitely no nrpe or linux agent changes. So let's try it with lmiltchev's -n and also run it locally on that remote system once to see output.

Re: Linux mem check wonky since 2014r2.0

Posted: Mon Nov 17, 2014 5:38 pm
by vAJ
Sorry, I only read your reply when I checked in on the thread, missed LM's post.

Updating check results in the same:

Code: Select all

COMMAND: /usr/local/nagios/libexec/check_nrpe -H aus02nqrmq008.dev.EXCORP.com -t 30 -c check_mem -a '-w 20 -c 10 -n'
OUTPUT: WARNING - 583 / 3829 MB (15%) Free Memory, Used: 3670 MB, Shared: 0 MB, Buffers: 218 MB, Cached: 424 MB | total=3829MB free=583MB used=3670MB shared=0 buffers=218MB cached=424MB

Re: Linux mem check wonky since 2014r2.0

Posted: Mon Nov 17, 2014 5:59 pm
by vAJ
You guys are going to LOVE this.

So, I double checked the exact time I upgraded XI and when we see this memory graph change... There was a few hours difference...

Turns out that our SysEngineering team decided to turn on ESX resource pooling for our non-prod environments. There's a whole political mess behind this that I'm not allowed (nor would you want) explained in a public forum.

Windows and linux OS handle this pooling differently in how they present the memory to the OS.

Least to say, looks like I'm removing thresholds on memory for non-prod... Feel free to close thread.