BUG: NCPA or NRPE Agent not returning consistent data

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
User avatar
eloyd
Cool Title Here
Posts: 2129
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

BUG: NCPA or NRPE Agent not returning consistent data

Post by eloyd »

Using memory check as an example, and forcing checks of an NCPA check and an NRPE check gives the following:

NCPA agent returns:

Code: Select all

WARNING: percent was 77%
NRPE agent returns

Code: Select all

WARNING - 122 / 1006 MB (12%) Free Memory, Used: 989 MB, Shared: 0 MB, Buffers: 101 MB, Cached: 105 MB
These results are not consistent with each other. In addition, if I log into the box and type "free -m" I get this:

Code: Select all

# free -m
             total       used       free     shared    buffers     cached
Mem:          1006        988         18          0        101        105
-/+ buffers/cache:        781        225
Swap:            0          0          0
% Free memory according to this is less than 2%.

So now I'm scared that our memory usage isn't being properly calculated. And whatever other checks use these technologies.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoydI'm a Nagios Fanatic!
User avatar
hsmith
Agent Smith
Posts: 3539
Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:

Re: BUG: NCPA or NRPE Agent not returning consistent data

Post by hsmith »

Cached is counted as part of the free memory AFAIK. The NCPA one is a bit concerning though..
Former Nagios Employee.
me.
User avatar
eloyd
Cool Title Here
Posts: 2129
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: BUG: NCPA or NRPE Agent not returning consistent data

Post by eloyd »

Working through the new dev NCPA and not seeing anything better, to be honest.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoydI'm a Nagios Fanatic!
jomann
Development Lead
Posts: 611
Joined: Mon Apr 22, 2013 10:06 am
Location: Nagios Enterprises

Re: BUG: NCPA or NRPE Agent not returning consistent data

Post by jomann »

Hey Eric,

If you think there is a bug in NCPA please post it on Github so that I can take a look at it. However, I do believe both of these values are correct. Just remember that NCPA gives the total used percent (via the psutil module in python) and is most likely calculated via amount used - cache - buffer or something along those lines. Calculating the percent used (or free) on Linux is really quite frustrating (see this website) since different places use different types of calculations to include or exclude buffered and cached memory. Since NRPE is using plugins, what plugin is it using to give you the free amount?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
eloyd
Cool Title Here
Posts: 2129
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: BUG: NCPA or NRPE Agent not returning consistent data

Post by eloyd »

Straight out of NRPE base install:

Code: Select all

command[check_mem]=/usr/local/nagios/libexec/custom_check_mem -n $ARG1$
I guess we need to audit all the plugins plus NRPE to see that they're using the same methodologies. If NRPE's generic install says X%, but NCPA's generic install says Y% on the same system, then the two are not interchangeable and will affect the thresholds and templates that we deploy for customers. This is all based on our testing of converting NRPE-based checks to NCPA-based checks in advance of doing this for a few tens of thousands of service checks for customers.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoydI'm a Nagios Fanatic!
User avatar
eloyd
Cool Title Here
Posts: 2129
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: BUG: NCPA or NRPE Agent not returning consistent data

Post by eloyd »

Here's my updates. Yes, I know about the "what is free memory" in Linux problem, so I calculated my own. Using three commands done in quick succession on localhost, I checked memory via NCPA (curl), NRPE (ran the custom_check_mem module) and "free -m" (which is actually what the custom_check_mem is doing). Here's the results:

First, "free -m":
I can calculate that the total memory is 1006MB, there is 512MB available (but partially allocated to cache and buffers) and that there is 961MB total in use, of which 494MB is in use by buffers and cache. This gives me 4.5% completely free memory that is doing nothing, but 50.9% available that can be put to use by draining cache and/or buffers. Only 46.4% of memory is allocated to OS and applications.

NCPA:
It comes back with /api/memory/virtual/percent as 49.2%. This is correct if one measures available free memory. Generally speaking, this is what people think of as "memory that can be used for something" and includes caches and buffers. While /api/memory/virtual/free and /api/memory/virtual/total can be used to calculate a percentage (in this case, 4.4%) of absolutely free memory that is not in use by anything, this is not what "percent" returns. It is returning .../available divided by .../total, instead. This makes intuitive sense, and calculates out as 50.8% in this case. Exactly what it should be if /api/memory/vritual/percent is 49.2% and in line with my "free -m" sample from above.

NRPE (aka "custom_check_mem):
This comes back with 4% free memory. This is correct only if you count memory that is not allocated for anything at all. Meaning, it considers buffers and caches to be in-use memory. This is the same as NCPA's .../free divided by .../total.

Conclusion:
The problem is that both custom_check_mem and NCPA are returning correct values for what they're measuring, but they're measuring different things. NCPA is checking available memory while custom_check_mem is checking free memory. They are not the same thing and they have an order of magnitude difference between them. I like having options, but I think if I change from one standard memory check (say, NRPE's custom_check_mem) to another (like NCPA's check) that I would expect my checks to be the same thing and to report the same numbers. Otherwise, my thresholds, capacity planning, and perf data are going to show weird results.

Is it possible to get percent_free added to NCPA to be able to get the same results as the NRPE check? I'll reference this forum post in my github ticket.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoydI'm a Nagios Fanatic!
User avatar
lmiltchev
Former Nagios Staff
Posts: 13587
Joined: Mon May 23, 2011 12:15 pm

Re: BUG: NCPA or NRPE Agent not returning consistent data

Post by lmiltchev »

Is it possible to get percent_free added to NCPA to be able to get the same results as the NRPE check? I'll reference this forum post in my github ticket.
I just wanted to provide a link to the issue you posted on GitHub, so that other users can read the discussion.
https://github.com/NagiosEnterprises/ncpa/issues/234
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked