Nagios Support Forum

Posted: **Mon Jan 06, 2020 3:28 pm**

Ah, I forgot that we updated t1 to NCPA 2.2.0. That sum matches up with my Cent6 VM running NCPA 2.2.0, so we should be good there.

According to the devs, NCPA just reads the numbers returned by psutil, and translates them from bytes to kilobytes/megabytes/gigabytes/etc. And psutil gets its information from /proc/meminfo. Can you give me the output of cat /proc/meminfo and vmstat -s from t1?

Also, you mentioned that this was working until you added more memory. Was memory added to all of the servers, or just this one? Do you know if the check was returning properly prior to adding memory, or was it just close enough that no one noticed?

Edit:
One more piece of information. Can you get uname -a from both t1 and r4?

Posted: **Mon Jan 06, 2020 5:16 pm**

we notice the wrong reading info when we added memory because we were getting the memory full alerts, on adding the memory we still got thesame alerts , we have deleted and readded the monitoring still same results. see info requested below

t1 ~]# cat /proc/meminfo
MemTotal: 6353629608 kB
MemFree: 15518496 kB
Buffers: 958328 kB
Cached: 1307809972 kB
SwapCached: 0 kB
Active: 723788824 kB
Inactive: 602394044 kB
Active(anon): 656998028 kB
Inactive(anon): 558292844 kB
Active(file): 66790796 kB
Inactive(file): 44101200 kB
Unevictable: 44 kB
Mlocked: 44 kB
SwapTotal: 16777212 kB
SwapFree: 16777212 kB
Dirty: 4100 kB
Writeback: 0 kB
AnonPages: 27396864 kB
Mapped: 241085764 kB
Shmem: 1188439264 kB
Slab: 8392836 kB
SReclaimable: 7796916 kB
SUnreclaim: 595920 kB
KernelStack: 81520 kB
PageTables: 132319108 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 777672912 kB
Committed_AS: 1236852840 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 13108092 kB
VmallocChunk: 29515609200 kB
HardwareCorrupted: 0 kB
AnonHugePages: 6215680 kB
HugePages_Total: 2359296
HugePages_Free: 247
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 5120 kB
DirectMap2M: 2017280 kB
DirectMap1G: 6440353792 kB

t1 ~]# vmstat -s
6353629696 total memory
6339171840 used memory
722839040 active memory
602395264 inactive memory
14457532 free memory
958332 buffer memory
1307819392 swap cache
16777212 total swap
0 used swap
16777212 free swap
2896995903 non-nice user cpu ticks
35894 nice user cpu ticks
268831966 system cpu ticks
44076884912 idle cpu ticks
447219093 IO-wait cpu ticks
39864 IRQ cpu ticks
19323367 softirq cpu ticks
0 stolen cpu ticks
618628756279 pages paged in
101513455190 pages paged out
0 pages swapped in
0 pages swapped out
3133129834 interrupts
158006976 CPU context switches
1563405840 boot time
78251142 forks

t1 ~]# uname -a
Linux t1.transplace.com 2.6.32-696.13.2.el6.x86_64 #1 SMP Fri Sep 22 12:32:14 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

r4 ~]# uname -a
Linux r4.transplace.com 2.6.32-696.13.2.el6.x86_64 #1 SMP Fri Sep 22 12:32:14 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

Posted: **Tue Jan 07, 2020 1:14 pm**

Alright, we've been digging through the psutils code quite a bit, and it looks like there are multiple paths psutils could take that would give different results for used percent. I will explain more about what we found down below, but for right now I think the best option to move forward is to find a memory plugin that is suitable to your needs, and use that for this server. This one seems like a good candidate.
https://exchange.nagios.org/directory/A ... ck/details

What we've found in psutils is that there are multiple ways in which it can gather information on memory usage, and subsequently, there are a few ways in which it calculates memory usage. The path it chooses for its calculation can be based on what fields are available in /proc/meminfo, which is expected, but also can be based on things like the CPU architecture of the system, and whether the system is an LCX container. Here is a link to the psutils function in question if you're interested in following the code.
https://github.com/giampaolo/psutil/blo ... ux.py#L373

Posted: **Wed Jan 08, 2020 12:28 pm**

i tried the plugin below, it works only for the nagios server itself and not the client, there is no option to check for a remote client
for example -H for hostname or -C for community code and so on

Posted: **Wed Jan 08, 2020 12:55 pm**

the plugin that @mbellerue mentioned should work, what you'll want to do is run the plugin locally, and then trigger it using check_ncpa.

https://www.nagios.org/ncpa/help.php#ac ... tive-check (see the Running Plugins with Arguments section)

Posted: **Wed Jan 08, 2020 1:50 pm**

i am running it locally using the guide you provided but am getting erros about the plugin not being found

the first thing i did was download the plugin and added it to my default libexec file and made it executable
i then ran it with the check_ncpa.py plugin as seen on the article you provided, this is the command i ran

libexec]# ./check_ncpa.py -H t1 -t plut0 -M 'plugins/check_linux_memory.sh' -q "args=-d G,args=-w 20,args=-c 10 -v"
UNKNOWN: The plugin (check_linux_memory.sh) requested does not exist

when i run it without the quoting arguements i get different results

libexec]# ./check_ncpa.py -H t1 -t plut0 -M 'plugins/check_linux_memory.sh' -q -d G -w 20 -c 10 -v
An error occurred:
need more than 1 value to unpack

what am i missing?

Posted: **Wed Jan 08, 2020 6:47 pm**

The first step is to execute the plugin manually on the remote machine to ensure it works and to remove ncpa from the equation. This will ensure you have the correct arguments. Can you perform this and then paste the command executed and the output returned please.

Posted: **Mon Jan 13, 2020 12:05 pm**

thats exactly what i did, see command on remote host

libexec]# ./check_linux_memory.sh -d G -w 20 -c 10 -v
MEMORY OK - 61.1226% Free - Total:9.60493G Active:4.46127G Inactive:2.27097G Buffers:0G Cached:5.06926G |Free=61.1226;20;10;0 Active=4677980;0;0;0 Inactive=2381280;0;0;0 Buffers=0;0;0;0 Cached=5315504;0;0;0

this is what am getting when i use the ncpa check with the plugin

libexec]# ./check_ncpa.py -H t1 -t plut0 -M 'plugins/check_linux_memory.sh' -q "args=-d 'G',args=-w '20',args=-c '10'"
UNKNOWN: The plugin (check_linux_memory.sh) requested does not exist

Posted: **Mon Jan 13, 2020 2:05 pm**

i finally got it to work

i was supposed to copy script into /usr/local/ncpa/plugins on the client hosts for it to work instead of /usr/local/nagios/libexec

libexec]# ./check_ncpa.py -H t1 -t plut0 -M 'plugins/check_linux_memory.sh' -q "args=-d 'G',args=-w '20',args=-c '10',args=-v"
MEMORY OK - 21.022% Free - Total:6059.29G Active:680.659G Inactive:585.039G Buffers:0.946941G Cached:1248.25G |Free=21.022;20;10;0 Active=713722448;0;0;0 Inactive=613458368;0;0;0 Buffers=992940;0;0;0 Cached=1308883352;0;0;0

Posted: **Mon Jan 13, 2020 2:38 pm**

There we go! That looks better! Glad you were able to get this working. Are we calling this good? Should I lock the thread?

Nagios Support Forum

NCPA reading memory wrong for terabite

Re: NCPA reading memory wrong for terabite

Re: NCPA reading memory wrong for terabite

Re: NCPA reading memory wrong for terabite

Re: NCPA reading memory wrong for terabite

Re: NCPA reading memory wrong for terabite

Re: NCPA reading memory wrong for terabite

Re: NCPA reading memory wrong for terabite

Re: NCPA reading memory wrong for terabite

Re: NCPA reading memory wrong for terabite

Re: NCPA reading memory wrong for terabite