Page 3 of 4
Re: NCPA reading memory wrong for terabite
Posted: Mon Jan 06, 2020 3:28 pm
by mbellerue
Ah, I forgot that we updated t1 to NCPA 2.2.0. That sum matches up with my Cent6 VM running NCPA 2.2.0, so we should be good there.
According to the devs, NCPA just reads the numbers returned by psutil, and translates them from bytes to kilobytes/megabytes/gigabytes/etc. And psutil gets its information from /proc/meminfo. Can you give me the output of cat /proc/meminfo and vmstat -s from t1?
Also, you mentioned that this was working until you added more memory. Was memory added to all of the servers, or just this one? Do you know if the check was returning properly prior to adding memory, or was it just close enough that no one noticed?
Edit:
One more piece of information. Can you get uname -a from both t1 and r4?
Re: NCPA reading memory wrong for terabite
Posted: Mon Jan 06, 2020 5:16 pm
by rnjie
we notice the wrong reading info when we added memory because we were getting the memory full alerts, on adding the memory we still got thesame alerts , we have deleted and readded the monitoring still same results. see info requested below
t1 ~]# cat /proc/meminfo
MemTotal: 6353629608 kB
MemFree: 15518496 kB
Buffers: 958328 kB
Cached: 1307809972 kB
SwapCached: 0 kB
Active: 723788824 kB
Inactive: 602394044 kB
Active(anon): 656998028 kB
Inactive(anon): 558292844 kB
Active(file): 66790796 kB
Inactive(file): 44101200 kB
Unevictable: 44 kB
Mlocked: 44 kB
SwapTotal: 16777212 kB
SwapFree: 16777212 kB
Dirty: 4100 kB
Writeback: 0 kB
AnonPages: 27396864 kB
Mapped: 241085764 kB
Shmem: 1188439264 kB
Slab: 8392836 kB
SReclaimable: 7796916 kB
SUnreclaim: 595920 kB
KernelStack: 81520 kB
PageTables: 132319108 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 777672912 kB
Committed_AS: 1236852840 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 13108092 kB
VmallocChunk: 29515609200 kB
HardwareCorrupted: 0 kB
AnonHugePages: 6215680 kB
HugePages_Total: 2359296
HugePages_Free: 247
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 5120 kB
DirectMap2M: 2017280 kB
DirectMap1G: 6440353792 kB
t1 ~]# vmstat -s
6353629696 total memory
6339171840 used memory
722839040 active memory
602395264 inactive memory
14457532 free memory
958332 buffer memory
1307819392 swap cache
16777212 total swap
0 used swap
16777212 free swap
2896995903 non-nice user cpu ticks
35894 nice user cpu ticks
268831966 system cpu ticks
44076884912 idle cpu ticks
447219093 IO-wait cpu ticks
39864 IRQ cpu ticks
19323367 softirq cpu ticks
0 stolen cpu ticks
618628756279 pages paged in
101513455190 pages paged out
0 pages swapped in
0 pages swapped out
3133129834 interrupts
158006976 CPU context switches
1563405840 boot time
78251142 forks
t1 ~]# uname -a
Linux t1.transplace.com 2.6.32-696.13.2.el6.x86_64 #1 SMP Fri Sep 22 12:32:14 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
r4 ~]# uname -a
Linux r4.transplace.com 2.6.32-696.13.2.el6.x86_64 #1 SMP Fri Sep 22 12:32:14 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
Re: NCPA reading memory wrong for terabite
Posted: Tue Jan 07, 2020 1:14 pm
by mbellerue
Alright, we've been digging through the psutils code quite a bit, and it looks like there are multiple paths psutils could take that would give different results for used percent. I will explain more about what we found down below, but for right now I think the best option to move forward is to find a memory plugin that is suitable to your needs, and use that for this server. This one seems like a good candidate.
https://exchange.nagios.org/directory/A ... ck/details
What we've found in psutils is that there are multiple ways in which it can gather information on memory usage, and subsequently, there are a few ways in which it calculates memory usage. The path it chooses for its calculation can be based on what fields are available in /proc/meminfo, which is expected, but also can be based on things like the CPU architecture of the system, and whether the system is an LCX container. Here is a link to the psutils function in question if you're interested in following the code.
https://github.com/giampaolo/psutil/blo ... ux.py#L373
Re: NCPA reading memory wrong for terabite
Posted: Wed Jan 08, 2020 12:28 pm
by rnjie
i tried the plugin below, it works only for the nagios server itself and not the client, there is no option to check for a remote client
for example -H for hostname or -C for community code and so on
Re: NCPA reading memory wrong for terabite
Posted: Wed Jan 08, 2020 12:55 pm
by tacolover101
the plugin that @mbellerue mentioned should work, what you'll want to do is run the plugin locally, and then trigger it using check_ncpa.
https://www.nagios.org/ncpa/help.php#ac ... tive-check (see the Running Plugins with Arguments section)
Re: NCPA reading memory wrong for terabite
Posted: Wed Jan 08, 2020 1:50 pm
by rnjie
i am running it locally using the guide you provided but am getting erros about the plugin not being found
the first thing i did was download the plugin and added it to my default libexec file and made it executable
i then ran it with the check_ncpa.py plugin as seen on the article you provided, this is the command i ran
libexec]# ./check_ncpa.py -H t1 -t plut0 -M 'plugins/check_linux_memory.sh' -q "args=-d G,args=-w 20,args=-c 10 -v"
UNKNOWN: The plugin (check_linux_memory.sh) requested does not exist
when i run it without the quoting arguements i get different results
libexec]# ./check_ncpa.py -H t1 -t plut0 -M 'plugins/check_linux_memory.sh' -q -d G -w 20 -c 10 -v
An error occurred:
need more than 1 value to unpack
what am i missing?
Re: NCPA reading memory wrong for terabite
Posted: Wed Jan 08, 2020 6:47 pm
by Box293
The first step is to execute the plugin manually on the remote machine to ensure it works and to remove ncpa from the equation. This will ensure you have the correct arguments. Can you perform this and then paste the command executed and the output returned please.
Re: NCPA reading memory wrong for terabite
Posted: Mon Jan 13, 2020 12:05 pm
by rnjie
thats exactly what i did, see command on remote host
libexec]# ./check_linux_memory.sh -d G -w 20 -c 10 -v
MEMORY OK - 61.1226% Free - Total:9.60493G Active:4.46127G Inactive:2.27097G Buffers:0G Cached:5.06926G |Free=61.1226;20;10;0 Active=4677980;0;0;0 Inactive=2381280;0;0;0 Buffers=0;0;0;0 Cached=5315504;0;0;0
this is what am getting when i use the ncpa check with the plugin
libexec]# ./check_ncpa.py -H t1 -t plut0 -M 'plugins/check_linux_memory.sh' -q "args=-d 'G',args=-w '20',args=-c '10'"
UNKNOWN: The plugin (check_linux_memory.sh) requested does not exist
Re: NCPA reading memory wrong for terabite
Posted: Mon Jan 13, 2020 2:05 pm
by rnjie
i finally got it to work
i was supposed to copy script into /usr/local/ncpa/plugins on the client hosts for it to work instead of /usr/local/nagios/libexec
libexec]# ./check_ncpa.py -H t1 -t plut0 -M 'plugins/check_linux_memory.sh' -q "args=-d 'G',args=-w '20',args=-c '10',args=-v"
MEMORY OK - 21.022% Free - Total:6059.29G Active:680.659G Inactive:585.039G Buffers:0.946941G Cached:1248.25G |Free=21.022;20;10;0 Active=713722448;0;0;0 Inactive=613458368;0;0;0 Buffers=992940;0;0;0 Cached=1308883352;0;0;0
Re: NCPA reading memory wrong for terabite
Posted: Mon Jan 13, 2020 2:38 pm
by mbellerue
There we go! That looks better! Glad you were able to get this working. Are we calling this good? Should I lock the thread?