Physical memory errors reported incorrectly

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
grberk
Posts: 5
Joined: Tue Jun 03, 2014 9:39 am

Physical memory errors reported incorrectly

Post by grberk »

I'm not a well-seasoned Nagios/Linux expert, so I'm at a point where I don't know what to look at. Maybe someone here can help!

I installed a new Nagios server last week, and it's running on Core 4.0.6 with Ubuntu Linux server 12.04. I was having little luck getting anything to work with the online documentation, because a lot of it was out of date, had different syntax, or just didn't work the way I was expecting it to. I looked for some install guides online and found one that was written just a few weeks prior discussing the same exact versions I was trying to get working, so I started working on this:
http://wellsie.net/p/248/

Everything mostly works, but I noticed that the memory alerts were not coming through (I have been testing Site24x7 with our ServiceDesk system for test monitoring). I was getting lots of alerts from servers that were high on physical memory usage, but nothing from Nagios. I noticed that the RAM was being collectively monitored with physical and virtual lumped together. After figuring out that I needed to change from CheckNT to NRPE for memory monitoring, I got it all setup. However, all of the servers I have currently setup in Nagios all show "Critical" in memory, but the numbers say otherwise:
http://imgur.com/Mulp0p4

If I go to the command-line and run ./check_nrpe -H 192.168.0.72 -p 5666 -c CheckMEM -a MaxWarn=80% MaxCrit=90% ShowAll type=physical, this is what I get:

Code: Select all

root@VERSA-NETMON02:/usr/local/nagios/libexec# ./check_nrpe -H 192.168.0.72 -p 5666 -c CheckMEM -a MaxWarn=80% MaxCrit=90% ShowAll type=physical
OK: physical memory: 1.27G|'physical memory %'=31%;80;90 'physical memory'=1.26G;3;3;0;4
So, it clearly shows memory usage is fine. That same server shows "critical" in the web page. I have the typical settings for warning (80%) and critical (90%). Here are my config settings that I have changed:

In my commands.cfg

Code: Select all

# 'check_mem' command definition
define command{
        command_name    check_mem
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -c CheckMEM -a MaxWarn=$ARG1$ MaxCrit=$ARG2$ ShowAll type=physical
        }
Service definition in one of my servers' config file:

Code: Select all

define service{
        use                     generic-service
        host_name               versa-dc01
        service_description     Memory Usage
        check_command           check_mem!80!90
        }
Have I missed something obvious? I can't understand why the console command shows correctly, but the web page is showing critical.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Physical memory errors reported incorrectly

Post by lmiltchev »

Can you show us the error that you are seeing in the GUI? Also, post the NSC.ini (nsclient.ini) file (hide sensitive info). Do you see any errors in the nsclient.log?
Be sure to check out our Knowledgebase for helpful articles and solutions!
grberk
Posts: 5
Joined: Tue Jun 03, 2014 9:39 am

Re: Physical memory errors reported incorrectly

Post by grberk »

The link to the GUI critical error is in my original post - http://imgur.com/Mulp0p4

There are no errors in the nsclient.log.

Here is the ncs.ini of the server I gave config file details of:

Code: Select all

# If you want to fill this file with all avalible options run the following command:
#   nscp settings --generate --add-defaults --load-all
# If you want to activate a module and bring in all its options use:
#   nscp settings --activate-module <MODULE NAME> --add-defaults
# For details run: nscp settings --help


; Undocumented section
[/modules]

; CheckDisk - CheckDisk can check various file and disk related things. The current version has commands to check Size of hard drives and directories.
CheckDisk = 1

; Event log Checker. - Check for errors and warnings in the event log. This is only supported through NRPE so if you plan to use only NSClient this wont help you at all.
CheckEventLog = 1

; Check External Scripts - A simple wrapper to run external scripts and batch files.
CheckExternalScripts = 1

; Helper function - Various helper function to extend other checks. This is also only supported through NRPE.
CheckHelpers = 1

; Check NSCP - Checkes the state of the agent
CheckNSCP = 1

; CheckSystem - Various system related checks, such as CPU load, process state, service state memory usage and PDH counters.
CheckSystem = 1

; CheckWMI - CheckWMI can check various file and disk related things. The current version has commands to check Size of hard drives and directories.
CheckWMI = 1

; NRPE server - A simple server that listens for incoming NRPE connection and handles them.
NRPEServer = 1

; NSClient server - A simple server that listens for incoming NSClient (check_nt) connection and handles them. Although NRPE is the preferred method NSClient is fully supported and can be used for simplicity or for compatibility.
NSClientServer = 1


; Undocumented section
[/settings/default]

; ALLOWED HOSTS - A comaseparated list of allowed hosts. You can use netmasks (/ syntax) or * to create ranges.
allowed hosts = <IP of my Nagios server>


; A list of aliases available. An alias is an internal command that has been "wrapped" (to add arguments). Be careful so you don't create loops (ie check_loop=check_a, check_a=check_loop)
[/settings/external scripts/alias]

; alias_cpu - Alias for alias_cpu. To configure this item add a section called: /settings/external scripts/alias/alias_cpu
alias_cpu = checkCPU warn=80 crit=90 time=5m time=1m time=30s

; alias_cpu_ex - Alias for alias_cpu_ex. To configure this item add a section called: /settings/external scripts/alias/alias_cpu_ex
alias_cpu_ex = checkCPU warn=$ARG1$ crit=$ARG2$ time=5m time=1m time=30s

; alias_disk - Alias for alias_disk. To configure this item add a section called: /settings/external scripts/alias/alias_disk
alias_disk = CheckDriveSize MinWarn=10% MinCrit=5% CheckAll FilterType=FIXED

; alias_disk_loose - Alias for alias_disk_loose. To configure this item add a section called: /settings/external scripts/alias/alias_disk_loose
alias_disk_loose = CheckDriveSize MinWarn=10% MinCrit=5% CheckAll FilterType=FIXED ignore-unreadable

; alias_event_log - Alias for alias_event_log. To configure this item add a section called: /settings/external scripts/alias/alias_event_log
alias_event_log = CheckEventLog file=application file=system MaxWarn=1 MaxCrit=1 "filter=generated gt -2d AND severity NOT IN ('success', 'informational') AND source != 'SideBySide'" truncate=800 unique descriptions "syntax=%severity%: %source%: %message% (%count%)"

; alias_file_age - Alias for alias_file_age. To configure this item add a section called: /settings/external scripts/alias/alias_file_age
alias_file_age = checkFile2 filter=out "file=$ARG1$" filter-written=>1d MaxWarn=1 MaxCrit=1 "syntax=%filename% %write%"

; alias_file_size - Alias for alias_file_size. To configure this item add a section called: /settings/external scripts/alias/alias_file_size
alias_file_size = CheckFiles "filter=size > $ARG2$" "path=$ARG1$" MaxWarn=1 MaxCrit=1 "syntax=%filename% %size%" max-dir-depth=10

; alias_mem - Alias for alias_mem. To configure this item add a section called: /settings/external scripts/alias/alias_mem
alias_mem = checkMem MaxWarn=80% MaxCrit=90% ShowAll=long type=physical type=virtual type=paged type=page

; alias_process - Alias for alias_process. To configure this item add a section called: /settings/external scripts/alias/alias_process
alias_process = checkProcState "$ARG1$=started"

; alias_process_count - Alias for alias_process_count. To configure this item add a section called: /settings/external scripts/alias/alias_process_count
alias_process_count = checkProcState MaxWarnCount=$ARG2$ MaxCritCount=$ARG3$ "$ARG1$=started"

; alias_process_hung - Alias for alias_process_hung. To configure this item add a section called: /settings/external scripts/alias/alias_process_hung
alias_process_hung = checkProcState MaxWarnCount=1 MaxCritCount=1 "$ARG1$=hung"

; alias_process_stopped - Alias for alias_process_stopped. To configure this item add a section called: /settings/external scripts/alias/alias_process_stopped
alias_process_stopped = checkProcState "$ARG1$=stopped"

; alias_sched_all - Alias for alias_sched_all. To configure this item add a section called: /settings/external scripts/alias/alias_sched_all
alias_sched_all = CheckTaskSched "filter=exit_code ne 0" "syntax=%title%: %exit_code%" warn=>0

; alias_sched_long - Alias for alias_sched_long. To configure this item add a section called: /settings/external scripts/alias/alias_sched_long
alias_sched_long = CheckTaskSched "filter=status = 'running' AND most_recent_run_time < -$ARG1$" "syntax=%title% (%most_recent_run_time%)" warn=>0

; alias_sched_task - Alias for alias_sched_task. To configure this item add a section called: /settings/external scripts/alias/alias_sched_task
alias_sched_task = CheckTaskSched "filter=title eq '$ARG1$' AND exit_code ne 0" "syntax=%title% (%most_recent_run_time%)" warn=>0

; alias_service - Alias for alias_service. To configure this item add a section called: /settings/external scripts/alias/alias_service
alias_service = checkServiceState CheckAll

; alias_service_ex - Alias for alias_service_ex. To configure this item add a section called: /settings/external scripts/alias/alias_service_ex
alias_service_ex = checkServiceState CheckAll "exclude=Net Driver HPZ12" "exclude=Pml Driver HPZ12" exclude=stisvc

; alias_up - Alias for alias_up. To configure this item add a section called: /settings/external scripts/alias/alias_up
alias_up = checkUpTime MinWarn=1d MinWarn=1h

; alias_updates - Alias for alias_updates. To configure this item add a section called: /settings/external scripts/alias/alias_updates
alias_updates = check_updates -warning 0 -critical 0

; alias_volumes - Alias for alias_volumes. To configure this item add a section called: /settings/external scripts/alias/alias_volumes
alias_volumes = CheckDriveSize MinWarn=10% MinCrit=5% CheckAll=volumes FilterType=FIXED

; alias_volumes_loose - Alias for alias_volumes_loose. To configure this item add a section called: /settings/external scripts/alias/alias_volumes_loose
alias_volumes_loose = CheckDriveSize MinWarn=10% MinCrit=5% CheckAll=volumes FilterType=FIXED ignore-unreadable 

; default - Alias for default. To configure this item add a section called: /settings/external scripts/alias/default
default = 

[/settings/NRPE/server]

; COMMAND ARGUMENT PROCESSING - This option determines whether or not the we will allow clients to specify arguments to commands that are executed.
allow arguments = true

; COMMAND ALLOW NASTY META CHARS - This option determines whether or not the we will allow clients to specify nasty (as in |`&><'"\[]{}) characters in arguments.
allow nasty characters = true

[/settings/external scripts]

; COMMAND ARGUMENT PROCESSING - This option determines whether or not the we will allow clients to specify arguments to commands that are executed.
allow arguments = true

; COMMAND ALLOW NASTY META CHARS - This option determines whether or not the we will allow clients to specify nasty (as in |`&><'"\[]{}) characters in arguments.
allow nasty characters = true
I just now saw the line in here that says
alias_mem = checkMem MaxWarn=80% MaxCrit=90% ShowAll=long type=physical type=virtual type=paged type=page..... is there something I need to do with that? It looks pretty much the same as what the config on my server looks like.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Physical memory errors reported incorrectly

Post by lmiltchev »

What happens if you change this line:

Code: Select all

command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -c CheckMEM -a MaxWarn=$ARG1$ MaxCrit=$ARG2$ ShowAll type=physical
to this?

Code: Select all

command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -c CheckMEM -a MaxWarn=$ARG1$% MaxCrit=$ARG2$% ShowAll type=physical
Be sure to check out our Knowledgebase for helpful articles and solutions!
grberk
Posts: 5
Joined: Tue Jun 03, 2014 9:39 am

Re: Physical memory errors reported incorrectly

Post by grberk »

Holy crap. That was it! It's reporting fine now. I can't believe out of all the documentation and help files, either that wasn't shown, or I completely missed it.

Thank you! Thank you! Thank you!
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Physical memory errors reported incorrectly

Post by tmcdonald »

Former Nagios employee
Locked