Nagios XI monitoring engine down

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Nagios XI monitoring engine down

Post by rajasegar »

Nagios XI 2012R2.9
RHEL 6.5 x64
Manual Install
Firefox 23

1) The Nagios monitoring engine went down unexpectedly over the weekend.

For the NCPD log this is the last entry. I suppose Nagios monitoring engine died at the same time.
[06-08-2014 00:13:06] NPCD: WARN: MAX load reached: load 12.310000/10.000000 at i=1
Please advice how to check this issue.

Restart of nagios services and npcd did not solve all the problems as I cannot do any immediate check and performance data was not updating.
Had to reboot the machine to get back to normal.

2) How to monitor the parameters in this dashboard via command line
09-06-2014 09-08-08 AM.png
You do not have the required permissions to view the files attached to this post.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Nagios XI monitoring engine down

Post by slansing »

How did you shut down the machine? If you did it incorrectly, you may need to follow this now as well:

http://assets.nagios.com/downloads/nagi ... tabase.pdf

As far as performance data not being graphed at the point in time you are referencing, that is because you hit NPCD's load limit. That limit is in place so you don't allow NPCD to eat all of your resources trying to crunch on performance data.
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Nagios XI monitoring engine down

Post by rajasegar »

slansing wrote:How did you shut down the machine? If you did it incorrectly, you may need to follow this now as well:

http://assets.nagios.com/downloads/nagi ... tabase.pdf

As far as performance data not being graphed at the point in time you are referencing, that is because you hit NPCD's load limit. That limit is in place so you don't allow NPCD to eat all of your resources trying to crunch on performance data.
I am sure NagiosXI installation scripts added proper shutdown scripts for shutdown/reboots right?
Anyway my XI DB is offloaded to another VM. and it is 95% - 99% idle most of the time.
Not a single error message in /var/log/mysqld.log.

1) I still want to find out why my monitoring engine went down.
2) How to monitor the monitoring engine statistics?

Thanks.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios XI monitoring engine down

Post by scottwilkerson »

Nagios XI comes with a wizard that can check XI systems, you could run it for localhost to get the command.

It uses the check_nagiosxiserver.php plugin which will use the username and backend ticket found in the Backend API Component

Admin -> Manage Components -> Backend API Component -> Settings
check_nagiosxiserver - Copyright (c) 2010 Nagios Enterprises, LLC.
Portions Copyright(c) others (see source code).

Usage:
check_nagiosxiserver.php <option>

Options:

--address=<addres> The address of the Nagios XI server

--url=<url> The URL used to access the Nagios XI web interface

--username=<username> The username used for accessing the server

--ticket=<ticket> The ticket used for accessing the server

--timeout=<seconds> Seconds before plugin times out (default=<? echo $timeout;?>)

--debug=<0/1> Enables/disables debugging output

--mode=<mode> Operating mode of the plugin. Valid modes include:
daemons Checks the status of the core Nagios XI daemons to ensure
they're running properly.
jobs Checks the status of the core Nagios XI jobs to ensure
they're running properly.
iowait Checks the I/O wait CPU statistics.
load Checks the 1,5,15 minutes load statistics.

--warn=<warning> The warning values used for some modes (iowait, load)

--crit=<critical> The critical values used for some modes (iowait, load)


This plugin checks the status of a remote Nagios XI server.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Nagios XI monitoring engine down

Post by rajasegar »

scottwilkerson wrote:Nagios XI comes with a wizard that can check XI systems, you could run it for localhost to get the command.

It uses the check_nagiosxiserver.php plugin which will use the username and backend ticket found in the Backend API Component

Admin -> Manage Components -> Backend API Component -> Settings
check_nagiosxiserver - Copyright (c) 2010 Nagios Enterprises, LLC.
Portions Copyright(c) others (see source code).

Usage:
check_nagiosxiserver.php <option>

Options:

--address=<addres> The address of the Nagios XI server

--url=<url> The URL used to access the Nagios XI web interface

--username=<username> The username used for accessing the server

--ticket=<ticket> The ticket used for accessing the server

--timeout=<seconds> Seconds before plugin times out (default=<? echo $timeout;?>)

--debug=<0/1> Enables/disables debugging output

--mode=<mode> Operating mode of the plugin. Valid modes include:
daemons Checks the status of the core Nagios XI daemons to ensure
they're running properly.
jobs Checks the status of the core Nagios XI jobs to ensure
they're running properly.
iowait Checks the I/O wait CPU statistics.
load Checks the 1,5,15 minutes load statistics.

--warn=<warning> The warning values used for some modes (iowait, load)

--crit=<critical> The critical values used for some modes (iowait, load)


This plugin checks the status of a remote Nagios XI server.
Thanks. I will check this out. Do you have anything similar for Nagios Core?
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios XI monitoring engine down

Post by scottwilkerson »

maybe check_nagios will accomplish all you need

Code: Select all

# /usr/local/nagios/libexec/check_nagios -h
check_nagios v2.0.2 (nagios-plugins 2.0.2)
Copyright (c) 1999-2014 Nagios Plugin Development Team
        <[email protected]>

This plugin checks the status of the Nagios process on the local machine
The plugin will check to make sure the Nagios status log is no older than
the number of minutes specified by the expires option.
It also checks the process table for a process matching the command argument.


Usage:
check_nagios -F <status log file> -t <timeout_seconds> -e <expire_minutes> -C <process_string>

Options:
 -h, --help
    Print detailed help screen
 -V, --version
    Print version information
 --extra-opts=[section][@file]
    Read options from an ini file. See
    https://www.nagios-plugins.org/doc/extra-opts.html
    for usage and examples.
 -F, --filename=FILE
    Name of the log file to check
 -e, --expires=INTEGER
    Minutes aging after which logfile is considered stale
 -C, --command=STRING
    Substring to search for in process arguments
 -t, --timeout=INTEGER
    Timeout for the plugin in seconds
 -v, --verbose
    Show details for command-line debugging (Nagios may truncate output)

Examples:
 check_nagios -t 20 -e 5 -F /usr/local/nagios/var/status.log -C /usr/local/nagios/bin/nagios

Send email to [email protected] if you have questions regarding use
of this software. To submit patches or suggest improvements, send email to
[email protected]
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
rajasegar
Posts: 1018
Joined: Sun Mar 30, 2014 10:49 pm

Re: Nagios XI monitoring engine down

Post by rajasegar »

scottwilkerson wrote:maybe check_nagios will accomplish all you need

Code: Select all

# /usr/local/nagios/libexec/check_nagios -h
check_nagios v2.0.2 (nagios-plugins 2.0.2)
Copyright (c) 1999-2014 Nagios Plugin Development Team
        <[email protected]>

This plugin checks the status of the Nagios process on the local machine
The plugin will check to make sure the Nagios status log is no older than
the number of minutes specified by the expires option.
It also checks the process table for a process matching the command argument.


Usage:
check_nagios -F <status log file> -t <timeout_seconds> -e <expire_minutes> -C <process_string>

Options:
 -h, --help
    Print detailed help screen
 -V, --version
    Print version information
 --extra-opts=[section][@file]
    Read options from an ini file. See
    https://www.nagios-plugins.org/doc/extra-opts.html
    for usage and examples.
 -F, --filename=FILE
    Name of the log file to check
 -e, --expires=INTEGER
    Minutes aging after which logfile is considered stale
 -C, --command=STRING
    Substring to search for in process arguments
 -t, --timeout=INTEGER
    Timeout for the plugin in seconds
 -v, --verbose
    Show details for command-line debugging (Nagios may truncate output)

Examples:
 check_nagios -t 20 -e 5 -F /usr/local/nagios/var/status.log -C /usr/local/nagios/bin/nagios

Send email to [email protected] if you have questions regarding use
of this software. To submit patches or suggest improvements, send email to
[email protected]
Yes found everything I need. Thanks
Please close this case.
5 x Nagios 5.6.9 Enterprise Edition
RHEL 6 & 7
rrdcached & ramdisk optimisation
Locked