Views

Difference between revisions of "Nagios XI:FAQs"

From Nagios Support Wiki

(Performance Graph Problems)
(Performance Graph Problems)
Line 340: Line 340:
  
 
*'''Make sure you're using the latest version of Nagios XI'''.  Old releases may have issues that will not necessarily be resolved from the below solutions.  [http://library.nagios.com/library/products/nagiosxi/documentation/249-upgrading-nagios-xi Upgrading Nagios XI]
 
*'''Make sure you're using the latest version of Nagios XI'''.  Old releases may have issues that will not necessarily be resolved from the below solutions.  [http://library.nagios.com/library/products/nagiosxi/documentation/249-upgrading-nagios-xi Upgrading Nagios XI]
 
'''2011R3.2 and 3.3 issues graphs display but are empty'''. Try running the following commands to see if an excessive amount of performance data files have built up.
 
 
  cd /usr/local/nagios/var/spool/xidpe
 
  ls -f | wc -l
 
 
If the file count is very large, run the following commands, which should restore regular performance graphing.
 
 
  cd /usr/local/nagios/var/spool
 
  rm -rf xidpe
 
  mkdir xidpe
 
  chown nagios.nagios xidpe
 
  chmod 755 xidpe
 
 
  
  
Line 399: Line 385:
  
 
===== Network Performance Graphs Are Displayed But Have No Data =====
 
===== Network Performance Graphs Are Displayed But Have No Data =====
 +
 +
'''2011R3.2 and 3.3 issues graphs display but are empty'''. Try running the following commands to see if an excessive amount of performance data files have built up.
 +
 +
  cd /usr/local/nagios/var/spool/xidpe
 +
  ls -f | wc -l
 +
 +
If the file count is very large, run the following commands, which should restore regular performance graphing.
 +
 +
  cd /usr/local/nagios/var/spool
 +
  rm -rf xidpe
 +
  mkdir xidpe
 +
  chown nagios.nagios xidpe
 +
  chmod 755 xidpe
 +
 +
'''Only Switch and Router Graphs display but have no data'''
  
 
A fuller description of this problem is when you are monitoring a switch or router, but its bandwidth graphs are always zero when you know for sure they should have data. Keep in mind, be absolutely sure that the graphs should have data.<br />
 
A fuller description of this problem is when you are monitoring a switch or router, but its bandwidth graphs are always zero when you know for sure they should have data. Keep in mind, be absolutely sure that the graphs should have data.<br />

Revision as of 12:29, 6 September 2012

Back To Nagios XI Overview

Answers to Frequently Asked Questions (FAQs) regarding Nagios XI can be found here.


Contents

FAQs

What Are FAQs? Frequently Asked Questions, or "FAQs", are answers to questions that are frequently asked in some context.


Common Problems - Try These Solutions First

Follow these steps if you are encountering problems with Nagios XI. These actions solve many commonly asked questions.

  • Clear your browser's cache to get the newest XI javascript code.
    Instructions on how to do it.
  • How To Reset Security Credentials (if performance graphs aren't displayed)
    Select the Reset Security Credentials option in the Admin section and click Update.
  • How To Reset File Permissions (if configuration changes are not taking effect)
    Instructions how.
  • Debugging Configuration Change Problems (if configuration changes are not taking effect)
    Write configuration file tool.

Hardware Requirements

Check out our general guidelines on the hardware requirements needed to run Nagios XI:

Nagios XI - Hardware Requirements

Supported Distributions

Nagios XI is currently supported with the following Linux distributions for both 32 and 64 bit installations:

  • CentOS 5/6
  • RHEL 5/6

Capabilities

Is Nagios XI capable of Distributed Monitoring?

Yes it is! There are multiple options for Distributed Monitoring with Nagios.

Nagios Fusion *New*

Nagios Core (the underlying monitoring engine) can be configured for distributed monitoring. For more information, read the Nagios Core documentation on distributed monitoring.

Using DNX With Nagios

Is it possible to use SMS alerts for a custom SMS gateway?

Yes! Nagios XI sends SMS alerts by via email. Although we currently don't have a solution that allows users to define custom SMS gateways, the best way to get around this is to define a contact with an email address that will send the SMS message. Email address examples are as follows:

 <phonenumber>@smsgateway.domain
 1235551234@messaging.sprintpcs.com (send SMS via sprint)
 1235551234@tmomail.net (send SMS via t-mobile)

System Configuration Problems

Resetting The nagiosadmin Password

To reset the nagiosadmin password, run the following from the command line:

 /usr/local/nagiosxi/scripts/reset_nagiosadmin_password.php --password=<newpassword>

Note: If you would like to use special characters in your password, you should escape them with "\". For example, if you want to set your new password to be "$newpassword#", then you can run:

 /usr/local/nagiosxi/scripts/reset_nagiosadmin_password.php --password=<\$newpassword\#>
Problems Using Nagios XI With Proxies

We do not officially support Nagios XI when you install and use proxy software that restricts traffic to or from the Nagios XI server. There are several reasons for this. First, Nagios XI requires external access for package installation and updates. Package installation and updates may not work when proxies are used. Additionally, the Nagios XI code makes several internal HTTP calls to the local Nagios XI server to import configuration data, apply configuration changes, process AJAX requests, etc. These functions may not work properly when you deploy a proxy, which would result in a non-functional Nagios XI installation.

There are two things that need to be configured to make XI installation work with a proxy; the yum and wget configurations. Do both of these before starting anything about the installation process.

In /etc/yum.conf :

 proxy=http://someproxyserver:port/ # Shouldn't need to be quoted, remember the trailing slash
 proxy_username=myname  # The username you authenticate to your proxy with, if applicable
 proxy_password=mypass  # The password you provide to your proxy, if applicable

In /etc/wgetrc :

 http_proxy=http://myname:mypass@someproxyserver:port/ # All in one string this time
 no_proxy=localhost,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16 # Hosts to exclude from proxying

Quoting is not needed (or helpful) in any of these, but if you have special characters in passwords (especially : or @) and are having problems you probably need to escape them with backslashes.

Here is a proxy install solution reported by forum user TSCAdmin:

1. Before running any installation script install php-pear package manually

2. Set proxy for PHP Pear

 pear config-set http_proxy 'http://example.com:8080'

3. Run Nagios installation scripts sequentially

4. Unset system proxy before running E-importnagiosql script


Update Check Behind a Proxy Updates checks are known to fail for systems behind a proxy. We created a proxy component that should allow the update check to work behind most proxies. Install this component from the Admin->Manage Components page and then access the Admin->Proxy Configuration page to configure the proxy settings. [Proxy Component]

Installation and Upgrade Problems

CentOS 6 Installation Problems

Between the the release of Nagios XI 2011R1.7 and 1.8, several changes were made to the CentOS 6 repo that created package conflicts, preventing the Nagios XI installation scripts from completing successfully. This usually becomes apparent by the "fullinstall" script failing with one of the following two messages:


 ERROR: Prerequisite program 'mysql' not found!
 which: no mysqladmin in (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:   /usr/sbin:/sbin:/home/admin/bin)
 ERROR: Prerequisite program 'mysqladmin' not found!
 7 prerequisite(s) missing - exiting.

OR

 ls: cannot access /usr/local/nagiosxi/nom/checkpoints/nagioscore/*.gz: No such file or directory
 NO NOM SNAPSHOT FOUND!
 ERROR: NagiosQL import appears to have failed - exiting.  (Reason: Import files are still present in /usr/local/nagios/etc/import)

This issue can be resolved with the following solution if you're attempting to install with the 1.7 tarball. This problem will be resolved in the 2011R1.8 release of Nagios XI.

Before attempting any more installations, run:

 yum install centos-release-cr

Then remove any previous /tmp/nagiosxi directory that is in place, and unpack a fresh tarball:

 cd /tmp
 rm -rf nagiosxi
 tar zxf xi-2011r1.7.tar.gz
 cd nagiosxi
 ./fullinstall

If the installer still fails, contact XI support and attach the install.log file that's generated by the fullinstall script.


SourceGuardian Errors

After upgrading to 2009R1.2C, some users started getting an error about SourceGuardian. Add this line to your /etc/php.ini file:

 extension=ixed.5.1.lin

Once you make that change, restart Apache:

 service restart httpd
Resolving "DB Connect Error [nagiosxi]: Database connection failed"

The problem we identified with gnome was that the PATH for the "service" command gets changed under gnome. This needs to be set correctly so that the scripts starting with 3-dbservers will run correctly. You can test if the path is set correctly by trying the following commands:

service httpd restart
service postgresql restart

The important thing is that it includes the "sbin" directories. Normally it would look like this, although this isn't the only "correct" answer possible:

/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
Resolving "NSP: Sorry Dave, I can't let you do that" Errors

Session protection was added to 2009R1.2C to prevent CSRF attacks. This code to do this caused some users to see this error. The problem was due to the user's browser caching older versions of the XI javascript code. In order to clear the cache and prevent this from happening, you need to clear your browser's cache. This is typically done (in Firefox) by holding down the shift key and clicking reload. See Other well documented procedures on clearing the browser cache.

The other possible cause of this is that the XI server's time is out of sync with the web browser. Try the following:

 yum install ntp
 ntpdate time.nist.gov


If that still doesn't fix the error, then you may have to specify your timezone in your /etc/php.ini file. Newer releases of PHP require this setting for your server to reflect the correct system time and timezone. To change this setting, edit the /etc/php.ini file with the following line:

 date.timezone = Etc/GMT-13

Change the timezone to match your location. These zones are listed at the following URL. PHP Timezones After changing the setting, restart your apache server:

 service httpd restart
"HTTP 500 Error"/"PHP Parse error - Unexpected $end"

For those doing manual installations, some of the tools embedded in Nagios XI use the PHP short tags feature, which is not necessarily enabled on all web servers by default. To fix this issue, locate your php.ini file (located at /etc/php.ini for CentOS installations), and verify that "short_open_tag" is set to "on." We intend to use full tags for future version, but some components and addons may still use them, so we recommend leaving this setting to "on."

"ERROR: PostgresQL not running - exiting."

This anomaly will rarely occur during a VM set up of Nagios XI. You may try restarting the server but in some cases will have to start the Nagios XI install from the beginning.

The following is an example of what it may look like:

 cp: cannot create regular file `/usr/local/nagiosxi/scripts': 
  Read-only file system
 cp: cannot create regular file `/usr/local/nagiosxi/scripts': 
  Read-only file system
 chown: cannot access `/usr/local/nagiosxi/scripts/reset_config_perms': No such file or directory
 chown: cannot access `/usr/local/nagiosxi/scripts/reset_config_perms.sh': No such file or directory
 chmod: cannot access `/usr/local/nagiosxi/scripts/reset_config_perms': No such file or directory
 chmod: cannot access `/usr/local/nagiosxi/scripts/reset_config_perms.sh': No such file or directory
 /tmp/nagiosxi
 Checking PostgresQL status...
 ERROR: PostgresQL not running - exiting.
 ERROR: Nagios XI database was not setup properly - exiting.
"ERROR: Please add the 'Optional' channel to your Red Hat systems subscriptions."

You need to add the Optional software channel so that Nagios XI can install the necessary prerequisites. To do so, first sign in to your Red Hat Network account at http://rhn.redhat.com/. Then click on the link corresponding to your system. Near the bottom-left corner of the page, click Alter Channel Subscriptions. Check the box labeled RHEL Server Optional and click Change Subscriptions. That's it! You should be able to run the installer again and complete your installation.

"Installation errors on customized corporate builds of CentOS or RHEL"

We have seen when companies require the use of their "standard build" of either OS Nagios XI will not be able to successfully install if there have been modification to the umask on the machine.

"Upgrade errors - root.crontab.orig: cannot overwrite existing file"

We have seen problems when upgrading and there are leftover files from previous upgrades.

This problem can be eliminated by running the following command:

 cat /dev/null > /tmp/nagiosxi/uninstall-crontab-root

After this you can proceed to run the upgrade script again.

Configuration Problems

Apply Configuration Fails: General Troubleshooting

If you receive an error while attempting to Apply Configuration stating that the configuration verification has failed, then that means there is some sort of syntax error or configuration conflict the configuration that's been defined. You can isolate this issue by accessing the Core Config Manager->Configuration Snapshots page. You should see the most recent snapshot highlighted in red. View the text file from the snapshot to see what config file contained the error. You can then find that file in the associated tar.gz file and search for the problem based on the error message. The snapshot represents the information that is CURRENTLY in the CCM database, that Nagios attempted to save. You'll need to correct the issue through the Core Config Manager, then attempt to Apply Configuration again.

The Write Config Tool in the CCM is a manual tool for writing the DB information to the configuration files (it manually Applies Configuration). It's important to know that Nagios cannot start or restart with a bad configuration. The config verification must pass in order for Nagios to be able to restart successfully with the new configuration.

Configuration Applies, but still get "Configuration File Is Out Of Date" Error

If your configuration is applying successfully and the changes are visible in the XI interface, but you're still seeing an error message in the CCM that says "Configuration File Is Out Of Date", then you may have to specify your timezone in your /etc/php.ini file. Newer releases of PHP require this setting for your server to reflect the correct system time and timezone. To change this setting, edit the /etc/php.ini file with the following line:

 date.timezone = Etc/GMT-13

Change the timezone to match your location. These zones are listed at the following URL. PHP Timezones After changing the setting, restart your apache server:

 service httpd restart


Apply Configuration Fails, No Configuration Problems

As of 2011 R1.7, extra sanity checks were added to the Apply Configuration functionality of Nagios XI to prevent false positives and also to prevent that page from stalling out endlessly. An example error that can show up is: "Backend login to the Core Config Manager failed"

There are a few different reasons an error like this can show up. The most common one is the use of a proxy that prevents "wget" from being able to resolve to "localhost" correctly. However, if you receive an error message when attempting to Apply Configuration other than "Configuration Error...," run the following commands and send the output file to the Nagios XI support team.

 cd /usr/local/nagiosxi/scripts
 ./reconfigure_nagios.sh &> reconfig.txt

Then also run the following command to begin capturing log output:

 tail -f /usr/local/nagiosxi/var/cmdsubsys.log &> cmd.txt

And attempt to Apply Configuration from the web interface. After the browser has returned some output to the screen, press Ctrl+C to stop the log tail, and send XI support the cmd.txt file and the reconfig.txt that was generated by the above instructions.

Apply Configuration Page Stalls Out, Never Completes

If you attempt to Apply Configuration and you're seeing the following output:

 * Configuration submitted for processing...
 * Waiting for configuration verification.................. 

and the configuration never applies, the page may be timing out. If you've recently updated XI, try restarting the server first. If you're currently running Nagios XI 2011R1.3 there is a known bug that can cause this issue. You'll need to upgrade to the latest version to resolve the issue. If that does not resolve the issue, try editing the configuration for your PHP settings. Open /etc/php.ini file in a text editor and increase the following values.

Scroll all the way down to the bottom of the script, and remove a line that says:

 memory_limit = 64M      ; Increase the memory limit

Then scroll up to around line 300, and increase the numbers for the following configs.

 ;;;;;;;;;;;;;;;;;;;
 ; Resource Limits ;
 ;;;;;;;;;;;;;;;;;;;
 max_execution_time = 60     ; Maximum execution time of each script, in  seconds
 max_input_time = 60     ; Maximum amount of time each script may spend parsing request data
 memory_limit = 256M      ; Maximum amount of memory a script may consume 


After this, run:

 service httpd restart


Note: If you're running a large installation with several thousand hosts/services, you may need to increase these numbers more to allow enough time and memory for large configuration changes to take effect.

If the issue persists after the above solutions, the issue could be caused by creating a local DNS entry for the Nagios XI server, but failing to add that name entry to the Nagios XI server itself. Example, if you're accessing the XI server from the following url: http://nagiosserver/nagiosxi, you need to verify that the XI server can also resolve that DNS name correctly. The local DNS entry for the XI server needs to be added to the /etc/hosts file.

You can observe similar issues if you run out of disk space.

Configuration Applies, No Changes Take Place

This is generally due to permissions issues with the configuration file. Use the Write Config Tool in the Core Config Manager to see if you can manually write the DB information to the config files. If the Write Config Tool returns error messages related to permissions you can run the following script to correct the permission settings:

 /usr/local/nagiosxi/scripts/reset_config_perms

There is a known bug in XI 1.3E and F where this script was not automatically running when configurations were applied. If you're running a Nagios XI version earlier than 1.3g, we recommend updating to correct this issue.

Modifying The Contents Of /usr/local/nagios/etc
  • You can keep custom configuration files in the /usr/local/nagios/etc/static directory
  • Don't modify config files directly in /usr/local/nagios/etc, as they will be overwritten by the Core Config Manager
Unable To Delete Hosts

Hosts can only be deleted after all of their dependent services and associated relationships have been deleted. Make sure to delete any associated services or other objects before deleting the host.

Host Still Visible After Deletion: (Ghost Hosts)

If you have successfully deleted a host and all of it's services from the Core Config Manager, but you're still seeing it in the status tables, then you most likely have multiple instances of Nagios running on your machine. To make sure all instances are stopped, type the following in the command-line.

 killall nagios
 service nagios start


Host Still Visible In XI After Deletion From the CCM

Go to the Core Config Manager->Write Config Tool, and use that tool to manually write out the configuration data to file. Verify your configuration. If it verifies, go ahead and restart Nagios.

If by chance the host and all of it's services are completely deleted in the Core Config Manager, and the actual host config file is still there after using the Write Config Tool, then go ahead and delete the config file. The files will be located in the following directories.

/usr/local/nagios/etc/hosts
/usr/local/nagios/etc/services

On rare occasions the CCM will somehow lose a file, we haven't nailed down what causes it, but it is usually related to deleting the host.

Network status map parent/child relationship not updating(v1.3)

Underneath the Parents box in the CCM, make sure the "standard" radio button is selected. If "null" is selected your parent host selection doesn't get written to disk. We're working on a method of fixing the CCM so this doesn't happen with several fields.


Core Config Manager Problems
GUI Issues

Most of these are related to IE's implementation of JavaScript. If possible, use a browser that more closely implements the ECMAScript Language Specification.

In the event of the the Core Config Manager not visible or components missing from the page, this generally relates to a proxy and the following thread covers how to address this issue:
Nagios Core Config Manager not showing up.

Configuration Changes

If you make changes to your configuration and they are not reflected in XI, it may be due to file permissions. Here are two options to try:

  • Reset File Permissions

Execute the following command to reset your configuration file permissions.

 /usr/local/nagiosxi/scripts/reset_config_perms

You can also view if you have any permissions related issues by accessing the Admin->Check File Permissions page in the XI interface (v1.3g+).

Restoring Default Configuration

If you've somehow messed up your configurations irreparably, or simply want to reset a test system, you can restore the configuration to the defaults as shipped with XI. To do so, download these two files and transfer them (via SCP) to your XI server:
restore_defaults.sh
nagiosql_defaults.sql
Then, log into the console of your XI server, and in whatever directory you put those two files run these commands:

 chmod +x restore_defaults.sh
 ./restore_defaults.sh

This will delete all of your hosts and services and reload just the demo ones that were initially set up.

Making A Mass Change In The CCM

Changing The Field Entry For A Large Amount Of Objects

Occasionally admins need to change a specific settings for a huge quantity of services or hosts, and this change can't be made from a template. Although we highly recommend the use of templating whenever possible, sometimes it's just not possible to make the change there. Our unofficial solution for this is to write a SQL query that will manually update the DB fields where you need them change. NOTE: Test your queries on a single test host/service first, and try this solution at your own risk, we are not responsible if you break something with this! Here's an example a user posted of a change made to the check_interval for all 'Disk Monitor' services.

 mysql> use nagiosql;
 mysql> update tbl_service set check_interval=60 where service_description='Disk Monitor';
 mysql> select config_name, service_description, check_interval from tbl_service where service_description='Disk Monitor';

If the change you wanted was successful, Apply Configuration to write the changes to the config files.

Using Scripts To Make Changes in the CCM

Some admins make use of internal scripts to update and maintain their monitoring environment. Although we're only able to offer limited support on a situation like this, a useful script to know about is:

 /usr/local/nagiosxi/scripts/reconfigure_nagios.sh  

This is the command-line version of "Apply Configuration" in the XI interface. It will write the CCM DB info to the config files and restart Nagios.

To automate importing configs using scripts, you can simply place config files in the /usr/local/nagios/etc/import directory, and then run the reconfigure_nagios.sh script. This will handle the import to the DB, writing the configs, verification, and then restarting Nagios.

Currently there is not a streamlined way to remove hosts and services from the Core Config Manager using scripts. We hope to have features like this implemented in 2012.

Performance Graph Problems

Performance Graphs Are Missing Or Not Displayed

This can happen for a variety of reasons, but there are several simple solutions that resolve this issue for most people:

  • Make sure you're using the latest version of Nagios XI. Old releases may have issues that will not necessarily be resolved from the below solutions. Upgrading Nagios XI


Verify That process_perfdata.pl has correct permissions Make sure that the file /usr/local/nagios/libexec/process_perfdata.pl has execute permissions and is owned by nagios:nagios.

2011 R1.8 Fix There is a known bug on some XI installs for this release that have incorrect permissions for the performance data directory. This can be resolved by running the following command as the root user.

 chmod -R +x /usr/local/nagios/share/perfdata/


  • 1.6 and 1.7 RHEL/CentOS 6 Users. There were some hiccups with the repos which cause a necessary component for MRTG graphing to not be installed. This is a very simple fix. Log into the CLI of your Nagios XI server as root, and type:
   yum install bc

That should fix the graphing issues. Note that this does not apply to versions of Nagios XI later than 1.8.

  • Run the command manually. Try running the command that Nagios XI runs to check status of a device. For instance, when monitoring a router or switch, Nagios XI uses the check_rrdtraf plugin. Test running this plugin manually by navigating to your libexec directory and running a check, similar to the following:
   ./check_rrdtraf -f '/var/lib/mrtg/192.168.6.1_1.rrd' -w 1 -c 2

This should return something that looks like:

   OK - Current BW in: 1.57Kbps Out: 365.41bps|in=1.573002Kb/s;1;2 out=365.413424b/s;1;2

If it gives errors, then that is the problem. Fix the issues the error gives and then Nagios XI can start graphing performance data.

  • Check perfdata directory permissions. Nagios XI needs to be able to write to its nagios/share/perdata/ directory. Check the file permissions on that directory and its subdirectories. For example:
   ll /usr/local/nagios/share/perfdata

Should return something like this:

    drwxrwxrwx 2 nagios nagios 4096 Oct 18 17:01 192.168.5.1

drwxrwxrwx 2 nagios nagios 4096 Oct 18 17:02 192.168.5.4 drwxrwxrwx 2 nagios nagios 4096 Oct 17 15:36 imap.fusemail.net drwxrwxrwx 2 nagios nagios 4096 Oct 18 17:02 localhost

If those folders are not writable and readable by Nagios, then that is problem and you should set write and read access for Nagios. Please note that all files contained in these folders also needs to be writable and readable by nagios.

  • Reset File Permissions

Execute the following command to reset your configuration file permissions.

 /usr/local/nagiosxi/scripts/reset_config_perms

You can also view if you have any permissions related issues by accessing the Admin->Check File Permissions page in the XI interface (v1.3g+).

  • Make sure you have not removed or renamed the nagiosadmin user. This user is the nagios equivalent to 'root user' and should never be removed.
  • Some users reported that editing the following lines in their /usr/local/nagios/etc/nagios.cfg file fixed their graphing issues:
 service_perfdata_file_processing_command=process-service-perfdata-file-bulk
 host_perfdata_file_processing_command=process-host-perfdata-file-bulk

Change To

 service_perfdata_file_processing_command=process-service-perfdata-file-pnp-bulk
 host_perfdata_file_processing_command=process-host-perfdata-file-pnp-bulk
  • Make sure your password for Nagios XI only contains alpha-numeric characters. Some users have reported graphs disappearing from using special characters, creating a permissions issue.
  • Performance graphs are pulled via an internal proxy, so users with their Nagios server behind their own proxy or using strict SSL settings may experience problems viewing graphs. If you're using an environment with a proxy or SSL and having issues viewing graphs post the problem to our support forums and specify your use of proxy or SSL right away.
  • Having an internal DNS hostname that is not defined on the XI server can also cause problems with internal proxy call. If you've defined a custom DNS host entry for your XI server, make sure it's defined in your /etc/hosts file as well. For further information on this, contact our support team at support.nagios.com/forum.
Network Performance Graphs Are Displayed But Have No Data

2011R3.2 and 3.3 issues graphs display but are empty. Try running the following commands to see if an excessive amount of performance data files have built up.

 cd /usr/local/nagios/var/spool/xidpe
 ls -f | wc -l

If the file count is very large, run the following commands, which should restore regular performance graphing.

 cd /usr/local/nagios/var/spool
 rm -rf xidpe
 mkdir xidpe
 chown nagios.nagios xidpe
 chmod 755 xidpe

Only Switch and Router Graphs display but have no data

A fuller description of this problem is when you are monitoring a switch or router, but its bandwidth graphs are always zero when you know for sure they should have data. Keep in mind, be absolutely sure that the graphs should have data.

  • Make sure the /var/lock/mrtg directory exists. It has been witnessed that this directory will occasionally disappear. It is a trivial matter recreating it.
   mkdir /var/lock/mrtg
  • Make sure none of the mrtg.cfg entries are using SNMP v2c. Older verions of the Switch Wizard called mrtg with arguments for SNMPv2c, which MRTG does not use. Open up /etc/mrtg/mrtg.cfg and look for
   Target[www.hostaddress.com]: 1:SNMP_Community_String@www.hostaddress.com:::::1

Notice that after the multitude of colons, there is a 1, this represents the SNMP version MRTG will use to poll the device. If this is instead 2c, change it to 2 and save the file. This will need to be done to every metric that is affected by being created with 2c.


Can I Migrate Performance Data From A Different Install?

RRD performance data files are compiled binaries, so for a simple file transfer a user would have to have the architecture match on both machines. If you want to migrate files from a 32bit to 64bit machine, you'll have to convert the data to XML and import it into RRD's on the new machine. Forum user srrhd was kind enough to supply the commands used for a working migration:

On the old 32bit machine:

 cd /usr/local/nagios/share/perfdata/
 for i in `find -name "*.rrd"`; do rrdtool dump $i > $i.xml; done
 tar -cvzf perfdata.tar.gz */*.rrd.xml
 for i in `find -name "*.rrd.xml"`; do rm -f $i; done

Then transfer the archive to the new server in the same directory. On the new x_64 server:

 cd /usr/local/nagios/share/perfdata/
 for i in `find -name "*.rrd"`; do rm -f $i; done
 tar -xvzf perfdata.tar.gz
 for i in `find -name "*.rrd.xml"`; do rrdtool restore $i `echo $i |sed s/.xml//g`; done
 for i in `find -name "*.rrd"`; do chown nagios:nagios $i; done
 for i in `find -name "*.rrd.xml"`; do rm -f $i; done

Notification Problems

Basic Troubleshooting Steps

1. Check if Notifications are enabled globally - click on the "Monitoring Process" from the left panel menu and make sure you see a green dot next to the Notifications in the "Monitoring Engine Process" window. You can enable/disable Notifications by clicking on the "Action" button on the right hand side.

2. Check if Notifications are enabled for the user currently logged into Nagios XI - click on the username in the upper right corner next to "Logged in as: ...", then click on "Notification Preferences" under "Notification Options" from the left panel menu. Make sure that the "Enable Notifications" check-box is checked.

3. Check if Notifications are enabled for a particular host/service. If you are having issues with Notifications for a particular Host or Service, log into the Core Config Manager and click on "Hosts" or "Services" under "Monitoring" from the left panel menu. Find your Host or Service and click on the "Modify" Action button to the right. Click on "Alert Settings" tab and verify that the "on" radio button next to the "Notification Enabled" is selected.

Note: If you are having issues with many hosts and services, you should check the templates you are using - "xiwizard_generic_host" and "xiwizard_generic_service" should be the first ones to be checked. Any changes you make in these templates will affect all hosts and services that reference them. You can override this by modifying the host or service configuration itself. If you need to know more on the topic, please read the full explanation of Nagios object inheritance here: http://nagios.sourceforge.net/docs/3_0/objectinheritance.html

Nagios Admin Account Notifications Not Controlled Through XI
  • The nagiosadmin user was set to use the generic_template contact template, which resulted in notifications not being controlled through the XI interface.
    This can be corrected by changing the user's contact template to be xi_generic_template is the Core Config Manager. This bug was corrected in 2009R1.2 and only effects systems that had/have previous versions installed.
Email Notifications Are Not Going Out

This can happen for a variety of reasons:

  • The nagiosadmin is set to use the generic_template contact template.
    This should be xi_generic_template, and can be modified by using the Core Config Manager. This bug was corrected in 2009R1.2 and only effects systems that had/have previous versions installed.
  • Outbound SMTP connections may be blocked by your border firewall
  • Unauthenticated SMTP relaying may be denied somewhere downstream - try switching email methods from Sendmail to SMTP in the admin section
Test Emails Fail, "Invalid address" Error

We identified a bug in 1.9 and some earlier versions where test emails to addresses like "root@localhost" or "user@xiserver" will fail to send because they fail email address validation. The email address needs to have some sort of domain at the end of it to pass validation and send. The browser may falsely display a success message for Users testing from their "Send Test Notification" page, while the browser will get an error message if a user runs the test from the Admin->Manage Email Settings->Send A Test Email page. This bug will be fixed in R1.10, but a workaround in the meantime would be to make sure users have the Nagios XI Sending Address in the Admin->Manage Email Settings page set to an email address with a FDQN OR the address listed below will also work:

 Nagios XI <root@localhost.localdomain>

Make sure initial setup for the Admin->Manage Email Settings page has been done and that you've pressed Update on the email settings.

This bug can be identified by a debug message showing up at the top of the test email page that says "Invalid address:".

This bug is specific to installations using version of PHP 5.2+.

XI Display Problems

Tables Displaying A Count, But No Results

A recent issue has been identified where characters outside of the ASCII table are being generated by some of the check plugins, which causes an issue with XI's XML generation. The result is a table with a returned count of services, but no actual table data. This issue can be verified by checking the following url:

 http://<serveraddress>/nagiosxi/backend/?cmd=getservicestatus

If this XML page returns an error, it should identify the line number of the issue which can be found in the page source. Below is a code patch that will be included in the next update of XI. Paste this code as a replacement to the xmlentities() function on line 30 of the /usr/local/nagiosxi/html/includes/utilsx.inc.php

 function xmlentities($string){
       $data=str_replace ( array ( '&', '"', "'", '<', '>' ), 
        array ( '&' , '"', ''' , '<' , '>' ), $string );
       preg_match_all('/([\x09\x0a\x0d\x20-\x7e]'. // ASCII characters
       '|[\xc2-\xdf][\x80-\xbf]'. // 2-byte (except overly longs)
       '|\xe0[\xa0-\xbf][\x80-\xbf]'. // 3 byte (except overly longs)
       '|[\xe1-\xec\xee\xef][\x80-\xbf]{2}'. // 3 byte (except overly longs)
       '|\xed[\x80-\x9f][\x80-\xbf])+/', // 3 byte (except UTF-16 surrogates)
       $data, $clean_pieces );
       $clean_output = join('?', $clean_pieces[0] );
       return $clean_output;
       }


Problems with Check Commands

How To Test Check Commands From The Command-line

Okay, you'll need to go through a few steps to establish what exactly is being run. Grab some paper to note settings as you go. Start by going to the Core Config Manager (under "Configure"), under Services in the left sidebar, find the service in question, and click the crossed tools "Configure" icon. On the "Common Settings" tab, note what it says for "Command view", the values of the eight ARG variables, and anything listed under "Additional templates". Now, in the left sidebar again, click "Templates -> Service templates", and find any that were listed on the previous step. If any of the ARG variables that were blank on the first page are filled in here, write down the value on the template. Repeat this step if any of the templates in turn have templates listed on their definitions. Similarly, if the Check command and Command view were blank, fill them in from the template.

Now, starting with what you had for "Command view", replace $USER1$ with /usr/local/nagios/libexec , and replace $HOSTADDRESS$ with the IP address of the host this service is associated with.

As an example, I have a host called "Server Room", with an IP address of 192.168.5.254, and am running a simple ping check against it. For "Check command" and "Command view" they're blank, $ARG5$ = -p 5, and for templates it has "xiwizard_websensor_ping_service". The template for xiwizard_websensor_ping_service has a "Check command" of "check_xi_service_ping" and a "Command view" of '$USER1$/check_icmp -H $HOSTADDRESS$ -w $ARG1$,$ARG2$ -c $ARG3$,$ARG4$ $ARG5$', with $ARG1$ = 3000.0, $ARG2$ = 80%, $ARG3$ = 5000.0, $ARG4$ = 100%, $ARG5$ = -p 8, and a template of "xiwizard_generic_service". The "xiwizard_generic_service" template has a check command of "check_xi_service_none" and a command view of '$USER1$/check_dummy 0 "Nothing to monitor"', with blank args and no additional template. Nothing gets filled in from this template because all of the values it defines are already defined in a higher-priority setting.

Here, the first step is to look at '$USER1$/check_icmp -H $HOSTADDRESS$ -w $ARG1$,$ARG2$ -c $ARG3$,$ARG4$ $ARG5$'. Step two fills in $ARG5$ from the service definition, and we get '$USER1$/check_icmp -H $HOSTADDRESS$ -w $ARG1$,$ARG2$ -c $ARG3$,$ARG4$ -p 5'. Step three gets args 1-4 from the xiwizard_websensor_ping_service template, giving '$USER1$/check_icmp -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5'. The $ARG5$ is left alone because it was already set. Step four does nothing - the last template doesn't have any new info. Step five is to fill in the macros, so you get '/usr/local/nagios/libexec/check_icmp -H 192.168.5.254 -w 3000.0,80% -c 5000.0,100% -p 5'. That's your full check command.

Now, log into your Nagios XI server as root, either on a direct terminal or through SSH. Enclose your command in single quotes like I've been doing here, put su -c before it and nagios after it, and hit enter. It should look something like this:


 [root@demo ~]# su -c '/usr/local/nagios/libexec/check_icmp -H 192.168.5.254 -w 3000.0,80% -c 5000.0,100% -p 5' nagios
 OK - 192.168.5.254: rta 50.903ms, lost 0%|rta=50.903ms;3000.000;5000.000;0; pl=0%;80;100;;
 [root@demo ~]#


Obviously that will be filled in with different details based on the check you're trying to run, but hopefully that demonstrates the progression of how to build the line.


Problems with $ Signs in the Check Command

(Solution posted by Dietmar Lang)

In your service definition file, you may need to pass a $ symbol as an argument to a service check. For example, MS SQL Server instances are named "MSSQL$INSTANCE1". Your service definition would look like this: check_command

 check_nt!SERVICESTATE!-d SHOWALL -l MSSQL$INSTANCE 

This will not work.

For Nagios 3, add two backslashes and a second dollar (\$) symbol, like this: check_command check_command

 check_nt!SERVICESTATE!-d SHOWALL -l MSSQL\\$$INSTANCE


Windows Memory Check Values Doubled

(contributed by Forum user GreatWolfResorts)

This is a result of how the check_nt plugin calculates memory values. The preferred solution for most users seems to be to use the check_nrpe plugin to distinguish the memory types.

Quoted from GreatWolfResorts: I essentially created the following custom command:

check_xi_service_nrpe:

 $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -c $ARG1$ -a MaxWarn=$ARG2$% MaxCrit=$ARG3$% $ARG4$ $ARG5$
 $ARG4$ = "type=physical"

Note: You will need to enable some NRPE commands in the nsc.ini file on the remote device. Specifically: allow_arguments=1

Alternatively, a full understanding of the check_nt MEMUSE command helps when reviewing the values returned. Windows refers to the sum of memory and swap files, that is, the entire available virtual memory. Windows regularly swaps program and data code from the main memory, even when it still has spare reserves. In this respect the load of the entire virual memory in Windows is the more important parameter to observe over simply physical or swap.

So in the end, the values returned weren't necessarily a bug in NagiosXI or nsclient++, but rather a view of the virtual memory of the machine.

Hope this helps!

Additional Documentation: [Enabling NRPE with NSClient]

Windows Event Log Check

WMI can be used to gather information from the Windows Event Log. Here are some example command definitions for use with check_wmi_plus.

Check Windows event log system for errors in the last 4 hours. Warn on 1 occurrence, critical if 6 or more.

 check_xi_service_wmiplus!administrator!password!checkeventlog!-a system -o 2 -3 4 -w 1 -c 6

Check Windows event log application for errors in last 1 hour. Warn on 3 occurrences, critical on 6 or more.

 check_xi_service_wmiplus!administrator!password!checkeventlog!-a application -o 2 -3 4 -w 3 -c 6
Linux Cached Memory Not Added to Free Memory

It is normal for Linux to "borrow" unused memory for disk caching. This may however create false "Warning" or "Critical" alerts, even though you are NOT low on memory. In order to fix this, we have modified the "custom_check_mem" script, part of our Linux agent install script by adding an optional flag [-n|--nocache]. Basically, cached memory is added to the free memory when you use the "-n" flag.

 Usage: custom_check_mem [-w|--warning]<percent free> [-c|--critical]<percent free> [-n|--nocache]

If you are downloading a new copy of our Linux agent, the updated "custom_check_mem" will be included. If you already installed the Linux agent, you can just download the updated "custom_check_mem" from here.

Copy the new script over the old "custom_check_mem".

Go to the Core Config Manager->Monitoring->Services->Memory Usage->Modify and under the "Common Settings" tab modify the $ARG2$ field by adding a "-n" flag.

For example, if you had:

 -a '-w 20 -c 10'

change it to:

 -a '-w 20 -c 10 -n'

Click on "Save" and "Apply Configuration".

Note: One gotcha - make sure the "custom_check_mem" has Unix EOL before you copy it over.

Other Issues

Upgrade to 2011R3.x Issues

If you're experiencing any of the following issues after an upgrade from Nagios XI 2011r2.x to 3.x:

  • Missing hosts or services or status data
  • Takes a VERY long time to Apply Configuration or restart the Nagios process
  • Unusually high CPU load
  • A flood of messages in the /var/log/messages related to ndo2db

Then you may need to manually set a few kernel settings on your system. In Nagios XI 2011r3.x+ the Ndoutils subcomponent now uses asynchronous writes to log status information to the database, and these messages are sent to the Linux kernel's message queue. Our upgrade scripts will tune the kernel settings automatically as of 2011r3.2, but in the event that you see the above symptoms on your system, we recommend applying the following settings to your system.

  • Open /etc/sysctl.conf with a text editor. Edit the file to match the following values:
 # Controls the maximum size of a message, in bytes
 kernel.msgmnb = 131072000
 
 # Controls the default maxmimum size of a mesage queue
 kernel.msgmax = 131072000
 
 # Controls the maximum shared segment size, in bytes
 kernel.shmmax = 4294967295
 
 # Controls the maximum number of shared memory segments, in pages
 kernel.shmall = 268435456
 ## The maximum number of messages allowed in any one message queue
 kernel.msgmni = 256000


Note: If you don't have these entries in the "/etc/sysctl.conf" file, just add them to the end of the file.

  • After these settings are saved to the file, run:
 sysctl -p

To apply the new settings. If the system still appears to be working improperly, reboot the machine.

Login Screen Keeps Redirecting To Itself

The web browser keeps redirecting to the login screen even after entering login credentials. This has been noticed in Internet Explorer.

Nagios XI uses cookies to save session state. These cookies are set to expire after 30 minutes. If the time on the Nagios XI server is incorrect, the cookies returned to the client's browser might appear to be expired due to the time difference between the client's computer and the Nagios XI server. Solution: Fix the time on the Nagios XI server to ensure it is correct.

Check Services Being Orphaned

Some users have encountered large numbers of warning messages that accumulate quickly that read as follows:

Warning: The check of service <Your Service> on host <Your Host> looks like it was orphaned (results never came back). I'm scheduling an immediate check of the service..

This is most likely caused by multiple instances of Nagios running. To fix this kill all instances of Nagios and then restart the process.

 killall -9 nagios

Then restart Nagios from the Admin menu of the web interface.

Related forum post can be read here.


If the issue continues to persist after reboots and restarts of the Nagios service, then the issue is most likely caused by either a memory leak in embedded perl, or system ulimit restrictions. Symptoms can include the /tmp directory filling up quickly with check* files, and the following errors in the nagios log.

 [1331905537] Warning: The check of service 'SERVICE' on host 'NAMESERVER' looks like it WAS 
 orphaned (results never Came back). I'm scheduling an immediate check of the service ...
 [1331755699] Warning: The check of service 'SWAP' on host 'nameserver' not could be due to Performed 
 to fork () error 'Resource temporarily unavailable'. The check will be rescheduled. 

Try the following solutions:

Edit /etc/security/limits.conf

 @nagios hard memlock 128     #locked memory
 @nagios soft memlock 128
 @nagios soft nofile 4096      #open files
 @nagios hard nofile 4096
 @nagios hard nproc 4096     #max user processes
 @nagios soft nproc 4096
 @nagios hard stack 20480     #stack size
 @nagios soft stack 20480

and restart the server. Run

 ulimit -a 

to verify that the new settings are in place.


And also update the settings in your nagios.cfg file to match the following:

 enable_embedded_perl=0
 use_embedded_perl_implicitly=0
Postgresql: Postmaster CPU Is High or "Transaction wraparound limit" in log

Although Nagios XI performance routine database maintenance on the postgres data tables, if you notice either a high CPU usage for the postmaster process, or a repeated error message in the /var/lib/pgsql/data/pg_log file that says "transaction ID wrap limit is 2147484146", then you may need to perform a manual VACUUM of the postgres databases. Run the following commands from the command line:

 psql nagiosxi nagiosxi
 VACUUM;
 VACUUM ANALYZE;
 VACUUM FULL;
 \q

You will see messages like the following when running the above commands:

 WARNING:  skipping "pg_authid" --- only table or database owner can vacuum it

This is normal. You may need to run the above commands more than once if the CPU usage from postmaster is extremely high.

Next, vacuum the tables as the postgres user.

 psql postgres postgres
 VACUUM;
 VACUUM ANALYZE;
 VACUUM FULL;
 \q

XI Component/Addon Problems

Website Wizard Content Check Failure

Some users have reported website content checks being blocked by the "dotDefender" application. See the following forum thread for the solution. Website Wizard Content Check Failure

Plugin/Component/Wizard Installation Problems

When plugins, components or wizards are not installed through the proper menus, this creates problems in Nagios XI, such as "wiping out" all wizards, so they can not be viewed in the Web interface, blank pages in the Web browser and other weird behaviors.

One common mistake is installing a component in place of the wizard and vice versa.

The proper way of doing it is: download the plugin, component or wizard you need to install, go to the "Admin" menu and then select the proper sub-menu from the left panel under the "System Extensions":

for plugins -> "Manage Plugins" -> "Browse" (select your plugin installation file) -> "Open" -> "Upload Plugin"

for components -> "Manage Components" -> "Browse" (select your component installation file) -> "Open" -> "Upload Component"

for wizards -> "Manage Config Wizards" -> "Browse" (select your wizard installation file) -> "Open" -> "Upload Wizard"

Note: Don't unzip the installation file prior to selecting it through "Browse". Also, don't rename the installation files. This will cause the installation to fail. The name of the file should be: "somename".zip. If you had a previous copy of the file and you download it again, your new file will be named "somename"(1).zip, which will not work.

If you already made a mistake and erroneously installed a component in place of the wizard or vice versa, here is what you should do:

Remove the problematic component/wizard by running in terminal as a root:

 # rm -rf /usr/local/nagiosxi/html/includes/components/"somecomponent"
 # rm -rf /usr/local/nagiosxi/html/includes/configwizards/"somewizard"

Try installing the component/wizard again.

If you have blank pages in the web browser, this usually means there is a PHP error. Run:

 # tail /var/log/httpd/error_log

right after loading that page to see what the errors are.

Sometimes, when you try to install a plugin you may receive an error message: "Plugin could not be installed - directory permissions may be incorrect". In order to check the permissions of your "libexec" directory, run in terminal:

 # ls -l /usr/local/nagios

The owner of "libexec" directory should be nagios:nagios and the permissions should be set to 775 (drwxrwxr-x). If this is not what you have, run in terminal:

 # chmod 775 /usr/local/nagios/libexec
 # chown nagios:nagios /usr/local/nagios/libexec

"Event Data Is Stale"

We've had a known bug relating to event data in versions 2009R1.4B-2011R1.1. This bug has been patched and will be available in releases later than the versions posted above, but if you're experiencing this error, and/or the nagios service is taking an excessively long time to start, you may have a corrupted mysql table that needs repair. We suggest taking the following steps.

Stop the following services

 service nagios stop
 service ndo2db stop
 service mysqld stop

Run the our repair script for mysql tables.

 /usr/local/nagiosxi/scripts/repairmysql.sh nagios

Unzip and copy the the following dbmaint file to /usr/local/nagiosxi/cron/. This will overwrite the previous version.

 cd /tmp
 wget http://assets.nagios.com/downloads/nagiosxi/patches/dbmaint.zip
 unzip dbmaint.zip
 chmod +x dbmaint.php
 cp dbmaint.php /usr/local/nagiosxi/cron

Run the following commands:

 service mysqld start
 rm -f /usr/local/nagiosxi/var/dbmaint.lock
 /usr/local/nagiosxi/cron/dbmaint.php

After running this script, restart services.

 service ndo2db start
 service nagios start

However, if you see any error output from this script, similar to this one:

 SQL: DELETE FROM nagios_logentries WHERE logentry_time < FROM_UNIXTIME(1293570334)
    SQL:         SQL Error [ndoutils] :</b> Table './nagios/nagios_logentries' is marked 
    as crashed and last (automatic?) repair failedCLEANING ndoutils TABLE 'notifications'...

you may need to run a force repair on the tables:

 service mysqld stop
 cd /var/lib/mysql/nagios
 myisamchk -r -f nagios_<corrupted_table>
 
 service mysqld start
 rm -f /usr/local/nagiosxi/var/dbmaint.lock
 /usr/local/nagiosxi/cron/dbmaint.php  

If problems continue to persist, contact our support team at our support forums.

Bandwidth Usage for Offloaded MySQL

We don't have an official documentation for benchmarks on bandwidth usage for a Nagios server, but the following specs were recorded and submitted by a user for network traffic between a Nagios XI server and an offloaded MySQL server. Thanks Stephen Wallace for contributing this!

  • 500 hosts, 10 services each at 5mn interval (5500 checks)
  • Breaks down to around 18 checks per second
  • Produces around 3MB of network traffic daily between Nagios and MySQL

"Still have questions?"

If you haven't found an answer to your question, you can check the Nagios XI Manuals:

Nagios XI User Guide

Nagios XI Administrator Guide


// ?>