Page 1 of 2
SNMP checks fails against Linux servers
Posted: Tue Feb 21, 2017 10:04 am
by vuservicedesk
Hi,
Our SNMP checks against Linux boxes are failing when issued from Nagios:
: ERROR: Description/Type table : No response from remote host
If we try same check from outside of Nagios (SNMP walk) everything works fine.
We are experiencing this issue when running checks against a bunch of Linux servers.
Nagios server is on physical machine. It's updated.
We checked performance and network utilisation on this box and it looks like the server is underutilised (nothing to worry about).
Could you please advise us what else we should check? What may be causing the issue?
Re: SNMP checks fails against Linux servers
Posted: Tue Feb 21, 2017 11:47 am
by bwallace
"No response from host" usually requires the following steps to get to the bottom of it:
1) check what snmp packages you have installed on the remote box. Run the following command, and show us the output:
yum list installed | grep snmp
2) Is snmpd running?
service snmpd status
3) To rule out a wrongly configured snmpd.conf, do the following simple test. Make a backup of your original snmpd.conf file:
cd /etc/snmp
cp -p snmpd.conf snmpd.conf.orig
4) Open the snmpd.conf in a text editor, clear everything, and add one line only:
rocommunity public x.x.x.x
(where x.x.x.x is the Nagios XI server IP address.)
5). Restart snmpd:
service snmpd restart
6). Try snmpwalk from Nagios XI server:
snmpwalk -v 2c -c public <client ip>
Was the walk successful?
Re: SNMP checks fails against Linux servers
Posted: Wed Feb 22, 2017 7:03 am
by vuservicedesk
But as I said, snmpwalk works fine.
One additional inormation:
-it works on those Linux machines "from time to time"
-when it stops, we have to wait very long time or restart nagios (after nagios restart it is usually working again for some time - sometimes for days, sometimes for half an hour)
Re: SNMP checks fails against Linux servers
Posted: Wed Feb 22, 2017 1:35 pm
by ssax
When the issue is occurring, please send the output of these commands:
Code: Select all
ps aux | grep nagios.cfg
ps aux | grep ndo
ps aux | grep defunct
ipcs -q
tail -n200 /var/log/messages /usr/local/nagios/var/nagios.log
su - nagios
ulimit -a
Also, please send a copy of your profile, you can download it by going to
Admin > System Config > System Profile and click the
Download Profile button in the top right corner.
Thank you
Re: SNMP checks fails against Linux servers
Posted: Thu Feb 23, 2017 8:34 am
by vuservicedesk
So we have this situation now.
Notice long-pkgr-p001 server in logs.
And those are requested informations:
ps aux | grep nagios.cfg
nagios 4328 0.5 0.1 83160 32832 ? Ss 10:04 1:09 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 4415 0.0 0.0 82636 3200 ? S 10:04 0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
ps aux | grep ndo
nagios 2994 0.0 0.0 48752 2104 ? Ss Feb15 0:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg -f
nagios 4384 0.0 0.0 48892 1856 ? S 10:04 0:04 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg -f
nagios 4385 0.6 0.0 49572 2736 ? S 10:04 1:22 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg -f
ps aux | grep defunct
this returns nothing - no zombies
ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0x40000400 393216 nagios 600 0 0
tail -n200 /var/log/messages /usr/local/nagios/var/nagios.log
Code: Select all
==> /var/log/messages <==
Feb 23 13:20:34 long-nagios-p001 ndo2db: Trimming hostchecks.
Feb 23 13:20:34 long-nagios-p001 ndo2db: Trimming eventhandlers.
Feb 23 13:20:37 long-nagios-p001 systemd: Created slice user-448804481.slice.
Feb 23 13:20:37 long-nagios-p001 systemd: Starting user-448804481.slice.
Feb 23 13:20:37 long-nagios-p001 systemd-logind: New session 110736 of user mmadmin.
Feb 23 13:20:37 long-nagios-p001 systemd: Started Session 110736 of user mmadmin.
Feb 23 13:20:37 long-nagios-p001 systemd: Starting Session 110736 of user mmadmin.
Feb 23 13:20:37 long-nagios-p001 dbus[1153]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Feb 23 13:20:37 long-nagios-p001 dbus-daemon: dbus[1153]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Feb 23 13:20:37 long-nagios-p001 dbus[1153]: [system] Successfully activated service 'org.freedesktop.problems'
Feb 23 13:20:37 long-nagios-p001 dbus-daemon: dbus[1153]: [system] Successfully activated service 'org.freedesktop.problems'
Feb 23 13:20:38 long-nagios-p001 dbus[1153]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service'
Feb 23 13:20:38 long-nagios-p001 dbus-daemon: dbus[1153]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service'
Feb 23 13:20:38 long-nagios-p001 systemd: Starting Fingerprint Authentication Daemon...
Feb 23 13:20:38 long-nagios-p001 dbus[1153]: [system] Successfully activated service 'net.reactivated.Fprint'
Feb 23 13:20:38 long-nagios-p001 dbus-daemon: dbus[1153]: [system] Successfully activated service 'net.reactivated.Fprint'
Feb 23 13:20:38 long-nagios-p001 systemd: Started Fingerprint Authentication Daemon.
Feb 23 13:20:38 long-nagios-p001 fprintd: Launching FprintObject
Feb 23 13:20:38 long-nagios-p001 fprintd: ** Message: D-Bus service launched with name: net.reactivated.Fprint
Feb 23 13:20:38 long-nagios-p001 fprintd: ** Message: entering main loop
Feb 23 13:20:41 long-nagios-p001 nagios: SERVICE ALERT: long-nagios-p001;Total Processes;WARNING;SOFT;3;PROCS WARNING: 403 processes with STATE = RSZDT
Feb 23 13:20:52 long-nagios-p001 nagios: SERVICE ALERT: long-pkgr-p001;/ Disk Usage;OK;SOFT;4;/: 43%used(21496MB/50268MB) (<90%) : OK
Feb 23 13:21:01 long-nagios-p001 nagios: SERVICE ALERT: long-mam-p006;CPU Usage;OK;HARD;10;12 CPU, average load 63.8% < 80% : OK
Feb 23 13:21:01 long-nagios-p001 systemd: Started Session 110737 of user nagios.
Feb 23 13:21:01 long-nagios-p001 systemd: Starting Session 110737 of user nagios.
Feb 23 13:21:01 long-nagios-p001 systemd: Started Session 110738 of user nagios.
Feb 23 13:21:01 long-nagios-p001 systemd: Starting Session 110738 of user nagios.
Feb 23 13:21:01 long-nagios-p001 systemd: Started Session 110739 of user nagios.
Feb 23 13:21:01 long-nagios-p001 systemd: Starting Session 110739 of user nagios.
Feb 23 13:21:01 long-nagios-p001 systemd: Started Session 110740 of user nagios.
Feb 23 13:21:01 long-nagios-p001 systemd: Starting Session 110740 of user nagios.
Feb 23 13:21:01 long-nagios-p001 systemd: Started Session 110742 of user nagios.
Feb 23 13:21:01 long-nagios-p001 systemd: Starting Session 110742 of user nagios.
Feb 23 13:21:01 long-nagios-p001 systemd: Started Session 110741 of user nagios.
Feb 23 13:21:01 long-nagios-p001 systemd: Starting Session 110741 of user nagios.
Feb 23 13:21:01 long-nagios-p001 systemd: Started Session 110745 of user nagios.
Feb 23 13:21:01 long-nagios-p001 systemd: Starting Session 110745 of user nagios.
Feb 23 13:21:01 long-nagios-p001 systemd: Started Session 110744 of user nagios.
Feb 23 13:21:01 long-nagios-p001 systemd: Starting Session 110744 of user nagios.
Feb 23 13:21:01 long-nagios-p001 systemd: Started Session 110743 of user nagios.
Feb 23 13:21:01 long-nagios-p001 systemd: Starting Session 110743 of user nagios.
Feb 23 13:21:09 long-nagios-p001 fprintd: ** Message: No devices in use, exit
Feb 23 13:21:14 long-nagios-p001 nagios: job 836 (pid=15469): read() returned error 11
Feb 23 13:21:25 long-nagios-p001 su: (to nagios) mmadmin on pts/0
Feb 23 13:21:35 long-nagios-p001 ndo2db: Trimming timedevents.
Feb 23 13:21:35 long-nagios-p001 ndo2db: Trimming systemcommands.
Feb 23 13:21:35 long-nagios-p001 ndo2db: Trimming servicechecks.
Feb 23 13:21:35 long-nagios-p001 ndo2db: Trimming hostchecks.
Feb 23 13:21:35 long-nagios-p001 ndo2db: Trimming eventhandlers.
Feb 23 13:21:35 long-nagios-p001 systemd: Created slice user-448810726.slice.
Feb 23 13:21:35 long-nagios-p001 systemd: Starting user-448810726.slice.
Feb 23 13:21:35 long-nagios-p001 systemd-logind: New session 110746 of user jfadmin.
Feb 23 13:21:35 long-nagios-p001 systemd: Started Session 110746 of user jfadmin.
Feb 23 13:21:35 long-nagios-p001 systemd: Starting Session 110746 of user jfadmin.
Feb 23 13:21:38 long-nagios-p001 nagios: SERVICE ALERT: long-nagios-p001;Total Processes;WARNING;HARD;4;PROCS WARNING: 404 processes with STATE = RSZDT
Feb 23 13:21:39 long-nagios-p001 dbus[1153]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service'
Feb 23 13:21:39 long-nagios-p001 dbus-daemon: dbus[1153]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service'
Feb 23 13:21:39 long-nagios-p001 systemd: Starting Fingerprint Authentication Daemon...
Feb 23 13:21:39 long-nagios-p001 dbus[1153]: [system] Successfully activated service 'net.reactivated.Fprint'
Feb 23 13:21:39 long-nagios-p001 dbus-daemon: dbus[1153]: [system] Successfully activated service 'net.reactivated.Fprint'
Feb 23 13:21:39 long-nagios-p001 systemd: Started Fingerprint Authentication Daemon.
Feb 23 13:21:39 long-nagios-p001 fprintd: Launching FprintObject
Feb 23 13:21:39 long-nagios-p001 fprintd: ** Message: D-Bus service launched with name: net.reactivated.Fprint
Feb 23 13:21:39 long-nagios-p001 fprintd: ** Message: entering main loop
Feb 23 13:22:01 long-nagios-p001 systemd: Started Session 110747 of user nagios.
Feb 23 13:22:01 long-nagios-p001 systemd: Starting Session 110747 of user nagios.
Feb 23 13:22:01 long-nagios-p001 systemd: Started Session 110753 of user nagios.
Feb 23 13:22:01 long-nagios-p001 systemd: Starting Session 110753 of user nagios.
Feb 23 13:22:01 long-nagios-p001 systemd: Started Session 110750 of user nagios.
Feb 23 13:22:01 long-nagios-p001 systemd: Starting Session 110750 of user nagios.
Feb 23 13:22:01 long-nagios-p001 systemd: Started Session 110751 of user nagios.
Feb 23 13:22:01 long-nagios-p001 systemd: Starting Session 110751 of user nagios.
Feb 23 13:22:01 long-nagios-p001 systemd: Started Session 110754 of user nagios.
Feb 23 13:22:01 long-nagios-p001 systemd: Starting Session 110754 of user nagios.
Feb 23 13:22:01 long-nagios-p001 systemd: Started Session 110749 of user nagios.
Feb 23 13:22:01 long-nagios-p001 systemd: Starting Session 110749 of user nagios.
Feb 23 13:22:01 long-nagios-p001 systemd: Started Session 110752 of user nagios.
Feb 23 13:22:01 long-nagios-p001 systemd: Starting Session 110752 of user nagios.
Feb 23 13:22:01 long-nagios-p001 systemd: Started Session 110748 of user nagios.
Feb 23 13:22:01 long-nagios-p001 systemd: Starting Session 110748 of user nagios.
Feb 23 13:22:01 long-nagios-p001 systemd: Started Session 110755 of user nagios.
Feb 23 13:22:01 long-nagios-p001 systemd: Starting Session 110755 of user nagios.
Feb 23 13:22:09 long-nagios-p001 fprintd: ** Message: No devices in use, exit
Feb 23 13:22:16 long-nagios-p001 nagios: SERVICE ALERT: long-pkgr-p001;Checks mounted shares;OK;SOFT;4;...............
Feb 23 13:22:36 long-nagios-p001 ndo2db: Trimming timedevents.
Feb 23 13:22:36 long-nagios-p001 ndo2db: Trimming systemcommands.
Feb 23 13:22:36 long-nagios-p001 ndo2db: Trimming servicechecks.
Feb 23 13:22:36 long-nagios-p001 ndo2db: Trimming hostchecks.
Feb 23 13:22:36 long-nagios-p001 ndo2db: Trimming eventhandlers.
Feb 23 13:22:59 long-nagios-p001 nagios: SERVICE ALERT: long-pkgr-p001;Linux Memory Usage;OK;HARD;5;Virtual memory: 66%used(15625MB/23762MB) (<80%) : OK
Feb 23 13:22:59 long-nagios-p001 nagios: SERVICE FLAPPING ALERT: long-pkgr-p001;Linux Memory Usage;STARTED; Service appears to have started flapping (21.1% change >= 20.0% threshold)
Feb 23 13:23:01 long-nagios-p001 systemd: Started Session 110756 of user nagios.
Feb 23 13:23:01 long-nagios-p001 systemd: Starting Session 110756 of user nagios.
Feb 23 13:23:01 long-nagios-p001 systemd: Started Session 110757 of user nagios.
Feb 23 13:23:01 long-nagios-p001 systemd: Starting Session 110757 of user nagios.
Feb 23 13:23:01 long-nagios-p001 systemd: Started Session 110758 of user nagios.
Feb 23 13:23:01 long-nagios-p001 systemd: Starting Session 110758 of user nagios.
Feb 23 13:23:01 long-nagios-p001 systemd: Started Session 110760 of user nagios.
Feb 23 13:23:01 long-nagios-p001 systemd: Starting Session 110760 of user nagios.
Feb 23 13:23:01 long-nagios-p001 systemd: Started Session 110759 of user nagios.
Feb 23 13:23:01 long-nagios-p001 systemd: Starting Session 110759 of user nagios.
Feb 23 13:23:01 long-nagios-p001 systemd: Started Session 110764 of user nagios.
Feb 23 13:23:01 long-nagios-p001 systemd: Starting Session 110764 of user nagios.
Feb 23 13:23:01 long-nagios-p001 systemd: Started Session 110763 of user nagios.
Feb 23 13:23:01 long-nagios-p001 systemd: Starting Session 110763 of user nagios.
Feb 23 13:23:01 long-nagios-p001 systemd: Started Session 110761 of user nagios.
Feb 23 13:23:01 long-nagios-p001 systemd: Starting Session 110761 of user nagios.
Feb 23 13:23:01 long-nagios-p001 systemd: Started Session 110762 of user nagios.
Feb 23 13:23:01 long-nagios-p001 systemd: Starting Session 110762 of user nagios.
Feb 23 13:23:37 long-nagios-p001 ndo2db: Trimming timedevents.
Feb 23 13:23:37 long-nagios-p001 ndo2db: Trimming systemcommands.
Feb 23 13:23:37 long-nagios-p001 ndo2db: Trimming servicechecks.
Feb 23 13:23:37 long-nagios-p001 ndo2db: Trimming hostchecks.
Feb 23 13:23:37 long-nagios-p001 ndo2db: Trimming eventhandlers.
Feb 23 13:23:49 long-nagios-p001 nagios: SERVICE ALERT: long-pkgr-p001;Memory Usage;OK;HARD;5;Virtual memory: 66%used(15634MB/23762MB) (<80%) : OK
Feb 23 13:23:49 long-nagios-p001 nagios: SERVICE FLAPPING ALERT: long-pkgr-p001;Memory Usage;STARTED; Service appears to have started flapping (23.4% change >= 20.0% threshold)
Feb 23 13:23:49 long-nagios-p001 nagios: SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;Memory Usage;FLAPPINGSTART (OK);xi_service_notification_handler;Virtual memory: 66%used(15634MB/23762MB) (<80%) : OK
Feb 23 13:24:01 long-nagios-p001 systemd: Started Session 110765 of user nagios.
Feb 23 13:24:01 long-nagios-p001 systemd: Starting Session 110765 of user nagios.
Feb 23 13:24:01 long-nagios-p001 systemd: Started Session 110766 of user nagios.
Feb 23 13:24:01 long-nagios-p001 systemd: Starting Session 110766 of user nagios.
Feb 23 13:24:01 long-nagios-p001 systemd: Started Session 110768 of user nagios.
Feb 23 13:24:01 long-nagios-p001 systemd: Starting Session 110768 of user nagios.
Feb 23 13:24:01 long-nagios-p001 systemd: Started Session 110767 of user nagios.
Feb 23 13:24:01 long-nagios-p001 systemd: Starting Session 110767 of user nagios.
Feb 23 13:24:01 long-nagios-p001 systemd: Started Session 110769 of user nagios.
Feb 23 13:24:01 long-nagios-p001 systemd: Starting Session 110769 of user nagios.
Feb 23 13:24:01 long-nagios-p001 systemd: Started Session 110770 of user nagios.
Feb 23 13:24:01 long-nagios-p001 systemd: Starting Session 110770 of user nagios.
Feb 23 13:24:01 long-nagios-p001 systemd: Started Session 110771 of user nagios.
Feb 23 13:24:01 long-nagios-p001 systemd: Starting Session 110771 of user nagios.
Feb 23 13:24:01 long-nagios-p001 systemd: Started Session 110773 of user nagios.
Feb 23 13:24:01 long-nagios-p001 systemd: Starting Session 110773 of user nagios.
Feb 23 13:24:01 long-nagios-p001 systemd: Started Session 110772 of user nagios.
Feb 23 13:24:01 long-nagios-p001 systemd: Starting Session 110772 of user nagios.
Feb 23 13:24:08 long-nagios-p001 nagios: SERVICE ALERT: long-pkgr-p001;CPU Usage;OK;HARD;5;24 CPU, average load 20.8% < 80% : OK
Feb 23 13:24:08 long-nagios-p001 nagios: SERVICE FLAPPING ALERT: long-pkgr-p001;CPU Usage;STARTED; Service appears to have started flapping (23.4% change >= 20.0% threshold)
Feb 23 13:24:08 long-nagios-p001 nagios: SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;CPU Usage;FLAPPINGSTART (OK);xi_service_notification_handler;24 CPU, average load 20.8% < 80% : OK
Feb 23 13:24:22 long-nagios-p001 nagios: SERVICE ALERT: long-pkgr-p001;/auto/Sending Disk Usage;OK;HARD;5;/auto/Sending: 20%used(928135MB/4576399MB) (<80%) : OK
Feb 23 13:24:22 long-nagios-p001 nagios: SERVICE FLAPPING ALERT: long-pkgr-p001;/auto/Sending Disk Usage;STARTED; Service appears to have started flapping (23.4% change >= 20.0% threshold)
Feb 23 13:24:22 long-nagios-p001 nagios: SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;/auto/Sending Disk Usage;FLAPPINGSTART (OK);xi_service_notification_handler;/auto/Sending: 20%used(928135MB/4576399MB) (<80%) : OK
Feb 23 13:24:38 long-nagios-p001 ndo2db: Trimming timedevents.
Feb 23 13:24:38 long-nagios-p001 ndo2db: Trimming systemcommands.
Feb 23 13:24:38 long-nagios-p001 ndo2db: Trimming servicechecks.
Feb 23 13:24:38 long-nagios-p001 ndo2db: Trimming hostchecks.
Feb 23 13:24:38 long-nagios-p001 ndo2db: Trimming eventhandlers.
Feb 23 13:24:50 long-nagios-p001 nagios: SERVICE ALERT: long-pkgr-p001;Swap Usage;OK;HARD;5;Swap space: 1%used(48MB/8008MB) (<80%) : OK
Feb 23 13:24:50 long-nagios-p001 nagios: SERVICE FLAPPING ALERT: long-pkgr-p001;Swap Usage;STARTED; Service appears to have started flapping (23.2% change >= 20.0% threshold)
Feb 23 13:24:50 long-nagios-p001 nagios: SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;Swap Usage;FLAPPINGSTART (OK);xi_service_notification_handler;Swap space: 1%used(48MB/8008MB) (<80%) : OK
Feb 23 13:25:01 long-nagios-p001 systemd: Started Session 110775 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Starting Session 110775 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Started Session 110774 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Starting Session 110774 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Started Session 110776 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Starting Session 110776 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Created slice user-0.slice.
Feb 23 13:25:01 long-nagios-p001 systemd: Starting user-0.slice.
Feb 23 13:25:01 long-nagios-p001 systemd: Started Session 110777 of user root.
Feb 23 13:25:01 long-nagios-p001 systemd: Starting Session 110777 of user root.
Feb 23 13:25:01 long-nagios-p001 systemd: Started Session 110779 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Starting Session 110779 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Started Session 110783 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Starting Session 110783 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Started Session 110780 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Starting Session 110780 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Started Session 110782 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Starting Session 110782 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Started Session 110781 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Starting Session 110781 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Started Session 110784 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Starting Session 110784 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Started Session 110785 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Starting Session 110785 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Started Session 110778 of user nagios.
Feb 23 13:25:01 long-nagios-p001 systemd: Starting Session 110778 of user nagios.
Feb 23 13:25:11 long-nagios-p001 systemd: Removed slice user-0.slice.
Feb 23 13:25:11 long-nagios-p001 systemd: Stopping user-0.slice.
Feb 23 13:25:39 long-nagios-p001 ndo2db: Trimming timedevents.
Feb 23 13:25:39 long-nagios-p001 ndo2db: Trimming systemcommands.
Feb 23 13:25:39 long-nagios-p001 ndo2db: Trimming servicechecks.
Feb 23 13:25:39 long-nagios-p001 ndo2db: Trimming hostchecks.
Feb 23 13:25:39 long-nagios-p001 ndo2db: Trimming eventhandlers.
Feb 23 13:26:01 long-nagios-p001 systemd: Started Session 110786 of user nagios.
Feb 23 13:26:01 long-nagios-p001 systemd: Starting Session 110786 of user nagios.
Feb 23 13:26:01 long-nagios-p001 systemd: Started Session 110787 of user nagios.
Feb 23 13:26:01 long-nagios-p001 systemd: Starting Session 110787 of user nagios.
Feb 23 13:26:01 long-nagios-p001 systemd: Started Session 110788 of user nagios.
Feb 23 13:26:01 long-nagios-p001 systemd: Starting Session 110788 of user nagios.
Feb 23 13:26:01 long-nagios-p001 systemd: Started Session 110789 of user nagios.
Feb 23 13:26:01 long-nagios-p001 systemd: Starting Session 110789 of user nagios.
Feb 23 13:26:01 long-nagios-p001 systemd: Started Session 110790 of user nagios.
Feb 23 13:26:01 long-nagios-p001 systemd: Starting Session 110790 of user nagios.
Feb 23 13:26:01 long-nagios-p001 systemd: Started Session 110791 of user nagios.
Feb 23 13:26:01 long-nagios-p001 systemd: Starting Session 110791 of user nagios.
Feb 23 13:26:01 long-nagios-p001 systemd: Started Session 110792 of user nagios.
Feb 23 13:26:01 long-nagios-p001 systemd: Starting Session 110792 of user nagios.
Feb 23 13:26:01 long-nagios-p001 systemd: Started Session 110793 of user nagios.
Feb 23 13:26:01 long-nagios-p001 systemd: Starting Session 110793 of user nagios.
Feb 23 13:26:01 long-nagios-p001 systemd: Started Session 110794 of user nagios.
Feb 23 13:26:01 long-nagios-p001 systemd: Starting Session 110794 of user nagios.
==> /usr/local/nagios/var/nagios.log <==
[1487850752] SERVICE ALERT: long-nagios-p001;Total Processes;WARNING;SOFT;1;PROCS WARNING: 401 processes with STATE = RSZDT
[1487850785] SERVICE ALERT: long-ele-p003;CPU Usage;OK;HARD;5;32 CPU, average load 78.8% < 80% : OK
[1487850785] SERVICE NOTIFICATION: aradmin;long-ele-p003;CPU Usage;OK;xi_service_notification_handler;32 CPU, average load 78.8% < 80% : OK
[1487850785] SERVICE NOTIFICATION: readmin;long-ele-p003;CPU Usage;OK;xi_service_notification_handler;32 CPU, average load 78.8% < 80% : OK
[1487850785] SERVICE NOTIFICATION: rpadmin;long-ele-p003;CPU Usage;OK;xi_service_notification_handler;32 CPU, average load 78.8% < 80% : OK
[1487850785] SERVICE NOTIFICATION: shadmin;long-ele-p003;CPU Usage;OK;xi_service_notification_handler;32 CPU, average load 78.8% < 80% : OK
[1487850785] SERVICE NOTIFICATION: TAlsop;long-ele-p003;CPU Usage;OK;xi_service_notification_handler;32 CPU, average load 78.8% < 80% : OK
[1487850810] SERVICE ALERT: long-nagios-p001;Total Processes;OK;SOFT;2;PROCS OK: 396 processes with STATE = RSZDT
[1487850961] SERVICE ALERT: long-pkgr-p001;Linux Memory Usage;CRITICAL;SOFT;1;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487851021] SERVICE ALERT: long-pkgr-p001;Linux Memory Usage;CRITICAL;SOFT;2;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487851053] SERVICE ALERT: long-pkgr-p001;Swap Usage;CRITICAL;SOFT;1;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487851078] SERVICE ALERT: long-pkgr-p001;Linux Memory Usage;CRITICAL;SOFT;3;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487851091] SERVICE ALERT: long-pkgr-p001;Checks mounted shares;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 30 seconds.
[1487851112] SERVICE ALERT: long-pkgr-p001;Swap Usage;CRITICAL;SOFT;2;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487851134] SERVICE ALERT: long-pkgr-p001;Linux Memory Usage;CRITICAL;SOFT;4;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487851163] SERVICE ALERT: long-mam-p003;CPU Usage;OK;HARD;5;24 CPU, average load 77.0% < 80% : OK
[1487851171] SERVICE ALERT: long-pkgr-p001;Swap Usage;CRITICAL;SOFT;3;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487851192] SERVICE ALERT: long-pkgr-p001;Linux Memory Usage;CRITICAL;HARD;5;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487851228] SERVICE ALERT: long-mysql-p001;MySQL Slow Queries;WARNING;SOFT;1;WARNING - 42 slow queries in 299 seconds (0.14/sec)
[1487851228] SERVICE ALERT: long-pkgr-p001;Swap Usage;CRITICAL;SOFT;4;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487851285] SERVICE ALERT: long-mysql-p001;MySQL Slow Queries;WARNING;SOFT;2;WARNING - 8 slow queries in 57 seconds (0.14/sec)
[1487851287] SERVICE ALERT: long-pkgr-p001;Swap Usage;CRITICAL;HARD;5;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487851287] SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;Swap Usage;CRITICAL;xi_service_notification_handler;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487851343] SERVICE ALERT: long-mysql-p001;MySQL Slow Queries;WARNING;SOFT;3;WARNING - 8 slow queries in 58 seconds (0.14/sec)
[1487851389] SERVICE ALERT: long-pkgr-p001;Checks mounted shares;CRITICAL;SOFT;2;CHECK_NRPE: Socket timeout after 30 seconds.
[1487851402] SERVICE ALERT: long-mysql-p001;MySQL Slow Queries;OK;SOFT;4;OK - 5 slow queries in 59 seconds (0.08/sec)
[1487851406] SERVICE ALERT: long-nagios-p001;Total Processes;WARNING;SOFT;1;PROCS WARNING: 403 processes with STATE = RSZDT
[1487851423] SERVICE ALERT: long-pkgr-p001;Memory Usage;CRITICAL;SOFT;1;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487851428] SERVICE ALERT: long-pkgr-p001;CPU Usage;UNKNOWN;SOFT;1;No answer from host
[1487851450] SERVICE ALERT: long-pkgr-p001;/auto/Sending Disk Usage;CRITICAL;SOFT;1;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487851460] Auto-save of retention data completed successfully.
[1487851464] SERVICE ALERT: long-nagios-p001;Total Processes;OK;SOFT;2;PROCS OK: 400 processes with STATE = RSZDT
[1487851479] SERVICE ALERT: long-pkgr-p001;Memory Usage;CRITICAL;SOFT;2;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487851487] SERVICE ALERT: long-pkgr-p001;CPU Usage;UNKNOWN;SOFT;2;No answer from host
[1487851508] SERVICE ALERT: long-pkgr-p001;/auto/Sending Disk Usage;CRITICAL;SOFT;2;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487851537] SERVICE ALERT: long-pkgr-p001;Memory Usage;CRITICAL;SOFT;3;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487851545] SERVICE ALERT: long-pkgr-p001;CPU Usage;UNKNOWN;SOFT;3;No answer from host
[1487851566] SERVICE ALERT: long-pkgr-p001;/auto/Sending Disk Usage;CRITICAL;SOFT;3;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487851580] SERVICE ALERT: long-pkgr-p001;/ Disk Usage;CRITICAL;SOFT;1;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487851586] SERVICE ALERT: long-pkgr-p001;Memory Usage;OK;SOFT;4;Virtual memory: 66%used(15630MB/23762MB) (<80%) : OK
[1487851598] SERVICE ALERT: long-pkgr-p001;CPU Usage;OK;SOFT;4;24 CPU, average load 16.1% < 80% : OK
[1487851614] SERVICE ALERT: long-pkgr-p001;/auto/Sending Disk Usage;OK;SOFT;4;/auto/Sending: 20%used(928069MB/4576399MB) (<80%) : OK
[1487851615] SERVICE ALERT: long-misc-p001;Memory Usage;CRITICAL;SOFT;1;ERROR: Description/Type table : No response from remote host "10.9.10.247".
[1487851673] SERVICE ALERT: long-misc-p001;Memory Usage;CRITICAL;SOFT;2;ERROR: Description/Type table : No response from remote host "10.9.10.247".
[1487851687] SERVICE ALERT: long-pkgr-p001;Checks mounted shares;CRITICAL;SOFT;3;CHECK_NRPE: Socket timeout after 30 seconds.
[1487851719] SERVICE ALERT: long-misc-p001;Memory Usage;OK;SOFT;3;Memory buffers: 0%used(0MB/5808MB) (<80%) : OK
[1487851763] SERVICE ALERT: long-nagios-p001;Total Processes;WARNING;SOFT;1;PROCS WARNING: 403 processes with STATE = RSZDT
[1487851821] SERVICE ALERT: long-nagios-p001;Total Processes;WARNING;SOFT;2;PROCS WARNING: 410 processes with STATE = RSZDT
[1487851877] SERVICE ALERT: long-nagios-p001;Total Processes;WARNING;SOFT;3;PROCS WARNING: 408 processes with STATE = RSZDT
[1487851879] SERVICE ALERT: long-pkgr-p001;/ Disk Usage;CRITICAL;SOFT;2;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487851935] SERVICE ALERT: long-nagios-p001;Total Processes;WARNING;HARD;4;PROCS WARNING: 403 processes with STATE = RSZDT
[1487851985] SERVICE ALERT: long-pkgr-p001;Checks mounted shares;CRITICAL;SOFT;4;CHECK_NRPE: Socket timeout after 30 seconds.
[1487852175] SERVICE ALERT: long-pkgr-p001;/ Disk Usage;CRITICAL;SOFT;3;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487852192] SERVICE ALERT: long-pkgr-p001;Memory Usage;CRITICAL;SOFT;1;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487852201] SERVICE ALERT: long-pkgr-p001;CPU Usage;UNKNOWN;SOFT;1;No answer from host
[1487852222] SERVICE ALERT: long-pkgr-p001;/auto/Sending Disk Usage;CRITICAL;SOFT;1;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487852233] SERVICE ALERT: long-nagios-p001;Total Processes;OK;HARD;4;PROCS OK: 396 processes with STATE = RSZDT
[1487852249] SERVICE ALERT: long-pkgr-p001;Memory Usage;CRITICAL;SOFT;2;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487852259] SERVICE ALERT: long-pkgr-p001;CPU Usage;UNKNOWN;SOFT;2;No answer from host
[1487852279] SERVICE ALERT: long-pkgr-p001;/auto/Sending Disk Usage;CRITICAL;SOFT;2;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487852283] SERVICE ALERT: long-pkgr-p001;Checks mounted shares;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30 seconds.
[1487852283] SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;Checks mounted shares;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after 30 seconds.
[1487852306] SERVICE ALERT: long-pkgr-p001;Memory Usage;CRITICAL;SOFT;3;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487852317] SERVICE ALERT: long-pkgr-p001;CPU Usage;UNKNOWN;SOFT;3;No answer from host
[1487852327] SERVICE ALERT: long-misc-p001;Memory Usage;CRITICAL;SOFT;1;ERROR: Description/Type table : No response from remote host "10.9.10.247".
[1487852337] SERVICE ALERT: long-pkgr-p001;/auto/Sending Disk Usage;CRITICAL;SOFT;3;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487852364] SERVICE ALERT: long-pkgr-p001;Memory Usage;CRITICAL;SOFT;4;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487852375] SERVICE ALERT: long-pkgr-p001;CPU Usage;UNKNOWN;SOFT;4;No answer from host
[1487852376] SERVICE ALERT: long-misc-p001;Memory Usage;OK;SOFT;2;Memory buffers: 0%used(0MB/5808MB) (<80%) : OK
[1487852396] SERVICE ALERT: long-pkgr-p001;/auto/Sending Disk Usage;CRITICAL;SOFT;4;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487852422] SERVICE ALERT: long-pkgr-p001;Memory Usage;CRITICAL;HARD;5;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487852422] SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;Memory Usage;CRITICAL;xi_service_notification_handler;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487852433] SERVICE ALERT: long-pkgr-p001;CPU Usage;UNKNOWN;HARD;5;No answer from host
[1487852433] SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;CPU Usage;UNKNOWN;xi_service_notification_handler;No answer from host
[1487852455] SERVICE ALERT: long-pkgr-p001;/auto/Sending Disk Usage;CRITICAL;HARD;5;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487852455] SERVICE NOTIFICATION: VictorOps_SE;long-pkgr-p001;/auto/Sending Disk Usage;CRITICAL;notify_victorops;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487852455] SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;/auto/Sending Disk Usage;CRITICAL;xi_service_notification_handler;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487852473] SERVICE ALERT: long-pkgr-p001;/ Disk Usage;CRITICAL;HARD;4;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487852473] SERVICE NOTIFICATION: VictorOps_SE;long-pkgr-p001;/ Disk Usage;CRITICAL;notify_victorops;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487852473] SERVICE NOTIFICATION: jfadmin;long-pkgr-p001;/ Disk Usage;CRITICAL;xi_service_notification_handler;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487852473] SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;/ Disk Usage;CRITICAL;xi_service_notification_handler;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487852493] SERVICE FLAPPING ALERT: long-mam-p006;CPU Usage;STOPPED; Service appears to have stopped flapping (3.8% change < 5.0% threshold)
[1487852603] SERVICE ALERT: lonjira01;HTTP;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds
[1487852651] SERVICE ALERT: lonjira01;HTTP;OK;SOFT;2;HTTP OK: HTTP/1.1 302 Moved Temporarily - 496 bytes in 0.002 second response time
[1487852676] SERVICE ALERT: long-pkgr-p001;Linux Memory Usage;OK;HARD;5;Virtual memory: 66%used(15638MB/23762MB) (<80%) : OK
[1487852851] SERVICE ALERT: long-pkgr-p001;Checks mounted shares;OK;HARD;5;...............
[1487852851] SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;Checks mounted shares;OK;xi_service_notification_handler;...............
[1487853010] SERVICE ALERT: long-pkgr-p001;Memory Usage;OK;HARD;5;Virtual memory: 66%used(15635MB/23762MB) (<80%) : OK
[1487853010] SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;Memory Usage;OK;xi_service_notification_handler;Virtual memory: 66%used(15635MB/23762MB) (<80%) : OK
[1487853025] SERVICE ALERT: long-pkgr-p001;CPU Usage;OK;HARD;5;24 CPU, average load 15.5% < 80% : OK
[1487853025] SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;CPU Usage;OK;xi_service_notification_handler;24 CPU, average load 15.5% < 80% : OK
[1487853041] SERVICE ALERT: long-pkgr-p001;/auto/Sending Disk Usage;OK;HARD;5;/auto/Sending: 20%used(928053MB/4576399MB) (<80%) : OK
[1487853041] SERVICE NOTIFICATION: VictorOps_SE;long-pkgr-p001;/auto/Sending Disk Usage;OK;notify_victorops;/auto/Sending: 20%used(928053MB/4576399MB) (<80%) : OK
[1487853041] SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;/auto/Sending Disk Usage;OK;xi_service_notification_handler;/auto/Sending: 20%used(928053MB/4576399MB) (<80%) : OK
[1487853077] SERVICE ALERT: long-pkgr-p001;Swap Usage;OK;HARD;5;Swap space: 1%used(47MB/8008MB) (<80%) : OK
[1487853077] SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;Swap Usage;OK;xi_service_notification_handler;Swap space: 1%used(47MB/8008MB) (<80%) : OK
[1487853129] SERVICE ALERT: long-ele-p002;CPU Usage;WARNING;SOFT;1;32 CPU, average load 86.3% > 80% : WARNING
[1487853187] SERVICE ALERT: long-ele-p002;CPU Usage;WARNING;SOFT;2;32 CPU, average load 86.6% > 80% : WARNING
[1487853245] SERVICE ALERT: long-ele-p002;CPU Usage;WARNING;SOFT;3;32 CPU, average load 85.1% > 80% : WARNING
[1487853304] SERVICE ALERT: long-ele-p002;CPU Usage;WARNING;SOFT;4;32 CPU, average load 83.5% > 80% : WARNING
[1487853361] SERVICE ALERT: long-ele-p002;CPU Usage;WARNING;HARD;5;32 CPU, average load 82.8% > 80% : WARNING
[1487853361] SERVICE FLAPPING ALERT: long-ele-p002;CPU Usage;STARTED; Service appears to have started flapping (20.1% change >= 20.0% threshold)
[1487853361] SERVICE NOTIFICATION: aradmin;long-ele-p002;CPU Usage;FLAPPINGSTART (WARNING);xi_service_notification_handler;32 CPU, average load 82.8% > 80% : WARNING
[1487853361] SERVICE NOTIFICATION: readmin;long-ele-p002;CPU Usage;FLAPPINGSTART (WARNING);xi_service_notification_handler;32 CPU, average load 82.8% > 80% : WARNING
[1487853361] SERVICE NOTIFICATION: rpadmin;long-ele-p002;CPU Usage;FLAPPINGSTART (WARNING);xi_service_notification_handler;32 CPU, average load 82.8% > 80% : WARNING
[1487853361] SERVICE NOTIFICATION: shadmin;long-ele-p002;CPU Usage;FLAPPINGSTART (WARNING);xi_service_notification_handler;32 CPU, average load 82.8% > 80% : WARNING
[1487853361] SERVICE NOTIFICATION: TAlsop;long-ele-p002;CPU Usage;FLAPPINGSTART (WARNING);xi_service_notification_handler;32 CPU, average load 82.8% > 80% : WARNING
[1487853362] SERVICE ALERT: long-pkgr-p001;/ Disk Usage;OK;HARD;4;/: 43%used(21524MB/50268MB) (<90%) : OK
[1487853387] SERVICE ALERT: long-mam-p006;CPU Usage;WARNING;SOFT;1;12 CPU, average load 82.9% > 80% : WARNING
[1487853425] SERVICE ALERT: long-nagios-p001;Total Processes;WARNING;SOFT;1;PROCS WARNING: 403 processes with STATE = RSZDT
[1487853482] SERVICE ALERT: long-nagios-p001;Total Processes;WARNING;SOFT;2;PROCS WARNING: 406 processes with STATE = RSZDT
[1487853505] SERVICE ALERT: long-mam-p006;CPU Usage;OK;SOFT;2;12 CPU, average load 64.0% < 80% : OK
[1487853517] SERVICE ALERT: lonbms01;HTTP;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds
[1487853540] SERVICE ALERT: long-nagios-p001;Total Processes;OK;SOFT;3;PROCS OK: 399 processes with STATE = RSZDT
[1487853564] SERVICE ALERT: lonbms01;HTTP;OK;SOFT;2;HTTP OK: HTTP/1.1 200 OK - 955 bytes in 0.001 second response time
[1487853658] SERVICE ALERT: long-ele-p002;CPU Usage;OK;HARD;5;32 CPU, average load 70.5% < 80% : OK
[1487854082] SERVICE ALERT: long-mysql-p001;MySQL Slow Queries;WARNING;SOFT;1;WARNING - 40 slow queries in 297 seconds (0.13/sec)
[1487854101] SERVICE ALERT: long-mam-p006;CPU Usage;WARNING;SOFT;1;12 CPU, average load 86.0% > 80% : WARNING
[1487854140] SERVICE ALERT: long-mysql-p001;MySQL Slow Queries;WARNING;SOFT;2;WARNING - 12 slow queries in 58 seconds (0.21/sec)
[1487854197] SERVICE ALERT: long-mysql-p001;MySQL Slow Queries;WARNING;SOFT;3;WARNING - 9 slow queries in 57 seconds (0.16/sec)
[1487854220] SERVICE ALERT: long-mam-p006;CPU Usage;WARNING;SOFT;2;12 CPU, average load 86.2% > 80% : WARNING
[1487854255] SERVICE ALERT: long-mysql-p001;MySQL Slow Queries;WARNING;SOFT;4;WARNING - 10 slow queries in 58 seconds (0.17/sec)
[1487854313] SERVICE ALERT: long-mysql-p001;MySQL Slow Queries;WARNING;HARD;5;WARNING - 6 slow queries in 58 seconds (0.10/sec)
[1487854313] SERVICE NOTIFICATION: nagiosadmin;long-mysql-p001;MySQL Slow Queries;WARNING;xi_service_notification_handler;WARNING - 6 slow queries in 58 seconds (0.10/sec)
[1487854339] SERVICE ALERT: long-mam-p006;CPU Usage;WARNING;SOFT;3;12 CPU, average load 87.4% > 80% : WARNING
[1487854458] SERVICE ALERT: long-mam-p006;CPU Usage;WARNING;SOFT;4;12 CPU, average load 84.8% > 80% : WARNING
[1487854470] SERVICE ALERT: lonbms01;HTTP;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds
[1487854520] SERVICE ALERT: prodapp01;VZ check import for vm;WARNING;SOFT;1;NRPE: Unable to read output
[1487854529] SERVICE ALERT: lonbms01;HTTP;CRITICAL;SOFT;2;CRITICAL - Socket timeout after 10 seconds
[1487854576] SERVICE ALERT: long-mam-p006;CPU Usage;WARNING;SOFT;5;12 CPU, average load 85.3% > 80% : WARNING
[1487854577] SERVICE ALERT: lonbms01;HTTP;OK;SOFT;3;HTTP OK: HTTP/1.1 200 OK - 955 bytes in 0.001 second response time
[1487854694] SERVICE ALERT: long-mam-p006;CPU Usage;WARNING;SOFT;6;12 CPU, average load 86.7% > 80% : WARNING
[1487854812] SERVICE ALERT: long-mam-p006;CPU Usage;WARNING;SOFT;7;12 CPU, average load 86.6% > 80% : WARNING
[1487854817] SERVICE ALERT: prodapp01;VZ check import for vm;OK;SOFT;2;vm: No XMLs waiting
[1487854923] SERVICE ALERT: long-ebkr-p002;Linux Memory Usage;OK;HARD;5;Virtual memory: 66%used(2171MB/3282MB) (<80%) : OK
[1487854930] SERVICE ALERT: long-mam-p006;CPU Usage;WARNING;SOFT;8;12 CPU, average load 87.2% > 80% : WARNING
[1487855048] SERVICE ALERT: long-mam-p006;CPU Usage;WARNING;SOFT;9;12 CPU, average load 86.4% > 80% : WARNING
[1487855061] Auto-save of retention data completed successfully.
[1487855064] SERVICE ALERT: long-pkgr-p001;Linux Memory Usage;CRITICAL;SOFT;1;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855122] SERVICE ALERT: long-pkgr-p001;Linux Memory Usage;CRITICAL;SOFT;2;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855166] SERVICE ALERT: long-mam-p006;CPU Usage;WARNING;HARD;10;12 CPU, average load 85.8% > 80% : WARNING
[1487855167] SERVICE ALERT: long-pkgr-p001;/ Disk Usage;CRITICAL;SOFT;1;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855180] SERVICE ALERT: long-pkgr-p001;Linux Memory Usage;CRITICAL;SOFT;3;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855207] SERVICE ALERT: long-mysql-p001;MySQL Slow Queries;OK;HARD;5;OK - 13 slow queries in 298 seconds (0.04/sec)
[1487855207] SERVICE NOTIFICATION: nagiosadmin;long-mysql-p001;MySQL Slow Queries;OK;xi_service_notification_handler;OK - 13 slow queries in 298 seconds (0.04/sec)
[1487855238] SERVICE ALERT: long-pkgr-p001;Linux Memory Usage;CRITICAL;SOFT;4;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855271] SERVICE ALERT: long-pkgr-p001;Checks mounted shares;CRITICAL;SOFT;1;CHECK_NRPE: Socket timeout after 30 seconds.
[1487855295] SERVICE ALERT: long-pkgr-p001;Linux Memory Usage;CRITICAL;HARD;5;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855411] SERVICE ALERT: long-pkgr-p001;Memory Usage;CRITICAL;SOFT;1;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855422] SERVICE ALERT: long-pkgr-p001;CPU Usage;UNKNOWN;SOFT;1;No answer from host
[1487855442] SERVICE ALERT: long-pkgr-p001;/auto/Sending Disk Usage;CRITICAL;SOFT;1;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855465] SERVICE ALERT: long-pkgr-p001;/ Disk Usage;CRITICAL;SOFT;2;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855469] SERVICE ALERT: long-pkgr-p001;Memory Usage;CRITICAL;SOFT;2;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855471] SERVICE ALERT: long-pkgr-p001;Swap Usage;CRITICAL;SOFT;1;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855480] SERVICE ALERT: long-pkgr-p001;CPU Usage;UNKNOWN;SOFT;2;No answer from host
[1487855499] SERVICE ALERT: long-pkgr-p001;/auto/Sending Disk Usage;CRITICAL;SOFT;2;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855527] SERVICE ALERT: long-pkgr-p001;Memory Usage;CRITICAL;SOFT;3;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855528] SERVICE ALERT: long-pkgr-p001;Swap Usage;CRITICAL;SOFT;2;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855539] SERVICE ALERT: long-pkgr-p001;CPU Usage;UNKNOWN;SOFT;3;No answer from host
[1487855557] SERVICE ALERT: long-pkgr-p001;/auto/Sending Disk Usage;CRITICAL;SOFT;3;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855570] SERVICE ALERT: long-pkgr-p001;Checks mounted shares;CRITICAL;SOFT;2;CHECK_NRPE: Socket timeout after 30 seconds.
[1487855585] SERVICE ALERT: long-pkgr-p001;Memory Usage;CRITICAL;SOFT;4;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855585] SERVICE ALERT: long-pkgr-p001;Swap Usage;CRITICAL;SOFT;3;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855597] SERVICE ALERT: long-pkgr-p001;CPU Usage;UNKNOWN;SOFT;4;No answer from host
[1487855615] SERVICE ALERT: long-pkgr-p001;/auto/Sending Disk Usage;CRITICAL;SOFT;4;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855642] SERVICE ALERT: long-pkgr-p001;Memory Usage;CRITICAL;HARD;5;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855642] SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;Memory Usage;CRITICAL;xi_service_notification_handler;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855643] SERVICE ALERT: long-pkgr-p001;Swap Usage;CRITICAL;SOFT;4;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855656] SERVICE ALERT: long-pkgr-p001;CPU Usage;UNKNOWN;HARD;5;No answer from host
[1487855656] SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;CPU Usage;UNKNOWN;xi_service_notification_handler;No answer from host
[1487855674] SERVICE ALERT: long-pkgr-p001;/auto/Sending Disk Usage;CRITICAL;HARD;5;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855674] SERVICE NOTIFICATION: VictorOps_SE;long-pkgr-p001;/auto/Sending Disk Usage;CRITICAL;notify_victorops;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855674] SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;/auto/Sending Disk Usage;CRITICAL;xi_service_notification_handler;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855701] SERVICE ALERT: long-pkgr-p001;Swap Usage;CRITICAL;HARD;5;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855701] SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;Swap Usage;CRITICAL;xi_service_notification_handler;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855763] SERVICE ALERT: long-pkgr-p001;/ Disk Usage;CRITICAL;SOFT;3;ERROR: Description/Type table : No response from remote host "10.9.10.10".
[1487855780] SERVICE ALERT: lonbms01;HTTP;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds
[1487855828] SERVICE ALERT: lonbms01;HTTP;OK;SOFT;2;HTTP OK: HTTP/1.1 200 OK - 955 bytes in 0.001 second response time
[1487855868] SERVICE ALERT: long-pkgr-p001;Checks mounted shares;CRITICAL;SOFT;3;CHECK_NRPE: Socket timeout after 30 seconds.
[1487855924] SERVICE ALERT: long-nagios-p001;Total Processes;WARNING;SOFT;1;PROCS WARNING: 407 processes with STATE = RSZDT
[1487855982] SERVICE ALERT: long-nagios-p001;Total Processes;WARNING;SOFT;2;PROCS WARNING: 407 processes with STATE = RSZDT
[1487856041] SERVICE ALERT: long-nagios-p001;Total Processes;WARNING;SOFT;3;PROCS WARNING: 403 processes with STATE = RSZDT
[1487856052] SERVICE ALERT: long-pkgr-p001;/ Disk Usage;OK;SOFT;4;/: 43%used(21496MB/50268MB) (<90%) : OK
[1487856061] SERVICE ALERT: long-mam-p006;CPU Usage;OK;HARD;10;12 CPU, average load 63.8% < 80% : OK
[1487856098] SERVICE ALERT: long-nagios-p001;Total Processes;WARNING;HARD;4;PROCS WARNING: 404 processes with STATE = RSZDT
[1487856136] SERVICE ALERT: long-pkgr-p001;Checks mounted shares;OK;SOFT;4;...............
[1487856179] SERVICE ALERT: long-pkgr-p001;Linux Memory Usage;OK;HARD;5;Virtual memory: 66%used(15625MB/23762MB) (<80%) : OK
[1487856179] SERVICE FLAPPING ALERT: long-pkgr-p001;Linux Memory Usage;STARTED; Service appears to have started flapping (21.1% change >= 20.0% threshold)
[1487856229] SERVICE ALERT: long-pkgr-p001;Memory Usage;OK;HARD;5;Virtual memory: 66%used(15634MB/23762MB) (<80%) : OK
[1487856229] SERVICE FLAPPING ALERT: long-pkgr-p001;Memory Usage;STARTED; Service appears to have started flapping (23.4% change >= 20.0% threshold)
[1487856229] SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;Memory Usage;FLAPPINGSTART (OK);xi_service_notification_handler;Virtual memory: 66%used(15634MB/23762MB) (<80%) : OK
[1487856248] SERVICE ALERT: long-pkgr-p001;CPU Usage;OK;HARD;5;24 CPU, average load 20.8% < 80% : OK
[1487856248] SERVICE FLAPPING ALERT: long-pkgr-p001;CPU Usage;STARTED; Service appears to have started flapping (23.4% change >= 20.0% threshold)
[1487856248] SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;CPU Usage;FLAPPINGSTART (OK);xi_service_notification_handler;24 CPU, average load 20.8% < 80% : OK
[1487856262] SERVICE ALERT: long-pkgr-p001;/auto/Sending Disk Usage;OK;HARD;5;/auto/Sending: 20%used(928135MB/4576399MB) (<80%) : OK
[1487856262] SERVICE FLAPPING ALERT: long-pkgr-p001;/auto/Sending Disk Usage;STARTED; Service appears to have started flapping (23.4% change >= 20.0% threshold)
[1487856262] SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;/auto/Sending Disk Usage;FLAPPINGSTART (OK);xi_service_notification_handler;/auto/Sending: 20%used(928135MB/4576399MB) (<80%) : OK
[1487856290] SERVICE ALERT: long-pkgr-p001;Swap Usage;OK;HARD;5;Swap space: 1%used(48MB/8008MB) (<80%) : OK
[1487856290] SERVICE FLAPPING ALERT: long-pkgr-p001;Swap Usage;STARTED; Service appears to have started flapping (23.2% change >= 20.0% threshold)
[1487856290] SERVICE NOTIFICATION: mmadmin;long-pkgr-p001;Swap Usage;FLAPPINGSTART (OK);xi_service_notification_handler;Swap space: 1%used(48MB/8008MB) (<80%) : OK
[/quote]
ulimit -a (for nagios user)
[quote]
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 126795
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 10000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Profile attached.
Re: SNMP checks fails against Linux servers
Posted: Thu Feb 23, 2017 5:31 pm
by ssax
I'm not seeing anything that stands out, do you guys have a host or network IPS device that could be interfering with the connection (I've seen this before)?
This happens for ALL of the Linux hosts that use this check? Do all the services fail for the host or just this one specific check? Can you setup another SNMP service that queries something else to determine if it's a SNMP issue or if it's related to that specific check/OID that the plugin is calling?
Do you have a ping check or a host check that is successful during this time when the issue is occurring? I want to rule out network connectivity.
Re: SNMP checks fails against Linux servers
Posted: Fri Feb 24, 2017 4:56 am
by vuservicedesk
We have no network IPS device.
All SNMP checks are affected.
Ping works during this time.
SNMPwalk from Nagios box to those Linux servers works during this time.
Not every Linux box is affected. They (Linux boxes) stop responding to Nagios SNMP checks randomly.
Re: SNMP checks fails against Linux servers
Posted: Fri Feb 24, 2017 2:54 pm
by ssax
Hmm, please pull up the XI server's (localhost) services for CPU and memory and show the perfdata graph over a timeperiod that contained those errors, do you see anything that stands out?
I can't really think of any reason why net-snmp would fail (unless related to load, open files, limits, etc) for all of them, please send the output of these commands:
Code: Select all
rpm -qa | grep net-snmp
ps aux | grep nagios | grep -v grep | wc -l
su - nagios
ulimit -a
Thank you
Re: SNMP checks fails against Linux servers
Posted: Mon Feb 27, 2017 10:49 am
by vuservicedesk
Code: Select all
rpm -qa | grep net-snmp
net-snmp-agent-libs-5.7.2-24.el7_2.1.x86_64
net-snmp-libs-5.7.2-24.el7_2.1.x86_64
net-snmp-devel-5.7.2-24.el7_2.1.x86_64
net-snmp-utils-5.7.2-24.el7_2.1.x86_64
net-snmp-5.7.2-24.el7_2.1.x86_64
net-snmp-perl-5.7.2-24.el7_2.1.x86_64
Code: Select all
ps aux | grep nagios | grep -v grep | wc -l
75
You have ulimit output for nagios user in my previous post.
As I said in my first post, nagios server is underutilized.
Load is rarely above 1 (on 10 core CPU).
Server uses only a fraction of memory ~2G out of 32G.
Re: SNMP checks fails against Linux servers
Posted: Mon Feb 27, 2017 2:14 pm
by avandemore
You have multiple checks failing near the same times, and some aren't SNMP. They all are reporting network issues.
What is the output of: