/check_uptime.pl show incorrect uptime.
-
- Posts: 28
- Joined: Sun May 31, 2015 3:02 pm
/check_uptime.pl show incorrect uptime.
Hello everyone
We have recently come across a issue where nagios is reporting wrong value for /check_uptime.pl. When this happens, we believe that a device has gone down (restart) but when checking on the device (mainly cisco) we can see uptime on the device is more than a year. I can't find any relevant article which could assist us in resolving the issue.
You can see that when we run the command on nagios, it retruns wrong value compare to when we check direclty on the end device.
Any advice to fix the issue?
Thank you.
MK
Nagios CLI:
[root@demo libexec]# ./check_uptime.pl -c tiger -l 1 -u 10000 172.16.16.1
OK: UP for 3 days | uptime=3;
Device:
172.16.16.1 uptime is 1 year, 19 weeks, 2 days, 19 hours, 34 minutes
Uptime for this control processor is 1 year, 19 weeks, 2 days, 19 hours, 37 minutes
System returned to ROM by reload
System restarted at 17:59:38 GMT Fri Mar 14 2014
System image file is "flash:packages.conf"
Last reload reason: Reload command
define command{
command_name check_snmp_uptime
command_line $USER1$/check_uptime.pl -c $ARG1$ -l $ARG2$ -u $ARG3$ $HOSTADDRESS$
}
define service {
use active-msc,graphed-service
notification_period 24x7
notification_options w,u,c
service_description sys-uptime
check_command check_snmp_uptime3!tiger!1!10000
hostgroup Voice-Servers-linux
notification_interval 0
}
We have recently come across a issue where nagios is reporting wrong value for /check_uptime.pl. When this happens, we believe that a device has gone down (restart) but when checking on the device (mainly cisco) we can see uptime on the device is more than a year. I can't find any relevant article which could assist us in resolving the issue.
You can see that when we run the command on nagios, it retruns wrong value compare to when we check direclty on the end device.
Any advice to fix the issue?
Thank you.
MK
Nagios CLI:
[root@demo libexec]# ./check_uptime.pl -c tiger -l 1 -u 10000 172.16.16.1
OK: UP for 3 days | uptime=3;
Device:
172.16.16.1 uptime is 1 year, 19 weeks, 2 days, 19 hours, 34 minutes
Uptime for this control processor is 1 year, 19 weeks, 2 days, 19 hours, 37 minutes
System returned to ROM by reload
System restarted at 17:59:38 GMT Fri Mar 14 2014
System image file is "flash:packages.conf"
Last reload reason: Reload command
define command{
command_name check_snmp_uptime
command_line $USER1$/check_uptime.pl -c $ARG1$ -l $ARG2$ -u $ARG3$ $HOSTADDRESS$
}
define service {
use active-msc,graphed-service
notification_period 24x7
notification_options w,u,c
service_description sys-uptime
check_command check_snmp_uptime3!tiger!1!10000
hostgroup Voice-Servers-linux
notification_interval 0
}
Re: /check_uptime.pl show incorrect uptime.
It's possible that your Cisco device is improperly reporting its uptime. You can verify this by manually running an SMNPget against the appropriate OID.
Replace -vXX with the appropriate version of SNMP you're running, public with the appropriate community string, and 192.168.1.1 with the appropriate IP Address of your Cisco device in question.
Code: Select all
snmpget -v2c -c public -mALL 192.168.1.1 1.3.6.1.6.3.10.2.1.3
-
- Posts: 28
- Joined: Sun May 31, 2015 3:02 pm
Re: /check_uptime.pl show incorrect uptime.
Hi
OK, so it return the similar result as Nagios did.
[root@ukdcnetmon01 libexec]# snmpget -v2c -c tiger -mALL 172.16.16.1 1.3.6.1.2.1.1.3.0
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (39903247) 4 days, 14:50:32.47
It's seems that it's around this issue/limitation.
sysUpTime is a 32-bit counter and will roll over after 496 days.
https://supportforums.cisco.com/discuss ... ime#573246
Now users suggests to use this, 1.3.6.1.6.3.10.2.1.3.0 which does not rollover, but how can be make nagios to check this snmp mib and not the above? So using the new OID, i am getting this.
SNMP-FRAMEWORK-MIB::snmpEngineTime.0 = INTEGER: 42251161 seconds
So quoting from the above shown link.
sysUpTime is a 32-bit counter and will roll over after 496 days.
But you can poll snmpEngineId (.1.3.6.1.6.3.10.2.1.3) which returns the uptime in seconds and should not roll over for 135 years...
Thanks.
OK, so it return the similar result as Nagios did.
[root@ukdcnetmon01 libexec]# snmpget -v2c -c tiger -mALL 172.16.16.1 1.3.6.1.2.1.1.3.0
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (39903247) 4 days, 14:50:32.47
It's seems that it's around this issue/limitation.
sysUpTime is a 32-bit counter and will roll over after 496 days.
https://supportforums.cisco.com/discuss ... ime#573246
Now users suggests to use this, 1.3.6.1.6.3.10.2.1.3.0 which does not rollover, but how can be make nagios to check this snmp mib and not the above? So using the new OID, i am getting this.
SNMP-FRAMEWORK-MIB::snmpEngineTime.0 = INTEGER: 42251161 seconds
So quoting from the above shown link.
sysUpTime is a 32-bit counter and will roll over after 496 days.
But you can poll snmpEngineId (.1.3.6.1.6.3.10.2.1.3) which returns the uptime in seconds and should not roll over for 135 years...
Thanks.
Re: /check_uptime.pl show incorrect uptime.
mkhan12282,
Try using the attached version of the script (updated for the new OID you suggested) and let me know the results of your testing.
Thanks!
Try using the attached version of the script (updated for the new OID you suggested) and let me know the results of your testing.
Thanks!
- Attachments
-
- check_uptime.pl
- (26.02 KiB) Downloaded 524 times
-
- Posts: 28
- Joined: Sun May 31, 2015 3:02 pm
Re: /check_uptime.pl show incorrect uptime.
Hi
Thanks for the file.
Please see below. It's fetching real up time, but the "Duration" column" still gives not real value. Can we update this too?
Regards
MK
Thanks for the file.
Please see below. It's fetching real up time, but the "Duration" column" still gives not real value. Can we update this too?
Regards
MK
- Attachments
-
- nagios1.PNG (4.54 KiB) Viewed 5975 times
Re: /check_uptime.pl show incorrect uptime.
Since you installed the new plugin, has the service status of that check changed? It's likely that the duration is displaying properly. Duration does not have to do with the uptime check specifically, and since you recently replaced your check_uptime plugin I assume that the your duration may have been reset due to state changes for that service. Am I missing something here?Duration is how long it's been since the service state last changed.
-
- Posts: 28
- Joined: Sun May 31, 2015 3:02 pm
Re: /check_uptime.pl show incorrect uptime.
You are right.
I believe it has fixed the issue, so thank you.
I will go ahead and update plugin file with yours for rest of our nagios servers.
Can you please leave this ticket open for end of this week and close if you don't hear from me?
Regards
MK
I believe it has fixed the issue, so thank you.
I will go ahead and update plugin file with yours for rest of our nagios servers.
Can you please leave this ticket open for end of this week and close if you don't hear from me?
Regards
MK
Re: /check_uptime.pl show incorrect uptime.
Sure can!mkhan12282 wrote:Can you please leave this ticket open for end of this week and close if you don't hear from me?
Former Nagios employee
-
- Posts: 28
- Joined: Sun May 31, 2015 3:02 pm
Re: /check_uptime.pl show incorrect uptime.
Hello team
We noticed that after updating the check_uptime.pl (previously provided), nagios shows the uptime value for the nagios host itself. What i mean that say if we search for a device in nagios, then system uptime value is of nagios host but not for the device which we searched.
However, if we use the snmpwalk with the old oid value, we get the correct uptime.
Examples:
This show actual device uptime.
[root@PR-VMNAGIOS01 libexec]# snmpget -v1 -c password -mALL 10.x.x.x 1.3.6.1.2.1.1.3.0
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (77175710) 8 days, 22:22:37.10
[root@PR-VMNAGIOS01 libexec]#
[root@PR-VMNAGIOS01 libexec]#
[root@PR-VMNAGIOS01 libexec]#
This shows nagios host uptime instead of the actual device.
[root@PR-VMNAGIOS01 libexec]# ./check_uptime.pl -c password -l 1 -u 10000 10.x.x.x
Unknown option: u
OK: Linux PR-VMNAGIOS01.xxxxx.internal 2.6.32-431.5.1.el6.x86_64 - up 2 hours 5 minutes
Can you please assist us?
Thank you.
MK
We noticed that after updating the check_uptime.pl (previously provided), nagios shows the uptime value for the nagios host itself. What i mean that say if we search for a device in nagios, then system uptime value is of nagios host but not for the device which we searched.
However, if we use the snmpwalk with the old oid value, we get the correct uptime.
Examples:
This show actual device uptime.
[root@PR-VMNAGIOS01 libexec]# snmpget -v1 -c password -mALL 10.x.x.x 1.3.6.1.2.1.1.3.0
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (77175710) 8 days, 22:22:37.10
[root@PR-VMNAGIOS01 libexec]#
[root@PR-VMNAGIOS01 libexec]#
[root@PR-VMNAGIOS01 libexec]#
This shows nagios host uptime instead of the actual device.
[root@PR-VMNAGIOS01 libexec]# ./check_uptime.pl -c password -l 1 -u 10000 10.x.x.x
Unknown option: u
OK: Linux PR-VMNAGIOS01.xxxxx.internal 2.6.32-431.5.1.el6.x86_64 - up 2 hours 5 minutes
Can you please assist us?
Thank you.
MK
Re: /check_uptime.pl show incorrect uptime.
Try running that plugin manually as the nagios user with the -v flag for verbose output, and post the results. Also, what is the -u option doing here? I did not see it in the plugin help output.
Former Nagios employee