Hello,
Several times per day I get an unknown state from all my check_snmp_processes checks. This started after I enabled the memory and cpu part.
When it returns an unknown state the service returns this: 11 process matching apache2 (> 0), Mem : 8.2Mb OK, No data for CPU (1 line(s)):UNKNOWN.
I checked the tmp file and it seems to have been wiped.
root@xena:/tmp# cat tmp_Nagios_proc.192.168.100.13.apache2
1329121305:2309
I can't find anywhere in check_snmp_process.pl that it deletes the file so I guess that would mean that it somehow fails to read the old values and when it is to write a new tmp file it only has the current value to write down which would explain why it only contains one line.
After 5 new checks it returns to an OK state.
Have anyone experienced this and have a solution?
Regards
Magnus
check_snmp_process unknown state
-
[email protected]
- Posts: 6
- Joined: Wed Feb 01, 2012 3:35 pm
Re: check_snmp_process unknown state
I have analyzed this further and found this code in check_snmp_process.pl:
$found_value= ($res_cpu-$file_values[$j][1]) / ($timenow - $file_values[$j][0] );
if ($found_value <0) { # in case of program restart
$j=0;$found_value=undef; # don't look for more values
$n_rows=0; # reset file
}
This clears the history file according to my opinion is bad, atleast when the service turns into an unknown state.
When my services turns into an unknown state they haven't been restarted, though I think for example when a user disconnects from a samba share you get one smb process less. Since that cpu counter (processor time in centiseconds) disappears found_value might become negative even though the process hasn't restarted.
The correct way to handle this I think is to record which ifindex has which load value and matching them upon the next check, if an old value for an ifindex exists and that ifindex isn't running anymore discard that process cpu value from the old result before comparing it to the new result. This would missreport the cpu utililization though when a process disappears and has been running for some time since the last check. This would still be better than the current situation.
Since I don't like perl very much I dont think I will alter check_snmp_process.pl, if I have the energy I might rewrite it in ruby.
For now I just comment out the following to avoid holes in the graphs and a lot of unnessecary unknown alarms.
#if ($found_value <0) { # in case of program restart
#$j=0;$found_value=undef; # don't look for more values
#$n_rows=0; # reset file
#}
Regards
Magnus
$found_value= ($res_cpu-$file_values[$j][1]) / ($timenow - $file_values[$j][0] );
if ($found_value <0) { # in case of program restart
$j=0;$found_value=undef; # don't look for more values
$n_rows=0; # reset file
}
This clears the history file according to my opinion is bad, atleast when the service turns into an unknown state.
When my services turns into an unknown state they haven't been restarted, though I think for example when a user disconnects from a samba share you get one smb process less. Since that cpu counter (processor time in centiseconds) disappears found_value might become negative even though the process hasn't restarted.
The correct way to handle this I think is to record which ifindex has which load value and matching them upon the next check, if an old value for an ifindex exists and that ifindex isn't running anymore discard that process cpu value from the old result before comparing it to the new result. This would missreport the cpu utililization though when a process disappears and has been running for some time since the last check. This would still be better than the current situation.
Since I don't like perl very much I dont think I will alter check_snmp_process.pl, if I have the energy I might rewrite it in ruby.
For now I just comment out the following to avoid holes in the graphs and a lot of unnessecary unknown alarms.
#if ($found_value <0) { # in case of program restart
#$j=0;$found_value=undef; # don't look for more values
#$n_rows=0; # reset file
#}
Regards
Magnus
Re: check_snmp_process unknown state
I'm glad you found the root cause of your problem... I would suggest reporting it to the script author... though if you do a fixed version be sure to share it on the nagios exchange!
-
[email protected]
- Posts: 6
- Joined: Wed Feb 01, 2012 3:35 pm
Re: check_snmp_process unknown state
Of course I would share it. I will talk to the person who made the script.