check_snmp_process_wizard.pl lag?
check_snmp_process_wizard.pl lag?
I have a handfull of Linux servers that I'm using this check (v2c) on.
I am checking two processes on these servers.
Of all of the service checks I have running across our system, these are the only two that seem to have a lag.
They will constantly show up as being down, but for no more than about 30 seconds. When I check, they are running on the server in question just fine.
I'm wondering if this is a known issue and if so, can I run a different, better optimized check instead?
I am checking two processes on these servers.
Of all of the service checks I have running across our system, these are the only two that seem to have a lag.
They will constantly show up as being down, but for no more than about 30 seconds. When I check, they are running on the server in question just fine.
I'm wondering if this is a known issue and if so, can I run a different, better optimized check instead?
Re: check_snmp_process_wizard.pl lag?
Are the servers in question under heavy load? You may have to increase the timeout on the check to accommodate a server with strict preemption under heavy load.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: check_snmp_process_wizard.pl lag?
Not really. Here's an example on one of the boxes that just alerted then went away:abrist wrote:Are the servers in question under heavy load? You may have to increase the timeout on the check to accommodate a server with strict preemption under heavy load.
Code: Select all
# uptime
10:47:34 up 176 days, 22:05, 1 user, load average: 0.73, 0.88, 0.86Code: Select all
# uptime
10:52:22 up 31 days, 9:42, 1 user, load average: 0.68, 0.83, 0.85Re: check_snmp_process_wizard.pl lag?
What services are you checking?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: check_snmp_process_wizard.pl lag?
It's a process that is specific to our application. It's not a standard Linux process. Basically, an image capture and transfer process.
Re: check_snmp_process_wizard.pl lag?
Try passing a longer timeout than default (try 30 seconds or so):
Code: Select all
-t, --timeout=INTEGER
Seconds before connection times out (default: 10)Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: check_snmp_process_wizard.pl lag?
I have upped this to 60 seconds but I'm still getting the alerts.
Re: check_snmp_process_wizard.pl lag?
Can we see the config file for one of the checks? Go to the CCM and click the "disk" image next to one of these service checks. Post the file in code wraps.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: check_snmp_process_wizard.pl lag?
Hoping I've copied everything you would need.
Code: Select all
define service {
host_name {removed for bravarity}
service_description Video Capture
use xiwizard_linuxsnmp_process
hostgroup_name All Controllers - Ramps,All Controllers
display_name Video Capture
servicegroups Techs
check_command check_xi_service_snmp_linux_process!-C roadway --v2c -n 'ves_cap_trx' -w0,2 -t 60!!!!!!!
register 1
}
Code: Select all
define service {
name xiwizard_linuxsnmp_process
service_description xiwizard_linuxsnmp_process
display_name Linux SNMP Process Check
use xiwizard_generic_service
check_command check_xi_service_snmp_linux_process!!!!!!!!
register 0
}Code: Select all
define service {
name xiwizard_generic_service
service_description xiwizard_generic_service
display_name Generic Service Check
check_command check_xi_service_none
is_volatile 0
max_check_attempts 5
check_interval 5
retry_interval 1
active_checks_enabled 1
passive_checks_enabled 1
check_period xi_timeperiod_24x7
parallelize_check 1
obsess_over_service 1
check_freshness 0
freshness_threshold 1800
event_handler host-notify-by-email
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notification_interval 60
first_notification_delay 60
notification_period xi_timeperiod_24x7
notifications_enabled 1
contacts nagiosadmin
contact_groups admins, Techs
failure_prediction_enabled 1
register 0
}Code: Select all
define command {
command_name check_xi_service_snmp_linux_process
command_line $USER1$/check_snmp_process_wizard.pl -H $HOSTADDRESS$ $ARG1$
}-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: check_snmp_process_wizard.pl lag?
Everything looks correct, but as this is a SNMP check it is utilizing UDP connections, and as this is stateless, packets can get dropped. this is likely what is happening.jbennett wrote:I'm wondering if this is a known issue and if so, can I run a different, better optimized check instead?
With config you posted though, it shouldn't be sending notifications if it is only down for 30 seconds, it should be trying 5 times at 1 minute intervals before sending notification