[Nagios-devel] BUG/PATCH/WORKAROUND: Problem with Nagios state retention
Posted: Fri Apr 07, 2006 3:54 am
This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.
--0-2124956321-1144410853=:9261
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
As many people have noticed over time, there is a persistent problem with
Nagios not writing out the status retention file in some installations.
The problem is not with the state retention code as such, but in how
Nagios carefully tries to write out the file to a temporary file (good)
first, but uses a compiled-in temporary file over the configured
'temp_file' variable for the state_retention file (bad).
To determine if this problem affects your installation, see whether the
user running Nagios has write permission to the compiled-in 'tempfile'
location, eg:
$ strings -a `which nagios` | grep tempfile
/var/log/nagios/tempfile
$ if [ ! -w /var/log/nagios ] ; then echo "I cannot write." ; fi
I cannot write.
If, for whatever reason, you cannot give the user running Nagios
permission to write the compiled-in 'tempfile' file (usually political, or
possibly avoiding odd issues with multiple Nagios installations on one
host), a viable workaround is to set up a Nagios service which copies the
status file to the state retention file, as they have a compatible format:
define service {
host_name localhost
service_description state-retention
check_command copy_status_to_retention
normal_check_interval 3
max_check_attempts 3
retry_check_interval 3
check_period 24x7
check_freshness 0
obsess_over_service 0
passive_checks_enabled 0
notification_interval 120
notification_period 24x7
notification_options n
contact_groups default
}
define command {
command_name copy_status_to_retention
command_line /bin/cp $STATUSDATAFILE$ $RETENTIONDATAFILE$ && exit 0 || exit 2
}
The attached patch (against both 2.0b4 and latest 2.1) ensures that the
state retention code uses the file pointed to by the configuration's
'temp_file' variable instead of the compiled 'tempfile'.
--==--
Bruce.
And now, back to tracking down excessive latency problems.
--0-2124956321-1144410853=:9261
Content-Type: TEXT/PLAIN; charset=US-ASCII; name=xrddefault.c.patch
Content-Transfer-Encoding: BASE64
Content-ID:
Content-Description: xrddefault patch
Content-Disposition: attachment; filename=xrddefault.c.patch
KioqIHhkYXRhL3hyZGRlZmF1bHQuYwkyMDA2LzA0LzA3IDEwOjI2OjAwCTEu
MQ0KLS0tIHhkYXRhL3hyZGRlZmF1bHQuYwkyMDA2LzA0LzA3IDEwOjMzOjAz
DQoqKioqKioqKioqKioqKioNCioqKiAxMTgsMTIzICoqKioNCi0tLSAxMTgs
MTMzIC0tLS0NCiAgCQlpZih0ZW1wX3B0cj09TlVMTCkNCiAgCQkJY29udGlu
dWU7DQogIA0KKyAJCS8qIHRlbXAgZmlsZSBkZWZpbml0aW9uICovDQorIAkJ
aWYoICEgc3RyY21wKHRlbXBfcHRyLCJ0ZW1wX2ZpbGUiKSApew0KKyAJCQl0
ZW1wX3B0cj1teV9zdHJ0b2soTlVMTCwiXG4iKTsNCisgCQkJaWYodGVtcF9w
dHI9PU5VTEwpDQorIAkJCQljb250aW51ZTsNCisgDQorIAkJCXN0cm5jcHko
eHJkZGVmYXVsdF90ZW1wX2ZpbGUsdGVtcF9wdHIsc2l6ZW9mKHhyZGRlZmF1
bHRfdGVtcF9maWxlKS0xKTsNCisgCQkJeHJkZGVmYXVsdF90ZW1wX2ZpbGVb
c2l6ZW9mKHhyZGRlZmF1bHRfdGVtcF9maWxlKS0xXT0nXHgwJzsNCisgICAg
ICAgICAgICAgICAgIH0NCisgDQogIAkJLyogc2tpcCBsaW5lcyB0aGF0IGRv
bid0IHNwZWNpZnkgdGhlIGhvc3QgY29uZmlnIGZpbGUgbG9jYXRpb24gKi8N
CiAgCQlpZihzdHJjbXAodGVtcF9wdHIsInhyZGRlZmF1bHRfcmV0ZW50aW9u
X2ZpbGUiKSAmJiBzdHJjbXAodGVtcF9wdHIsInN0YXRlX3JldGVudGlvbl9m
aWxlIikpDQogIAkJCWNvbnRpbnVlOw0K
--0-2124956321-1144410853=:9261--
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
while the remaining parts are likely unreadable without MIME-aware tools.
--0-2124956321-1144410853=:9261
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
As many people have noticed over time, there is a persistent problem with
Nagios not writing out the status retention file in some installations.
The problem is not with the state retention code as such, but in how
Nagios carefully tries to write out the file to a temporary file (good)
first, but uses a compiled-in temporary file over the configured
'temp_file' variable for the state_retention file (bad).
To determine if this problem affects your installation, see whether the
user running Nagios has write permission to the compiled-in 'tempfile'
location, eg:
$ strings -a `which nagios` | grep tempfile
/var/log/nagios/tempfile
$ if [ ! -w /var/log/nagios ] ; then echo "I cannot write." ; fi
I cannot write.
If, for whatever reason, you cannot give the user running Nagios
permission to write the compiled-in 'tempfile' file (usually political, or
possibly avoiding odd issues with multiple Nagios installations on one
host), a viable workaround is to set up a Nagios service which copies the
status file to the state retention file, as they have a compatible format:
define service {
host_name localhost
service_description state-retention
check_command copy_status_to_retention
normal_check_interval 3
max_check_attempts 3
retry_check_interval 3
check_period 24x7
check_freshness 0
obsess_over_service 0
passive_checks_enabled 0
notification_interval 120
notification_period 24x7
notification_options n
contact_groups default
}
define command {
command_name copy_status_to_retention
command_line /bin/cp $STATUSDATAFILE$ $RETENTIONDATAFILE$ && exit 0 || exit 2
}
The attached patch (against both 2.0b4 and latest 2.1) ensures that the
state retention code uses the file pointed to by the configuration's
'temp_file' variable instead of the compiled 'tempfile'.
--==--
Bruce.
And now, back to tracking down excessive latency problems.
--0-2124956321-1144410853=:9261
Content-Type: TEXT/PLAIN; charset=US-ASCII; name=xrddefault.c.patch
Content-Transfer-Encoding: BASE64
Content-ID:
Content-Description: xrddefault patch
Content-Disposition: attachment; filename=xrddefault.c.patch
KioqIHhkYXRhL3hyZGRlZmF1bHQuYwkyMDA2LzA0LzA3IDEwOjI2OjAwCTEu
MQ0KLS0tIHhkYXRhL3hyZGRlZmF1bHQuYwkyMDA2LzA0LzA3IDEwOjMzOjAz
DQoqKioqKioqKioqKioqKioNCioqKiAxMTgsMTIzICoqKioNCi0tLSAxMTgs
MTMzIC0tLS0NCiAgCQlpZih0ZW1wX3B0cj09TlVMTCkNCiAgCQkJY29udGlu
dWU7DQogIA0KKyAJCS8qIHRlbXAgZmlsZSBkZWZpbml0aW9uICovDQorIAkJ
aWYoICEgc3RyY21wKHRlbXBfcHRyLCJ0ZW1wX2ZpbGUiKSApew0KKyAJCQl0
ZW1wX3B0cj1teV9zdHJ0b2soTlVMTCwiXG4iKTsNCisgCQkJaWYodGVtcF9w
dHI9PU5VTEwpDQorIAkJCQljb250aW51ZTsNCisgDQorIAkJCXN0cm5jcHko
eHJkZGVmYXVsdF90ZW1wX2ZpbGUsdGVtcF9wdHIsc2l6ZW9mKHhyZGRlZmF1
bHRfdGVtcF9maWxlKS0xKTsNCisgCQkJeHJkZGVmYXVsdF90ZW1wX2ZpbGVb
c2l6ZW9mKHhyZGRlZmF1bHRfdGVtcF9maWxlKS0xXT0nXHgwJzsNCisgICAg
ICAgICAgICAgICAgIH0NCisgDQogIAkJLyogc2tpcCBsaW5lcyB0aGF0IGRv
bid0IHNwZWNpZnkgdGhlIGhvc3QgY29uZmlnIGZpbGUgbG9jYXRpb24gKi8N
CiAgCQlpZihzdHJjbXAodGVtcF9wdHIsInhyZGRlZmF1bHRfcmV0ZW50aW9u
X2ZpbGUiKSAmJiBzdHJjbXAodGVtcF9wdHIsInN0YXRlX3JldGVudGlvbl9m
aWxlIikpDQogIAkJCWNvbnRpbnVlOw0K
--0-2124956321-1144410853=:9261--
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]