Re: [Nagios-devel] Possible patch to cure CGI's not finding data f=
Posted: Fri Aug 07, 2009 3:00 pm
Putting the web server on the Nagios server box would not work for us. Our =
web server is currently overloaded with 4 CPUs. The CPU usage is typically =
over 80% (average) with bursts of 100% usage occuring multiple times a minu=
te and lasting up to 20 seconds. We are using DNX with 3 client servers (ea=
ch with 4 CPUs) and the main Nagios server (with 4 CPUs) and a DB server (w=
ith 4 CPUs). There are actually times where we would probably use 100% of 1=
6 CPUs on the webserver if it has 16 CPUs because we have more than 15 stat=
us.cgi and extinfo.cgi processes running at once using 30MB status.dat and =
objects.cache files. What would help us most is to go to 3.x with a DB (lik=
e Merlin or other similar), but until we can properly migrate, we are stuck=
with 2.7.
We haven't tried CIFS, so that is something I guess we should look at as we=
ll.
We are looking at possibly using rsync to keep the files up to date from th=
e Nagios server to the web server (using an in memory tmpfs on the webserve=
r, which might lower our CPU usage). If we get rid of the problem with last=
_update value in status.dat, then the rsync would happen very quickly becau=
se the percentage of changes to the file will be pretty minimal from one rs=
ync to the next. We were looking at the fsync() issue to make sure the file=
would be complete before we rsync, otherwise the rsync would just rsync in=
complete data. We weren't looking at NFS being the cause of the lack-of-dat=
a problem, but I guess it should be looked at now.
Thanks for the suggestions.
Cary
________________________________________
From: Andreas Ericsson [[email protected]]
Sent: Friday, August 07, 2009 2:30 AM
To: Nagios Developers List
Subject: Re: [Nagios-devel] Possible patch to cure CGI's not finding data f=
or objects in status.dat
Cary Petterborg wrote:
> In response to your request for details of our system: We are running
> SuSE 9 writing to a Rieser FS (with a separate web server reading the
> status.dat, etc. from an NFS mount off the main Nagios server). Our
> status.dat file is 37MB, and objects.cache is 32MB. If you need more
> details than this, please let me know what you need.
>
I blame NFS. Don't use it for sync()-sensitive data, as caching happens
on multiple levels. The patch hurts the normal case (webserver on same
system as Nagios) though, so I'd prefer if it wasn't applied.
>
> I may be wrong in this next information, but I did homework on it
> before proceeding to try to implment the fix on our system, and I'm
> taking the information from what I found. The fsync() call is the
> more important function call in the fix. fclose() almost always
> guarantees fflush(), but it doesn't guarantee that it will be written
> to the disk immediately, especially if the program doesn't exit.
It doesn't have to be written to disk. After the fclose() the kernel
will cache the data so the next reader will still see the full file
contents no matter if it's actually committed to disk or not.
fsync() and fflush() are primarily meant to make sure data stays
intact across power outages.
NFS breaks this sometimes. CIFS is a better option, I think.
What happens if you use a webserver on the same host?
What happens if you use CIFS instead of NFS?
--
Andreas Ericsson [email protected]
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
---------------------------------------------------------------------------=
---
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus =
on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Nagios-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/lis ... gios-devel
NOTICE: T
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: ndreas Ericsson [[email protected]
web server is currently overloaded with 4 CPUs. The CPU usage is typically =
over 80% (average) with bursts of 100% usage occuring multiple times a minu=
te and lasting up to 20 seconds. We are using DNX with 3 client servers (ea=
ch with 4 CPUs) and the main Nagios server (with 4 CPUs) and a DB server (w=
ith 4 CPUs). There are actually times where we would probably use 100% of 1=
6 CPUs on the webserver if it has 16 CPUs because we have more than 15 stat=
us.cgi and extinfo.cgi processes running at once using 30MB status.dat and =
objects.cache files. What would help us most is to go to 3.x with a DB (lik=
e Merlin or other similar), but until we can properly migrate, we are stuck=
with 2.7.
We haven't tried CIFS, so that is something I guess we should look at as we=
ll.
We are looking at possibly using rsync to keep the files up to date from th=
e Nagios server to the web server (using an in memory tmpfs on the webserve=
r, which might lower our CPU usage). If we get rid of the problem with last=
_update value in status.dat, then the rsync would happen very quickly becau=
se the percentage of changes to the file will be pretty minimal from one rs=
ync to the next. We were looking at the fsync() issue to make sure the file=
would be complete before we rsync, otherwise the rsync would just rsync in=
complete data. We weren't looking at NFS being the cause of the lack-of-dat=
a problem, but I guess it should be looked at now.
Thanks for the suggestions.
Cary
________________________________________
From: Andreas Ericsson [[email protected]]
Sent: Friday, August 07, 2009 2:30 AM
To: Nagios Developers List
Subject: Re: [Nagios-devel] Possible patch to cure CGI's not finding data f=
or objects in status.dat
Cary Petterborg wrote:
> In response to your request for details of our system: We are running
> SuSE 9 writing to a Rieser FS (with a separate web server reading the
> status.dat, etc. from an NFS mount off the main Nagios server). Our
> status.dat file is 37MB, and objects.cache is 32MB. If you need more
> details than this, please let me know what you need.
>
I blame NFS. Don't use it for sync()-sensitive data, as caching happens
on multiple levels. The patch hurts the normal case (webserver on same
system as Nagios) though, so I'd prefer if it wasn't applied.
>
> I may be wrong in this next information, but I did homework on it
> before proceeding to try to implment the fix on our system, and I'm
> taking the information from what I found. The fsync() call is the
> more important function call in the fix. fclose() almost always
> guarantees fflush(), but it doesn't guarantee that it will be written
> to the disk immediately, especially if the program doesn't exit.
It doesn't have to be written to disk. After the fclose() the kernel
will cache the data so the next reader will still see the full file
contents no matter if it's actually committed to disk or not.
fsync() and fflush() are primarily meant to make sure data stays
intact across power outages.
NFS breaks this sometimes. CIFS is a better option, I think.
What happens if you use a webserver on the same host?
What happens if you use CIFS instead of NFS?
--
Andreas Ericsson [email protected]
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
---------------------------------------------------------------------------=
---
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus =
on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Nagios-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/lis ... gios-devel
NOTICE: T
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: ndreas Ericsson [[email protected]