[Nagios-devel] Patch submission for comments : CGI speed improvement (XNG)
Posted: Tue Jun 07, 2005 6:35 am
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
------_=_NextPart_000_01C56B6D.B12BBF6E
Content-Type: multipart/alternative;
boundary="----_=_NextPart_001_01C56B6D.B12BBF6E"
------_=_NextPart_001_01C56B6D.B12BBF6E
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: Quoted-Printable
Hi all,
Here is a patch I am submitting to your comments. The purpose of this patch=
is to bring a solution to the CGI performance problem met on configuration
with several thousands of hosts/services.
History :
I currently have a configuration with about 2000 hosts, 3400 services, and
their dependencies (more than 6000). And this configuration is supposed to
grow by a factor of 3 in the near future. On the machine I am using now
(quite slow, Sun E250 with only 1 proc), a request for 'status.cgi?host=3Da=
ll'
takes about 5 seconds of CPU time. With the daemon and the checks, it means=
that this CGI execution takes about 25 s of elapsed time. Two new machines
have been ordered, with 4 procs each, but, as more and more things are
monitored, more and more people want to access to the web interface, and
whatever processor I have on my server, I will quickly meet the same
performance problem. And I don't consider that adding some CPU and memory i=
s
the solution to any performance problem
This is why I started thinking about a way to improve the response time of
CGIs, and especially, to lower the impact of an increase in the number of
concurrent web accesses.
My first step was to look at nagios-db. I finally decided not to use it,
mostly because :
=09- I want to use the original nagios UI (especially with the nuvola
skin
).
=09- Talking about postgres materialized views, I don't want to choose
between data freshness and CGI response time. I need a solution where I see=
'realtime' data in an acceptable response time.
The system I am submitting today was designed with these goals in mind :
=09- No modification to the CGI code.
=09- Less than 1 sec on CPU time on my server for a
'status.cgi?host=3Dall'
=09- a minimal number of changes in the nagios code (except xdata).
=09- Full compatibility with the current communication system. The
objects.cache and status.dat file remain the same, in order to keep the
compatibility with all the add-ons who read their information from these
files.
Some profiling in some CGIs confirmed that the two main performance problem=
s
were the reading of configuration and status data (88 % for a full
status.cgi, and more than 95% for extinfo.cgi). That's why I designed a new=
system to store and retrieve the data. I kept the system of flat files
because I don't see any interesting alternative. We could do it with shared=
memory but I don't expect much improvement in terms of performance and it
brings a new problem : you cannot know which size you will need (for status=
data).
The new communication system uses two files, like the current one, but the
format of these files is made to be read very fast by the CGIs. It includes=
the objects in binary struct form, the hashtables (which don't have to be
recomputed), and everything to restore the object and status environment in=
the fastest possible way. I don't give more explanation on the format today=
because I am waiting to know if somebody is interested before writing a rea=
l
documentation.
Here are some performance facts :
The request I use is 'status.cgi?host=3Dall'.
Original CPU time (ms) / New CPU time (ms) / Factor of improvement :
Reading object configuration : 2450 / 80 / 30 x
Reading status data : 1640 / 50 / 33 x
Rest of code : 530 / 480 / -10 %
Total CPU time : 4620 / 610 / 7.6 x
In this request, the global performance improvement (760 %) is relatively
low because there is much computing to display the page. But, for something=
like extinfo.cgi on one host, there is so few computing that the global
improvement is nearly 30x.
Now, the next step is to see if you find it interesting enough to include i=
t
in a future versi
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
this format, some or all of this message may not be legible.
------_=_NextPart_000_01C56B6D.B12BBF6E
Content-Type: multipart/alternative;
boundary="----_=_NextPart_001_01C56B6D.B12BBF6E"
------_=_NextPart_001_01C56B6D.B12BBF6E
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: Quoted-Printable
Hi all,
Here is a patch I am submitting to your comments. The purpose of this patch=
is to bring a solution to the CGI performance problem met on configuration
with several thousands of hosts/services.
History :
I currently have a configuration with about 2000 hosts, 3400 services, and
their dependencies (more than 6000). And this configuration is supposed to
grow by a factor of 3 in the near future. On the machine I am using now
(quite slow, Sun E250 with only 1 proc), a request for 'status.cgi?host=3Da=
ll'
takes about 5 seconds of CPU time. With the daemon and the checks, it means=
that this CGI execution takes about 25 s of elapsed time. Two new machines
have been ordered, with 4 procs each, but, as more and more things are
monitored, more and more people want to access to the web interface, and
whatever processor I have on my server, I will quickly meet the same
performance problem. And I don't consider that adding some CPU and memory i=
s
the solution to any performance problem
This is why I started thinking about a way to improve the response time of
CGIs, and especially, to lower the impact of an increase in the number of
concurrent web accesses.
My first step was to look at nagios-db. I finally decided not to use it,
mostly because :
=09- I want to use the original nagios UI (especially with the nuvola
skin
=09- Talking about postgres materialized views, I don't want to choose
between data freshness and CGI response time. I need a solution where I see=
'realtime' data in an acceptable response time.
The system I am submitting today was designed with these goals in mind :
=09- No modification to the CGI code.
=09- Less than 1 sec on CPU time on my server for a
'status.cgi?host=3Dall'
=09- a minimal number of changes in the nagios code (except xdata).
=09- Full compatibility with the current communication system. The
objects.cache and status.dat file remain the same, in order to keep the
compatibility with all the add-ons who read their information from these
files.
Some profiling in some CGIs confirmed that the two main performance problem=
s
were the reading of configuration and status data (88 % for a full
status.cgi, and more than 95% for extinfo.cgi). That's why I designed a new=
system to store and retrieve the data. I kept the system of flat files
because I don't see any interesting alternative. We could do it with shared=
memory but I don't expect much improvement in terms of performance and it
brings a new problem : you cannot know which size you will need (for status=
data).
The new communication system uses two files, like the current one, but the
format of these files is made to be read very fast by the CGIs. It includes=
the objects in binary struct form, the hashtables (which don't have to be
recomputed), and everything to restore the object and status environment in=
the fastest possible way. I don't give more explanation on the format today=
because I am waiting to know if somebody is interested before writing a rea=
l
documentation.
Here are some performance facts :
The request I use is 'status.cgi?host=3Dall'.
Original CPU time (ms) / New CPU time (ms) / Factor of improvement :
Reading object configuration : 2450 / 80 / 30 x
Reading status data : 1640 / 50 / 33 x
Rest of code : 530 / 480 / -10 %
Total CPU time : 4620 / 610 / 7.6 x
In this request, the global performance improvement (760 %) is relatively
low because there is much computing to display the page. But, for something=
like extinfo.cgi on one host, there is so few computing that the global
improvement is nearly 30x.
Now, the next step is to see if you find it interesting enough to include i=
t
in a future versi
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]