recurring_downtime

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
onegative
Posts: 173
Joined: Tue Feb 17, 2015 12:06 pm

recurring_downtime

Post by onegative »

G 'Day Nagios Support,

I am running into a situation where the cron job for recurring downtime appears to be exceeding the once an hour execution.

01 * * * * nagios /usr/bin/php -q /usr/local/nagiosxi/cron/recurring_downtime.php >> /usr/local/nagiosxi/var/recurringdowntime.log 2>&1

I assume we have this scheduling defined to try and insure that any newly defined recurring downtime are truly scheduled within a reasonable time frame. Is this true?

The issue I end up with happens about once a week where the swap eventually gets consumed to the point where I have to restart the application release the swap and clear the alarm. Seems like we end up going through the entire recurringdowntime.cfg every hour to keep it sync'd which is clearly showed in the log.
i.e.
Downtime exists with start_time: 1557910500, and duration 7500 seconds ..
NOT SCHEDULING
Downtime exists with start_time: 1557910500, and duration 7500 seconds ..
NOT SCHEDULING
Downtime exists with start_time: 1557910500, and duration 7500 seconds ..
NOT SCHEDULING

This seems an inefficient way of handling recurring downtime...this appears to be just one of the reasons scaling the product is difficult.

Anyway wanted to get some feedback about reducing the recurrence to maybe once every 4 hours...is there any reason this would adversely effect any other component or operation with Core or XI itself?

Let me know and as always thanks for your help,
Danny
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: recurring_downtime

Post by npolovenko »

Hello, @onegative. Yes, this script verifies existing downtime and puts it in Core for execution. You can change the cron interval to 4 hours. Keep in mind that when you add a new recurring downtime entry it will not actually be scheduled until the cron runs. And it may take up to 4 hours for the cron to run. So you may not be able to schedule recurring downtime which starts, say in 3 hours.
How many recurring downtime entries do you currently have scheduled?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
onegative
Posts: 173
Joined: Tue Feb 17, 2015 12:06 pm

Re: recurring_downtime

Post by onegative »

Howdy,

The number is based on HostGroups so I estimate with hosts and services included it appears to be "13,121 total records".

Danny
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: recurring_downtime

Post by npolovenko »

@onegative, I see. You can change the cron interval to 4 hours and see if that improves the swap usage.

Can you show me the output of this command?

Code: Select all

time /usr/bin/php /usr/local/nagiosxi/cron/recurring_downtime.php
Next time when all the swap gets confused please upload the /var/log/messages log that covers the time when it happened, as well as the apache error and ssl_error logs located in /var/log/httpd/ folder.

You can also generate and send in a system profile.
To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and send it to me in a private message.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
onegative
Posts: 173
Joined: Tue Feb 17, 2015 12:06 pm

Re: recurring_downtime

Post by onegative »

@npolovenko

So now I have an issue with "Scheduled Downtime" link...it only produces a white page within the frame. Looking at the access_log it indicates the following error:

xxx.xxx.xxx.7 - - [29/May/2019:10:59:00 -0700] "GET /nagiosxi/includes/components/xicore/downtime.php HTTP/1.1" 500 - "https://nagiosxi.mcis.washington.edu/nagiosxi/" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0"

I have verified that localhost, shortname and fqdn all exists.
I have verified permissions of the php files.

Just not sure what else to check.

Danny
onegative
Posts: 173
Joined: Tue Feb 17, 2015 12:06 pm

Re: recurring_downtime

Post by onegative »

I was just about to send you the system profile but when I attempt to download it is returns this:

PROFILE BUILD FAILED

Array
(
)

CODE: 1

If I just click view profile it produces it on screen.

Danny
onegative
Posts: 173
Joined: Tue Feb 17, 2015 12:06 pm

Re: recurring_downtime

Post by onegative »

Seeing this in the profile:

Resolving localhost (localhost)... ::1, 127.0.0.1
Connecting to localhost (localhost)|::1|:443... connected.
ERROR: cannot verify localhost's certificate, issued by '/C=US/ST=MI/L=Ann Arbor/O=Internet2/OU=InCommon/CN=InCommon RSA Server CA':
Unable to locally verify the issuer's authority.
ERROR: no certificate subject alternative name matches
requested host name 'localhost'.
To connect to localhost insecurely, use `--no-check-certificate'.
onegative
Posts: 173
Joined: Tue Feb 17, 2015 12:06 pm

Re: recurring_downtime

Post by onegative »

@npolovenko

I should also be noted that only the Admin accounts are getting the blank white page when clicking "Scheduled Downtime"...if I click the link as a regular user I get the page....what gives? Can anyone suggest a resolution?

Danny
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: recurring_downtime

Post by npolovenko »

@onegative, This article should help you fix the Profile Build Fail error.
https://support.nagios.com/kb/article.p ... ategory=44

As far as getting a blank page when trying to access scheduled downtime as an admin user, this could be because Admin users have access to ALL hosts and services and so Nagios needs to fetch a lot of information while a regular user would normally have access only to select hosts and services and therefore everything loads faster.

The blank screen usually means either Apache doesn't have sufficient resources or the page is timing out.
Please follow this article and increase settings in the php.ini file. Double all settings recommended in the article.
https://support.nagios.com/kb/article/n ... e-611.html
When done, run service httpd restart.

System profile should contain logs that will allow us to narrow down this issue. Especially, if you generate it after you see the error/white screen.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
onegative
Posts: 173
Joined: Tue Feb 17, 2015 12:06 pm

Re: recurring_downtime

Post by onegative »

@npolovenko

There has to be something wrong here...the apache user can execute the sudo command as expected from the command line and produce the profile.zip without any errors...please observe below.

Code: Select all

sudo -u apache /usr/local/nagiosxi/html/includes/components/profile/getprofile.sh
-------------------Fetching Information-------------------
Please wait.......
Creating nagios.txt...
Creating perfdata.txt...
Creating npcd.txt...
Creating cmdsubsys.txt...
Creating event_handler.txt...
Creating eventman.txt...
Creating perfdataproc.txt...
Creating sysstat.txt...
Creating systemlog.txt...
Creating apacheerrors.txt...
Creating mysqllog.txt...
Creating a sanatized copy of config.inc.php...
Creating memorybyprocess.txt...
Creating filesystem.txt...
Dumping PS - AEF to psaef.txt...
Creating top log...
Copying objects.cache
Copying MRTG Configs
tar: Removing leading `/' from member names
Counting Performance Data Files
Counting MRTG Files
Getting Network Information
Getting ipcs Information
Adding latest snapshot to: /var/log/httpd
updating: profile/ (stored 0%)
updating: profile/nagios.txt (deflated 90%)
updating: profile/perfdata.txt (deflated 91%)
updating: profile/npcd.txt (deflated 95%)
updating: profile/cmdsubsys.txt (deflated 94%)
updating: profile/event_handler.txt (deflated 94%)
updating: profile/eventman.txt (deflated 92%)
updating: profile/perfdataproc.txt (deflated 95%)
updating: profile/sysstat.txt (deflated 95%)
updating: profile/systemlog.txt (deflated 93%)
updating: profile/apacheerrors.txt (deflated 78%)
updating: profile/database_host.txt (deflated 3%)
updating: profile/config.inc.php (deflated 68%)
updating: profile/memorybyprocess.txt (deflated 82%)
updating: profile/filesystem.txt (deflated 70%)
updating: profile/psaef.txt (deflated 82%)
updating: profile/top.txt (deflated 83%)
updating: profile/1559275326.tar.gz (deflated 12%)
updating: profile/objects.cache (deflated 98%)
updating: profile/mrtg.tar.gz (stored 0%)
updating: profile/File_Counts.txt (deflated 47%)
updating: profile/ip_addr.txt (deflated 55%)
updating: profile/ipcs.txt (deflated 69%)
Zipping logs directory...
Backup and Zip complete!
I will PM you the profile.zip.

Let me know what you think???

Thanks,
Danny
Locked