Jobs Subsystem Overview
The jobs subsystem of Nagios Log Server runs on every Log Server instance, and is responsible for running jobs that are scheduled to run. Jobs can be scheduled to run on a specific instance (local jobs) or they can be run on any single instance (global jobs). Additionally, a job can be schedule to run just once, or it can be scheduled to run on a given frequency, e.g. daily, hourly, etc.
Example Jobs
- Local
-
apply_config - Job is scheduled for EACH instance (via instance jobs) to create config snapshot, write logstash configs, restart logstash
-
change_timezone - Changes the timezone on the local instance
-
create_snapshot - creates a config snapshot
-
delete_snapshot - deletes a config snapshot
-
restore_snapshot - restores a config snapshot
-
stop_service - stops a specific service on the local instance
-
start_service - starts a specific service on the local instance
-
restart_service - restarts specific service on the local instance
-
- Global
-
run_alerts - Runs every 20 seconds to send alerts
-
backup_maintenance- Runs every day to perform index maintenance and backups
-
cleanup - deletes old completed tasks from the jobs queue more than 1 day old
-
Many global jobs are able to be run from the Admin > System > Command Subsystem page:
Architecture Components And Execution Flow
The jobs subsystem starts every minute via a cron located at /etc/cron.d/nagioslogserver and runs as the nagios user:
This cron executes a loop that runs every 5 seconds to perform the following actions:
-
Query the elasticsearch index to get a list of local jobs scheduled to be executed this instance.
-
Execute function in command field
-
Update the Audit Report with results _type = JOBS
-
-
Query the elasticsearch index to get a list of global jobs that need to be executed.
-
Execute function in command field
-
Update the Audit Report with results _type = JOBS
-
NOTE: Global jobs are jobs that ANY instance may process, they are NOT executed by all instances.
The general flow of execution of the jobs subsystem works as follows:
-
The jobs.php controller runs as a background process and executes the commands in the process_jobs() method. The jobs.php script is located at /var/www/html/nagioslogserver/application/controllers/jobs.php and runs under cron every minute. The cron job is defined in /etc/cron.d/nagioslogserver
-
The jobs.php script executes the functions listed in the process_jobs() method, and the functions will be located in the cmdsubsys_helper.php located at /var/www/html/nagioslogserver/application/helpers/cmdsubsys_helper.php
-
The poller cron saves output of the run in /var/www/html/nagioslogserver/var/jobs.log
Troubleshooting Problems
Some potential problems with the jobs subsystem, as well as troubleshooting information are listed below:
Problem: Daily Backups are not being processes or alerts are not being run on designated interval
Potential Causes:
-
The jobs scripts may not be running. Run the following from the command line to see if the script is running:
-
ps axuw | grep jobs
-
-
There may be a problem with the cron job. Check the cron file /etc/cron.d/nagioslogserver to ensure the job is not commented out. Execute the following from the command line to look for possible errors:
-
tail /var/log/cron
-
-
Check the /usr/local/nagioslogserver/var/jobs.log log file for errors
-
The nagios user account could be expired - you can check this with the following from the command line:
-
chage -l nagios
-
Final Thoughts
For any support related questions please visit the Nagios Support Forums at: