livestatus not picking up comments after nagios restart
livestatus not picking up comments after nagios restart
Hi,
Not sure if anyone else has come across this, but the issue I've been having is when reloading the nagios configuration files (or a stop/start of the nagios process) a select few comments aren't accessible by the mklivestatus backend. Reload again, and they return!
The Nagios core front end can see these comments the entire time, but mklivestatus sometimes cant see them.
Removing the comments and re-adding them sometime' fixes the issue but sometimes different comments (on the same host dissapear)
This affects both host and service comments and the only non standard thing that I've noticed is the host name has hyphens.
tried on mk-livestatus-1.2.0p2 mk-livestatus-1.2.1i2 mk-livestatus-1.2.1i4
currently trying to replicate the issue on a test environment
Not sure if anyone else has come across this, but the issue I've been having is when reloading the nagios configuration files (or a stop/start of the nagios process) a select few comments aren't accessible by the mklivestatus backend. Reload again, and they return!
The Nagios core front end can see these comments the entire time, but mklivestatus sometimes cant see them.
Removing the comments and re-adding them sometime' fixes the issue but sometimes different comments (on the same host dissapear)
This affects both host and service comments and the only non standard thing that I've noticed is the host name has hyphens.
tried on mk-livestatus-1.2.0p2 mk-livestatus-1.2.1i2 mk-livestatus-1.2.1i4
currently trying to replicate the issue on a test environment
Re: livestatus not picking up comments after nagios restart
I have not heard of this bug yet, if you can reproduce it in your test environment with a limited number of variables, definitely let us know. I dug through mk's changelog and it does not look like there has been any recent issues with the comments. You may want to ask around their community as well.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: livestatus not picking up comments after nagios restart
Thanks for the response
After further investigation it appears to be duplicated comment ids
Some host / service/ downtime or acknowledgement comments share the same comment id - mklivestatus can only show one of these whereas the nagios web interface will show them (also we can see them using NDO as a backend)
I'm trying to work out the situations that cause comment ids to be duplicated to further understand the issue
sudo cat /var/lib/nagios3/retention.dat |grep comment_id= | uniq -c|sort
After further investigation it appears to be duplicated comment ids
Some host / service/ downtime or acknowledgement comments share the same comment id - mklivestatus can only show one of these whereas the nagios web interface will show them (also we can see them using NDO as a backend)
I'm trying to work out the situations that cause comment ids to be duplicated to further understand the issue
sudo cat /var/lib/nagios3/retention.dat |grep comment_id= | uniq -c|sort
Re: livestatus not picking up comments after nagios restart
Good to know, keep us updated. You may want to report your findings to the mk community as well.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: livestatus not picking up comments after nagios restart
This is what I posted there yesterday: http://lists.mathias-kettner.de/piperma ... 08628.htmlabrist wrote:Good to know, keep us updated. You may want to report your findings to the mk community as well.
The issue looks to be that Nagios can create a hostcomment and a servicecomment that have the same comment_id.
I trawled through the Livestatus code a bit overnight and it certainly looks like they consider comment_id to be unique as that's what it adds/deletes comments based on (I think, my C/C++ isn't that good, I'm a Perl person). From Livestatus TableDownComm.cc:
Code: Select all
void TableDownComm::addComment(nebstruct_comment_data *data) {
unsigned long id = data->comment_id;
if (data->type == NEBTYPE_COMMENT_ADD || data->type == NEBTYPE_COMMENT_LOAD) {
add(new Comment(data));
}
else if (data->type == NEBTYPE_COMMENT_DELETE) {
remove(id);
}
}
...
void TableDownComm::add(DowntimeOrComment *data)
{
_entries_t::iterator it = _entries.find(data->_id);
// might be update -> delete previous data set
if (it != _entries.end()) {
delete it->second;
_entries.erase(it);
}
_entries.insert(make_pair(data->_id, data));
}
So we're a bit stumped. We don't know if this is Nagios doing something weird and creating duplicate comment_id's (where it shouldn't) or Livestatus not handling duplicate comment_id's (where it should).
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: livestatus not picking up comments after nagios restart
This appears that livestatus isn't taking into account the comment_type
So it is very possible to have 2 comment_id's that are the same with different comment_type's
Code: Select all
comment_type =Indicates whether this is a host or service comment
1 = Host comment
2 = Service commentRe: livestatus not picking up comments after nagios restart
Yep, I was wondering if it was something like that.scottwilkerson wrote:This appears that livestatus isn't taking into account the comment_type
So it is very possible to have 2 comment_id's that are the same with different comment_type'sCode: Select all
comment_type =Indicates whether this is a host or service comment 1 = Host comment 2 = Service comment
I've hunted around to see if comment_id's are supposed to be globally unique or only unique within host/service comment lists - and I haven't been able to find anything. I've tried looking through the Nagios source to see if I can find this out myself but again, my C/C++ skills aren't that great (think I did 1 class 20 odd years ago).
Cheers for the quick response.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: livestatus not picking up comments after nagios restart
No problem. If you could add that to your post on the mk list that would be great, I am sure they will get the bug sorted soon...
Re: livestatus not picking up comments after nagios restart
Done.scottwilkerson wrote:No problem. If you could add that to your post on the mk list that would be great, I am sure they will get the bug sorted soon...
What's confusing me is that there is only 1 next_comment_id stored in the status.dat and retention.dat files, however when adding a comment it calls xcddefault_add_new_host/service_comment which in turn calls find_host/service_comment(next_comment_id) and checks for the next available host or service comment_id .. which is fine - but is there any reason why next_comment_id would be reset to a low value while there are comments in the system?
Currently our highest is comment_id=10953, however next_comment_id=502. We definitely haven't got anywhere near 4.3 billion comments through the system to roll the unsigned long int next_comment_id figure over .. not yet anyway.
It's almost like this bit of code from xdata/xcddefault.c isn't always initializing the comments properly (or is being called before the retention.dat file is parsed) and is resetting next_comment_id back to 1:
Code: Select all
/* initialize comment data */
int xcddefault_initialize_comment_data(char *main_config_file) {
comment *temp_comment = NULL;
/* find the new starting index for comment id if its missing*/
if(next_comment_id == 0L) {
for(temp_comment = comment_list; temp_comment != NULL; temp_comment = temp_comment->next) {
if(temp_comment->comment_id >= next_comment_id)
next_comment_id = temp_comment->comment_id + 1;
}
}
/* initialize next comment id if necessary */
if(next_comment_id == 0L)
next_comment_id = 1;
return OK;
}
Sorry if I'm asking stupid questions, I'm just trying to understand what's going on - partially so I understand but also so I can pass this on to problem managers who are wanting us to get our daily reporting (hosts and services with notifications disabled that have no comments) working.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: livestatus not picking up comments after nagios restart
I'm going to have our Core developer take a look at this....