Opsview/Nagios MySQL DB corruption

Seems the runtime.nagios_servicechecks InnoDB table is prone to corruption if the server crashes.

Often /var/log/opsview/opsviewd.log has errors such as:
[2018/11/12 19:21:24] [import_ndologsd] [WARN] Failed to import 1542029786.120211
[2018/11/12 19:21:24] [import_ndologsd] [FATAL] Error for 1542029791.083171: Can't call method "execute" on an undefined value at /usr/local/nagios/bin/../lib/Opsview/Utils/NDOLogsImporter.pm line 1163.

 

Other InnoDB tables are also prone – to find out which, run:

mysqlcheck -u <user> -p runtime

Which will display all the “good” tables up to the bad one.  To find the bad one, in the MySQL CLI, run:

MySQL> use runtime;

mysql> show tables;
+----------------------------------------+
| Tables_in_runtime |
+----------------------------------------+
| nagios_acknowledgements |
| nagios_commands |
| nagios_commenthistory |
| nagios_comments |
| nagios_configfiles |
| nagios_configfilevariables |
| nagios_conninfo |
| nagios_contact_addresses |
| nagios_contact_notificationcommands |
| nagios_contactgroup_members |
| nagios_contactgroups |
| nagios_contactnotificationmethods |
| nagios_contactnotifications |
| nagios_contacts |
| nagios_contactstatus |
| nagios_customvariables |
| nagios_customvariablestatus |
| nagios_database_version |
| nagios_dbversion |
| nagios_downtimehistory |
| nagios_eventhandlers |
| nagios_externalcommands |
| nagios_flappinghistory |
| nagios_host_contactgroups |
| nagios_host_contacts |
| nagios_host_parenthosts |
| nagios_hostchecks |
| nagios_hostdependencies |
| nagios_hostescalation_contactgroups |
| nagios_hostescalation_contacts |
| nagios_hostescalations |
| nagios_hostgroup_members |
| nagios_hostgroups |
| nagios_hosts |
| nagios_hoststatus |
| nagios_instances |
| nagios_logentries |
| nagios_notifications |
| nagios_objects |
| nagios_processevents |
| nagios_programstatus |
| nagios_runtimevariables |
| nagios_scheduleddowntime |
| nagios_schema_version |
| nagios_service_contactgroups |
| nagios_service_contacts |
| nagios_servicechecks |
| nagios_servicedependencies |
| nagios_serviceescalation_contactgroups |
| nagios_serviceescalation_contacts |
| nagios_serviceescalations |
| nagios_servicegroup_members |
| nagios_servicegroups |
| nagios_services |
| nagios_servicestatus |
| nagios_statehistory |
| nagios_systemcommands |
| nagios_timedeventqueue |
| nagios_timedevents |
| nagios_timeperiod_timeranges |
| nagios_timeperiods |
| opsview_contact_hosts |
| opsview_contact_objects |
| opsview_contact_services |
| opsview_contacts |
| opsview_database_version |
| opsview_host_objects |
| opsview_host_services |
| opsview_hostgroup_hosts |
| opsview_hostgroups |
| opsview_hosts |
| opsview_hosts_matpaths |
| opsview_monitoringclusternodes |
| opsview_monitoringservers |
| opsview_performance_metrics |
| opsview_servicechecks |
| opsview_servicegroups |
| opsview_topology_map |
| opsview_viewports |
| schema_version |
| snmptrapdebug |
| snmptrapexceptions |
| snmptrapruledebug |
+----------------------------------------+
83 rows in set (0.00 sec)

 

(Sometimes you might have to add innodb_force_recovery = 4 to the [mysqld] section of my.cnf and restart MySQL. Note - THIS IS DANGEROUS! Stop Opvsiew first. Remember to remove this line, and restart MySQL before restarting Opsview)

 

 

Resolution is to get as much data out to a duplicate table, drop table, and then duplicate back…

 

Stop Opsview:

/etc/init.d/opsview stop
/etc/init.d/opsview-agent stop
/etc/init.d/opsview-web stop

 

mysql -u root -p

use runtime;
create table nagios_servicechecksnew like nagios_servicechecks;
insert nagios_servicechecksnew select * from nagios_servicechecks where servicecheck_id not in (select servicecheck_id from nagios_servicechecksnew);

This will error after a while with a SQL Server crash. Run same command below, but starting limit high, and slowly reducing to 1.

insert nagios_servicechecksnew select * from nagios_servicechecks where servicecheck_id not in (select servicecheck_id from nagios_servicechecksnew) limit 1;

Delete table, and then duplicate info back

drop table nagios_servicechecks;
create table nagios_servicechecks like nagios_servicechecksnew;
insert nagios_servicechecks select * from nagios_servicechecksnew where servicecheck_id not in (select servicecheck_id from nagios_servicechecks);
drop table nagios_servicechecksnew;

 

Start Opsview

/etc/init.d/opsview start
/etc/init.d/opsview-agent start
/etc/init.d/opsview-web start