Major Incident Management AI-application live!

What is Incident Management?

Incident management (IM) is the process to recover a service that does not function properly or is unavailable. A service disruption can be an employee not being able to print, but also payment transactions no longer being processed or an entire company no longer being able to log into their systems. An incident is a major incident if service recovery is critical to the performance of an organisation.

IM protects the ability to do business and avoids costs of non-productivity. Certain services, such as internet banking or iDEAL, have such importance for society that there are legal uptime requirements (and penalties for non-compliance). Issues with these services often escalate to major incidents.

Improving system availability

The new AI application improves the availability of systems by:

Ensuring uniform process execution by experts
Reducing major incident handling time
Preventing incidents from escalating to major incidents

The new application will contribute to a critical business process for the entire bank.

‘Model = Application’ using advanced AI

The structuring methodology of Knowledge Values helped to create a goal-oriented model of the (major) incident process. The advanced Match™ AI and Data Enterprise Platform facilitates the paradigmatic principle that model = application which allows the model to be immediately used by Major Incident Managers (MIMs) to guide them through the process. By consistently using the application within the IM process, uniform process execution is now guaranteed.

Servicenow integration reduces handling time

Once an incident has escalated to a major incident, the responsibility for service recovery is transferred to a MIM. To support the MIM, the application is integrated with the incident administration system ServiceNow. Information about the incident — the service, its importance to the organisation and the potentially involved IT architecture — is automatically retrieved and presented to the MIM. An employee can also enter other essential incident information. The result is an overview of the incident that the MIM can use to kick-start the service recovery.

Automatic progress reporting to stakeholders

The application automatically generates reports during the major incident process (see figure). Benefits:

Siloed information becomes common knowledge. This reduces the chance of costly inappropriate actions based on misinformation or misunderstandings. Also, less time has to be spent distributing information.
Management is always up-to-date on the progress of service recovery. The MIM can focus on service recovery and spend less time informing stakeholders.
The step-wise service recovery progress is automatically saved in a retrospective document. With this document, the incident can be easily analysed afterwards. Without such a feature, reconstructing what happened is difficult, as documenting with precision is difficult during a major incident.

Preventing major incidents

The application should contribute to the goal of preventing incidents from escalating to major incidents, or even prevent incidents from occurring at all. The retrospective document helps identifying structural improvements based on lessons learned during the incident.

A possible future step is the implementation of the application in the normal incident process. The expectation is information gathering functionality will empower employees in the incident proces to solve incidents themselves. The consequence would be that less incidents escalate to major incidents. This is the ideal situation as the best way to handle a major incident is to prevent it.

What’s next?

The disruptive flexibility of The Match™ Technology Platform allows feedback of the MIMs to be quickly integrated into the model. This means that MIM knowledge can be continuously integrated in the application and directly applied during major incidents. The automatically generated documentation during the proces will prove to stakeholders that there is an unprecedented level of control that will contribute strongly to system and service availability.