Major Incident Definition

A major incident can be categorised by each of the following statements:

  • When a single event results in total service failure for several customers or services.
  • When a managed IT customer has a total loss or failure of a service at either a critical location, multiple locations, or around a critical time in their business calendar.
  • Any incidents deemed a major incident by a member of the Timico management team.

Major Incident Team

A major incident can be categorised by each of the following statements:
During a major incident, there several key roles that need to be fulfilled, the roles may be fulfilled by one or more individuals.

  • Major Incident Manager
    Runs the major incident and is responsible for assigning people to the roles within the major incident team
  • Technical Lead
    Responsible for coordinating and managing the engineers working on the technical resolution of the major incident
  • SLA Manager
    Responsible for the communications, documenting the time line during a major incident and the producing the incident report following confirmed resolution of the major incident
  • Change Manager
    Responsible for ensuring any changes are reviewed and approved before to implementation
  • Problem Manager
    Responsible for ensuring that the service improvement opportunities are followed up within the agreed time frames as detailed in the incident report
  • Disaster Recovery Manager
    Responsible for advising on whether the business disaster recovery plan should be considered

Major Incident Communications

During a major incident, there are different levels of communication required to both progress the incident and update interested parties. The SLA Manager will agree the forms of communication
required during the major incident. Options for communication methods could include –

  • Telephone call updates
  • ServiceNow incident updates
  • Status Page updates
  • SMS message updates
  • Conference bridge updates

During a major incident updates will be provided at least every hour for the duration of the incident.

Incident Report

Following confirmation of resolution to a major incident the SLA Manager will aim to issue an interim Root Cause Analysis (RCA) within 5 standard business days and a full report once the required information is available.