Lesson 20 of 33
In Progress

Incident Management Process

1                  Introduction

The process defined in this document makes use of the workflow and activities defined in the ITS Global Incident Management Process [1] and ITS Global Critical Incident Management Process [2]. Readers are expected to read and understand the ITS Global Incident Management Process [1], ITS Global Critical Incident Management Process [2]. ITS UK Incident Management Process [5] and ITS UK Critical Incident Management Process [6] before applying the process in the GSOC.

1.1              Purpose

This document describes how IT incidents that fall within the scope of the KPMG Global Security Operations Centre (GSOC) are managed to ensure minimal disruption to GSOC services, to the satisfaction of the customer, and in compliance with the GSOC terms of reference and any agreed service levels.

The document also identifies points of interaction with ITS Global, ITS UK and Third Parties, and provides direction on how the GSOC will interact with them to ensure efficient resolution of incidents.

1.2              Scope

This document covers IT incidents that disrupts or that could disrupt one or more GSOC services including incidents that are communicated directly by users, technical staff or vendors, and those identified through monitoring tools.

1.3              Ownership

The responsibility of ownership and ongoing management of this document, including the processes contained therein, rests primarily with the GSOC Director.

1.4              Audience

The intended audience for this document is the GSOC Team and third party service providers.

1.5              Exceptions

All requests for exceptions to this processes contained within this document should be directed to the GSOC Director who, depending on the nature and the scope of the request.

1.6              Reporting Violations

Any violations to this policy should be reported directly to the following email address:

<<gsoc-manager@kpmg.com>>

1.7              Key Definitions

IT Incident: an unplanned interruption to an IT Service or a reduction in the Quality of an IT Service. Failure of a Configuration Item that has not yet impacted Service is also an Incident [4].

Critical IT Incident: refers to an IT incident with an urgent business priority which requires a response that is above and beyond that given to normal incidents.

Urgency: is a measure of how long it will be until an Incident, Problem or Change has a significant impact on the business.

Priority: is a category used to identify the relative importance of an Incident, Problem or Change. Priority is based on impact and urgency and is used to identify required times for actions to be taken.

2.1          Responsibilities

The following roles have responsibilities for respective components of the incident management process. This list is not intended to provide an extensive list of the responsibilities of each of the roles:

1.7.1          GSOC Director

The GSOC Director has the overall accountability of all services provided by the GSOC and all IT incidents that occur within the GSOC.

1.7.2          Tooling Engineer

The Tooling Engineer has the responsibility to investigate and resolve any IT incidents that occur within the GSOC. The Tooling Engineer will also coordinate any resolution of incidents requiring corporation of other stakeholders such as ITS Global and UK, and third party support teams.

1.7.3          Incident Manager

Incident Manager is an assigned role, primarily performed by the GSOC Operations Manager, who has the authority to delegate the role to another member of the GSOC. The incident Manager has the overall responsibility of:

1.7.4          GSOC Operations Manager

The GSOC Operations Manager (GOM) serves as the Incident Manager both for critical and non-critical incidents. They will assign an incident manager.

1.7.5          Third party support teams

Third party support teams have the following responsibilities:

1.7.6          All Analysts

All analysts may be tasked with receiving initial incident requests, i.e. provide first line support. They may also be called upon to assist the Tooling Engineer during the investigation and resolution of the incidents.

2.2          Upstream (Dependent) Processes

2.3          Downstream (Affected) Processes

  • GSOC Change Management Process
  • GSOC Service Management Process
  • GSOC Problem Management Process

2                  Process Overview

This process makes use of the workflows and activities defined in the ITS Global Incident Management Process [1] and ITS Global Critical Incident Management Process [2]. The activities defined are executed within the context of and by the roles defined within the GSOC. This section describes the deltas necessary to make this work within the GSOC.

2.1              Goals

The following are the goals of the GSOC IT Incident Management Process:

  • Identification of the underlying causes of an incident and the best resolution and prevention
  • Restoration of the service as quickly as possible following an incident while ensuring that all details are recorded
  • Reduction of the impact of incidents on the GSOC
  • Reduction of the number of problems resulting from repeated occurrence of incidents.

2.2              Process Workflow

The process workflow for the GSOC Incident Management Process follows the ITS Global Incident Management Process workflow with the following modifications, as indicated in red in the diagram below.

2.2.1          Assigning Priorities

The Priority assigned to a record for the resolution of an Incident depends upon:

  •  The Impact on the business: size, scope and complexity of the Incident
  •  The Urgency to the business: time within which resolution is required
  •  The resource availability
  •  The expected effort in resolving or completing a task.

The assignment of priority values follows the Global Service Desk Prioritization and SLA Definition [3].

2.2.2          Role Mapping

The ITS Global Incident Management Process [1] defines a number of roles within this process. These roles map to the GSOC roles defined in Section 2.1 as follows.

RoleProcess OwnerIncident ManagerService Desk AnalystSupport EngineerIncident Requester
GSOC DirectorYes    
GSOC Operations Manager Yes   
Tooling Engineer   Yes 
All Analysts  Yes Yes
Member firms    Yes

Figure 1: IT Incident Management workflow – showing deltas and interaction points

2.2.3          Incident sources (1.1a)

In addition to Global ITS support incidents may also come from ITS UK and from members of the GSOC. These sources will use the channels and will be processed as indicated in the interactions defined in Section 3.4.

2.2.4          Event monitoring (1.1c)

Event monitoring for identifying IT incidents within the GSOC will initially, version one of the GSOC, be performed using a manual process by the Tooling Engineer until such a time when the GSOC is mature enough.

2.2.5          Invoking the Service Request Management Process (2.2a)

Within the GSOC, any IT incident that has been identified as needing a change to one or more of the services will result in the initiation of a service request. This will result in the invocation of the GSOC Service Request Management Process, defined as part of the GSOC Service Management Process.

2.2.6          Invoking the Problem Management Process (2.8a)

IT incidents that consistently reappear are a sign of an existence of an underlying problem. For this reason, such incidents will be treated as being a problem and hence trigger the GSOC Problem Management Process.

2.2.7          Implementing Changes (4.1a)

All changes identified during the diagnosis and that require to be implemented will be achieved through the GSOC Change Management Process as opposed to this being an optional step. The decision about how to verify and implement the changes required to fix the incident will be determined through the GSOC Change Management Process.

2.3              IT Incident escalation (6.5a)

IT incidents within the GSOC will follow a simplified escalation process as described in this section.

2.3.1          Escalation Mechanisms

The GSOC will utilise one of the following mechanisms for escalating IT incidents.

2.3.1.1         Direct Communication-based Escalations

Based on the nature of the IT incident escalation may be performed using direct communication with email. The information contained within the email should be captured and used to create an incident.

2.3.1.2         ITS Global Service Desk Tool

ITS Global Service Desk Tool (referred to now as the Service Desk Tool) provided by ITS Global must be used to escalate incidents stored within it.

2.3.1.3         Other IT Incident Management Systems

The Tooling Engineer will use the incident management systems provided by the entity to which they intend to escalate an incident by submitting a new ticket for the issue. This may require manual input of data or tool supported export of the data from the system used within the GSOC to the system used by the receiver.

2.3.2          Criteria for escalations and escalation paths

The Tooling Engineer will handle all the IT incidents. The following matrix defines the types of IT Incidents that may result in the Tooling Engineer triggering an escalation request together with the escalation options applicable.

ConditionDescriptionEscalation toEscalation Mechanism
Workload overwhelmingThe Tooling Engineer is unable to keep with the resolution of incidents due to increased workloadGOMEmail
GSOC process not workingOne or more of the processes within the GSOC necessary to resolve an incident is unavailable or not workingGOMEmail
Limited capabilityAn IT incident cannot be resolved because of a lack of expertise in a technologyGOMEmail
Communication breakdownCommunication with an entity outside the GSOC cannot be established or has broken downGOMEmail
Resolution depends on ITS Global/UK incidentThe resolution of an IT incident depends on ITS Global or ITS UK resolving another incident and ITS Global or ITS UK is unable to resolve the incident within their defined SLAsGOMEmail
Solution does not existThe solution to an incident does not exist due to technological limitationsGOMEmail
Assigned Tooling Engineer unable to resolve issueThe Tooling Engineer initially assigned an IT incident is unable to resolve the issue due to limited expertiseTooling EngineerService Desk Tool
L1/L2 AnalystUnable to resolve an IT incidentTooling EngineerService Desk Tool

2.4              Interactions with External Parties

2.4.1          Interactions Matrix

The GSOC will interact with ITS Global, ITS UK, third party support teams and member firms to resolve the incidents. The matrix below shows a combination of incident types and the responsibilities in relation to how interactions will be managed. The activities are achieved through the workflows discussed in proceeding sections.

Figure 2: Interactions matrix showing incident types and responsibilities

2.4.2          Incoming interactions (1.1b)

IT incidents within the GSOC may originate from various sources. How the GSOC processes these incidents will depend on the type of source and the channel they decide to use to report the incident. The Figure below shows the workflow for various sources and channels available to them.

Figure 3: Interactions with incident sources

Incidents will be identified by personnel within the GSOC or entities outside the GSOC. Incidents identified internally may be reported through emails or directly into the Service Desk Tool by the person who has identified the incident.

For incidents reported through email, information will be extracted by the GSOC Service Desk to create a ticket in the Service Desk Tool. This will result in the invocation of activities within the GSOC IT Incident Management Process.

Incidents that are identified by entities outside the GSOC will have two possible channels, i.e. report directly through the Service Desk Tool account provided by ITS Global or use one of the channels defined within the GSOC Communication Process.

Incidents that are identified by ITS Global, and which need to be resolved by the GSOC, will be escalated through the Service Desk Tool as specified in the GSOC Communication Process.

Depending on whether the source of an incident is external or internal to the GSOC, the channels available for reporting the incident will differ.

2.4.3          GSOC initiated interactions (3.4b)

The GSOC may initiate interactions with parties external to the GSOC. The diagram below shows the workflow for incidents that have been determined to require support from parties external to the GSOC. The decision to seek support from other entities as well as which entity to involve will be determined by the GSOC Service Support Model [7] and the interactions matrix defined in Section 3.4.1.

The workflow for initiating such interactions is shown in the diagram below.

Figure 4: Interactions with external support mechanisms

During the IT incident investigation phase, an incident may be identified as requiring support from an external party. In such cases, the GSOC will use the Service Desk Tool to record the incident. This will be received by ITS Global who will determine whether the incident needs to be routed to other entities. This decision will be based upon the support structures as defined in the GSOC Services Support Model [7].

The GSOC will engage directly with the third parties if necessary to provide details as required. However, any updates will still be maintained within the Service Desk Tool. Communication with the third parties will be performed in line with the GSOC Communication Process and upon closing the incident, the Service Desk Tool will be updated as appropriate.

2.5              Critical IT Incident Management

The GSOC will utilise the workflow and follow activities defined in ITS Global Critical Incident Management Process [2] for managing critical incidents within the scope of the GSOC. The following sections define the deltas necessary to achieve this.

2.5.1          Role mapping in Critical Incidents

The roles defined within the ITS Global Critical Incident Management Process map to the GSOC roles are follows:

Risk levelProcess OwnerCritical Incident ManagerTechnical Response Team (TRT)Support EngineerSupport Group Lead
GSOC DirectorYes    
GSOC Operations Manager Yes  Yes
Tooling Engineer  YesYes 
3rd Party Service Provider  Yes  
L1/L2/L3 Analyst  Yes  

2.5.2          Approvals (3.12.10.a and 3.12.4.a)

The ITS Global Critical Incident Management Process [2] specifies that ITS Global will approve the business communication plans. However, when this process is applied within the GSOC context, all communications will be performed in accordance with the GSOC Communications Process and will be approved by the GSOC Director or another member of the GSOC delegated with the authority to do so.

2.5.3          Declaring a critical incident

The authority to declare a critical incident resides with the GSOC Operations Manager. The Tooling Engineer may recommend any incident believed to be critical and which meets the criteria defined in the next section; however, the final decision resides with the GSOC Operations Manager.

Figure 5: Critical incidents workflow

2.5.4          Criteria for defining a critical incident

Any urgent priority incident is a candidate for becoming a critical incident, additional criteria is outlined below and should be considered prior to declaring critical incident:

  • The importance of the business function affected by the incident
  • The number and type of people affected
  • The number of Member Firms affected
  • The number and type of GSOC services affected
  • The elapsed time of an incident which would have significant impact to GSOC employees or that would result in the inability to sustain business for any extended period of time
  • An incident that could lead to security incident if not resolved within a given time frame and that time is nearing expiry or has expired.

Resolution of critical incidents requires additional resources, which results in increased costs when critical incident procedures are followed. Therefore, use of good judgement is expected when declaring a critical incident. The cost of support should be weighed against the urgency and impact to the GSOC as a result of the incident.

2.5.5          Post incident review (3.12.18.a)

Once an incident has been closed, a review will be performed by the Tooling Engineer and the GSOC Operations Manager to identify opportunities for preventing the incident in future and document any lessons learnt. It will be up to the GSOC Operations Manager and the Tooling Engineer to schedule this depending on the criticality of the incident.

3                  References

[1] ITS Global Incident Management Process

[2] ITS Global Critical Incident Management Process

[3] Global Service Desk Prioritization and SLA Definition

[4] ITIL V3 Glossary of Terms, Definitions and Acronyms

[5] ITS UK Incident Management Process

[6] ITS UK Critical Incident Management Process

[7] GSOC Service Support Model