Security Incident Management Process
This document is a process guide and reference for KPMG GSOC personnel for the security incident management process. The purpose of the process is to ensure that there is rapid triage, accurately investigated and effective responses to the highest priority security incident while ensuring coverage (triage, investigation, and response) of as many lower priority incidents as possible.
This process document does not go into elaborate detail, provide low-level technical procedures, or address all potential outcomes or failure cases. It does not go into depth about Triage or Investigation, which are handled by separate processes. Analysts are expected to maintain and rely on technical work notes and controls to guide their execution of this process, and expected to use their best judgment when minor adaptations are needed.
This document is ultimately owned by the GSOC Director. He or she is responsible for ensuring that this is updated and maintained in response to feedback from GSOC Analysts.
This document is intended for all GSOC members (see Section 2.8 – Responsibilities).
This document should be reviewed on at least a 6 month basis, or at any time that the Constraints or Assumptions (see Appendix) are believed to have changed.
Exceptions to this process can be temporarily authorized by L3 analysts. Documentation of any process exceptions must be provided to GSOC management for potential process modifications.
Failure to adhere to this process must be reported to the lead shift L2 analyst, with a courtesy copy to the L3 and GSOC Operations Manager.
The following roles have overall responsibility for elements of this process. Please note that these are not comprehensive listing of responsibilities of each of the following roles, but represent these roles’ specific responsibilities to support the Security Incident Management.
The GSOC Director is ultimately responsible for the proper functioning of the Security Incident Management process, and to ensure that supporting processes are also healthy. He or she must ensure that this process is reviewed and adjusted on at least a 6-month basis, and authorize process adjustments on a more frequent basis if needed.
The GSOC L1 Analyst is the primary user of the Security Incident Management process. He or she will triage, investigate, and coordinate remediation of all non-escalated security incidents detected via automated or non-automated means, escalate when the Security Incident Management process is not functioning properly, and identify and communicate issues with the Security Incident Management process.
The GSOC L2 Analyst will accept escalations of security incidents when escalated to him or her, triage, investigate, and coordinate remediation of escalated security incidents, provide support to the L1 Analyst in the event of the Security Incident Management process not functioning properly, and identify and communicate issues with the Security Incident Management process.
The GSOC L3 Analyst provide day-to-day oversight of the proper functioning of the security incident management process, ensuring that L1 and L2 analysts are appropriately managing their assigned incident loads. He/she will accept escalations of security incidents when escalated to him or her, triage, investigate, and coordinate remediation of escalated security incidents, provide support to the L1 and L2 Analysts in the event of the Security Incident Management process not functioning properly, and identify and communicate issues with the Security Incident Management process.
The GSOC Tooling Engineer will ensure that security incident detection content will maximize the value of automation to improve the detection, investigation, and remediation of security incidents throughout the Security Incident Management process. This will include the inclusion of threat intelligence indicators and internal device information (e.g., active directory identities, VIP lists) into detection and prioritization content. He or she will also ensure that information about false positives identified by Analysts during the Security Incident Management process is incorporated into updated content rules.
The GSOC Threat/Intel Analyst will ensure that continuously updated threat intelligence (e.g., indicator lists) is provided to support GSOC Tooling Engineer-designed automation. The GSOC Threat/Intel Analyst will also provide internal GSOC intelligence notes as needed to support the Security Incident Management process.
- Content Management Process
- Context Management Process
- Detection Optimisation Process
- Triage Process
- Escalation Process
- Investigation Process
- Intelligence Management Process
- Reporting Service Process
- Content Management Process
- Detection Optimisation Process
- Triage Process
- Investigation Process
- Escalation process
The Security Incident Management process takes as input automated and manually detected security incidents, which are then triaged and investigated by GSOC analysts, and either identified as false positives, or provided to member firms for remediation and eventual closure. Throughout the triage and investigation phases, the GSOC may interact with member firms to support these processes. During the remediation phase, the GSOC supports the member firm’s remediation.
The goal of the security incident management process is to ensure that the following criteria have been met:
- Security Incidents are triaged and investigated in accordance with their respective processes.
- Properly prioritized security incidents are provided to member firms for remediation.
- Member firms are supported throughout the remediation process.
- Metrics are maintained which enable the tracking of the effectiveness and duration of incidents through each phase of the process.
The triage process is conducted in accordance with the Triage Process. Triage is arguably the most important phase of the Security Incident Management Process, as it ensures that the most high-value security incidents are identified. Security Incidents are expected to complete this phase rapidly, either triaged (prioritized, owner identified), identified as false positives and closed, or forwarded to the investigation phase.
The investigation phase is conducted in accordance with the Investigation Process. The investigation process absorbs the majority of effort of GSOC analysts. The output of the investigation phase is a security incident that has been sufficiently analysed to communicate remediation guidance to member firms, or has been identified as a false positive and closed.
This phase of the security incident management process is primarily executed by member firms, but the GSOC is expected to provide good initial guidance to member firms to support remediation, and provide additional support in response to member firm concerns about the remediation process, and ensure accurate information-gathering following completion of remediation.
Several requirements support a successful remediation process, including:
- Consistent, predictable communication to member firms (governed by the Communication Process)
- Sufficient context and remediation guidance provided to member firms to enable successful remediation (described below)
- Consistent information-collection from member firms to both allow confirmation of successful incident remediation (or false positive identification), and to gather and record any information associated with the security incident.
In order to ensure that member firms are empowered to appropriately resolve security incidents, they require sufficient context about the security incident to respond effectively. The following information must be provided to member firms as part of the initial hand-off of a security incident as part of remediation. Information is subdivided into essential, optional, and prohibited.
22.214.171.124 Essential Security Incident Remediation Context
- Original Raw Security Event. Any security event information that was originally used to detect the security incident from member firm data sources (i.e., the raw log event from Security Analytics forwarded by a member firm log collector).
- Associated Raw Security Events. Additional security event information identified during triage or investigation that is believed to be associated with the security incident, and was directly derived from member firm data sources (i.e., additional raw log events from Security Analytics forwarded by a member firm log collector). In some cases, such as when there is a very large number of associated log events, it may be more effective to provide a summary of the additional data, rather than forward them to the member firm.
- Content Rule Description. A brief description of the content rule used to detect the event and define it as incident.
- Priority Reasoning. Priority of the security incident, and the specific criteria that was used to choose the priority. Often, this will be contextual data about the member firm (identity of VIP’s, admins, or high-priority systems) that resulted in a security incident receiving higher prioritization. Providing this as part of the remediation process is essential to allow the member firm to refute or refine the prioritization if the information is incorrect or incomplete.
- Supporting Intelligence Source. If the Security Incident was identified due to it being on an intelligence feed, or if corroborating information was derived from intelligence feed, the source of the information should be included in the context. This intelligence may be open source, commercial, or internal. In the event that the intelligence source is TLP:Red, then that fact should be stated and the intelligence information not provided as part of the initial security incident context.
126.96.36.199 Optional Security Incident Context
- Possible False Positive Information. In the event that the analyst believes where there are cases where this security incident could be identified as a false positive, additional information which would help confirm/refute this should be provided.
- Association with Other Security Incidents. In the event that this incident is part of a pattern with this particular member firm, or part of a broader/larger event, then information about previous security incidents (tracking numbers, etc.,) should be provided if the context could help the member firm better understand next steps associated with the incident. Note: reference to security incidents in other member firms should not be provided in detail – specifically, the member firm details are not to be provided.
188.8.131.52 Prohibited Security Incident Context
- Potentially privacy-related information. In the event that the security incident includes information that the security analyst believes may impact privacy issues (i.e., identification of a specific user’s personal information), this information should not be included in the initial context. Reference to potential privacy-related information should be included, but the actual data not provided.
- Security event log information from other member firms. Under no circumstances should event log information from other member firms be sent as part of the information to a member firm. Reference to these log events can be provided (i.e., stating that the incident is similar to that experienced by other member firms), but must not be included as context.
Member firms may not always have sufficient internal capability to determine next steps to address remediation. The GSOC must provide a recommendation for remediation steps. In general, initial remediation guidance will be high level, but follow-on remediation guidance may be more detailed in response to member firm feedback or support requests.
184.108.40.206 Key Remediation Considerations
It is important to remember that the GSOC has no authority over member firms. The GSOC can recommend remediation steps, but cannot mandate or direct them. If a member firm refuses to remediate an incident, chooses a different remediation mechanism, refuses to provide information requested, this fact should be documented in the incident record.
Other potential outcomes include a member firm responding to a request by stating that a security incident is a false positive, with no additional information provided to explain why it is a false positive. In this case, at least one request for supporting data must be made. If the response to the request is still insufficient, the security incident should still be documented as a false positive, and the incident record annotated to reflect the member firm response.
220.127.116.11 Initial Remediation Guidance
- Proportional. The remediation guidance should be proportional to the security incident priority. Higher priority security incidents require higher levels of confidence regarding remediation, and should also include more detail.
- High-level. The initial remediation guidance should be no more than 4-5 bullets, with an offer of additional support or detailed guidance if desired by the member firm.
- Data requests. In the event that there is specific information required by the GSOC to either enhance intelligence, confirm remediation, or support other incident investigations, this request should be included in the initial guidance.
18.104.22.168 Follow-on Remediation support
The GSOC is expected to support member firm request for assistance with additional detail/information required to support remediation. In general, this is not expected to involve exhaustive, detailed guidance. Remediation guidance should clarify or expand on previously provided recommendations.
In the even that a member firm is requiring too much effort to effectively remediate the security incident, and it is conflicting with security analyst efforts on other incidents, then this should be escalated as needed to resolve.
22.214.171.124 Remediation confirmation
Following report of closure from the member firm, there should be a quick review of log events to confirm that there is no evidence that the security incident remains open. If there has been any data provided by the member firm regarding the remediation closure, this should be reviewed to see if there is any conflicting information which could indicate a failure to remediation the security incident. If the analyst believes there is sufficient data to indicate a failure to successfully remediation the security incident, then the analyst should contact the member firm to communicate the probable remediation failure, and continue remediation support.
In general, confirmation that an incident has been successful remediated is the minimum response required for security incident closure. However, additional data may sometimes be needed.
- False Positive Feedback. In the event that the member firm declares a security incident as a false positive, it is important to gather feedback from the member firm to confirm what specifically allowed it to be confirmed as a false positive, and ensure that feedback is included as input to the tooling engineer as part of the content management and detection optimization process.
- Context Feedback. Often, the member firm’s response will reveal that the GSOC’s understanding of the member firm is flawed, and that there is some misunderstood context associated with the member firm. In this case, the analyst should request supporting/amplifying information from the firm ensure that feedback is recorded internally in accordance with the context management process.
There are multiple potential contingencies that will invoke exceptions to the security incident management process.
There are two major mechanisms that will end or modify the security incident management process at any point within the process. At any point (triage, investigation, or remediation), additional information may be discovered or provided which either identifies the security incident as a false positive, or results in a change in prioritization.
In the case of false positives, the details behind the false positives should be documented when discovered, and before security incident closure. If the reason for the false positive is easily solved by a content change, a content change request should be initiated in accordance with the content management process.
In the case of priority modifications, the modification will result in a shift of the incident handler – from L1 to L2 or vice versa. The reason for the priority modification should be documented in the ticket, and, where appropriate a change to content recommended via the content management process.
In many situations, there may be a significant backlog of security incidents which remain in investigation and remediation, beyond the analyst team’s ability to resolve the incidents in a reasonable timeframe. In this situation, the priority for operations is as follows:
- Triage. Ensure that at least one analyst continues to triage new security incidents as they come in, checking the new incident queue as often as possible, no less than once per hour.
- Prioritize. Handle existing security incidents in priority fashion first. Higher priority security incidents (P1 over P2, P2 over P3), regardless whether investigation or remediation, will always have priority of effort.
- Remediate. If security incidents are of equivalent priority, remediation phase activities always take priority over investigation phase activities.
In the event that a security incident backlog is growing at a rate that prevents continued periodic triage at least once per hour, that P2 or P1 incidents are unable to be acted on, or that P3 incidents remain unopened or unhandled by GSOC staff for one entire shift (12 hours), it is appropriate to escalate to GSOC management to resolve the impasse.
Contingency mechanisms to resolve large security incident backlogs include the following methods. These all require authorization of various levels of GSOC Management (L3, Operations Manager, or Head).
These contingency mechanisms are not considered to be “business as usual” but represent a departure from normal operations in response to an unusual event or significant failure in technology or process. The use of these mechanisms should be avoided unless deemed to be absolutely necessary.
- Group Incident Remediation. In the event that there are a large number of similar incidents affecting a common member firm, a large group may be simultaneously handed off to the member firm and remediated as a group. This requires L3 authorization.
- Group Incident Closure. Large numbers of similar lower priority incidents that have a high-likelihood of false positive may be closed without remediation or hand-off to a member firm. This requires GSOC Manager of Operations authorization.
- Group Incident Deletion. In extreme cases (content error, major outbreak), large numbers of incidents may have to be arbitrarily deleted (from SECOPS, not the SAW) to support adequate backlog management. This requires GSOC Director authorization. This is an extremely unusual case, warranted only if the number/type of security incidents is such that their continued presence in SECOPS dramatically undermines the value of SECOPS statistics or affects system stability.
In some cases, remediation backlogs associated with specific member firms may grow without closure/response by member firms. Significant growth of remediation backlogs may require the escalation process to resolve. Authority constraints of the GSOC mean that is not the responsibility of the GSOC to force member firms to resolve open backlogs.
The primary feedback mechanisms for the Security Incident Management Process are:
- Overall Effectiveness. Reporting Service Process
- Content Issues. Content Management Process
- False Positive Rates. Detection Optimization Process
- Context Issues. Context Management Process
- System Issues. IT Incident Management Process
- Process Issues. Continuous Improvement Process, SECOPS Customization Process
- Constraints and Assumptions
The purpose of this appendix is to describe significant constraints and assumptions that are the key drivers for the design and content of this process. The purpose of identifying these key constraints and assumptions is to ensure that when constraints change or assumptions are disproven, that the processes are examined to ensure that they still apply and are optimized for the goals of the GSOC.
The current expectation for steady state false positive rates is that no more than 50% of all automated security incident detections that enter the Security Incident Management process will be false positives, and that no more than 5% of all security incidents passed to member firms will be false positives. These false positive rate requirements will significantly influence analyst decisions during triage, and during any follow-on analysis. These represent proposed limits, and may change over time. It is expected that during the on-boarding process, and during initial operations, that false positive rates will fluctuate significantly prior to stabilizing. These expected rates represent long-term goals.
The KPMG GSOC must conduct investigations following initial detection to enable sufficient information to be passed to a member firm to enable them to know what action to take.
The KPMG GSOC may not take on Security Incident Remediation/Response efforts. This is the sole responsibility of member firms the KPMG GSOC may only provide advisory support to remediation/response.
The current definition for P4 security incidents will likely result in a very large number of P4 security incidents. Accordingly, it is not expected that P4 priority security incidents will be included in the Security Incident Management. It may well be that, during triage, a P4 security incident is associated with the P3 or higher security incident that is being examined for triage, but it is not intended to be the subject of the Security Incident Management process.