In this, the second in a series exploring Business Continuity Planning, we examine Steps 2 and 3: Risk Assessment and Strategy/Plan Development.
BCP continued...
In Part 1 we defined BCP and the five steps leading to business continuity assurance. We also examined Step 1, analyzing the business to identify critical people, processes, and technology. Once the analysis is complete, your ready to move to Step 2, assessing risk.
Step 2 - Assess Risks
The first step in a BCP risk assessment is identifying internal and external threats. Next, look at the critical components of your organization to determine vulnerabilities of each to the identified threats. Finally, determine the business impact of the partial or complete loss of each critical operational component. Some of the areas to address include:
- Loss of short term revenue
- Loss of long term revenue
- Loss of investor confidence
- Loss of key employees
- Loss of facilities or other key fixed assets
As you work through these and other possible business issues, try to change a limited set of variables through the use of scenario planning. Scenario planning enables management and key employees to work through several kinds of business continuity interruptions, and helps determine if the team has considered all critical recovery requirements. You list of scenarios might include:
- One or more facilities are untenable, but the information processing infrastructure is still operational. This could be caused by
- Chemical spills
- Blizzard
- Floods
- One or more facilities no longer exist due to fire, hurricane, explosion, etc.
- All facilities are operational, but a supplier is temporarily shut down because of a catastrophic event
- The central data center is no longer operational, but all other facility functions are capable of normal operation
- In the case of a catastrophic event, many key employees or their families are affected. One or more of these employees might be unable to help with your recovery efforts. So the following questions should be answered as part of BCP (from John Burtle's Beware the Complex Plan...):
- Who is prepared to do what? What activities and conditions will they tolerate?
- Who is not prepared to do certain things?
- What are the general reservations, or things the entire team is reluctant to do?
- List what people can do above and beyond their normal duties. For example, who may have a 4-wheel drive vehicle or unique skills other than those used daily at the office?
Additional scenarios are found in John Burtle's Some thoughts on exercise scenarios and plot lines.
Using the results of scenario planning activities, build a quantitative or qualitative risk assessment chart. The resulting risk scores help with prioritization of process recovery.
Finally, list all key processes in a matrix that includes, at a minimum, the following information:
- The process owner
- Key individuals required to produce the desired outcome(s)
- The technology required to execute each process and any manual tasks used as workarounds
- The maximum number of hours or days the organization can survive without the output of the process
- Any special considerations resulting from scenario planning
- Dependencies (what processes must be operational to support one or more other processes)
After compiling all assessment information, you're ready to begin developing a recovery strategy and plan.
Step 3 - Strategy and Plan Development
Strategy
Before developing your recovery plan, review the risk assessment matrix. Select the appropriate business continuity strategy for each risk. The strategies you develop for each system directly impact recovery. Possible strategies fall into one of three categories:
- Accept the risk
- Transfer the risk
- Reduce the risk to an acceptable level
Accepting the risk means taking no steps to prevent or mitigate the impact of a continuity event. However, planning should include clear recovery steps, steps that minimize business interruption via quick, efficient recovery activities.
Transferring risk includes purchasing business interruption insurance. It's important not be be too short-sighted. Your insurance carrier might pay for short term losses, but you may never recover from the long term effects of the loss of customer or investor confidence.
Reducing risk is typically accomplished by reducing or eliminating vulnerabilities, including,
- A single point of failure, such as a server, router, switch, or firewall
- Lack of proper documentation to rebuild one or more components of a system
- Insufficient skills within the technical teams to quickly recover from system failures
- Lack of agreements with vendors that obligate them to respond within a defined time-frame
- Lack of an overall technology recovery plan, or the presence of an untested plan
- Lack of documented manual processes that can be initiated if automated systems fail
- Lack of cross-training programs that ensure more than one person possesses a critical skill set
- Non-IS personnel are not involved in recovery testing
Strategies for dealing with these and other potential business continuity weaknesses can take many forms. For example, single points of failure can be mitigated by maintaining one or more duplicate components "on the shelf," helping reduce downtime by eliminating equipment acquisition cycles. Another method is implementing redundant components. This provides for minimal downtime through automatic fail-over, from a broken device to one that is either on standby or in a load balancing relationship. Another way to mitigate risk is including the proper maintenance of system build documentation in all project plans. Regardless of the vulnerabilities you identify, ensure you mitigate them so you can recover each system before maximum tolerable downtime is reached.
Planning
Now that the risks are identified, and you've documented strategies for dealing with them, you're ready to build your recovery plans. The following are some recommended steps for creating a successful plan:
- Create a clear communication plan. When a business continuity event occurs, communication is probably the most important recovery activity. All stakeholders must be kept informed of the type of event, the impact on the business overall, and the impact on their teams or departments. Understanding the scope of an event helps managers determine the best course of action to maintain the critical processes for which they're responsible. Other points of contact for inclusion in the communication plan include:
- Fire services
- Law enforcement
- Shareholders
- Press
- Customers
- Insurance carriers
- Vendors
- Create recovery teams. Looking at your recovery requirements, create a team for each specific recovery area. For example, select a team of individuals who will travel to your hot site to rebuild your data center. Another consideration is a team assigned to set up a temporary office environment with phones, workstations, fax machines, and other office equipment necessary to perform day-to-day activities.
- Create easy to follow checklists. When first responding to a business continuity incident, your response teams shouldn't be encumbered with lengthy, verbose technical or process documentation. Rather, they should follow checklists, which quickly guide them through the initial stages of recovery. Reacting quickly during the first few hours is critical to positioning your organization for a successful recovery. Completion of checklists should result in:
- Notification of critical personnel
- Identification of incident type
- Identification of incident scope
- Business impact mitigation
- Initiation of process and technology recovery efforts, if necessary
- Create system/process recovery documentation. In addition to lists of forms and other items necessary to implement temporary manual processes, this step requires the creation of detailed documentation that results in the recovery of all delivery systems. Examples include,
- Server and workstation build documents
- Application and data recovery documents
- Manual process instructions
- Plan for worst-case scenarios. Creating documentation for each possible scenario might not be practical. Your business continuity teams are usually engaged in day-to-day operational activities when they're not working on BCP. In such cases, develop all recovery documentation with the intent to recover from catastrophic events. If your teams are properly trained, they will be able to adapt the plans on-the-fly to lesser incidents. Regular testing will help develop necessary awareness and flexibility.
In Part 3, we'll walk through how to test the plan.
Read other articles in the series...
Key Terms
Maximum Tolerable Downtime (MTD)- MTD is the period during which a specific business process can be down without significant, or irrecoverable, business impact. Every effort should be made to ensure a process is recovered prior to exceeding its MTD.
Business Continuity Planning
Effective business continuity planning is necessary if you want to reduce overall impact of the inevitable system failure. The focus of this series is how to plan for system-level service interruptions.