Abstract
This paper discusses contingency planning for an organization's IT resources. In particular, it discusses the six contingency planning steps; the recovery options for IT resources; the recommended testing requirements; and a recommendation for a proposed 24-month cycle business contingency test plan. The paper concludes that a contingency plan should cover all of an organization’s resources and that the contingency plan for its IT resources should complement and support those of the other resources.
Introduction 4
Contingency Planning Steps 4
Identification of Mission- or Business-critical Functions 5
Identification of the Resources that Support the Critical Functions 5
Anticipation of the Potential Contingencies or Disasters 6
Selection of the Contingency Planning Strategies 6
Implementation of the Contingency Strategies 7
Testing and Revision of Contingency Strategies 8
Recovery Options 9
Processing Capability 10
Automated Applications and Data 11
Computer-based Services 11
Physical Infrastructure 12
Recommended Testing Requirements 12
Recommendation for a Proposed 24-Month Cycle Business Contingency Testing Plan 13
Conclusion 14
References 16
Service Restoration and Business Continuity
Introduction
Contingency planning for IT resources is an important endeavor as disruptions in the IT operations can lead to disruptions in the critical mission and business functions of the organization (NIST, 2007). Such disruptions can be caused by storms, fires, hardware failures, or power outages. In addition, contingency planning is associated to incident handling, which mostly addresses malicious technical threats such as viruses and hackers. According to Seese (2010), a good contingency plan does not only enable the organization to surpass a disaster but even impresses the organization’s customers, in turn enabling the organization to make a profit.
This paper focuses on contingency planning for an organization’s IT resources and discusses the planning steps for handling these contingencies, the recovery options, the recommended testing requirements, and a recommendation for a proposed 24-month cycle business test plan.
Contingency Planning Steps
When planning for contingencies, the small, large, mundane, and sensational contingencies must all be considered (Seese, 2010). In particular, contingency planning consists of the following six steps: 1.) Identification of mission- or business-critical functions; 2.) Identification of the resources that support the critical functions; 3.) Anticipation of the potential contingencies or disasters; 4.) Selection of the contingency planning strategies; 5.) Implementation of the contingency strategies; and 6.) The testing and revision of the strategy (NIST, 2007).
Identification of Mission- or Business-critical Functions
The mission-or business-critical functions can be identified through the development of a business plan, which will be used to support contingency planning (NIST 2007). The business plan can be used not only for the identification of critical businesses and missions but also for the assignment of priorities to these functions. Since having fully redundant capabilities for every function will be very costly, it is important to prioritize the most critical functions. This way, even if not all of the functions are able to continue during a disaster (i.e. a very destructive event), it is still possible for the organization to function without having to compromise the most important areas of the organization’s operations; hence, enabling the organization to cope and get by until the problem is resolved.
Identification of the Resources that Support the Critical Functions
Next, the resources that support the critical functions must be identified. In particular, the time frames when these resources are needed must be identified (daily, weekly, monthly, etc.). As well, the effects of these resources’ unavailability on the organization’s mission and operations must be identified (NIST, 2007). How the various departments’ resources interact to support the organization’s strategic objectives must also be determined and even the non-IT resources must be included in the contingency plan. In this regard, the people who will be tasked with the analysis of the needed resources must have a good understanding of how the functions are performed, as well as what their dependencies are in relation to the other resources in the organization. With this information, the organization is better able to prioritize the resources, especially since not all resources are important in the operations of critical functions.
More specifically, the six categories of resources that must be analyzed include the human resources; the processing capability; the automated applications and data; the computer-based services; the physical infrastructure; and the documents and papers (NIST, 2007). Moreover, the contingency planning teams must consist of personnel that would be representative of the various resource categories. In this regard, the contingency planning teams must consist of a technology management group; a facilities management group; and the business-oriented groups, although these teams may also include those that are involved in public affairs, in physical security, computer security, safety, training, personnel, and financial management.
Anticipation of the Potential Contingencies or Disasters
After identifying the resources that would support the critical functions, it is then necessary to identify the problems that can possibly occur. A development of scenarios will aid in the development of a plan that would enable the organization to address the various problems that can take place and these scenarios should include both big and small problems (NIST, 2007). The contingency planning teams must also conduct some research and be imaginative and creative so that a comprehensive set of scenarios can be developed.
Selection of the Contingency Planning Strategies
After developing the scenarios of when contingencies may be encountered, it is now necessary to create a plan on how to recover the necessary resources. When considering the various options for recovery, it is also important to identify the controls that are in place to minimize and prevent contingencies. Since not all contingencies can be prevented, the efforts to prevent and recover from such contingencies must be coordinated.
According to NIST (2007), contingency planning consists of three parts, namely “emergency response, recovery, and resumption” (NIST, 2007). In particular, the emergency response includes all the initial steps taken to minimize damage and to protect lives. On the other hand, recovery pertains to the actions taken to enable the continuation of support for the critical functions. Finally, resumption is when the operations return to normal.
When selecting a strategy for recovery, the cost and feasibility must be considered, as well as the various resource categories. The risks should also be assessed in order to determine the cost estimates of various options so that an optimal strategy may be developed. However, Kendrick (2009) asserts that it would be better to deal with the effects of risks rather than with their causes, as these risks are inevitable no matter how hard the organization tries to prevent it. Moreover, some of the causes of these risks cannot be controlled.
Implementation of the Contingency Strategies
In order to successfully implement the selected contingency strategies, the necessary preparations must be made, the strategies must be documented, and the employees must be trained (NIST, 2007). These tasks do not usually have to be performed in sequence but are mostly works in progress.
In particular, the preparations that need to be made are those that would enable the contingency plan to support the resources and protect the critical functions. An example of a common preparation is the development of procedures for performing backups of the organization’s applications and files. Another example is the preparation of the agreements and contracts that may be necessary for the implementation of a contingency strategy. These may also include renegotiations for existing service contracts and the purchase of equipment. As well, information about the following must be obtained and kept: hardware assets; software assets; IT staff/human resources; system data assets; third-party service providers, and network, printers, and other peripherals (Childs & Dietrich, 2003)
In addition, these preparations must be kept up-to-date, considering that changes in computer systems, redundant equipment, and backup services are very rapid. As well, equipment must be regularly maintained and replaced, if necessary, and people should be designated for the various tasks that need to be performed in the event of a contingency. It should also be decided how many contingency plans are to be created and who should be responsible for the creation of the said plans. Moreover, the contingency plan must be formally documented and kept up-to-date in order to reflect the changes with the system and with the other factors involved. These plans must then be kept in a safe place and if possible, in multiple sites or locations. As well, the contingency plans must be written in procedural form and in simple language, so that any employee can easily perform them, especially when the person who created the plan is unavailable.
Finally, the employees must be provided with training on how to perform their contingency-related tasks. These trainings should include those that are provided for new personnel, as well as refresher trainings for the existing personnel. Most importantly, however, the employees should be trained on how to effectively respond in emergencies, as these situations leave no room for double checking the procedures. Emergency drills should also be performed in order to enable the employees to practice.
Testing and Revision of the Contingency Strategies
Testing should be regularly performed in order to detect or anticipate flaws in the implementation of the contingency plans and because there are bound to be changes in the resources that are needed to support the organization’s critical functions. In the same regard, people should be delegated for keeping the contingency plan updated.
Some of the types of testing include simulations of disasters, analyses, and reviews (NIST, 2007). In particular, a review can be performed to ensure accuracy of the contingency plan’s documentation. This can be done while in the process of updating the documentation. As well, the review can include testing of whether the employees are familiar with the emergency procedures and testing of whether file restoration from backup tapes is possible.
On the other hand, analyses of the entire contingency plan or of portions of it can be more effectively done by an employee who was not involved in the creation of the plan but who has a good understanding of the critical functions and the resources that support them. The analyst then mentally follows the flow of the plan to ensure that the process has no flaws. As well, the analyst can help fill the missing gaps by interviewing and asking for feedback from resource and functional managers.
Finally, contingency or disaster simulations enable not only the identification of flaws in the contingency plan but also enable the employees to practice the emergency response procedures. Although these simulations can be quite costly, they are worthwhile, especially for the more critical functions. Examples would include fire and earthquake drills.
Upon identification of the tests that are required, a test plan must be created and this test plan must be included in the contingency plan (Swanson, Wohl, Pope et al., 2002).
Recovery Options
Although contingency plans should cover both the IT and non-IT resources, this section focuses on the recovery options particularly for the IT resources. More specifically, these resources are those in the categories of processing capability; automated applications and data; computer-based services, and the physical infrastructure (NIST, 2007). Although the physical infrastructure does not entirely consist of IT resources, the IT resources still need a stable physical infrastructure in which to be stored or in which to operate.
Processing Capability
Processing capability resources refer to the data centers, personal computers, workstations, minicomputers, and local area networks (NIST, 2007).
The options for recovering the processing capability include having a hot site, a cold site, reciprocal agreements, and hybrids (NIST, 2007). In particular, a hot site is a building that already has processing capabilities and other services. A cold site is a building that houses processes, which can easily be accessed and used. On the other hand, a redundant site is a site that has the same configuration and equipment as the primary site. However, as this can be very costly, partial redundancy is possible where some spare LAN servers or personal computers are maintained for redundancy.
On the other hand, reciprocal agreement is when two organizations agree to back each other up. However, this has proven to be an ineffective strategy as personnel changes and difficulties in keeping plans and systems up-to-date can prevent both organizations from keeping their end of the agreement (NIST, 2007). Finally, hybrids are recovery options that combine any two of the aforementioned alternatives. An example is having a hot site as a backup for a redundant or reciprocal agreement. As well, it may be necessary to create new contracts for the replacement of equipment.
Automated Applications and Data
These resources pertain to the applications and the hardware that are necessary for the processing of data (NIST, 2007). It should be ensured that the hardware and the applications are compatible, especially when the applications are run on various machines. As well, the compatibility between the application software and the operating system must be ensured, and this would include considerations for the compatibility of their configurations and versions.
Recovery options for these resources include offsite storage and regular backup. With these options, the frequency of performing the backup should be determined, as well as the frequency for storing data off-site and the manner by which data is transported to the other site.
Computer-based Services
While an organization uses many types of computer-based services for its operations, two of the most important computer-based services are the information and communications services (NIST, 2007). Communications include both data and voice communications while information services include information sources that are external to the organization. These can include bulletin boards, news services, and online private and public databases.
Service providers often offer contingency services, such as the rerouting of calls to a new location or the rerouting of traffic by data communications carriers. Some host sites are also able to receive both voice and data communications, so if one service provider is down then it may be possible to use another.
Physical Infrastructure
Physical infrastructure refers to the work environment, along with the necessary utilities and equipment (NIST, 2007). This includes resources, such as electricity, space, terminals, personal computers, fax machines, telephones, desks, sewage, water, venting, cooling, and heating.
As a contingency strategy, cold and hot sites can be used for office space. Alternatively, contractual arrangements can be made for security services, office space, and furniture when a contingency occurs. If one of the contingency strategies selected is offsite transfer, then procedures must be in place for the smooth transition either back to the main operating facility or to a new facility. In addition, the physical infrastructure’s protection must be included in the emergency response plan. Examples include the protection of the physical infrastructure from fire or from water damage.
Recommended Testing Requirements
Every element of the contingency plan must be tested to ensure that the individual recovery procedures are accurate and that overall, the plan is effective (ISMF, n.d.). In particular, the following areas should be tested: notification procedures; restoration of normal operations; system performance with the use of alternate equipment; external and internal connectivity; coordination among the recovery teams, and system recovery on an alternate platform from the backup media (ISMF, n.d.).
As well, the training that is provided to the employees with regards to their contingency tasks must complement the testing. Training should be extensive enough to enable the employees to perform the contingency plan procedures without needing the actual guide. In this regard, the employees should be trained on the following and their knowledge on these areas should be tested, too: purpose of the plan; individual responsibilities; team-specific processes; security requirements; reporting procedures; and cross-team communication and coordination (ISMF, n.d.).
Recommendation for a Proposed 24-Month Cycle Business Contingency Testing Plan
The following tests will be recommended for a 24-month cycle business contingency testing plan:
Conclusion
This paper discussed the steps that should be taken when planning for contingencies, particularly those that are related to an organization’s IT resources. More specifically, this paper discussed the planning steps that consist of identifying mission- or business-critical functions; identifying the resources that support these functions; anticipating the potential contingencies; selecting the contingency planning strategies; implementing these strategies; and testing and revising of these strategies.
Since this paper focused on contingency planning for IT resources, the recovery options discussed were for the restoration of an organization's processing capability, automated applications and data, and the physical infrastructure. The recommended testing requirements were also discussed, along with the recommendations for a for a proposed 24-month cycle business contingency test plan.
Although this paper covered all of the things that must be considered for IT contingency planning, it should be kept in mind that all of an organization's resources must be considered for contingency planning as they all support and interact with each other in order to support an organization's operations. As such, the IT resource contingency plan must be complementary to the contingency plans of the other organizational resources.
References
Childs, D. R. & Dietrich, S. (2003). Contingency planning and disaster recovery: A Small
business guide. John Wiley & Sons.
ISMF. (n.d.). Contingency planning guide. Retrieved from http://www.moct.gov.sy/
ICTSandards/en_pdf/5.pdf.
Kendrick, T. (2009). Identifying and managing project risk: Essential tools for failure-
proofing your project. AMACOM Div American Mgmt Assn.
NIST. (2007, July 6). Chapter 11: Preparing for contingencies and disasters. In Special
Publication 800-12: An Introduction to Computer Security - The NIST Handbook.
Retrieved from http://csrc.nist.gov/publications/nistpubs/800-12/800-12-html/chapter11-
printable.html.
Seese, M. (2010). Scrappy business contingency planning: How to bullet-proof your business
and laugh at volcanoes, tornadoes, locust plagues, and hard drive crashes. Happy About.
Swanson, M., Wohl, A., Pope, L., Grance, T., Hash, J. & Thomas, R. (2002). Contingency
planning guide for information technology system: Recommendations of the National
http://www.au.af.mil/au/awc/awcgate/nist/sp800-34.pdf.