Business Continuity Plan for MDL
Business Continuity Plan for MDL
Introduction
Military Delivery Logistics (MDL) is a certified, U.S. Department of Defense approved contractor for the full range of logistics and product distribution services for the Federal, Defense, Intelligence, State and Local governments. It has a vendor network that allows access to more than 200,000 military standard and custom engineered products. It is based out of Santa Monica, California. It has a complex IT infrastructure to supply logistics services and has many terabytes of data that is critical. However, it has about five terabytes of data that is mission critical and has to be highly available. Santa Monica is in the Los Angeles County, California and is the county that has the maximum number of disasters declared since 1964.
Business Continuity Plan
Business Continuity Plan (BCP) consists of creating and validating a plan that will maintain the business as usual (BAU) before, during, and after a disruption or a disaster. It focuses on the processes involved in managing the exposure to the internal and external threats that can disrupt a business through risk management and the documentation of plans and processes to maintain BAU. The key driver for BCP is how much of the disruption is tolerable for the organization and the budget that is available for BCP. Disaster Recovery (DR) is a part of business continuity and is focused on the people, technologies, and processes involved in the critical business operations. While it consists of mostly IT portion of the business continuity, it also includes non-technology assets, people, and processes. Some of the components of DR planning merge with those of the BC planning. Figure 1 shows the relationship between BCP and DRP. The BC/DR planning includes 1) Project initiation, 2) Risk assessment, 3) Business impact analysis, 4) Mitigation strategy development, 5) Training, testing, and auditing, and 7) Plan maintenance.
Figure 1: Business continuity and disaster recovery cycle
Source:
Though BCP and DRP can be taken up as different projects, it is advisable to manage it as one integrated project as there might otherwise be issues with handing over points. Support of the top management, the involvement of the users, project manager who is experienced, scope and objectives that are clearly defined, and project management processes are necessary to succeed. Since the BC/DR plan addresses all the critical aspects of the organization and needs the participation of members from all key areas, the top management backing is needed to pull people away from their other tasks and contribute to this project. Defining objectives clearly for the BC/DR planning project is an important point and in this case, the objective is to protect the five terabytes of the critical data and achieve BAU seamlessly. Some of the plans that need to be developed as part of the BC/DR planning are; 1) Business continuity plan (BCP), 2) Business Recovery or Resumption plan (BRP), 3) Continuity of Operations plan (COOP), 4) Continuity of support plan or IT contingency Plan, 5) Disaster Recovery Plan (DRP), 6) Crisis communication plan, 7) Occupant Emergency Plan (OEP), and 8) Cyber Incident Response plan.
Risk Assessment
The risk management process completed earlier would have identified the assets, completed the threat analysis, vulnerability assessment, and populated a risk register with mitigation actions. The risk register would include threats, current controls, and the risk based on threat likelihood and impact. After all this process, there would be some residual risk for which the contingency plans have to be put in place. Threats that can be considered are; 1) Natural (such as hurricane, tornado, earthquake, flood, and fire), 2) Human (such as human error, sabotage, injection of malicious code, and terrorist attack), and 3) Environmental (such as failure of the equipment, software bugs, and outages).
Business Impact Assessment
Business Impact Assessment (BIA) identifies the critical business functions and the impact of not having those functions. MDL is a logistics company that helps the vendors to supply material to the military during their peace and war efforts. In the case of a war, it is very important that the military has the required material at the location that it needs at the right time. A lot of coordination is required for this. The software applications that run this are mission critical as any downtime in those could mean that the U.S. could lose a war. Even when there is no war, these systems are critical as the entire systems should always be in a ready state. Since the security of the nation is at stake, these systems are extremely critical and they should be highly available with no downtime. The information security triad (CIA) of Confidentiality, Integrity, and Availability is highly applicable to these applications.
The management has to take a decision on maximum tolerable outage (MTO) or Maximum Tolerable Downtime (MTD) for these applications. The interdependencies are identified and the similar MTOs (or MTDs) are applied to them. Impact on operations assessed priorities are identified, recovery time requirements are developed, and the financial, legal, and operational impacts are identified. Apart from these mission critical components other functions are categorized as essential functions (Vital), necessary functions (important), and desirable functions (minor). Figure 2 shows the business recovery timeline where RTO is the recovery time objective (The time available for recovering the systems), WRT is the work recovery time required to verify the system and data integrity, RPO which is the extent of data loss that can be tolerated and is usually the time between the last backup and the current state of data. In the present scenario, the RPO is zero so the data should be highly available, highly redundant, and should have no single point of failure as it could have a catastrophic impact on the wartime efforts.
Figure 2: Business recovery timeline
Source:
Mitigation strategy development
The mission critical nature of the application and data, as well as the strict requirements of preserving the CIA triad, has resulted in the adoption of virtualization/cloud computing disaster recovery for MDL. The virtual server encapsulates everything including the operating system, applications, and data. It is easier to copy or backup the virtual server to an offsite data center in minutes in the event of a disaster. An alternative hot site with the minimum infrastructure is maintained. The virtual server is immediately transferred to the second site accurately and safely without having to reload each component of the server. This cloud-based disaster recovery can also be used to replicate the full network, saving the time for configuring the network, firewall rules, VPNs, and VLAN. This cuts down the time drastically for starting operations at the alternative site. Redundancy ensures that in case the outage is local or a component outage and not a site-wide outage, then the alternative components are brought online due to failover procedures. By adopting a disaster recovery as a service (DRaaS), there are no upfront costs and it is in a pay-per-use model.
Other risk mitigation strategies were developed as part of risk management and include constitution of various controls such as physical, technical, and administrative controls. Alternate site selection is a multi-criteria decision making (MCDM) problem and several strategies such as Decision Making Trial and Evaluation Laboratory (DEMATEL) and the Analytic Network Process (ANP) are used. There are three types of recovery options; 1) As needed, 2) Prearranged, and 3) Pre-established.
As needed: Acquiring resources at the time of disaster at the prevailing market rates. No contracts are required beforehand. However, in the case of a widespread disaster such as an earthquake, material prices may skyrocket or sometimes may not even be available for any price .
Prearranged: Making arrangements in advance for the quick shipment of supplies and later through vendor agreements later. The contracts could be at a higher price, but costs can be contained because they are known in advance and provisions can be made. The contracts should be with reputed vendors who can deliver and they should have strict SLAs.
Pre-established: The alternative options are purchased, configured, and implemented in advance and only used in case of a disaster to recover. This may be the cheaper option as the purchases can be timed, but in the long run, it might prove expensive as these systems have to be upgraded on a par with the production systems and then discarded as they become obsolete, without having ever used them. There are different options in this such as a cold site (which is fired up from nothing), a warm site (where the infrastructure can be used for testing and other non-production uses), and a hot site which can be used for load balancing production services. Other types of alternate sites are fully mirrored sites, mobile sites, and reciprocal sites .
BC/DR phases
Figure 3: Phases of business continuity and disaster recovery
Source:
The activation phase of the BC/DR plan relates during and immediately after the disaster. Parameters are set which have to be satisfied before the BC/DR plan is activated. The plan defines who is authorized to activate the plan, how to authorize the plan, and the precise steps to be taken initiate the BC/DR activities. These include initial response, notification, problem assessment, escalation process, disaster declaration, plan implementation, and determination of various disaster levels (minor, intermediate, and major). The BC/DR teams that are needed are crisis management team, damage assessment team, notification team, emergency response team, crisis communication team, and so on.
Recovery Tasks: The following recovery tasks are performed while waiting for resumption of business at the original site. Resources required are identified. These include computers, telecommunication equipment, internet links, office equipment and supplies, and contact lists. Access to the site is provided for employees and vendors as required. The alternative work site is activated and notified. The contact lists are shared with the different BCP teams. Depending on the arrangements with the vendor (as needed, prearranged pre-established), the material is ordered so that the original site can be readied while the hot site is being used for recovery. The latest copy of the virtual server is copied to the hot site if it is not already synchronized so that the mission critical operations can continue without disruption. The VLAN and firewall configuration information are copied as well so there is very little loss of time to get the alternative facility ready. The damage assessment of the original site is done and any salvageable material is salvaged and shipped to the alternative site if required. The BCP results are logged and reviewed.
Testing the BC/DR plan
Once the BC/DR plan is completed, it can be tested in four common methods; paper walkthrough, field exercise and functional exercise, and full interruption. For the paper walk-through, it is important to develop multiple realistic scenarios. Then evaluation criteria are developed so that the success can be measured and determined. A meeting is convened with all the BCP teams (CMT, ERT, and so on) participating and each of the BCP teams should be provided their parts of the BC/DR plans while the CMT is provided the complete plan. A flow chart can be created if the staff can use them. The participants are divided by team, checklists for key processes are used, notes are taken, and training needs are identified. At the end of the walk-through, a summary is developed and lessons learned are documented. Functional exercises test the functionality of the plan. Two teams participate in this and while one team responds to the scripted scenarios according to the BC/DR plan, the other team acts as if they are normal team members who are confused. In the case of field exercise, some scenarios are practiced. Realistic scenarios are used to practice and test the BC/DR plan. To test the full functionality, a full-scale interruption test is conducted where all the mission critical functions are interrupted and all components of the BC/DR plans. The alternate site is activated.
Conclusion
MDL, as a logistics company, servicing the military, needs a robust risk management plan, BCP, and DRP. The BC/DR plan that will be part of the risk management is prepared and tested. The situations keep changing so the BC/DR plan should be maintained by testing the components regularly or else when the time comes for executing the plan, it will fail. The entire staff has to be trained about the BC/DR plan, the contact lists are so stored that they are available to all in the case of a disaster. The backups are tested regularly to ensure that the data is not corrupted. All these tests will ensure that MDL will be able to provide uninterrupted service even in the case of a disaster.
References
Ingraham, C. (2015, January 21). Earthquakes, floods and volcanoes: the most disaster-prone places in America. Retrieved from washingtonpost.com: https://www.washingtonpost.com/news/wonk/wp/2015/01/21/earthquakes-floods-and-volcanoes-the-most-disaster-prone-places-in-america/
lexingtoninstitute. (2005). Implementing logistics transformation: a new model for the military supply chain. Arlington, VA: Lexington Institute.
Marek, Z. (2013, December 10). RPO, RTO, WRT, MTDWTH?! Retrieved from defaultreasoning.com: http://defaultreasoning.com/2013/12/10/rpo-rto-wrt-mtdwth/
Maxava. (2013, August 19). Military-strength disaster recovery. Retrieved from mcpressonline.com: http://www.mcpressonline.com/managed-services-/-saas/military-strength-disaster-recovery.html
Snedaker, S., & Rima, C. (2014). Business continuity and disaster recovery planning for IT professionals (2nd ed.). Waltham, MA: Syngress.
Swanson, M., Wohl, A., Pope, L., Grance, T., Hash, J., & Thomas, R. (2002). NIST SP 800-34: contingency planning guide for information technology systems. Washington, DC: U.S. Government Printing Office.
Yang, C.-L., Yuan, B. J., & Huang, C.-Y. (2015). Key determinant derivations for information technology disaster recovery site selection by the multi-criterion decision making method. Sustainability, 7, 6149-6188. doi:10.3390/su7056149