Disaster Recovery (DR) readiness:
DR readiness plan defines the most critical business processes and identifies the IT systems’ critical applications, servers, databases, and supporting infrastructure that are associated with those business processes. When there is a good DR plan place and enforced, all business critical systems including data should be recovered with little loss to overall business operation.
The biggest challenge is identifying which systems and applications are critical and determining how fast they have to be back online and running.
- What are the things and systems that you consider to be critical for running your business? What is the priority order of recovering these systems? (its important to rank the systems).
- Identify systems that are part of the DR plan strategy that might require protection within one of the other strategies (DR plan for fire, DR plan for electrical outage, etc).
- Knowing that the speed of recovery can dictate some of the backup decisions.
- What reports, documents, call lists, and operation guides that might be required for system recovery are at the off-site with the DR backup media. Many DR plans fail because a document or its component was overlooked.
- What are the considerations for choosing the current disaster recovery site?
- What are the various ways in which these critical systems and functions can fail?
- What do you do should such a situation arise?
- How long can you tolerate the disruption? What is in your control, and what is out of your control?
- Who will be responsible that such a situation does not arise, or if it does arise again, how do how do you tackle it with minimal financial and other resource loss? Define the capabilities that a DR team member must posses to ensure viable recovery. “Assess their strengths and weaknesses in terms of knowledge, skill and performance, in order to ensure the best team members are trained and ready and implement succession planning for all levels of personnel.”
- If such a situation does occur again, what are the alternatives you can use to ensure business continuity?
Following the preliminary investigation(s), you can deduce that some of the critical areas of IT include:
- Documentations of system installation, upgrade, frequency of outages, modification, or maintenance must be kept in paper form or in a disk media that can be used for developing future guidelines and standards.
- Business critical data should be backed up regularly, at least once a day.
- Housing for computer system especially hardware must meet data center requirements with sufficient controls like fire suppression and humidity control to keep the environment within the equipment specifications.
- Appropriate IT support vendor agreements must be in place and up-to-date to ensure convenient support in case of failure.
- Hardware redundancy, replication, and uninterrupted power supplies are all critical to system reliability, recoverability, and availability.
- Availability of IT personnel and organization employees to provide the necessary support for performing and monitoring reliability, recoverability, and availability requirements from across the internal organization. This involves training personnel the concepts, principles, and practices of IT reliability and reliability measures, and information asset protection.
More importantly, it should be noted that for the above to succeed, commitment from senior members of the organization is very vital, aligning IT objectives with organization goals, consistent acceptance that these goals/objectives are achievable, and sufficient allocation of budget to support reliability assessment plan, DR or recoverability assessment plan, security strategy, and availability assessment and enforcement plans
IT professionals often create procedures and enforce policies such as disaster recovery plan, business contingency plan, and crisis communication plan, are geared toward dealing with situations once they occur (a reactive approach). However, a more proactive approach in certain situations should be implemented to help prevent disasters. For example, you can have a standby power generator that automatically starts when there is power failure/outage, surge protectors to prevent power surges from destroying equipment, and putting more effort in teaching end users how to respond to error messages and what kind communication protocol to follow, etc.
References:
Arregoces, M. (2006). Data Center Fundamentals. Cisco Press
Buyya, R. (1999). High Performance Cluster Computing, Volume 1. Prentice Hall-PTR
Keyes, Jessica. (2005). Implementing the IT Balanced Scorecard. CRC Press.
Marcus, E. & Stern, H. (2003). Blueprints for High Availability. 2nd Ed.Wiley
Sommerville, Ian. (2000). Software Engineering 6th Ed. Addison Wesley
Thejendra, B.S. (2008). Disaster Recovery and Business Continuity. IT Governance Ltd
Top tips for disaster recovery readiness
http://www.itpeopleindia.com/20020408/management2.shtml
“Tactical Software Reliability Guidebook” Technology Transfer, SEMATECH
http://ismi.sematech.org/docubase/document/2967agen.pdf
|
Social: