The goal of an IT department includes aligning the IT plans with business objectives, establishing measures of IT effectiveness, directing employee efforts toward IT objectives, improving the performance of technology, and achieving balanced results across stakeholder groups.

As an IT Director of an organization; there should be an on going assessment of the current state of your IT department from the perspective of reliability, availability and Disaster Recovery (DR) readiness. You must first understand the entire system, and evaluate how each component affects the overall systems reliability, recoverability, serviceability, performance, security, and manageability. Then the critical elements of the system would be investigated and evaluated in order of set priorities (from more critical to less critical), i.e. failure with any of these components, affects the entire system.

System reliability is the probability that a system will be operational over a specified time in a given environment for a given purpose without failing. That is, it’s a form of guarantee that a given system service(s) will be delivered as specified. While system availability is the probability that a system will be operational and able to deliver the requested services at a point in time (without unplanned outages). System reliability and availability are related but distinct in that availability takes into account the time that the system is out of service, whereas unreliable systems can have a high availability if there is a short restart time.

System reliability and availability also goes hand in hand with systems disaster recovery and business continuity plans.  For example, a system failure during working hours that might take several hours to rectify, calls for a disaster recovery action; however, keeping the system functional (available) while the error is being corrected is form of business continuity plan. Therefore, evaluate the reliability goals and assess the efficacy of the measurement effort in order to carry out the required corrective actions. The following analysis will identify whether the IT reliability goals are met.

Perform reliability assessment to: 
  • Identify assessment steps that are consistent with the reliability measurement plan that best fits an organization’s needs.
  • Check the consistency of the acceptance criteria and the sufficiency of the tests to demonstrate satisfactorily that the reliability goals/objectives have been achieved or whether final goals are achievable.
  • Identify the organization or personnel responsible for determining the final acceptance of the reliability requirements. Is there necessary support for performing and monitoring reliability requirements from across the internal organization?
  • Evaluate the impact of entire system design decisions on reliability i.e. its failures, and how to address these failures.
  • Look at past documentations in regards to the steps in assessing reliability, which can assist in developing future guidelines and standards.  
  • If available, evaluate the educational materials for training personnel the concepts, principles, and practices of IT reliability and reliability measures.  
“A well-developed reliability program plan should provide the roadmap to achieving high reliability by stressing proactive reliability tasks over reactive tasks.”