A distributed system refers to the computer system consisting of a collection of computers that share various characteristics. The use of common network is the first feature shared by computers in a distributed system. The computers may also share software that helps computers in the coordination of their activities over large distances. The computers also share the system resources in a distributed system which are often found the integrated computing capacity.
Fault tolerance is a crucial factor to be taken into keen consideration so as to prevent data loss and catastrophic situations in the distributed system. This refers to the capability of the system to remain in operation even in the presence of undesirable and challenging conditions both in the internal and external environment of operation of the business. Distributed systems also should be available highly in order to facilitate restoration of operations as well as permitting provision of the services at times that the systems fail. The systems also should be recoverable in order to facilitate retrieval of information in case of the failure of the systems.
There are many types of failures in the operation off the distributed systems. Hardware failure refers to the type failure of a single or particular component in a distributed system. The second form of failure in the distributed systems is a form of network failure where it occurs in a particular link, in the distributed system (Butts, 2013, p.12). An application failure also arises in the distributed system where an application tends to operate inefficiently or may stop operating in a distributed system. The fourth failure in a distributed system may arise in the form of synchronization failure. This takes place as a result of poor or incorrect synchronization of data or information across the various components of a distributed system.
The types of failures that may occur in the centralized systems includes the software failure and synchronization failure due to the sharing of information. The linking of information in a centralized system may fail due to poor or incorrect coordination and synchronization of data or information in the system. The software failure may also result in failure of the wholly centralized system if the software fails to coordinate various tasks and operations in a centralized system. The network failure and synchronization failure are, therefore, most common failures in the centralized systems.
A hardware failure may occur due to the malfunction of various or a particular hardware component in a distributed system. This failure often results from the electromechanical defects in the system leading to electronic circuits in the system. The failure of components like mice, keyboards or monitors to operate may lead to the inefficient or inappropriate operation of the distributed system. The failure of the components may tamper the whole process of operation leading to irregular flow of information of processes in a distributed system. Hardware failures may result from the power supply failures leading to the interruption of the whole system. The hardware failure leads to complete termination of the system since the hardware helps in facilitating the whole process of communication. Hardware failures can be isolated through normal trouble shooting (Tessone, 2013, p.24). The hardware failures can also be isolated by exhausting all the software failures in order to determine the hardware issues or failures in the system. Monitoring the system helps in isolation of the hardware failures through identification of fault tolerant processes in the system. Monitoring of the systems helps in detection of failure through determination of various levels of frequency in the system. The failure of hardware components in the distribution systems can be fixed by replacing the components causing the electromechanical problems or failures in the system. The use of feature flags is helpful in fixing the hardware failures in the system so as to facilitate the flow of activities in the systems. Dual routers, dual power supplies, as well as data backups, are very effective in solving hardware failures in the distribution systems. The use of Dual disk controllers also plays an important role in solving the hardware issues or problems in the systems.
Application failure is a form of failure in the distributed systems due to the problems of software functionality. The failure of a particular software like the operating systems may lead to challenges and difficulties in the operation of the system. This leads to problems or difficulties in the flow of commands and instructions to various sections of the system. Software failures are an issue of concern in the distributed systems because it may terminate many of the operations and processes leading to a lot of inconveniences. Application failures can be detected through troubleshooting in order to identify the applications that are not operating in an efficient manner. Trouble shooting helps in identification of the failure of various applications in the system and their specific problems or issues associated with such applications. The application failures can be resolved through installation of softwares that help prevent the failure or problems in various software applications, in the system. The compatible softwares should be installed in order to help in remedying such complications or issue sin application failure.
References
Butts, J., & Shenoi, S. (2013). Critical Infrastructure Protection VII: 7th IFIP WG 11.10 International Conference, ICCIP 2013, Washington, DC, USA, March 18-20, 2013, Revised Selected Papers. Berlin, Heidelberg: Springer Berlin Heidelberg.
DISC 2012, A. (2012). Distributed computing: 26th International Symposium, DISC 2012, Salvador, Brazil, October 16-18, 2012: proceedings. Heidelberg: Springer.
Tessone, C. J., Garas, A., Guerra, B., & Schweitzer, F. (2013). How big is too big? Critical shocks for systemic failure cascades. Journal of Statistical Physics. doi:10.1007/s10955-013-0723-y