Oliner, Adam, and Jon Stearley. “What Supercomputers Say: A Study of Five System Logs.” 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07), Edinburgh, 25-28 June 2007. Washington, DC: IEEE Computer Society, 2007. 575-584. Print.
Computer logs are the issue every system administrator faces in his work. However, they were studied not well enough, while the knowing and proper analysis of logs and their content allowed improving control and prediction of possible system failures. In their paper, Oliner and Jon aim to study and to analyze logs and alerts generated by five of the largest supercomputers, Liberty (512), Spirit (1028), Thunderbird (9024), Red Storm (10880), and Blue Gene/L (131072) and to give recommendations regarding further researches in the area. The study of Oliner and Jon is the first to consider raw logs from several supercomputers concentrating on the real behavior of the system, as the previous works lacked data about the behavior of large computing systems. At first, the authors collect logs from the five supercomputers, then identify alerts, and finally, suggest filter algorithm, filter the alerts, and provide the analysis of the results. During their work, they analyze more about 112 GB of information, and it makes their study the first largest study in this area. In the end, Oliner and Jon give recommendations regarding solutions of the next problems: detection of faults, prediction of failures, attribution of root causes, and quantification of RAS.
The study of Oliner and Jon is the first massive-scale study in the area of logs and alerts of the computing systems. Its major flaws include the weaknesses of the mechanism and threshold of the suggested filter algorithm and refer to the lack of understanding of alert mechanisms. Generally, it is a strong research that forms the basis and provides directions for further studies in the area. The authors clearly state their purpose, describe the solution, write about the weaknesses of their study, and give detailed recommendations regarding future researches and their possible solutions.
Works Cited
Oliner, Adam, and Jon Stearley. “What Supercomputers Say: A Study of Five System Logs.” 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07), Edinburgh, 25-28 June 2007. Washington, DC: IEEE Computer Society, 2007. 575-584. Print.