18 July 2014 Unveiling ALMA software behavior using a decoupled log analysis framework
Author Affiliations +
ALMA Software is a complex distributed system installed in more than one hundred of computers, which interacts with more than one thousand of hardware device components. A normal observation follows a flow that interacts with almost that entire infrastructure in a coordinated way. The Software Operation Support team (SOFTOPS) comprises specialized engineers, which analyze the generated software log messages in daily basis to detect bugs, failures and predict eventual failures. These log message can reach up to 30 GB per day. We describe a decoupled and non-intrusive log analysis framework and implemented tools to identify well known problems, measure times taken by specific tasks and detect abnormal behaviors in the system in order to alert the engineers to take corrective actions. The main advantage of this approach among others is that the analysis itself does not interfere with the performance of the production system, allowing to run multiple analyzers in parallel. In this paper we'll describe the selected framework and show the result of some of the implemented tools.
© (2014) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Juan Pablo Gil, Juan Pablo Gil, Alexis Tejeda, Alexis Tejeda, Tzu-Chiang Shen, Tzu-Chiang Shen, Norman Saez, Norman Saez, } "Unveiling ALMA software behavior using a decoupled log analysis framework", Proc. SPIE 9152, Software and Cyberinfrastructure for Astronomy III, 91521G (18 July 2014); doi: 10.1117/12.2055352; https://doi.org/10.1117/12.2055352

Back to Top