18 July 2014 Unveiling ALMA software behavior using a decoupled log analysis framework
Author Affiliations +
ALMA Software is a complex distributed system installed in more than one hundred of computers, which interacts with more than one thousand of hardware device components. A normal observation follows a flow that interacts with almost that entire infrastructure in a coordinated way. The Software Operation Support team (SOFTOPS) comprises specialized engineers, which analyze the generated software log messages in daily basis to detect bugs, failures and predict eventual failures. These log message can reach up to 30 GB per day. We describe a decoupled and non-intrusive log analysis framework and implemented tools to identify well known problems, measure times taken by specific tasks and detect abnormal behaviors in the system in order to alert the engineers to take corrective actions. The main advantage of this approach among others is that the analysis itself does not interfere with the performance of the production system, allowing to run multiple analyzers in parallel. In this paper we'll describe the selected framework and show the result of some of the implemented tools.
© (2014) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Juan Pablo Gil, Juan Pablo Gil, Alexis Tejeda, Alexis Tejeda, Tzu-Chiang Shen, Tzu-Chiang Shen, Norman Saez, Norman Saez, "Unveiling ALMA software behavior using a decoupled log analysis framework", Proc. SPIE 9152, Software and Cyberinfrastructure for Astronomy III, 91521G (18 July 2014); doi: 10.1117/12.2055352; https://doi.org/10.1117/12.2055352


ACS from development to operations
Proceedings of SPIE (August 07 2016)
Cryogenics maintenance strategy
Proceedings of SPIE (September 12 2012)
Reactive scheduling for LINC-NIRVANA
Proceedings of SPIE (June 29 2006)
Towards a new Mercator Observatory Control System
Proceedings of SPIE (July 19 2010)

Back to Top