The ALMA Observatory is a challenging project in many ways. The hardware and software pieces were often designed
specifically for ALMA, based on overall scientific requirements. The observatory is still in its construction
phase, but already started Early Science observations with 16 antennas in September 2011, and has currently
(June 2012) 39 accepted antennas, with 1 or 2 new antennas delivered every month. The finished array will
integrate up to 66 antennas in 2014.
The on-line software is a critical part of the operations: it controls everything from the low level real-time
hardware and data processing up to the observations scheduler and data storage. Many pieces of the software are
eventually affected by a growing number of antennas, as more processes are integrated into the distributed system,
and more data flows to the Correlator and Database. Although some early scalability tests were performed in
a simulated environment, the system proved to be very dependent on real deployment conditions and several
unforeseen scalability issues have been found in the last year, starting with a critical number of about 15
antennas. Processes that grow with the number of antennas tend to quickly demand more powerful machines,
unless alternatives are implemented.
This paper describes the practical experience of dealing with (and hopefully preventing) blocking scalability
issues during the construction phase, while the expectant users push the system to its limits. This may also be
a very useful example for other upcoming radio-telescopes with a large number of receivers.