The Large Synoptic Survey Telescope (LSST) is an 8.4m optical survey telescope being constructed on Cerro Pach´on in Chile. The data management system being developed must be able to process the nightly alert data, 20,000 expected transient alerts per minute, in near real time, and construct annual data releases at the petabyte scale. The development team consists of more than 90 people working in six different sites across the US developing an integrated set of software to realize the LSST science goals. In this paper we discuss our agile software development methodology and our API and developer decision making process. We also discuss the software tools that we use for continuous integration and deployment.
Agile methodologies are current best practice in software development. They are favored for, among other reasons, preventing premature optimization by taking a somewhat short-term focus, and allowing frequent replans/reprioritizations of upcoming development work based on recent results and current backlog. At the same time, funding agencies prescribe earned value management accounting for large projects which, these days, inevitably include substantial software components. Earned Value approaches emphasize a more comprehensive and typically longer-range plan, and tend to characterize frequent replans and reprioritizations as indicative of problems. Here we describe the planning, execution and reporting framework used by the LSST Data Management team, that navigates these opposite tensions.
The Large Synoptic Survey Telescope (LSST) program is jointly funded by the NSF, the DOE, and private institutions
and donors. From an NSF funding standpoint, the LSST is a Major Research Equipment and Facilities (MREFC)
project. The NSF funding process requires proposals and D&D reviews to include activity-based budgets and schedules;
documented basis of estimates; risk-based contingency analysis; cost escalation and categorization.
"Out-of-the box," the commercial tool Primavera P6 contains approximately 90% of the planning and estimating
capability needed to satisfy R&D phase requirements, and it is customizable/configurable for remainder with relatively
little effort. We describe the customization/configuration and use of Primavera for the LSST Project Management
Control System (PMCS), assess our experience to date, and describe future directions.
Examples in this paper are drawn from the LSST Data Management System (DMS), which is one of three main
subsystems of the LSST and is funded by the NSF. By astronomy standards the LSST DMS is a large data management
project, processing and archiving over 70 petabyes of image data, producing over 20 petabytes of catalogs annually, and
generating 2 million transient alerts per night. Over the 6-year construction and commissioning phase, the DM project is
estimated to require 600,000 hours of engineering effort. In total, the DMS cost is approximately 60% hardware/system
software and 40% labor.
The Large Synoptic Survey Telescope (LSST) project is a proposed large-aperture, wide-field, ground-based telescope
that will survey half the sky every few nights in six optical bands. LSST will produce a data set suitable for answering a
wide range of pressing questions in astrophysics, cosmology, and fundamental physics. The 8.4-meter telescope will be
located in the Andes mountains near La Serena, Chile. The 3.2 Gpixel camera will take 6.4 GB images every 15
seconds, resulting in 15 TB of new raw image data per night. An estimated 2 million transient alerts per night will be
generated within 60 seconds of when the camera’s shutter closes. Processing such a large volume of data, converting the
raw images into a faithful representation of the universe, automated data quality assessment, automated discovery of
moving or transient sources, and archiving the results in useful form for a broad community of users is a major
challenge. We present an overview of the planned computing infrastructure for LSST. The cyberinfrastructure required
to support the movement, storing, processing, and serving of hundreds of petabytes of image and database data is
described. We also review the sizing model that was developed to estimate the hardware requirements to support this
environment beginning during project construction and continuing throughout the 10 years of operations.
The time domain has been identied as one of the most important areas of astronomical research for the next
decade. The Virtual Observatory is in the vanguard with dedicated tools and services that enable and facilitate
the discovery, dissemination and analysis of time domain data. These range in scope from rapid notications of
time-critical astronomical transients to annotating long-term variables with the latest modelling results. In this
paper, we will review the prior art in these areas and focus on the capabilities that the VAO is bringing to bear
in support of time domain science. In particular, we will focus on the issues involved with the heterogeneous
collections of (ancilllary) data associated with astronomical transients, and the time series characterization and
classication tools required by the next generation of sky surveys, such as LSST and SKA.
The Large Synoptic Survey Telescope (LSST) project has evolved from just a few staff members in 2003 to about 100 in
2010; the affiliation of four founding institutions has grown to 32 universities, government laboratories, and industry.
The public private collaboration aims to complete the estimated $450 M observatory in the 2017 timeframe. During the
design phase of the project from 2003 to the present the management structure has been remarkably stable. At the same
time, the funding levels, staffing levels and scientific community participation have grown dramatically. The LSSTC
has introduced project controls and tools required to manage the LSST's complex funding model, technical structure and
distributed work force. Project controls have been configured to comply with the requirements of federal funding
agencies. Some of these tools for risk management, configuration control and resource-loaded schedule have been
effective and others have not. Technical tasks associated with building the LSST are distributed into three subsystems:
Telescope & Site, Camera, and Data Management. Each sub-system has its own experienced Project Manager and
System Scientist. Delegation of authority is enabling and effective; it encourages a strong sense of ownership within the
project. At the project level, subsystem management follows the principle that there is one Board of Directors, Director,
and Project Manager who have overall authority.
The astronomical time domain is entering an era of unprecedented growth. LSST will join current and future surveys at
diverse wavelengths in exploring variable and transient celestial phenomena characterizing astrophysical domains from
the solar system to the edge of the observable universe. Adding to the large but relatively well-defined load of a project
of the scale of the Large Synoptic Survey Telescope will be many challenging issues of handling the dynamic empirical
interplay between LSST and contingent follow-up facilities worldwide. We discuss concerns unique to this telescope,
while exploring consequences common to emerging observational time domain paradigms.
The Data Management system for the LSST will have to perform near-real-time calibration and analysis of acquired
images, particularly for transient detection and alert generation; annual processing of the entire dataset for precision
calibration, object detection and characterization, and catalog generation; and support of user data access and analysis.
Images will be acquired at roughly a 17-second cadence, with alerts generated within one minute. The ten-year survey
will result in tens of petabytes of image and catalog data and will require ~250 teraflops of processing to reduce.
The LSST project is carrying out a series of Data Challenges (DC) to refine the design, evaluate the scientific and
computational performance of candidate algorithms, and address the challenging scaling issues that the LSST dataset
will present. This paper discusses the progress of the DCs to date and plans for future DCs.
Algorithm development must address dual requirements for the efficient use of computational resources and the accurate,
reliable processing of the deep and broad survey data. The DCs incorporate both existing astronomical images and
image data resulting from detailed photon-level simulations. The data is used to ensure that the system can scale to the
LSST field of view and 3.2 gigapixel camera scale and meet the scientific data quality requirements. Future DCs, carried
out in conjunction with the LSST Science Collaborations, are planned to deliver data products verified by computeraided
analysis and actual applications as suitable for high-quality science.
The LSST Data Management System is built on an open source software framework that has middleware and
application layers. The middleware layer provides capabilities to construct, configure, and manage pipelines on
clusters of processing nodes, and to manage the data the pipelines consume and produce. It is not in any way specific
to astronomical applications. The complementary application layer provides the building blocks for constructing
pipelines that process astronomical data, both in image and catalog forms. The application layer does not directly
depend upon the LSST middleware, and can readily be used with other middleware implementations. Both layers
have object oriented designs that make the creation of more specialized capabilities relatively easy through class
This paper outlines the structure of the LSST application framework and explores its usefulness for constructing
pipelines outside of the LSST context, two examples of which are discussed. The classes that the framework provides
are related within a domain model that is applicable to any astronomical pipeline that processes imaging data.
Specifically modeled are mosaic imaging sensors; the images from these sensors and the transformations that result
as they are processed from raw sensor readouts to final calibrated science products; and the wide variety of catalogs
that are produced by detecting and measuring astronomical objects in a stream of such images. The classes are
implemented in C++ with Python bindings provided so that pipelines can be constructed in any desired mixture of
C++ and Python.
Large ground-based and space-based telescopes are expected to make exciting discoveries in the upcoming decade.
These large projects start their construction phase many years before first-light and continue to operate for many years
after first-light and usually span multiple countries. The file-storage cyberinfrastructure ("file-storage CI") of these largescale
projects has to evolve over several years from a conceptual prototype to a highly flexible data distribution network.
During this long period the file-storage CI has to transition into multiple stages, starting with a conceptual prototype
before first-light, to a large-scale distributed network in production, and finally into a persistent archive once the project
is decommissioned. While the project makes these transitions, the file-storage CI has to incorporate several requirements
including but not limited to: Technology Evolution, due to changes in Cyberinfrastructure (CI) software or hardware
during the lifetime of the project; International Partnerships that are updated during the various phases of the project;
and Data Lifecycle that exists in the project. The file-storage and management software's architecture has to be designed
with significant consideration of these requirements for these large projects. In this paper, we provide the generic
requirements, for file-storage and management cyberinfrastructure in a large project similar to LSST before first-light.
The LSST Data Management System (DMS) processes the incoming stream of images that the camera system generates
to produce transient alerts and to archive the raw images, periodically creates new calibration data products that other
processing functions will use, creates and archives an annual Data Release (a static self-consistent collection of data
products generated from all survey data taken from the date of survey initiation to the cutoff date for the Data Release),
and makes all LSST data available through an interface that uses community-based standards and facilitates user data
analysis and production of user-defined data products with supercomputing-scale resources.
This paper discusses DMS distributed processing and data, and DMS architecture and design, with an emphasis on the
particular technical challenges that must be met. The DMS publishes transient alerts in community-standard formats (e.g.
VOEvent) within 60 seconds of detection. The DMS processes and archives over 50 petabytes of exposures (over the 10-
year survey). Data Releases, include catalogs of tens of trillions of detected sources and tens of billions of astronomical
objects, 2000-deep co-added exposures, and calibration products accurate to standards not achieved in wide-field survey
instruments to date. These Data Releases grow in size to tens of petabytes over the survey period. The expected data
access patterns drive the design of the database and data access services. Finally, the DMS permits interactive analysis
and provides nightly summary statistics describing DMS output quality and performance.
The Large Synoptic Survey Telescope (LSST) is an 8.4m (6.5m effective), wide-field (9.6 degree2), ground-based
telescope with a 3.2 GPixel camera. It will survey over 20,000 degree2 with 1,000 re-visits over 10 years in six visible
bands, and is scheduled to begin full scientific operations in 2016. The Data Management System will acquire and
process the images, issue transient alerts, and catalog the world's largest database of optical astronomical data. Every 24
hours, 15 terabytes of raw data will be transferred via redundant 10 Gbps fiber optics down from the mountain summit at
Cerro Pachon, Chile to the Base Facility in La Serena for transient alert processing. Simultaneously, the data will be
transferred at 2.5Gbps over fiber optics to the Archive Center in Champaign, Illinois for archiving and further scientific
processing and creation of scientific data catalogs. Finally, the Archive Center will distribute the processed data and
catalogs at 10Gbps to a number Data Access Centers for scientific ,educational, and public access. Redundant storage
and network bandwidth is built into the design of the system. The current networking acquistiion strategy involves
leveraging existing dark fiber to handle within Chile, Chile - U.S. and within U.S. links. There are a significant number
of carriers and networks involved and coordinating the acquisition, deployment, and operations of this capability.
Advanced protocols are being investigated during our Research and Development phase to address anticipated
challenges in effective utilization. We describe the data communications requirements, architecture, and acquisition
strategy in this paper.
The Large Synoptic Survey Telescope (LSST) is planned to start construction in early 2009 and achieve first light in late 2012. The LSST Data Management System (DMS) has the responsibility to:
1) process the stream of raw images (15 TB/night) generated during observing to create and archive the nightly data products;
2) reprocess archived data products to incorporate pipeline improvements and generate longer-term data products;
3) provide a public interface that makes available all generated data products.
The DMS must perform these duties throughout the multi-decade lifetime of the survey and its data products. It is given that computing hardware undergoes generational changes every 3 to 5 years, software engineering paradigms shift every decade, and astronomy data reduction and analysis algorithms are in constant evolution. Thus, if the useful life of the LSST Data Products is even 2 decades, the raw data will be completely re-processed at least 20 times with improved
algorithms, the computing on which this is executed will be completely changed at least 4 times, and the software engineering paradigm and software architecture will completely change at least once. Managing this evolution in the DMS will require strategies in all areas of LSST Data Management, including:
1) a layered system architecture;
2) stable interfaces preserving backward compatibility;
3) plug-and-play components for pipeline construction;
4) extendable data and metadata types for catalog construction;
5) open interfaces for resource registration and access;
6) provenance and preservation mechanisms.
This paper describes how we plan to employ these strategies and the expected benefits.
The project for the proposed Large Synoptic Survey Telescope (LSST) performed more than two years of data
collection, site evaluation, and analysis to support the selection of its prime site. LSST assessment was based on
using an existing site with existing infrastructure and historical performance information. A large and diverse set of
comparative information was compiled for potential sites using results from other site campaigns, measurements
from existing large telescopes, new astro-climate measurements, logistical and feasibility information, and from
existing satellite and climate databases. Several analyses were performed on these data including the assessment of
survey performance using the LSST operation simulator. An independent site selection committee of experts
provided recommendations to the Project leading to three finalist sites, one in Mexico, and two in northern Chile.
The finalist sites were assessed thoroughly with additional data collection from all-sky cameras and site proposals.
Cerro Pachon in Chile was selected to be the site for LSST after a difficult decision between the high quality final
candidates. This paper describes the data, analysis and approach used to support the site evaluation.
The 8.4m Large Synoptic Survey Telescope (LSST) is a wide-field telescope facility that will add a qualitatively new capability in astronomy. For the first time, the LSST will provide time-lapse digital imaging of faint astronomical objects across the entire sky. The LSST has been identified as a national scientific priority by diverse national panels, including multiple National Academy of Sciences committees. This judgment is based upon the LSST's ability to address some of the most pressing open questions in astronomy and fundamental physics, while driving advances in data-intensive science and computing. The LSST will provide unprecedented 3-dimensional maps of the mass distribution in the Universe, in addition to the traditional images of luminous stars and galaxies. These mass maps can be used to better understand the nature of the newly discovered and utterly mysterious Dark Energy that is driving the accelerating expansion of the Universe. The LSST will also provide a comprehensive census of our solar system, including potentially hazardous asteroids as small as 100 meters in size. The LSST facility consists of three major subsystems: 1) the telescope, 2) the camera and 3) the data processing system. The baseline design for the LSST telescope is a 8.4m 3-mirror design with a 3.5 degree field of view resulting in an A-Omega product (etendue) of 302deg<sup>2</sup>m<sup>2</sup>. The camera consists of 3-element transmisive corrector producing a 64cm diameter flat focal plane. This focal plane will be populated with roughly 3 billion 10μm pixels. The data processing system will include pipelines to monitor and assess the data quality, detect and classify transient events, and establish a large searchable object database. We report on the status of the designs for these three major LSST subsystems along with the overall project structure and management.
Semi-active systems are becoming increasingly attractive for structural control applications because they offer some of the best features of both the passive and active systems. This paper examines one such system in which a passive tuned liquid column damper is converted into a variable damping semi-active system. Different semi-active algorithms which are based on the clipped-optimal strategy and fuzzy control theory are used to simulate such a system. The main objective of this paper is to show the applicability of such a system and to discuss the semi-active algorithms needed to achieve performance that is comparable to active systems.