The Astrophysical Virtual Observatory Project (AVO: http://www.eso.org/projects/avo/) will conduct a research and demonstration program on the scientific requirements and technologies necessary to build a VO for European astronomy. The AVO has been jointly funded by the European Commission and six European organizations for a three year Phase-A work program valued at 5 million Euro. The Phase A program will focus its work in three areas -- science requirements, archive interoperability and GIRD/database technologies. The AVO project, the US NVO and UK ASTROGRID projects have been working closely together over the past nine months to reach consensus on essential technical directions and standards that will facilitate the possibility of an International Virtual Observatory. An International Virtual Observatory Alliance was formed in June 2002 among all currently funded and proposed VO projects. The IVOA has adopted a roadmap for IVO developments over the next three years that will feature coordinated demonstrations of VO capabilities on specific science programs, and international agreements on key interoperability standards and tools.
AstroGrid is the UK's contribution to the world-wide drive towards a Virtual Observatory (VO). I describe the project, its relation to other VO projects and other e-Science projects, and its current status. I then examine the concepts and science drivers behind the Virtual Observatory and the Grid, and the technical challenges which we face. The conception of the VO we arrive at is not one of a software monolith, but rather one of a framework which enables
data centres to provide competing and co-operating data services, and which enables software providers to offer a variety of compatible analysis and visualisation tools and user interfaces. The first priority of the VO projects worldwide is to provide the infrastructure which will enable such creative diversity. AstroGrid is however also a consortium of data centres, which will pool resources within this framework, and we expect to develop an early working implemementation of immediate use to astronomers.
The U.S. National Science Foundation is sponsoring the development of the infrastructure for the National Virtual Observatory via its Information Technology Research Program. This initiative combines expertise from astronomical observatories, data archive centers, university astronomy departments, and computer science and information technology groups at seventeen different organizations. This paper describes the nature of the project, our approach to managing and coordinating work across such a large collaboration, and the progress made thus far in the initial development activities (metadata standards, systems architecture, and science requirements definition).
For the Virtual Observatory to connect archives around the globe, some standardization is needed. It is not necessary to rework the internal structure of each archive to a common standard, but standards for interfaces to archives and for exchange of data are important.
We report on standardization work currently going on in the AVO and AstroGRID projects in the following areas:
- Exchange formats for tabular data;
- Semantic definitions for quantities in tabular data;
- Identification of user and authorization to use resources;
- Query interfaces to archives;
- Catalogues of data resources.
Discussion on standards is ongoing among all Virtual Observatory projects.
The Harvard-Smithsonian Center for Astrophysics (CfA) provides an ideal test-bed for the Virtual Observatory. Cfa has expertise covering virtually all branches of observational astronomy and of astrophysical research, as well as data and information management (the Chandra X-ray Center -- CXC, and the Astrophysical Data System -- ADS). At CfA, we have a leading role in the U.S. National Virtual Observatory (NVO) team for the development of the VO Data Model(s), in collaboration with the European Astrophysical Virtual Observatory (AVO). To this end, we are validating our design with a local prototype, which will federate the CfA data archives, starting with the X-ray (Chandra) and optical (Telescope Data Center -- TDC) archives. This paper describes our approach and progress.
The European Grid of Solar Observations (EGSO) is a project to develop a virtual observatory for the solar physics community. Like in all such projects, a vital component is a schema that adequately describes the data in the distributed data sets. Here, we discuss the schema in general terms, and present a draft example of a portion of a possible XML schema.
Building an automated classifier for high-energy sources provides an opportunity to prototype approaches to building the Virtual Observatory with a substantial immediate scientific return. The ClassX collaboration is combining existing data resources with trainable classifiers to build a tool that classifies lists of objects presented to it. In our first year the collaboration has concentrated on developing pipeline software that finds and combines information of interest and in exploring the issues that will be needed for successful classification.
ClassX must deal with many key VO issues: automating access to remote data resources, combining heterogeneous data and dealing with large data volumes. While the VO must attempt to deal with these problems in a generic way, the clear science goals of ClassX allow us to act as a pathfinder exploring particular approaches to addressing these issues.
The yourSky custom astronomical image mosaicking software has a Web portal architecture that allows access via ordinary desktop computers with low bandwidth network connections to high performance and highly customizable mosaicking software deployed in a high performance computing and communications environment. The emphasis is on custom access to image mosaics constructed from terabytes of raw image data stored in remote archives. In this context, custom access refers to new technology that enables on the fly mosaicking to meet user-specified criteria for region of the sky to be mosaicked, datasets to be used, resolution, coordinate system, projection, data type and image format. The yourSky server is a fully automated end-to-end system that handles all aspects of the mosaic construction. This includes management of mosaic requests, determining which input images are required to fulfill each request, management of a data cache for both input image plates and output mosaics, retrieval of input image plates from massive remote archives, image mosaic construction on a multiprocessor system, and making the result accessible to the user on the desktop. The URL for yourSky is http://yourSky.jpl.nasa.gov.
A Topic Map is a structured network of hyperlinks that points into an information pool. Topic Maps have an existence independent of the information pool and hence different Topic Maps can form different layers above the same information pool and provide us with different views of it. We explore the use of Topic Maps
with the Unified Column Descriptor (UCD) scheme developed in the frame of the ESO-CDS data mining project. UCD, with its multi-tier hierarchical structure, categorizes parameters reported in tables and catalogs. By using Topic Maps we show how columns from different catalogs with similar but not identical descriptions could be combined. A direct application for the Virtual
Observatory community is that of merging catalogs in order to generate customized views of data.
Hera is a new experiment at the HEASARC (High Energy Astrophysics Science Archive Research Center) at the NASA Goddard Space Flight Center to provide a complete data analysis environment over the Internet for archival researchers. This new facility complements the existing Browse database search facility that is available on the Web. With Hera, users can search the HEASARC data archives with a Web browser and save any selected data set to their Hera disk space area. This only takes a few seconds compared to the many minutes or hours that it could take to down load large data sets to the user's local machine. The user can then immediately log into one of
the available Hera server machines and begin analyzing the data without having to install any local software except for a very small Hera client application program that runs on the user's local machine. Hera is currently most useful for expert users who are already familiar with analyzing high energy data sets with the HEASARC software. In the future we intend to make Hera more useful for the novice scientific user by providing more on-line help features to guide the user through the data analysis process.
Since 2001, we have started to do some works on how to approach to virtual observatory in Yunnan Observatory. We have finished a prototype distributed controller of CCD camera, we have established a search engine for 2MASS database, and we have finished a prototype system of data archiving, browsing, searching for H-alpha image data of the sun, etc. Based on these efforts, we are trying to use up-to-date IT technologies to unite all these efforts, and integrate both astronomical database and astronomical instruments into a Virtual Observatory.
The NOAO Science Archive (NSA) is a step toward building a comprehensive scientific archive of the optical and infrared data holdings of the National Optical Astronomy Observatory. Earlier efforts included the NOAO Save the Bits archive (more properly a data store) with current raw data holdings from telescopes at both Kitt Peak National Observatory and Cerro Tololo Inter-American Observatory of more than 3 million images, totaling in excess of 20 terabytes. The NOAO Science Archive builds on the foundation provided by the NOAO Deep-Wide Field Survey (NDWFS) Archive that offers sophisticated analysis tools -- as well as the coherent and extensive NDWFS data set. NSA is an initiative of the NOAO Data Products Program aimed at identifying scientifically useful datasets from the large and growing NOAO holdings and making these data available to the astronomical community, while providing tools for data discovery, mining and exploration. The goals for the NSA are: to immediately create a scientifically useful archive of NOAO Survey data, to develop in-house expertise in the relevant technologies, to identify and document requirements for NOAO's future comprehensive archive by providing a design study, and to create a high level of visibility and utility for both the NOAO Archive and NOAO Surveys (for example, with web services available at http://archive.noao.edu). The archive and associated NOAO assets are expected to grow into a resource of the National Virtual Observatory.
AVO Work Area 2 consists of deployment and demonstration of an interoperability prototype. Access to archives of all the partners (ESO, ESA, AstroGrid, Terapix, Jodrell Bank) is implemented via the CDS data federation and integration tools: VizieR and Aladin. The prototype is available for science usage and more functionalities, based in particular on the usage of Uniform Content Descriptors (UCDs) for data mining, will be developed. Case by case discussion with data providers will help to establish a set of practical recommendations for interoperability. Science requirements and new technologies studied by the other AVO work Areas will also be tested. Discussions on standards are ongoing among all VO projects.
The Astrophysics Data System (ADS) is the search system of choice for
Astronomers world-wide. The searchable database contains over 2.6
million bibliographic records. In addition the ADS has over 2 million
scanned article pages from about 280,000 articles, dating back as far
as 1829. There are currently more than 10,000 regular users (more
than 10 queries/month). ADS users issue almost 1 million queries per
month and receive 40 million records and 1.2 million scanned article
pages per month. One important aspect of the ADS is the system of
links to other data providers. We have currently more than 3 million
links to other on-line resources. The ADS is accessed from almost 100
countries with a wide range of the number of queries per country. In
order to improve access from different parts of the world, we maintain 9 mirror sites of the ADS in Brazil, Chile, China, England, France, Germany, India, Japan, and Russia. Automatic procedures facilitate keeping these mirror sites up-to-date over the network. The ADS is funded by NASA Grant NCC5-189. The ADS is available at:
The National Virtual Observatory (NVO) will provide on-demand access to data collections, data fusion services and compute intensive applications. The paper describes the development of a framework that will support two key aspects of these objectives: a compute engine that will deliver custom image mosaics, and a "request management system," based on an e-business applications server, for job processing, including monitoring, failover and status reporting. We will develop this request management system to support a diverse range of astronomical requests, including services scaled to operate on the emerging computational grid infrastructure. Data requests will be made through existing portals to demonstrate the system: the NASA/IPAC Extragalactic Database (NED), the On-Line Archive Science Information Services (OASIS) at the NASA/IPAC Infrared Science Archive (IRSA); the Virtual Sky service at Caltech's Center for Advanced Computing Research (CACR), and the yourSky mosaic server at the Jet Propulsion Laboratory (JPL).
Science projects are data publishers. The scale and complexity of current and future science data changes the nature of the publication process. Publication is becoming a major project component. At a minimum, a project must preserve the ephemeral data it gathers. Derived data can be reconstructed from metadata, but metadata is ephemeral. Longer term, a project should expect some archive to preserve the data. We observe that published scientific data needs to be available forever -- this gives rise to the data pyramid of versions and to data inflation where the derived data volumes explode. As an example, this article describes the Sloan Digital Sky Survey (SDSS) strategies for data publication, data access, curation, and preservation.
The design of the link between data and its data description defines the flexibility and application. A static configuration refers to a situation where metadata are externally defined. Conversely, in dynamic cases the data descriptions are no longer frozen, but are explicitly formalized and stored at a certain hierarchy level. We used two levels in order to test dynamic design; catalogs as dynamic lists and images as dynamic items. We then considered three kinds of objects, always containing a static data part and eventually a dynamic data part. Using the object oriented database Objectivity, we measured the retrieval speed for several configurations. When selection criteria apply to the static part of items, the retrieval speed is independent of the data kind extracted (static or dynamic). However, when the selection criteria apply on dynamic parts, the speed is strongly decreased. This clearly shows the strength of static implementation of informations whenever it is possible in order to guarantee a fast data access. It also point out a serious limitation to data mining, where a priori knowledge is in general not available. Fortunately, a dynamic implementation at a level of lists, could resolve the problem. The most elegant way, in the VO context, would be the usage of VOTable as data access layer interface.
Current astronomical facilities on the WWW support anonymous access to public-domain resources with very limited workflows. To meet even current aspirations, the Virtual Observatory needs to operate extensive workflows that also include access to restricted resources.
AstroGrid (see http://www.astrogrid.org/), a UK eScience project with collaborating groups drawn from the major UK data archive centres, is currently creating the UK's virtual observatory (Lawrence, 2002, these proceedings). We present use cases from AstroGrid's survey of requirements that show a need for a pervasive infrastructure for identifying users and controlling access to facilities and data. We describe in outline AstroGrid's architecture for this infrastructure.
Web Services form a new, emerging paradigm to handle distributed access to resources over the Internet. There are platform independent standards (SOAP, WSDL), which make the developers' task considerably easier. This article discusses how web services could be used in the context of the Virtual Observatory. We envisage a multi-layer architecture, with interoperating services. A well-designed lower layer consisting of simple, standard services implemented by most data providers will go a long way towards establishing a modular architecture. More complex applications can be built upon this core layer. We present two prototype applications, the SdssCutout and the SkyQuery as examples of this layered architecture.
The era of extremely large, public databases in astronomy is upon us.
Such databases will open (are opening!) the field to new research and
new researchers. However it is important to be sure the resources are
available to properly archive groundbased astronomical data, and
include the necessary quality checks and calibrations. An NVO without
a proper archive will have limited usefulness. This also implies that
with limited resources, not all data can or should be archived. NASA
already has a very good handle on US space-based astronomical data.
Agencies and organizations that operate astronomical facilities,
particularly groundbased observatories, need to plan and budget for
these activities now. We should not underestimate the effort required
to produce high quality data products that will be useful for the
We have constructed an archive system for NRAO telescopes using mainly tools available in the Astronomical Information Processing System (AIPS++). Since the tools are available to anyone using AIPS++, this amounts to a generic archive capability for any telescope for which the AIPS++ data conversion program exists. The rich tool set available in AIPS++ has enabled very rapid development: our entire effort took no more than about 1 FTE-year. Additional capabilities were required to connect AIPS++ to the web. The system is now being deployed at the NRAO as a prototype archive system for the Very Large Array with deployment for the Green Bank Telescope and Very Long Baseline Array planned for 2003.
The mining of Virtual Observatories (VOs) is becoming a powerful new method for discovery in astronomy. Here we report on the development of SkyDOT (Sky Database for Objects in the Time domain), a new Virtual Observatory, which is dedicated to the study of sky variability. The site will confederate a number of massive variability surveys and enable exploration of the time domain in astronomy. We discuss the architecture of the database and the functionality of the user interface. An important aspect of SkyDOT is that it is continuously updated in near real time so that users can access new observations in a timely manner. The site will also utilize high level machine learning tools that will allow
sophisticated mining of the archive. Another key feature is the real time data stream provided by RAPTOR (RAPid Telescopes for Optical Response), a new sky monitoring experiment under construction at Los Alamos National Laboratory (LANL).
The Italian National "Galileo" Telescope (Telescopio Nazionale "Galileo" - TNG) is a 3.5m telescope located at La Palma, in the Canary islands, which has seen first light in 1998. Available TNG subsystems include four first-generation instruments, plus adaptive optics, meteo and seeing towers; the control and data handling systems are tightly coupled allowing a smooth data flow while preserving integrity. As a part of the data handling systems, the production of a local "Archive at the Telescope" (AaT) is included, and the production of database tables and hard media for the TNG Long-Term Archive (LTA) is supported. The implementation of a LTA prototype has been recently terminated, and the implementation of its operational version is being planned by the Italian National Institute for Astrophysics (INAF).
A description of the AaT and prototype LTA systems are given, including their data handling/archiving and data retrieval capabilities. A discussion of system features and lessons learned is also included, with particular reference to the issues of completeness and data quality. These issues are of particular importance in the perspective of the preparation of a national facility for the archives of data from ground-based telescopes, and its possible inclusion as a data provider in the Virtual Observatory framework.