Proc. SPIE. 10194, Micro- and Nanotechnology Sensors, Systems, and Applications IX
KEYWORDS: Analytics, Cancer, Data modeling, Imaging systems, Medical research, Buildings, Data acquisition, Data archive systems, Data processing, Biological research, Analytical research, Data centers, Scientific research, Information science, Computer security, Health informatics
We describe here the Early Detection Research Network (EDRN) for Cancer’s knowledge environment. It is an open source platform built by NASA’s Jet Propulsion Laboratory with contributions from the California Institute of Technology, and Giesel School of Medicine at Dartmouth. It uses tools like Apache OODT, Plone, and Solr, and borrows heavily from JPL’s Planetary Data System’s ontological infrastructure. It has accumulated data on hundreds of thousands of biospecemens and serves over 1300 registered users across the National Cancer Institute (NCI). The scalable computing infrastructure is built such that we are being able to reach out to other agencies, provide homogeneous access, and provide seamless analytics support and bioinformatics tools through community engagement.
The time domain has been identied as one of the most important areas of astronomical research for the next
decade. The Virtual Observatory is in the vanguard with dedicated tools and services that enable and facilitate
the discovery, dissemination and analysis of time domain data. These range in scope from rapid notications of
time-critical astronomical transients to annotating long-term variables with the latest modelling results. In this
paper, we will review the prior art in these areas and focus on the capabilities that the VAO is bringing to bear
in support of time domain science. In particular, we will focus on the issues involved with the heterogeneous
collections of (ancilllary) data associated with astronomical transients, and the time series characterization and
classication tools required by the next generation of sky surveys, such as LSST and SKA.
Studies of the cosmic gamma-ray bursts (GRBs) and their host galaxies are now starting to provide interesting or even unique new insights in observational cosmology. Observed GRB host galaxies have a median magnitude R~25 mag, and show a range of luminosities, morphologies, and star formation rates, with a median redshift z~1. They represent a new way of identifying a population of star-forming galaxies at cosmological redshifts, which is mostly independent of the traditional selection methods. They seem to be broadly similar to the normal field galaxy populations at comparable redshifts and magnitudes, and indicate at most a mild luminosity evolution over the redshift range they probe. Studies of GRB optical afterglows seen in absorption provide a powerful new probe of the ISM in dense, central regions of their host galaxies, which is complementary to the traditional studies using QSO absorption line systems. Some GRB hosts are heavily obscured, and provide a new way to select a population of cosmological sub-mm sources. A census of detected optical tranistents may provide an important new way to constrain the total obscured fraction of star formation over the history of the universe. Finally, detection of GRB afterglows at high redshifts (z>6) may provide a unique way to probe the primordial star formation, massive IMF, early IGM, and chemical enrichment at the end of the cosmic reionization era.
A Topic Map is a structured network of hyperlinks that points into an information pool. Topic Maps have an existence independent of the information pool and hence different Topic Maps can form different layers above the same information pool and provide us with different views of it. We explore the use of Topic Maps
with the Unified Column Descriptor (UCD) scheme developed in the frame of the ESO-CDS data mining project. UCD, with its multi-tier hierarchical structure, categorizes parameters reported in tables and catalogs. By using Topic Maps we show how columns from different catalogs with similar but not identical descriptions could be combined. A direct application for the Virtual
Observatory community is that of merging catalogs in order to generate customized views of data.
The yourSky custom astronomical image mosaicking software has a Web portal architecture that allows access via ordinary desktop computers with low bandwidth network connections to high performance and highly customizable mosaicking software deployed in a high performance computing and communications environment. The emphasis is on custom access to image mosaics constructed from terabytes of raw image data stored in remote archives. In this context, custom access refers to new technology that enables on the fly mosaicking to meet user-specified criteria for region of the sky to be mosaicked, datasets to be used, resolution, coordinate system, projection, data type and image format. The yourSky server is a fully automated end-to-end system that handles all aspects of the mosaic construction. This includes management of mosaic requests, determining which input images are required to fulfill each request, management of a data cache for both input image plates and output mosaics, retrieval of input image plates from massive remote archives, image mosaic construction on a multiprocessor system, and making the result accessible to the user on the desktop. The URL for yourSky is http://yourSky.jpl.nasa.gov.
One major component of the VO will be catalogs measuring gigabytes and terrabytes if not more. Some mechanism like XML will be used for structuring the information. However, such mechanisms are not good for information retrieval on their own. For retrieval we use queries. Topic Maps that have started becoming popular recently are excellent for segregating information that results from a query. A Topic Map is a structured network of hyperlinks above an information pool. Different Topic Maps can form different layers above the same information pool and provide us with different views of it. This facilitates in being able to ask exact questions, aiding us in looking for gold needles in the proverbial haystack. Here we will discuss the specifics of what Topic Maps are and how they can be implemented within the VO framework.
Like every other field of intellectual endeavor, astronomy is being revolutionized by the advances in information technology. There is an ongoing exponential growth in the volume, quality, and complexity of astronomical data sets, mainly through large digital sky surveys and archives. The Virtual Observatory (VO) concept represents a scientific and technological framework needed to cope with this data flood. Systematic exploration of the observable parameter spaces, covered by large digital sky surveys spanning a range of wavelengths, will be one of the primary modes of research with a VO. This is where the truly new discoveries will be made, and new insights be gained about the already known astronomical objects and phenomena. We review some of the methodological challenges posed by the analysis of large and complex data sets expected in the VO-based research. The challenges are driven both by the size and the complexity of the data sets (billions of data vectors in parameter spaces of tens or hundreds of dimensions), by the heterogeneity of the data and measurement errors, including differences in basic survey parameters for the federated data sets (e.g., in the positional accuracy and resolution, wavelength coverage, time baseline, etc), various selection effects, as well as the intrinsic clustering properties (functional form, topology) of the data distributions in the parameter spaces of observed attributes. Answering these challenges will require substantial collaborative efforts and partnerships between astronomers, computer scientists, and statisticians.