8 August 2016 Trident: scalable compute archives: workflows, visualization, and analysis
Author Affiliations +
Abstract
The Astronomy scientific community has embraced Big Data processing challenges, e.g. associated with time-domain astronomy, and come up with a variety of novel and efficient data processing solutions. However, data processing is only a small part of the Big Data challenge. Efficient knowledge discovery and scientific advancement in the Big Data era requires new and equally efficient tools: modern user interfaces for searching, identifying and viewing data online without direct access to the data; tracking of data provenance; searching, plotting and analyzing metadata; interactive visual analysis, especially of (time-dependent) image data; and the ability to execute pipelines on supercomputing and cloud resources with minimal user overhead or expertise even to novice computing users. The Trident project at Indiana University offers a comprehensive web and cloud-based microservice software suite that enables the straight forward deployment of highly customized Scalable Compute Archive (SCA) systems; including extensive visualization and analysis capabilities, with minimal amount of additional coding. Trident seamlessly scales up or down in terms of data volumes and computational needs, and allows feature sets within a web user interface to be quickly adapted to meet individual project requirements. Domain experts only have to provide code or business logic about handling/visualizing their domain's data products and about executing their pipelines and application work flows. Trident's microservices architecture is made up of light-weight services connected by a REST API and/or a message bus; a web interface elements are built using NodeJS, AngularJS, and HighCharts JavaScript libraries among others while backend services are written in NodeJS, PHP/Zend, and Python. The software suite currently consists of (1) a simple work flow execution framework to integrate, deploy, and execute pipelines and applications (2) a progress service to monitor work flows and sub-work flows (3) ImageX, an interactive image visualization service (3) an authentication and authorization service (4) a data service that handles archival, staging and serving of data products, and (5) a notification service that serves statistical collation and reporting needs of various projects. Several other additional components are under development. Trident is an umbrella project, that evolved from the One Degree Imager, Portal, Pipeline, and Archive (ODI-PPA) project which we had initially refactored toward (1) a powerful analysis/visualization portal for Globular Cluster System (GCS) survey data collected by IU researchers, 2) a data search and download portal for the IU Electron Microscopy Center's data (EMC-SCA), 3) a prototype archive for the Ludwig Maximilian University's Wide Field Imager. The new Trident software has been used to deploy (1) a metadata quality control and analytics portal (RADY-SCA) for DICOM formatted medical imaging data produced by the IU Radiology Center, 2) Several prototype work flows for different domains, 3) a snapshot tool within IU's Karst Desktop environment, 4) a limited component-set to serve GIS data within the IU GIS web portal. Trident SCA systems leverage supercomputing and storage resources at Indiana University but can be configured to make use of any cloud/grid resource, from local workstations/servers to (inter)national supercomputing facilities such as XSEDE.
© (2016) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Arvind Gopu, Soichi Hayashi, Michael D. Young, Ralf Kotulla, Robert Henschel, Daniel Harbeck, "Trident: scalable compute archives: workflows, visualization, and analysis", Proc. SPIE 9913, Software and Cyberinfrastructure for Astronomy IV, 99131H (8 August 2016); doi: 10.1117/12.2233111; https://doi.org/10.1117/12.2233111
PROCEEDINGS
12 PAGES


SHARE
RELATED CONTENT

ODI Portal, Pipeline, and Archive (ODI PPA) ...
Proceedings of SPIE (July 18 2014)
AstroGrid: powering the Virtual Observatory
Proceedings of SPIE (December 16 2002)
The DIRP framework Flexible HPC based post processing of...
Proceedings of SPIE (September 24 2012)
Visual browsing of remote and distributed data
Proceedings of SPIE (March 11 2005)
Sarnoff data analysis and visualization project
Proceedings of SPIE (August 01 1990)

Back to Top