4 May 2010 The impact of the data archiving file format on the sharing of scientific data for use in popular computational environments
Author Affiliations +
Abstract
The U.S. Army Research Laboratory (ARL) conducted an initial study on the performance of XML and HDF5 in three popular computational software environments, MATLAB, Octave, and Python, all of which use high-level scripting languages and computational software tools designed for computational processing. Although usable for sharing and exchanging data, the initial results of the study indicated XML has clear limitations in a computational environment. Popular computational tools are unable to handle very large XML formatted files, thus limiting processing of large XML archived data files. We show the breakdown points of XML formatted files for various popular computational tools and explore the performance dependencies of XML and HDF5 formatted files in popular computational environments on the hardware, operating system, and mathematical function. This study also explores the inverse file size relationship between HDF5 and XML data files. Several organizations, including ARL, use both XML and HDF5 for archiving and exchanging data. XML is best suited for storing "light" data (such as metadata) and HDF5 is best suited for storing "heavy" scientific data. Integrating and using both XML and HDF5 for data archiving offers the best solution for data providers and consumers to share information for computational and scientific purposes.
© (2010) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Kelly Bennett, James Robertson, "The impact of the data archiving file format on the sharing of scientific data for use in popular computational environments", Proc. SPIE 7687, Active and Passive Signatures, 76870F (4 May 2010); doi: 10.1117/12.850609; https://doi.org/10.1117/12.850609
PROCEEDINGS
12 PAGES


SHARE
Back to Top