Digital histopathology images with more than 1 Gigapixel are drawing more and more attention in clinical,
biomedical research, and computer vision fields. Among the multiple observable features spanning multiple
scales in the pathology images, the nuclear morphology is one of the central criteria for diagnosis and grading.
As a result it is also the mostly studied target in image computing. Large amount of research papers have
devoted to the problem of extracting nuclei from digital pathology images, which is the foundation of any
further correlation study. However, the validation and evaluation of nucleus extraction have yet been formulated
rigorously and systematically. Some researches report a human verified segmentation with thousands of nuclei,
whereas a single whole slide image may contain up to million. The main obstacle lies in the difficulty of obtaining
such a large number of validated nuclei, which is essentially an impossible task for pathologist. We propose a
systematic validation and evaluation approach based on large scale image synthesis. This could facilitate a more
quantitatively validated study for current and future histopathology image analysis field.
High-resolution pathology images provide rich information about the morphological and functional characteristics of biological
systems, and are transforming the field of pathology into a new era. To facilitate the use of digital pathology
imaging for biomedical research and clinical diagnosis, it is essential to manage and query both whole slide images (WSI)
and analytical results generated from images, such as annotations made by humans and computed features and classifications
made by computer algorithms. There are unique requirements on modeling, managing and querying whole slide
images, including compatibility with standards, scalability, support of image queries at multiple granularities, and support
of integrated queries between images and derived results from the images. In this paper, we present our work on developing
the Pathology Image Database System (PIDB), which is a standard oriented image database to support retrieval of
images, tiles, regions and analytical results, image visualization and experiment management through a unified interface
and architecture. The system is deployed for managing and querying whole slide images for In Silico brain tumor studies at
Emory University. PIDB is generic and open source, and can be easily used to support other biomedical research projects.
It has the potential to be integrated into a Picture Archiving and Communications System (PACS) with powerful query
capabilities to support pathology imaging.
Medical image based biomarkers are being established for therapeutic cancer clinical trials, where image assessment is
among the essential tasks. Large scale image assessment is often performed by a large group of experts by retrieving
images from a centralized image repository to workstations to markup and annotate images. In such environment, it is
critical to provide a high performance image management system that supports efficient concurrent image retrievals in a
distributed environment. There are several major challenges: high throughput of large scale image data over the Internet
from the server for multiple concurrent client users, efficient communication protocols for transporting data, and effective
management of versioning of data for audit trails. We study the major bottlenecks for such a system, propose and evaluate a
solution by using a hybrid image storage with solid state drives and hard disk drives, RESTfulWeb Services based protocols
for exchanging image data, and a database based versioning scheme for efficient archive of image revision history. Our
experiments show promising results of our methods, and our work provides a guideline for building enterprise level high
performance medical image management systems.
Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image
data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model
for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of
AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid.
Biomedical database systems need not only to address the issues of managing complex data, but also to provide data
security and access control to the system. These include not only system level security, but also instance level access
control such as access of documents, schemas, or aggregation of information. The latter is becoming more important
as multiple users can share a single scientific data management system to conduct their research, while data have to be
protected before they are published or IP-protected. This problem is challenging as users' needs for data security vary
dramatically from one application to another, in terms of who to share with, what resources to be shared, and at what
access level. We develop a comprehensive data access framework for a biomedical data management system SciPort.
SciPort provides fine-grained multi-level space based access control of resources at not only object level (documents and
schemas), but also space level (resources set aggregated in a hierarchy way). Furthermore, to simplify the management
of users and privileges, customizable role-based user model is developed. The access control is implemented efficiently
by integrating access privileges into the backend XML database, thus efficient queries are supported. The secure access
approach we take makes it possible for multiple users to share the same biomedical data management system with flexible
access management and high data security.
Increased complexity of scientific research poses new challenges to scientific data management. Meanwhile, scientific
collaboration is becoming increasing important, which relies on integrating and sharing data from distributed institutions.
We develop SciPort, a Web-based platform on supporting scientific data management and integration based on a central
server based distributed architecture, where researchers can easily collect, publish, and share their complex scientific
data across multi-institutions. SciPort provides an XML based general approach to model complex scientific data by
representing them as XML documents. The documents capture not only hierarchical structured data, but also images and
raw data through references. In addition, SciPort provides an XML based hierarchical organization of the overall data
space to make it convenient for quick browsing. To provide generalization, schemas and hierarchies are customizable with
XML-based definitions, thus it is possible to quickly adapt the system to different applications. While each institution can
manage documents on a Local SciPort Server independently, selected documents can be published to a Central Server to
form a global view of shared data across all sites. By storing documents in a native XML database, SciPort provides high
schema extensibility and supports comprehensive queries through XQuery. By providing a unified and effective means for
data modeling, data access and customization with XML, SciPort provides a flexible and powerful platform for sharing
scientific data for scientific research communities, and has been successfully used in both biomedical research and clinical