Stochastic Monte Carlo (MC) modelling is widely used to simulate the peculiarities of light scattering in disperse composite media with complex internal structures including biological tissues.1 An opportunity to imitate directly the influence of varieties of structural and physiologic properties of biological tissues on the photon migration for a particular source-detector configuration makes MC a primary tool in biomedical optics and optical engineering. A number of MC codes have been developed in the past12.3.4.–5 and applied extensively for various biomedical applications, including light dosimetry modelling for photodynamic therapy treatment planning,6 simulation of reflectance spectra of human skin,7 analysis of fluorescence excitation in skin tissues,8 imitation of photon migration in the brain,9 two-dimensional images of skin by optical coherence tomography (OCT),10 and others. It has been demonstrated that MC technique is able to provide a broad variety of practical solutions in a range of biomedical studies from single cells to the biopsy of specific biological tissues and whole organs. However, simulation time was a significant concern in all of the developed MC models previously mentioned.
The situation dramatically changed in 2005 when leading hardware manufactures were faced with fundamental limitations (a rapidly approaching physical limit of transistor size) and were unable to extract additional power from existing architectures, they began offering processors with multiple cores. This progress in technology is known as the “multicore revolution.”11 Today nearly all personal computers have more than one processor core, making truly parallel computing available for developers. With the further developments of multiprocessor architecture a novel compute unified device architecture (CUDA) technology became available,11,12 It has been demonstrated that the multiple-iterative parts of an algorithm can be executed in parallel on CUDA cores bringing up to hundreds of times the speed increase. CUDA is a parallel computing architecture which utilizes both hardware and software levels developed by the world leading graphic accelerators manufacturer NVIDIA Corp. Introduced in 2007, CUDA has undergone significant changes in conformance to a number of futures that NVIDIA’s engineers added to the graphic chip.11 NVIDIA CUDA technology provides an unlimited access to computational resources of graphic cards, processor cores, and different types of memory (can be distinguished by capacity and speed), making it a massive coprocessor in parallel computations.12 Typical CUDA graphic multiprocessors utilize a fundamentally different design philosophy than CPUs and consist of a number of streaming processors logically divided in hundreds of CUDA cores that can execute thousands of threads simultaneously, without context switch performance losses and extremely fast on-chip graphic double data rate memory (GDDR). This philosophy, known as single program multiple data, makes graphics processing units (GPUs) specialized for compute-intensive, highly-parallel computations, such as graphics rendering. Within the development of the CUDA framework (including compiler, debugger, profiler, software development kit, documentation) the ultimate petaflop computational power of GPUs became available to a variety of industries and applications.11,12 Therefore, using GPU acceleration, the time and performance of MC simulation has significantly improved.1314.15.–16
Based on the next CUDA generation, so-called architecture code Fermi, introduced in 2010, a new concept of MC model utilizing object-oriented programming (OOP) and GPU acceleration has been proposed.17 The OOP approach allows describing photon packets and the structural components of biological tissues (e.g., layers, vessels, tumors, cells, collagen fiber, etc.) as objects. Thus, the photon migration in the medium is presented as an interaction of objects, photons with the object, medium (or with multiple medium’s objects). Dividing the medium into the objects makes it possible to develop realistic tissue models presenting three-dimensional spatial variations of complex biological structures. Moreover, potentially the OOP approach enables import of the actual structure of medium as an object from a particular imaging modality, e.g., from electron microscopy, magnetic resonance imaging, or OCT.
Integrating CUDA acceleration with the modern web technologies, such as Microsoft Silverlight, ASP.NET Framework, the online object-oriented MC (O3MC) computational tool was developed.17 The key idea behind O3MC (Ref. 17) development is the creation of a universal computational tool to simulate the results of real experiments typically used in major applications in biomedical optics and related areas that could provide researchers with practical results nearly in real time.
OOP and GPU implementations speeds up the procedure of MC simulation up to times.18 However, due to the multiuser architecture of the online solution, concurrent simulations by multiple clients significantly degrade performance of O3MC. For example, if one user accessing the O3MC can get the results in 4.3 s on TESLA M2090 GPU, 100 users accessing O3MC at the same time can be stacked in a queue and wait for 10 to 15 min.
Therefore, in framework of further development of O3MC (available online at: www.biophotonics.ac.nz) to deal with the multiuser access we apply a peer-to-peer (P2P) network. The proposed P2P network consists of a set of computers, called nodes or peers, which communicate and share their GPUs (Fig. 1). The peers in a P2P network are equal among each other, acting both as clients and servers.19 The P2P approach has gained a lot of popularity in recent years, especially in terms of multimedia content delivery and communication (e.g., BitTorrent, Skype, etc.). With current development, for the first time to our knowledge, we apply a hybrid P2P network (Fig. 1) utilizing different types of peers for MC simulation.
The web server hosts the online MC tool user interface, accepts O3MC simulation requests from clients, and keeps track of the other nodes (Fig. 1). The nodes are responsible for sharing the information about currently queuing MC simulations, processing them on GPUs, uploading, downloading, and hosting the outcomes (presented in a typical journal-paper format) among themselves without the need of the central server. To develop a P2P network (Fig. 1), the recently introduced P2P features of .NET 4.0 Windows Communication Foundation (WCF) were applied.20 This allows integrating the P2P network with the load balancing part of O3MC as both are written in managed code using .NET framework application programming interface.17 Thus, a number of MC simulations can be executed simultaneously on peers without queuing. Following Ref. 16, we consider simulation of fluence rate as a quantitative measure of MC performance (Fig. 2).
Medium with the following parameters was used in all simulations: the scattering coefficient , absorption coefficient , factor anisotropy , and refractive index . The time required to perform one simulation of photon packets is different at each peer: TESLA M2090 GPU spans 0.253 s for counting photon packets migration; GTX480; 0.567 s; FX580, 7.466 s; and GT555M, 2.134 s. The current network of peers can be slower, e.g., compared with a realization of MC modeling on GPU cluster.14 However, configuring a P2P network with clusters will surpass the performance of a standalone cluster.
Comparison of P2PMC with other MC codes applied for simulation of fluence rate for 3×107 photon packets.16 The parameters of the medium are: μs=10 mm−1, μa=0.05 mm−1, g=0.9, and n=1.5. Note, in O3MC and P2PMC the time of simulation is not depending on the absorption properties of the medium.
|Reference||Time (s)||Std deviation|
Due to the different nature of peers involved, the current P2P Network operates with Gigaflops of floating-point operations per second for single and with Gigaflops for double-precision computations. Thus, the best speedup of processing multiuser requests has been achieved using single-precision computing, whereas utilizing double precision for floating-point arithmetic operations in O3MC provides the most accurate results of modeling (Table 1). This is explained by the fact that Tesla M2090 GPUs used by a standalone version of O3MC fully support IEEE standards for floating-point arithmetic operations (IEEE 754) revised in 2008 (Ref. 21), whereas FX580 and GT555M GPUs in a P2P network in current study does not have this capability. Having more peers with better double-precision capabilities will increase the double-precision computation capacity of the entire network.
In addition, the P2PMC has been also validated by comparing the results of simulation of diffuse reflectance for a semi-infinite scattering medium with known analytical results tabulated by Giovanelli,22 results of adding-doubling method,22 MCML (Ref. 3) and O3MC (Ref. 17). The results of comparison are presented in Table 2.
The results of total diffuse reflectance Rd given by analytical results,22 adding-doubling method,23 MCML,3 O3MC,17 and P2PMC for 5×104 photon packets. The parameters of the medium are: μs=9 mm−1, μa=1 mm−1, g=0, and n=1.5.
The developed MC technique was also comprehensively validated by comparison with the known exact solution by Milne25,26 and with the results of experimental studies of image transfer through the water solution of spherical microparticles of known size and density.27,28
Thus, in this letter, utilizing different types of CUDA GPUs, we consider the possibility of using a P2P network to provide multiuser access for an online MC simulation of photon migration in complex turbid media, such as biological tissues. The multiuser access capability was found to be linearly dependent on the number of peers. The best speedup of processing multiuser requests in a range of 4 to 35 s, depending on the particular peers’ involvement, was achieved using single-precision computing, whereas the double-precision for floating-point arithmetic operations provides the best accuracy. Thus, with the growing number of peers (preferably double-precision), including those distributed worldwide, a P2P solution will allow multiple users to perform accurate reasonably fast MC modeling (i.e., in about 5 s for photon packets, excluding the time for P2P network communication). Further development of such a network worldwide could be the base for a computational platform for biomedical optics and optical diagnostic community.