Although there has been progress in applying GPU-technology to Computed-Tomography reconstruction algorithms, much of the work has concentrated on optimizing reconstruction performance for smaller, medical-scale datasets. Industrial CT datasets can vary widely in size and number of projections. With the new advancements in high resolution cameras, it is entirely possible that the Industrial CT community may soon need to pursue a 100-megapixel detector for CT applications. To reconstruct such a massive dataset, simply adding extra GPUs would not be an option as memory and storage bottlenecks would result in prolonged periods of GPU downtime, thus negating performance gains. Additionally, current reconstruction algorithms would not be sufficient due to the various bottlenecks in the processor hardware. Past work has shown that CT reconstruction is an irregular problem for large-scale datasets on a GPU due to the massively parallel environment. This work proposes a high-performance, multi-GPU, modularized approach to reconstruction where computation, memory transfers, and disk I/O are optimized to occur in parallel while accommodating the irregular nature of the computation kernel. Our approach utilizes a dynamic MIMD-type of architecture in a hybrid environment of CUDA and OpenMP. The modularized approach showed an improvement in load-balancing and performance such that a 1 trillion voxel volume was reconstructed from 10,000 100 megapixel projections in less than a day.