Data volume and average data preparation time continue to trend upward with newer technology nodes. In the past decade, with file sizes measured in terabytes and network bandwidth requirements exceeding 40GB/s, mask synthesis operations have expanded their cluster capacity to thousands and even 10s of thousands of CPU cores. Efficient, scalable and flexible management of this expensive, high performance, distributed computing system is required in every stage of geometry processing - from layout polishing through Optical Proximity Correction (OPC), Mask Process Correction (MPC) and Mask Data Preparation (MDP) - to consistently meet tape out cycle time goals. The MDP step, being the final stage in the entire flow, has to write all of the pattern data into one or more disk files. This extremely I/O intensive section remains a significant portion of the processing time and creates a major challenge for the software from a scalability perspective. It is important to have a comprehensive solution that displays high scalability for large jobs and low overhead for small jobs, which is the ideal behavior in a typical production environment. In this paper we will discuss methods to address the former requirement, emphasizing the efficient use of high performance distributed file systems while minimizing the less scalable disk I/O operations. We will also discuss dynamic resource management and efficient job scheduling to address the latter requirement. Finally, we will demonstrate the use of a cluster management system to create a comprehensive data processing environment suitable to support large scale data processing requirements.