Achieving the throughput of one wafer per minute per layer with a direct-write maskless lithography system, using 25 nm pixels for 50 nm feature sizes, requires data rates of about 10 Tb/s. In previous work, we have shown that lossless binary compression plays a key role in the system architecture for such a maskless writing system. Recently, we developed a new compression technique Context-Copy-Combinatorial-Code (C4) specifically tailored to lithography data which exceeds the compression efficiency of all other existing techniques including BZIP2, 2D-LZ, and LZ77. The decoder for any chosen compression scheme must be replicated in hardware tens of thousands of times in any practical direct write lithography system utilizing compression. As such, decode implementation complexity has a significant impact on overall complexity. In this paper, we explore the tradeoff between the compression ratio, and decoder buffer size for C4. Specifically, we present a number of techniques to reduce the complexity for C4 compression. First, buffer compression is introduced as a method to reduce decoder buffer size by an order of magnitude without sacrificing compression efficiency. Second, linear prediction is used as a low-complexity alternative to both context-based prediction and binarization. Finally, we allow for copy errors, which improve the compression efficiency of C4 at small buffer sizes. With these techniques in place, for a fixed buffer size, C4 achieves a significantly higher compression ratio than those of existing compression algorithms. We also present a detailed functional block diagram of the C4 decoding algorithm as a first step towards a hardware realization.