Achieving the throughput of one wafer layer per minute with a direct-write maskless lithography system, using 22-nm pixels for 45-nm feature sizes, requires data rates of about 12 Tb/s. In our previous work, we developed a novel lossless compression technique specifically tailored to flattened, rasterized, layout data called context copy combinatorial code (C4), which exceeds the compression efficiency of all other existing techniques including BZIP2, 2D-LZ, and LZ77, especially under a limited decoder buffer size, as required for hardware implementation. In this work, we present two variations of the C4 algorithm. The first variation, block C4, lowers the encoding time of C4 by several orders of magnitude, concurrently with lowering the decoder complexity. The second variation, which involves replacing the hierarchical combinatorial coding part of C4 with Golomb run-length coding, significantly reduces the decoder power and area as compared to block C4. We refer to this algorithm as block Golomb context copy code (block GC3). We present the detailed functional block diagrams of block C4 and block GC3 decoders, along with their hardware performance estimates as the first step of implementing the writer chip for maskless lithography.