An effective way to accelerate the Finite-difference time-domain (FDTD) method is the use of a Graphic Processing Unit (GPU). This paper describes an implementation of the three dimensional FDTD method with CPML boundary condition on a Kepler (GK110) architecture GPU. We optimize the FDTD domain decomposition method on Kepler GPU. And then, several Kepler-based optimizations are studied and applied to the FDTD program. The optimized program achieved up to 270.9 times speedup compared to the CPU sequential version. The experiments show that 22.2% of the simulation time is saved compared to the GPU version without optimizations. The solution is also faster than previous works.