Stimulated emission depletion (STED), as one of the emerging super-resolution techniques, defines a state-of-the-art image resolution method. It has been developed into a universal fluorescent imaging tool over the past several years. The currently best available lateral resolution offered by STED is around 20 nm, but in real live cell imaging applications, the regular resolution offered through this mechanism is around 100 nm, limited by phototoxicity. Many critical biological structures are below this resolution level. Hence, it will be invaluable to improve the STED resolution through postprocessing techniques. We propose a deep adversarial network for improving the STED resolution significantly, which takes an STED image as an input, relies on physical modeling to obtain training data, and outputs a “self-refined” counterpart image at a higher resolution level. In other words, we use the prior knowledge on the STED point spread function and the structural information about the cells to generate simulated labeled data pairs for network training. Our results suggest that 30-nm resolution can be achieved from a 60-nm resolution STED image, and in our simulation and experiments, the structural similarity index values between the label and output result reached around 0.98, significantly higher than those obtained using the Lucy–Richardson deconvolution method and a state-of-the-art UNet-based super-resolution network.