In recent years, the traditional automotive industry has also begun to enter the field of autonomous driving technology, seeking new breakthroughs. From simple pedestrian and vehicle detection, to the instance segmentation of traffic scenes, to the ideal all-intelligent driving, it has begun to be occupied by deep learning. Research scholars have tried to build a vehicle's control strategy system entirely using computers and also proposed a full convolutional neural network that replaced the fully connected structure with convolutional ideas, realizing the transition from image classification to dense pixel prediction. This is the first step for neural network to be used for scene instance segmentation, and it is also a key step for intelligent driving. However, the effect of the full convolutional neural network is not very ideal. An important problem is that the pooling layer will lose part of the location information while aggregating the background. For dense prediction of image instance segmentation, the context information (location information) of each pixel is indispensable, which is very important for the final classification of pixels. Thus, later researchers proposed three different structures to solve this problem: cavity convolution structure, codec structure, space pyramid structure. This paper analyzes the working principles and characteristics of several different structures and compares the differences between various networks. This paper combines a Mask-RCNN to construct a new network structure for image instance segmentation. The main innovations of this paper are as follows: 1. Introduce generative adversarial network into the field of image segmentation. Combine the conditional generative adversarial network idea, using the original image as the input of the generator, and the generative to generate the desired instance segmentation result. Combine the original image with the instance segmentation result generated by the generative, or combine the original image with the manually labeled segmentation result as the input to the discriminator. By training the network so that the discriminator cannot distinguish between the image generated by the generator and the result of the manual annotation, the generator can generate a satisfactory image segmentation result. 2. Introduce superpixel information of the image. In this paper, the boundary information obtained by superpixel segmentation is input into the generator network as a segmentation condition. For the original input image, this paper uses superpixel segmentation method to obtain the subtle contour of the image, and then stacks the superpixel segmentation result with the original image as the input of the generated network. 3. Reconstructed a new image segmentation structure. In the image translation model, the processing at the boundary is often difficult to achieve good results, so this article changes the output layer of the generator to K (K represents the number of classifications) channels to output the results. This paper adopts the Encoder-Decoder structure in DeconvNet, removes the full connection layer to reduce the model parameters, and changes the pooled indexing method to the direct stacking structure.