Depth information perception of unstructured scene images is an important problem for applications using computer vision. This paper proposes a method based on deep learning combined with self-attention mechanism to reason the depth information of unstructured indoor targets, which effectively solves the problem of blurred image detail and insufficient layering in depth information reasoning in unstructured scenes. First, the deep learning-based encoder-decoder model is trained to learn the depth information of indoor scenes on large 3D datasets. The trained model has good results for general structured indoor scenes. Secondly, the soft self-attention mechanism is used to obtain the disparity information between the upper and lower sequences of the input image, by which the depth map obtained in the first step is corrected to enhance the accuracy of depth. Finally, in order to get clear objects with obvious boundaries in the depth response map, the nearest neighbor regression is used to correct the contour of the objects. The experimental results show that the proposed method has very good depth information reasoning ability for indoor unstructured scenes. Through depth information reasoning, the obtained objects have obvious texture structure, strong geometric features, clear contour edges and delicate layers, and also the misleading of deep information reasoning in reflective and highlight areas is eliminated.