The advancement and improvement of computer computing power have led to rapid development in the field of artificial intelligence. Intelligent information technology has also garnered attention and promotion in the manufacturing industry. However, existing research lacks consideration for the problems existing in the manufacturing field, mainly due to difficulties in acquiring rare scenario datasets. In the case of steel plate sorting, the detection of corner points plays a crucial role in production efficiency, particularly regarding the issue of steel plate adhesion caused by laser cutting. Considering the scarcity of seam-cut steel plate image data and the powerful generalization ability of cross-modal model GLIP, this study adopts the application approach of cross-modal large models from different fields. Firstly, we established a steel plate dataset with corner point information. And the GLIP model was fine-tuned using weakly supervised learning. Then, the inference results of the large teacher model are used as inputs to the lightweight student model YOLOv8, forming a framework for lightweight deployment in the industry. In our experiments, we first compared the effects of different amounts of data on the GLIP model and then demonstrated that the 20-shot model performs comparably to the full-shot model. In addition, YOLOv8 can recognize corner points that have not been manually annotated or labeled by the GLIP model, demonstrating excellent generalization performance. We conducted comparative verification, which showed the advantages of GLIP in terms of time consumption, manually labeled data volume, and deployment scale. This study fully utilizes a sparsely labeled dataset and cross-modal large models, integrating them with a lightweight object detection model to reduce labeling costs and improve production efficiency. Finally, we propose directions for future work.
|