It is time-consuming to manually label the defects of the steel surface at the pixel-level. In this study, we aim to train a model for steel surface defect detection based on a dataset which is weakly labeled at the image-level. To achieve this, we propose a class activation map (CAM) method based on vision transformer (ViT), which fuses the attention map and the semantic map . We also introduce an object background discrimination module (OBDM) to alleviate the problem of irrelevant background activation. Experimental results show that, compared with other CAM methods, our method has achieved performance in the task of steel surface defect detection.
|