[关键词]
[摘要]
基于自注意力的视觉变换器(ViT)模型在自然语言处理和计算机视觉领域显示出强大的特征提取和模式表征能力。针对合成孔径雷达(SAR)图像特征与自然物体图像特征存在明显差异的问题,文中提出一种使用ViT 模型进行SAR 图像目标分类识别的方法,探索基于自注意力的深度学习模型在SAR 图像智能化处理的可行性和有效性。ViT 模型架构设计与自然语言处理模型架构相似,具有设置简单、可扩展性好、开箱即用的优点。模型主要由图像块分割、图像块投影嵌入、位置嵌入、自注意力模块序列和全连接分类器五部分组成。选择MSTAR 公开数据集作为实验数据集,并对数据集训练样本进行数据增强,在增强数据集上对ViT 模型进行训练,以在验证集上获得较低的误差和较高的识别率并使网络收敛。使用训练好的ViT 模型对SAR 图像测试样本进行分类测试,结果显示ViT 模型对于SAR 图像分类有着高准确率和良好的泛化能力,基于自注意力深度学习方法在SAR 图像自动化处理领域具有广阔的应用前景。
[Key word]
[Abstract]
Since the self-attention-based Vision transformer (ViT) model shows its powerful ability of feature extraction and pattern representation in both natural language processing (NLP) and computer vision areas, and due to the obvious difference between SAR image features and natural object image features, a method using ViT model is proposed for SAR image target classification to explore the feasibility and effectiveness of self-attention model in SAR image intelligent processing. In this paper the ViT architecture is similar to the former NLP model, and has the advantages of simple setting, good scalability and out-of-the-box deployment.The ViT model is mainly composed of five components: image splits, patch embedding, position embedding, self-attention module sequencing and multilayer perceptron (MLP) classification. The open MSTAR dataset is selected as the experimental dataset, and the training samples of the dataset are augmented. The ViT model is trained on the augmented dataset by minimizing the training loss and maximizing the classification accuracy on the verification dataset to ensure the convergence of the network. The trained ViT model is used to classify SAR image on the testing dataset. The experiments result show that ViT model has a high accuracy and good generalization ability for SAR image classification, and the self-attention method can play an iportant role in the field of SAR image automatic processing.
[中图分类号]
TN957. 52
[基金项目]