[关键词]
[摘要]
基于自注意力的ViT模型在合成孔径雷达(SAR)图像目标识别方面取得良好效果,但是在SAR变体图像数据集上的分类识别效果未达预期,原因是SAR变体图像具有类内差别大、类间差别小、分类难度大等特性。文中首次使用改进的基于自注意力的T2T-ViT模型(融合卷积神经网络和ViT 特性)进行SAR变体图像识别,该模型可以提取并融合局部细节信息和全局分布信息,实现对细节特征丰富、复杂分布的目标的分类识别。模型开发训练过程中使用了一些改进方法,一是降低图像通道数量以降低网络规模,二是数据增强训练以提高模型的准确率和鲁棒性,三是通过网络初始参数设定、学习率优化调整等技巧提升T2T-ViT网络训练效果。实验数据集选取MSTAR公开数据集的变体子集T72_variants_SAR,模型训练收敛后,使用模型对SAR图像测试样本分类识别。实验结果显示:与ViT模型相比,T2T-ViT模型有着更高的准确率和更强的泛化能力。通过文中方法,T2T-ViT模型参数规模低、计算量小、模型训练速度快,这些优点使模型部署到嵌入式系统中成为可能,为模型投入实际使用奠定了技术基础。
[Key word]
[Abstract]
The ViT model based on self-attention has achieved relatively good results of target detection and recognition in synthetic aperture radar (SAR) images, but the classification and recognition effects on SAR variant images dataset are not acceptable because SAR variant images dataset has the characteristics of big intra-class differences, small inter-class differences and great classification difficulty. An improved self-attention based T2T-ViT model which combines the convolutional neural networks and ViT is used for SAR variant images recognition. This model can extract and fuse local detail information and global distribution information to realize the classification and recognition of targets with rich detailed features and complex distribution. Some improvement methods are used in the process of model development and training. Firstly, the image channels are reduced to lower the network scale. Secondly, data enhancement training method is used to improve the robustness and correctness of the model. Thirdly, techniques like network initial parameter setting and learning rate optimization and adjustment are used to improve the network training effects of T2T-ViT. The subset T72_variants_SAR of the MSTAR is selected, after network convergence the model is used to classify and recognize the SAR test samples. The experimental results show that the T2T-ViT model has higher accuracy and stronger generalization ability compared with ViT model. With the proposed method, T2T-ViT has the advantages of less model parameters, less calculation and fast training speed, which makes it possible to deploy the model into an embedded system, laying a technical foundation for practical use of the model.
[中图分类号]
TN957.52
[基金项目]