网站建设禁止性规定,抖音推广公司,wordpress小说系统,手机网站焦点图代码#x1f4a1;#x1f4a1;#x1f4a1;本文独家改进#xff1a;本文首先复现了将EMA引入到RT-DETR中#xff0c;并跟不同模块进行结合创新#xff1b;1#xff09;Rep C3结合#xff1b;2#xff09;直接作为注意力机制放在网络不同位置#xff1b;3#xff09;高效… 本文独家改进本文首先复现了将EMA引入到RT-DETR中并跟不同模块进行结合创新1Rep C3结合2直接作为注意力机制放在网络不同位置3高效和HGBlock结合
总有一种改进适用你的数据集完成涨点工作并进行创新
推荐指数五星 RT-DETR魔术师专栏介绍
https://blog.csdn.net/m0_63774211/category_12497375.html
✨✨✨魔改创新RT-DETR
引入前沿顶会创新助力RT-DETR
基于ultralytics优化与YOLO完美结合 1.RT-DETR介绍 论文 https://arxiv.org/pdf/2304.08069.pdf RT-DETR (Real-Time DEtection TRansformer) 一种基于 DETR 架构的实时端到端检测器其在速度和精度上取得了 SOTA 性能
为什么会出现 YOLO 检测器有个较大的待改进点是需要 NMS 后处理其通常难以优化且不够鲁棒因此检测器的速度存在延迟。为避免该问题我们将目光移向了不需要 NMS 后处理的 DETR一种基于 Transformer 的端到端目标检测器。然而相比于 YOLO 系列检测器DETR 系列检测器的速度要慢的多这使得无需 NMS 并未在速度上体现出优势。上述问题促使我们针对实时的端到端检测器进行探索旨在基于 DETR 的优秀架构设计一个全新的实时检测器从根源上解决 NMS 对实时检测器带来的速度延迟问题。 RT-DETR是第一个实时端到端目标检测器。具体而言我们设计了一个高效的混合编码器通过解耦尺度内交互和跨尺度融合来高效处理多尺度特征并提出了IoU感知的查询选择机制以优化解码器查询的初始化。此外RT-DETR支持通过使用不同的解码器层来灵活调整推理速度而不需要重新训练这有助于实时目标检测器的实际应用。RT-DETR-L在COCO val2017上实现了53.0%的AP在T4 GPU上实现了114FPSRT-DETR-X实现了54.8%的AP和74FPS在速度和精度方面都优于相同规模的所有YOLO检测器。RT-DETR-R50实现了53.1%的AP和108FPSRT-DETR-R101实现了54.3%的AP和74FPS在精度上超过了全部使用相同骨干网络的DETR检测器。
2.EMA介绍 论文https://arxiv.org/abs/2305.13563v1
录用ICASSP2023 通过通道降维来建模跨通道关系可能会给提取深度视觉表示带来副作用。本文提出了一种新的高效的多尺度注意力(EMA)模块。以保留每个通道上的信息和降低计算开销为目标将部分通道重塑为批量维度并将通道维度分组为多个子特征使空间语义特征在每个特征组中均匀分布。 本文提出了一种新的跨空间学习方法并设计了一个多尺度并行子网络来建立短和长依赖关系。 1)我们考虑一种通用方法将部分通道维度重塑为批量维度以避免通过通用卷积进行某种形式的降维。 2)除了在不进行通道降维的情况下在每个并行子网络中构建局部的跨通道交互外我们还通过跨空间学习方法融合两个并行子网络的输出特征图。 3)与CBAM、NAM[16]、SA、ECA和CA相比EMA不仅取得了更好的结果而且在所需参数方面效率更高。 3. EMA加入到RT-DETR
3.1 新建ultralytics/nn/attention/EMA.py
代码详见
RT-DETR手把手教程注意力机制如何添加在网络的不同位置进行创新优化EMA注意力为案列-CSDN博客
3.3 EMA_attention如何跟RT-DETR结合进行结合创新
3.3.1 如何跟Rep C3结合
# Ultralytics YOLO , AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. modelyolov8n-cls.yaml will call yolov8-cls.yaml with scale n# [depth, width, max_channels]l: [1.00, 1.00, 1024]backbone:# [from, repeats, module, args]- [-1, 1, HGStem, [32, 48]] # 0-P2/4- [-1, 6, HGBlock, [48, 128, 3]] # stage 1- [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8- [-1, 6, HGBlock, [96, 512, 3]] # stage 2- [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P3/16- [-1, 6, HGBlock, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut- [-1, 6, HGBlock, [192, 1024, 5, True, True]]- [-1, 6, HGBlock, [192, 1024, 5, True, True]] # stage 3- [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P4/32- [-1, 6, HGBlock, [384, 2048, 5, True, False]] # stage 4head:- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2- [-1, 1, AIFI, [1024, 8]]- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0- [-1, 1, nn.Upsample, [None, 2, nearest]]- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.1- [[-2, -1], 1, Concat, [1]]- [-1, 3, EMA_attentionC3, [256]] # 16, fpn_blocks.0- [-1, 1, Conv, [256, 1, 1]] # 17, Y4, lateral_convs.1- [-1, 1, nn.Upsample, [None, 2, nearest]]- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0- [[-2, -1], 1, Concat, [1]] # cat backbone P4- [-1, 3, EMA_attentionC3, [256]] # X3 (21), fpn_blocks.1- [-1, 1, Conv, [256, 3, 2]] # 22, downsample_convs.0- [[-1, 17], 1, Concat, [1]] # cat Y4- [-1, 3, EMA_attentionC3, [256]] # F4 (24), pan_blocks.0- [-1, 1, Conv, [256, 3, 2]] # 25, downsample_convs.1- [[-1, 12], 1, Concat, [1]] # cat Y5- [-1, 3, EMA_attentionC3, [256]] # F5 (27), pan_blocks.1- [[21, 24, 27], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)3.3.2 直接作为注意力机制放在网络不同位置
# Ultralytics YOLO , AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. modelyolov8n-cls.yaml will call yolov8-cls.yaml with scale n# [depth, width, max_channels]l: [1.00, 1.00, 1024]backbone:# [from, repeats, module, args]- [-1, 1, HGStem, [32, 48]] # 0-P2/4- [-1, 6, HGBlock, [48, 128, 3]] # stage 1- [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8- [-1, 6, HGBlock, [96, 512, 3]] # stage 2- [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P3/16- [-1, 6, HGBlock, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut- [-1, 6, HGBlock, [192, 1024, 5, True, True]]- [-1, 6, HGBlock, [192, 1024, 5, True, True]] # stage 3- [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P4/32- [-1, 6, HGBlock, [384, 2048, 5, True, False]] # stage 4head:- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2- [-1, 1, AIFI, [1024, 8]]- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0- [-1, 1, EMA_attention, [256]] # 13- [-1, 1, nn.Upsample, [None, 2, nearest]]- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 15 input_proj.1- [[-2, -1], 1, Concat, [1]]- [-1, 3, RepC3, [256]] # 17, fpn_blocks.0- [-1, 1, Conv, [256, 1, 1]] # 18, Y4, lateral_convs.1- [-1, 1, EMA_attention, [256]] # 19- [-1, 1, nn.Upsample, [None, 2, nearest]]- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 21 input_proj.0- [[-2, -1], 1, Concat, [1]] # cat backbone P4- [-1, 3, RepC3, [256]] # X3 (23), fpn_blocks.1- [-1, 1, Conv, [256, 3, 2]] # 24, downsample_convs.0- [[-1, 19], 1, Concat, [1]] # cat Y4- [-1, 3, RepC3, [256]] # F4 (26), pan_blocks.0- [-1, 1, Conv, [256, 3, 2]] # 27, downsample_convs.1- [[-1, 13], 1, Concat, [1]] # cat Y5- [-1, 3, RepC3, [256]] # F5 (29), pan_blocks.1- [[23, 26, 29], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)3.3.3 高效和HGBlock结合
# Ultralytics YOLO , AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. modelyolov8n-cls.yaml will call yolov8-cls.yaml with scale n# [depth, width, max_channels]l: [1.00, 1.00, 1024]backbone:# [from, repeats, module, args]- [-1, 1, HGStem, [32, 48]] # 0-P2/4- [-1, 6, HGBlock, [48, 128, 3]] # stage 1- [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8- [-1, 6, HGBlock, [96, 512, 3]] # stage 2- [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P3/16- [-1, 6, HGBlock, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut- [-1, 6, HGBlock, [192, 1024, 5, True, True]]- [-1, 6, HGBlock, [192, 1024, 5, True, True]] # stage 3- [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P4/32- [-1, 6, HGBlock_EMA_attention, [384, 2048, 5, True, False]] # stage 4head:- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2- [-1, 1, AIFI, [1024, 8]]- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0- [-1, 1, nn.Upsample, [None, 2, nearest]]- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.1- [[-2, -1], 1, Concat, [1]]- [-1, 3, RepC3, [256]] # 16, fpn_blocks.0- [-1, 1, Conv, [256, 1, 1]] # 17, Y4, lateral_convs.1- [-1, 1, nn.Upsample, [None, 2, nearest]]- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0- [[-2, -1], 1, Concat, [1]] # cat backbone P4- [-1, 3, RepC3, [256]] # X3 (21), fpn_blocks.1- [-1, 1, Conv, [256, 3, 2]] # 22, downsample_convs.0- [[-1, 17], 1, Concat, [1]] # cat Y4- [-1, 3, RepC3, [256]] # F4 (24), pan_blocks.0- [-1, 1, Conv, [256, 3, 2]] # 25, downsample_convs.1- [[-1, 12], 1, Concat, [1]] # cat Y5- [-1, 3, RepC3, [256]] # F5 (27), pan_blocks.1- [[21, 24, 27], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)4.总结
本文复现了将EMA引入到RT-DETR中并跟不同模块进行结合创新
1Rep C3结合
2直接作为注意力机制放在网络不同位置
3高效和HGBlock结合