内窥镜检查中基于提示的息肉分割|文献速递-深度学习医疗AI最新文献

Title

题目

Prompt-based polyp segmentation during endoscopy

内窥镜检查中基于提示的息肉分割

文献速递介绍

以下是对这段英文内容的中文翻译： ### 胃肠道癌症的发病率呈上升趋势，且有年轻化倾向（Bray等人，2018）。因此，消化道早期癌症的筛查至关重要。 colorectal cancer（CRC）患者在疾病一期的生存率超过95%，但到四、五期时则大幅降至35%以下（Bernal等人，2012）。目前，内窥镜检查在临床实践中被广泛应用，已成为筛查消化道疾病的标准方法（Alfarone等人，2022）。部分息肉容易被内镜医师忽视，这对患者会造成严重后果。近期一项meta分析（Zhao等人，2019）显示，结肠镜检查中26%的腺瘤会被漏诊。尽管部分腺瘤因黏膜表面暴露不足而被漏检（East等人，2007），但一项基于图像的回顾性研究（Yamada等人，2019）表明，即使腺瘤已被可视化，仍有14%未被医师识别。结肠镜检查中息肉的漏诊率通常为17%-28%（Yamada等人，2019）。未治疗的息肉易进展为胃肠道肿瘤和癌症。因此，诊断过程中准确且实时的息肉分割至关重要。由于息肉的大小、外观和位置具有异质性，准确诊断依赖于内镜医师的经验，具有挑战性。Koh等人（2023）的研究表明，人工智能（AI）技术可提高内窥镜诊断和治疗中的息肉检测率。PraNet、FCBFormer、HarDNet等方法（Fan等人，2020；Sanderson和Matuszewski，2022；Huang等人，2021）在息肉分割中已取得显著成果。然而，仅依赖基于AI的分割算法，仍容易漏诊某些溃疡、糜烂和早期恶性肿瘤。分割一切模型（SAM）（Kirillov等人，2023）展示了通过用户提供的提示（如点或边界框）对自然景观图像中的物体进行分割的卓越能力。因此，结合AI与内镜医师提示的息肉分割方法是一个重要的研究方向。在本研究中，我们提出了一种新型的基于提示的息肉分割方法（PPSM），可精确分割息肉并辅助内窥镜下的早期癌症诊断。内镜医师在结肠镜检查过程中会自然地将注意力集中在可疑病变区域（Wallace和Kiesslich，2010）。因此，PPSM将内镜医师的眼动注意力、非均匀点阵和息肉特征作为提示，以模拟内镜医师的息肉诊断过程。首先，我们提出了一种基于提示的息肉分割网络（PPSN），该网络由提示编码模块（PEM）、特征提取编码模块（FEEM）和掩码解码模块（MDM）组成。PEM对提示进行编码，以指导FEEM进行特征提取，并指导MDM生成掩码，从而使PPSN能够高效地分割息肉。其次，将内镜医师的注意力数据用作提示，这不仅能在现实场景中有效获取提示数据，还能提高PPSN分割息肉的准确性。为增强PPSN的稳定性，我们生成了非均匀点阵提示，以补偿眼动追踪过程中的帧丢失。此外，我们引入了一种基于SAM的数据增强方法，以丰富提示数据集并提高PPSN的适应性。在Kvasir-SEG（Jha等人，2020）、CVC-ClinicDB（Bernal等人，2015）、SUN-SEG（Misawa等人，2021）和PolypGen（Ali等人，2023）数据集上，PPSM分别取得了0.952、0.991、0.993和0.987的高精度分数。这一性能表明PPSM在准确分割息肉方面的有效性。此外，PPSM的最大帧率可达233。在四个数据集上的交叉训练和交叉测试结果显示，PPSM具有出色的泛化能力。部分代码和生成提示数据集的方法可在https://github.com/XinZhenRen/PPSM获取。总之，本研究的贡献包括： 1. 提出了一种准确、实时且泛化能力强的基于提示的息肉分割网络。PPSN生成的掩码边界清晰、内部无空洞，且受内窥镜工作环境的影响较小。五种提示显著提升了PPSN分割息肉的性能，并简化了训练过程，减少了对大量眼动数据的需求。 2. 为提高准确性，将内镜医师的经验融入PPSN，使用内镜医师的眼动注意力数据作为提示来指导PPSN进行息肉分割。与鼠标点击等提示输入方法相比，我们的方法在内窥镜检查中更具实时性和实用性。此外，我们的方法还考虑了眼动注意力数据缺失的情况。据我们所知，PPSM是首个结合AI和内镜医师提示的息肉分割方法。 3. 引入了一种基于SAM的数据增强方法，以丰富提示数据集并提高PPSN的适应性。SAM生成的提示包括正样本和负样本，帮助PPSN排除假息肉，并消除内窥镜工作环境中的干扰。 4. 开发了具有早期癌症实时辅助诊断功能的一次性电子内窥镜和图像处理器。本文的其余部分结构如下：第2节讨论相关工作；第3.1节介绍PPSM的总体架构；第3.2节详细介绍PPSN；第3.3节讨论提出的基于SAM的数据增强方法；第3.4节和第3.5节介绍实时提示策略；第4节展示实验结果。

Abatract

摘要

Accurate judgment and identification of polyp size is crucial in endoscopic diagnosis. However, the indistinct boundaries of polyps lead to missegmentation and missed cancer diagnoses. In this paper, a prompt-based polyp segmentation method (PPSM) is proposed to assist in early-stage cancer diagnosis during endoscopy. It combines endoscopists’ experience and artificial intelligence technology. Firstly, a prompt-based polyp segmentation network (PPSN) is presented, which contains the prompt encoding module (PEM), the feature extraction encoding module (FEEM), and the mask decoding module (MDM). The PEM encodes prompts to guide the FEEM for feature extracting and the MDM for mask generating. So that PPSN can segment polyps efficiently. Secondly, endoscopists’ ocular attention data (gazes) are used as prompts, which can enhance PPSN’s accuracy for segmenting polyps and obtain prompt data effectively in real-world. To reinforce the PPSN’s stability, non-uniform dot matrix prompts are generated to compensate for frame loss during the eyetracking. Moreover, a data augmentation method based on the segment anything model (SAM) is introduced to enrich the prompt dataset and improve the PPSN’s adaptability. Experiments demonstrate the PPSM’s accuracy and real-time capability. The results from cross-training and cross-testing on four datasets show the PPSM’s generalization. Based on the research results, a disposable electronic endoscope with the real-time auxiliary diagnosis function for early cancer and an image processor have been developed.

在 endoscopic 诊断中，息肉大小的准确判断和识别至关重要。然而，息肉边界模糊会导致分割错误和癌症漏诊。本文提出一种基于提示的息肉分割方法（PPSM），用于辅助 endoscopic 早期癌症诊断。该方法结合了内镜医师的经验和人工智能技术。首先，提出了一种基于提示的息肉分割网络（PPSN），该网络包含提示编码模块（PEM）、特征提取编码模块（FEEM）和掩码解码模块（MDM）。PEM 对提示进行编码，以指导 FEEM 进行特征提取，并指导 MDM 生成掩码，从而使 PPSN 能够高效地分割息肉。其次，将内镜医师的眼动注意力数据（注视点）用作提示，这可以提高 PPSN 分割息肉的准确性，并在现实场景中有效获取提示数据。为了增强 PPSN 的稳定性，生成了非均匀点阵提示，以补偿眼动追踪过程中的帧丢失。此外，引入了一种基于分割一切模型（SAM）的数据增强方法，以丰富提示数据集并提高 PPSN 的适应性。实验证明了 PPSM 的准确性和实时性。在四个数据集上进行的交叉训练和交叉测试结果表明，PPSM 具有良好的泛化能力。基于研究结果，开发了具有早期癌症实时辅助诊断功能的一次性电子内窥镜和图像处理器。

Method

方法

3.1. Overall architecture Fig. 1 is the schematic drawing of PPSM. The prompt-based polyp segmentation network processes the endoscope images guided by endoscopists’ ocular attention prompts. This process generates masks for potential lesion areas. Then, the masks are overlaid on the original endoscopic images for auxiliary diagnosis. The original endoscopic images and the auxiliary diagnosis images are displayed on two medical displayers. Specifically, the PPSM is divided into four parts: data acquisition, data augmentation, polyp segmentation, and auxiliary diagnosis. In the first part, there is a disposable electronic endoscope for capturing images, an eye tracker for capturing the endoscopist’s ocular data, and public datasets. In the second part, there is prompt data augmentation, real-time prompt strategy, prompts, and an endoscope mainframe (image processor). In the third part, there is a prompt-based polyp segmentation network (PPSN). In the fourth part, the original endoscope image and the auxiliary diagnosis image are displayed on two monitors to assist endoscopists. Algorithms and software are all integrated into the endoscope mainframe. PPSN is the core of PPSM. Prompts, prompt data augmentation, real-time prompt strategy, and hardware devices assist PPSN in segmenting polyps. Prompts can be divided into training prompts and diagnostic prompts to complete the corresponding tasks. Training prompts such as circumcircles, polygons, scribbles, and points are generated by the publicly available datasets. To enhance the diversity of training prompts, a SAM-based augmentation method is proposed. This augmentation method employs masks generated by SAM as a novel type of prompt. In practical applications, endoscopists’ ocular attention or non-uniform dot matrices generated based on the probability of polyps appearing are employed as prompts. The disposable electronic endoscope, image processor, and eye-tracking device (eye tracker) with clinical utility are developed. Fig. 2 illustrates the overall architecture of the PPSM, with yellow denoting prompts, purple representing devices, blue signifying modules, red representing output results, and green&gray representing algorithms∖strategies. 3.2. Prompt-based polyp segmentation network The proposed PPSN consists of the PEM, the FEEM, and the MDM. The PEM receives prompts related to polyps as inputs. It encodes prompts and transmits this information to the FEEM and the MDM.

3.1 总体架构图1是PPSM的示意图。基于提示的息肉分割网络在内镜医师眼动注意力提示的引导下处理内窥镜图像，生成潜在病变区域的掩码。随后，这些掩码被叠加到原始内窥镜图像上，用于辅助诊断。原始内窥镜图像和辅助诊断图像分别显示在两台医用显示器上。具体而言，PPSM分为四个部分：数据采集、数据增强、息肉分割和辅助诊断。 1. 数据采集：包括用于捕获图像的一次性电子内窥镜、捕获内镜医师眼动数据的眼动追踪器，以及公共数据集。 2. 数据增强：包含提示数据增强、实时提示策略、提示生成模块和内窥镜主机（图像处理器）。 3. 息肉分割：核心为基于提示的息肉分割网络（PPSN）。 4. 辅助诊断：将原始内窥镜图像和辅助诊断图像显示在两台显示器上，协助内镜医师判断。算法和软件均集成至内窥镜主机中。PPSN是PPSM的核心，提示、提示数据增强、实时提示策略和硬件设备共同辅助PPSN完成息肉分割。提示分为训练提示和诊断提示： - 训练提示（如外接圆、多边形、涂鸦、点）由公开数据集生成； - 为增强多样性，引入基于SAM的数据增强方法，将SAM生成的掩码作为新型提示。 - 实际应用中，采用内镜医师的眼动注意力或基于息肉出现概率生成的非均匀点阵作为诊断提示。图2展示了PPSM的整体架构，其中黄色代表提示，紫色为设备，蓝色为模块，红色为输出结果，绿色和灰色为算法/策略。 ### 3.2 基于提示的息肉分割网络所提出的PPSN由提示编码模块（PEM）、特征提取编码模块（FEEM）和掩码解码模块（MDM）组成： - PEM：接收与息肉相关的提示输入，对其进行编码后将信息传递给FEEM和MDM，引导后续处理。 - FEEM：以原始内窥镜图像为输入，结合PEM的提示提取病变特征。 - MDM：根据FEEM提取的特征和PEM的提示，有针对性地生成息肉掩码。三者通过简洁的信息交互协同工作，确保特征提取和掩码生成过程始终围绕提示导向，提升分割的准确性和效率。

Conclusion

结论

This paper has developed a prompt-based polyp segmentationmethod that demonstrates promising performance in polyp segmentation. Suspicious lesion areas tend to draw more attention fromendoscopists. So the proposed PPSM takes the endoscopist’s ocularattention, non-uniform dot matrix, or polyp features as prompts toassist in early-stage cancer diagnosis during endoscopy. To a certainextent, it addresses the subjectivity of endoscopist-only diagnosis andthe missegmentation issues of AI-only approaches. The proposed PPSNachieves excellent performance and strong adaptability on four datasetssince the PEM encodes the prompts, guiding the FEEM to extractfeatures in a targeted manner and instructing the MDM to generatemasks directionally. Endoscopists’ ocular attention data and the nonuniform dot matrix prompts are incorporated as inputs to the PEM,this addresses the challenge of insufficient prompt data in real-worldscenarios and enhances the stability of practical applications. The dataaugmentation method based on the SAM enriches the prompt dataset.It enhances adaptability and generalization of PPSN. Based on theabove research results, a disposable electronic endoscope with thereal-time auxiliary diagnosis function for early cancer and an imageprocessor have been developed for endoscopy. During the course ofthe experiment, the movement of endoscopists during actual endoscopyreduces the effectiveness of our eye tracking, which in turn affects theperformance of the PPSN. Therefore, developing more stable eye trackers and better eye tracking algorithms is a key direction for optimizingthe PPSM. We plan to extend the capabilities of the PPSM to othermedical tasks, such as assisting the doctors in the detection of cervicalintraepithelial neoplasia, carcinoma of urinary bladder, and respiratorytract lesions. Additionally, we are developing disposable cell (microscopic) colposcopes, disposable electronic bronchoendoscopes, anddisposable electronic pyeloscopes. In the future, using text promptsto train networks with diagnostic capabilities similar to endoscopistsis a promising direction. This method could assist in analyzing lesionlocations and providing accurate diagnoses. Furthermore, recording thepatient’s preoperative information, intraoperative video, postoperativeoutcomes, and follow-up results, and generating a retrospective reportare highly significant in the medical field.

本文提出了一种基于提示的息肉分割方法，该方法在息肉分割任务中表现出了良好的性能。可疑病变区域往往更能吸引内镜医师的注意力，因此所提出的PPSM（提示引导息肉分割模型）将内镜医师的眼动注意力、非均匀点阵或息肉特征作为提示，辅助内镜检查中的早期癌症诊断。在一定程度上，该方法解决了单纯依靠内镜医师诊断的主观性问题，以及仅使用人工智能方法可能出现的误分割问题。 ### 核心方法与性能优势 - PPSN模型架构：通过提示编码模块（PEM）对输入提示进行编码，引导特征增强与提取模块（FEEM）有针对性地提取特征，并指导掩码生成模块（MDM）定向生成分割掩码。这一设计使PPSN在四个数据集上均表现出优异的性能和强适应性。 - 提示数据融合：将内镜医师的眼动注意力数据与非均匀点阵提示作为PEM的输入，解决了现实场景中提示数据不足的挑战，增强了模型在实际应用中的稳定性。 - 数据增强策略：基于SAM（ Segment Anything Model）的掩码生成方法丰富了提示数据集，进一步提升了PPSN的适应性和泛化能力。 ### 硬件开发与临床应用 - 开发了具有早期癌症实时辅助诊断功能的一次性电子内窥镜及配套图像处理器。内镜检查过程中，内镜医师的操作移动会降低眼动追踪的有效性，进而影响PPSN性能。因此，研发更稳定的眼动追踪设备和算法是优化PPSM的关键方向。 ### 未来研究方向 1. 任务扩展：计划将PPSM的能力拓展至其他医疗任务，如辅助检测宫颈上皮内瘤变、膀胱癌、呼吸道病变等。 2. 设备开发：正在研发一次性细胞（显微）阴道镜、一次性电子支气管镜和一次性电子肾盂镜。 3. 技术升级：探索使用文本提示训练具有类似内镜医师诊断能力的网络，辅助分析病变位置并提供准确诊断。 4. 数据整合：记录患者的术前信息、术中视频、术后结果和随访数据，生成回顾性报告，这在医疗领域具有重要意义。总结本文提出的基于提示的分割方法为内镜检查中的自动化病变检测提供了新范式，结合眼动数据与AI模型的优势，有望提升早期癌症诊断的准确性和效率。未来研究将进一步优化硬件与算法，并拓展至更广泛的医疗场景。

Figure

图

Fig. 1. Schematic drawing. The prompt-based polyp segmentation network processes the endoscope images guided by endoscopists’ ocular attention prompts. This process generates masks for potential lesion areas. Then, the masks are overlaid on the original endoscopic images for auxiliary diagnosis. The original endoscopic images and the auxiliary diagnosis images are displayed on two medical displayers

图1. 示意图。基于提示的息肉分割网络在内镜医师眼动注意力提示的引导下处理内窥镜图像，生成潜在病变区域的掩码。随后，这些掩码被叠加到原始内窥镜图像上用于辅助诊断。原始内窥镜图像和辅助诊断图像分别显示在两台医用显示器上。

Fig. 2. The architecture of the proposed prompt-based polyp segmentation method. Yellow denotes prompts, purple represents devices, blue signifies modules, red represents output results, and green&gray represent algorithms∖strategies. Specifically, the PPSM is divided into four parts: data acquisition, data augmentation, polyp segmentation, and auxiliary diagnosis. In the first part, there is a disposable electronic endoscope for capturing images, an eye tracker for capturing the endoscopist’s ocular data, and public datasets. In the second part, there is prompt data augmentation, real-time prompt strategy, prompts, and an endoscope mainframe (image processor). In the third part, there is a prompt-based polyp segmentation network (PPSN). In the fourth part, the original endoscope image and the auxiliary diagnosis image are displayed on two monitors to assist endoscopists. Algorithms and software are all integrated into the endoscope mainframe. PPSN is the core of PPSM. Prompts, prompt data augmentation, real-time prompt strategy, and hardware devices assist PPSN in segmenting polyps. Prompts can be divided into training prompts and diagnostic prompts to complete the corresponding tasks. Training prompts such as circumcircles, polygons, scribbles, and points are generated by the publicly available datasets. To enhance the diversity of training prompts, a SAM-based augmentation method is proposed. This augmentation method employs masks generated by SAM as a novel type of prompt. In practical applications, endoscopists’ ocular attention or non-uniform dot matrices generated based on the probability of polyps appearing are employed as prompts. The disposable electronic endoscope, image processor, and eye-tracking device (eye tracker) with clinical utility are developed.

图2. 所提出的基于提示的息肉分割方法架构图。黄色表示提示，紫色代表设备，蓝色象征模块，红色代表输出结果，绿色和灰色代表算法/策略。具体而言，PPSM分为四个部分：数据采集、数据增强、息肉分割和辅助诊断。第一部分包括用于捕获图像的一次性电子内窥镜、用于捕获内镜医师眼动数据的眼动追踪器以及公共数据集。第二部分包含提示数据增强、实时提示策略、提示和内窥镜主机（图像处理器）。第三部分为基于提示的息肉分割网络（PPSN）。第四部分将原始内窥镜图像和辅助诊断图像显示在两台显示器上，以协助内镜医师。算法和软件均集成到内窥镜主机中。PPSN是PPSM的核心，提示、提示数据增强、实时提示策略和硬件设备辅助PPSN进行息肉分割。提示可分为训练提示和诊断提示以完成相应任务，训练提示（如外接圆、多边形、涂鸦和点）由公开可用的数据集生成。为增强训练提示的多样性，提出了一种基于SAM的数据增强方法，该方法将SAM生成的掩码用作新型提示。在实际应用中，采用内镜医师的眼动注意力或基于息肉出现概率生成的非均匀点阵作为提示。此外，还开发了具有临床实用性的一次性电子内窥镜、图像处理器和眼动追踪设备。

Fig. 3. The framework of the prompt polyp segmentation network. The proposed PPSN consists of the PEM, the FEEM, and the MDM. The PEM receives prompts related to polyps as inputs. It encodes prompts and transmits this information to the FEEM and the MDM. There are uncomplicated information interactions among the PEM, the FEEM, and the MDM. The FEEM takes the original images captured by the endoscope as its input. It extracts features from the original image with the prompts from the PEM. Finally, the MDM purposefully generates the masks according to the features from the FEEM and the prompts from the PEM. The PEM is designed to guide the FEEM towards purposeful feature extraction and the MDM towards mask generation.

图3. 提示息肉分割网络框架图。所提出的PPSN由PEM（提示编码模块）、FEEM（特征提取编码模块）和MDM（掩码解码模块）组成。PEM接收与息肉相关的提示作为输入，对提示进行编码后将信息传递给FEEM和MDM。PEM、FEEM和MDM之间存在简洁的信息交互。FEEM以内窥镜捕获的原始图像为输入，结合PEM提供的提示从原始图像中提取特征。最后，MDM根据FEEM提取的特征和PEM的提示有针对性地生成掩码。PEM的设计旨在引导FEEM进行有目的的特征提取，并引导MDM生成掩码。

Fig. 4. The framework of the prompt encoding module. The input of the PEM is a prompt tensor with dimensions ?? × ?? × ?? = 352 × 352 × 5, specifically related to polyps. The PEM consists of convolutional layers, normalization process layers, activation function layers, and pooling layers. The outputs at different layers of the PEM are ??1 , ??2 , and ??3 , and transmitted to the FEEM and the MDM.

图4. 提示编码模块（PEM）框架图 PEM的输入为与息肉相关的提示张量，尺寸为??×??×??=352×352×5。该模块由卷积层、归一化层、激活函数层和池化层组成。PEM不同层级的输出分别为??1、??2和??3，并传输至特征增强与提取模块（FEEM）和掩码生成模块（MDM）。

Fig. 5. The final state of the prompt dataset. The dataset consists of two parts: training and practice. The prompts for the PEM differ across stages. The training part includes circumcircle prompt, polygon prompt, scribble prompt, point prompt and SAM’s mask result prompts. The SAM’s mask result prompts are generated according to SAM results and the criteria of Table 1. Other prompts are generated according to ground truth. The practical part consists of prompts generated based on endoscopists’ ocular attention.

图5. 提示数据集的最终状态。该数据集分为训练和实际应用两部分，不同阶段用于PEM（提示编码模块）的提示类型不同： - 训练部分：包括外接圆提示、多边形提示、涂鸦提示、点提示和SAM掩码结果提示。其中，SAM掩码结果提示根据SAM分割结果和表1的筛选标准生成，其他提示基于数据集标注真值（ground truth）生成。 - 实际应用部分：由内镜医师的眼动注意力数据生成的提示构成，直接用于实时诊断。

Fig. 6. The types of prompts. Original: Original endoscopic image. (a): Circumcircle prompt. (b): Polygon prompt. (c): Scribble prompt. (d): Point prompt. (e): SAM’s mask result prompt. (f): Endoscopists’ ocular attention prompt. (g): Non-uniform dot matrix prompt.

图6. 提示类型示例图。 - 原始图：原始内窥镜图像。 - (a) 外接圆提示：围绕息肉的外接圆，引入随机函数增加多样性。 - (b) 多边形提示：息肉的最大包围多边形，添加随机偏移使顶点和边缘更随机。 - (c) 涂鸦提示：用曲线勾勒息肉前景（内部）和背景（外部），区分病变与正常组织。 - (d) 点提示：息肉内部任意点（非中心点），指示可疑区域核心。 - (e) SAM掩码结果提示：通过SAM模型生成的掩码转换为提示，包含正负样本（如息肉内/外区域）。 - (f) 内镜医师眼动注意力提示：实时捕捉医师注视点，映射为图像中的关注区域。 - (g) 非均匀点阵提示：基于息肉出现概率分布生成的点阵，补偿眼动追踪中的帧丢失，增强稳定性。

Fig. 7. The framework of the feature extraction encoding module. The input of the FEEM is an original endoscopic image tensor with dimensions ?? ×?? ×?? = 352×352×3. The FEEM comprises convolutional layers, normalization process layers, activation function layers, and pooling layers. Outputs from different layers of the PEM, ??1 , ??2 , and ??3 , are concatenated at various layers of the FEEM. The output of the FEEM is then transmitted to the MDM.

图7. 特征提取编码模块框架图。FEEM的输入是尺寸为??×??×??=352×352×3的原始内窥镜图像张量。该模块由卷积层、归一化处理层、激活函数层和池化层组成。来自PEM不同层的输出??1、??2和??3，会在FEEM的各层中与图像特征图进行拼接融合。FEEM最终提取的特征张量将传输至MDM（掩码解码模块）用于后续处理。

Fig. 8. The framework of the mask decoder module. The input of the MDM is the output of the FEEM. The input is first convolved and then concatenated with ??3 from the PEM, followed by further convolution, upsampling, and an additional convolution. The intermediate output is substituted by then combined with ??2 using the same process. The upsampling operation, which is applied after concatenation with ??1 , is substituted by a convolution operation to generate the mask. The output of the MDM is a mask.

图8. 掩码解码模块框架图。MDM的输入为FEEM的输出特征张量。输入首先经过卷积处理，然后与PEM的??3输出进行拼接，随后进一步卷积和上采样，再通过额外的卷积操作生成中间结果。接着，以相同流程将中间输出与PEM的??2拼接并处理。最后，在与PEM的??1拼接后，通过卷积操作（而非传统上采样）生成最终掩码。MDM的输出即为息肉分割掩码。

Fig. 9. Segmentation results of SAM on landscape images. The SAM showcases its excellent capability to segment objects in natural landscape images. While the SAM cannot be directly applied to endoscopic scenes, the masks it produces can serve as a novel type of prompt. A SAM-Based data augmentation method is proposed to enrich the prompt dataset.

图9. SAM在自然景观图像上的分割结果。SAM展示了其在自然景观图像中分割物体的卓越能力。尽管SAM无法直接应用于内窥镜场景，但其生成的掩码可作为一种新型提示。为此，研究提出了一种基于SAM的数据增强方法，以丰富提示数据集。

Fig. 11. The areas that endoscopists focus on when viewing the endoscopic images. The first row images are original endoscopic images. The images in the second row highlight the areas that endoscopists focus on.

图11. 内镜医师观察内窥镜图像时的关注区域示意图第一行：原始内窥镜图像；第二行：高亮显示内镜医师的眼动关注区域（如息肉或可疑病变部位）。

Fig. 12. The hardware equipment used for the experiment. A disposable electronic endoscope, an image processor, and an eye-tracking device (eye tracker) are developed. The disposable electronic endoscope collects image signals and transmits them to the image processor. The eye tracker captures the endoscopist’s ocular data and transmits them to the image processor. The image processor processes the image signals and the endoscopist’s ocular data, and transmits the original endoscopic image and the auxiliary image to two displays, respectively.

图12. 实验用硬件设备图。开发了一次性电子内窥镜、图像处理器和眼动追踪设备（眼动仪）。一次性电子内窥镜采集图像信号并传输至图像处理器，眼动仪捕获内镜医师的眼动数据并传输至图像处理器。图像处理器处理图像信号和眼动数据后，将原始内窥镜图像和辅助诊断图像分别传输至两台显示器。

Table

表

Table 1 Screening criteria for SAM-generated masks. The SAM-generated masks are divided into four types: (1) inside the GT, (2) outside the GT, (3) across the GT, and (4) containing and contained. The first and second types of masks are retained. The third type of masks are deleted. The internal masks of fourth type are retained.

表1. SAM生成掩码的筛选标准 SAM生成的掩码分为四种类型：(1) 位于真值（GT）内，(2) 位于真值外，(3) 跨越真值，(4) 包含与被包含关系。 - 保留：类型(1)和(2)的掩码（完全在真值内或外）。 - 删除：类型(3)的掩码（同时包含真值和背景区域，易引入混淆）。 - 处理类型(4)：保留内部掩码，删除外部掩码（确保提示的准确性）。

Table 2 The dot matrix prompts. The dot matrix prompts are generated according to the frequency of polyps to enhance the network’s stability in the absence of endoscopist’s ocular data. The first row images are heatmaps of the frequency of polyps in different datasets. The pictures in the other rows represent the dot matrix prompts generated at different sampling frequencies.

表2. 点阵提示点阵提示根据息肉出现频率生成，以增强在缺乏内镜医师眼动数据时网络的稳定性。第一行图像为不同数据集中息肉出现频率的热图，其他行图片为不同采样频率下生成的点阵提示。

Table 3 Statistics and characteristics of the enhanced datasets. ✓: Include, ▴: Generate. Due to the varying image sizes across the datasets, all images are resized to 400 × 400 pixels, and the pixel data are subsequently processed and analyzed.

表3. 增强数据集的统计信息与特性 ✓：包含，▴：生成。由于各数据集的图像尺寸不同，所有图像均调整为400×400像素，随后对像素数据进行处理和分析。