UniSAL：用于组织病理学图像分类的统一半监督主动学习方法|文献速递-深度学习医疗AI最新文献

Title

题目

UniSAL: Unified Semi-supervised Active Learning for histopathologicalimage classification

UniSAL：用于组织病理学图像分类的统一半监督主动学习方法

文献速递介绍

组织病理学图像在癌症筛查、诊断及治疗决策中起着关键作用，有助于提高患者生存率（Schiffman 等人，2015）。全玻片图像（WSIs）包含丰富的组织病理学信息，是临床实践中的常规使用数据。尤其是针对WSIs的组织病理学图像分类，能为病理学家提供更准确、高效且客观的诊断支持。例如，在结直肠癌和肺癌领域（Chen 等人，2021a；Yamashita 等人，2021），自动化分类模型可辅助识别癌性区域，促进早期检测并支持精准治疗规划，从而改善患者预后。病理检查作为癌症早期检测的金标准，依赖病理学家的视觉观察来判断组织微环境和肿瘤进展程度，因此组织病理学图像中不同组织类型的自动识别是数字病理学的重要组成部分。近年来，卷积神经网络（CNNs）和视觉Transformer（ViTs）等深度神经网络在组织病理学图像分类中展现出优异性能（Araújo 等人，2017；Shao 等人，2021；Lu 等人，2021）。然而，训练高性能网络依赖大规模标注训练集。对于组织病理学图像而言，由于图像尺寸极大且需要专业知识，标注过程耗时耗力，因此亟需探索更高效的标注方法以减少人工投入。在此背景下，如何在有限标注预算下提升分类性能，成为数字病理学领域的研究热点。为降低标注成本并提高效率，主动学习（AL）（Wang 等人，2023b）通过算法从无标注数据池中选择最具价值的样本进行标注，逐渐受到关注。现有AL方法通常基于模型不确定性估计（Beluch 等人，2018；Wang 等人，2016；Yoo 和 Kweon，2019；Chakrabarty 等人，2020；Zhang 等人，2022）、基于代表性的选择规则（Sener 和 Savarese，2018；Wan 等人，2021；Qu 等人，2023）或两者结合的策略（Wang 等人，2023c；Lughofer，2012；Wu 等人，2021）来选择待标注样本。基于不确定性的方法通常通过训练模型的预测结果（如预测熵）评估样本不确定性，但单一模型易产生过度自信的预测（Mehrtash 等人，2020），且未考虑样本间的相似性，导致选择最有价值样本的能力有限，可能引入冗余。例如，两个高度相似的样本若不确定性均较高，可能被同时选中，但标注两者带来的信息增益与仅标注其一差异甚微，造成标注预算浪费。基于代表性的方法（Sener 和 Savarese，2018；Yang 等人，2015）通过聚类或距离度量获取训练集的核心样本，在给定样本数量预算下保证多样性，减少所选样本的冗余，但可能忽略样本的信息丰富性。此外，现有AL方法常仅使用标注图像训练模型，这不仅因标注集规模小导致过拟合（Gao 等人，2020），且模型在未见过完整数据集的情况下，样本选择能力受限（Gao 等人，2020；Zhang 等人，2022）。近期，部分研究（Sinha 等人，2019；Gaillochet 等人，2022；Mahmood 等人，2022；Gao 等人，2020；Zhang 等人，2022）尝试利用无标注图像优化特征学习以辅助主动学习，但这些方法要么依赖与任务特定模型无关的额外特征提取器（Sinha 等人，2019），要么仅考虑模型不确定性（Gaillochet 等人，2022；Gao 等人，2020）或代表性（Mahmood 等人，2022）中的单一维度，限制了所选样本对任务特定深度学习模型的有效性。为解决上述问题，我们提出统一半监督主动学习（UniSAL）框架，利用无标注图像同步进行模型训练和最有价值样本选择，以在有限标注预算下高效训练高性能模型。具体而言：首先，UniSAL在统一的双网络架构中融合主动学习与半监督学习（SSL），充分利用标注和无标注数据。与传统方法（Zhang 等人，2022；Wang 等人，2023c）将AL和SSL视为独立阶段不同，我们的方法实现了两者的协同交互——AL驱动最具信息性样本的选择，而SSL通过伪标签训练和对比学习提升特征表示能力，形成 mutual enhancement。其次，在模型训练方面，我们提出双视图高置信度伪训练（DHPT），该方法同时利用标注和无标注数据：两个并行网络相互生成互补的高置信度伪标签，增强模型从未标注数据中学习的能力；进一步引入伪标签引导的类内对比学习策略，主动推动不同类别的样本在潜在特征空间中充分分离，既强化模型的判别能力，又确保学习过程更鲁棒，提升无标注数据的利用率。第三，训练后的网络通过网络间分歧捕捉不确定性，高效检测信息丰富的样本，并在分离良好的潜在特征空间中识别代表性样本，确保所选样本兼具不确定性和高信息性。最终，该方法可检索到少量但关键的查询样本，以最小标注成本优化模型性能。本文的主要贡献包括： • 提出适用于组织病理学图像分类的统一半监督主动学习框架UniSAL，通过DHPT实现模型训练与样本选择的同步进行。 • 提出基于双视图分歧的不确定性选择方法，高效获取每轮训练中对模型最具信息性的低置信度样本。 • 提出伪标签引导的类内对比学习策略，并以低额外计算成本集成到DHPT中，学习分离良好的特征空间，辅助代表性样本选择。在三个公共病理图像数据集上的实验结果表明，UniSAL可将标注成本降低至约10%，同时保持与全标注相当的性能。单次查询后，模型精度较随机选择提升约20个百分点，在不同标注预算下均超越多种最先进的主动学习方法。

Abatract

摘要

Histopathological image classification using deep learning is crucial for accurate and efficient cancer diagnosis.However, annotating a large amount of histopathological images for training is costly and time-consuming,leading to a scarcity of available labeled data for training deep neural networks. To reduce human effortsand improve efficiency for annotation, we propose a Unified Semi-supervised Active Learning framework(UniSAL) that effectively selects informative and representative samples for annotation. First, unlike mostexisting active learning methods that only train from labeled samples in each round, dual-view high-confidencepseudo training is proposed to utilize both labeled and unlabeled images to train a model for selecting querysamples, where two networks operating on different augmented versions of an input image provide diversepseudo labels for each other, and pseudo label-guided class-wise contrastive learning is introduced to obtainbetter feature representations for effective sample selection. Second, based on the trained model at each round,we design novel uncertain and representative sample selection strategy. It contains a Disagreement-awareUncertainty Selector (DUS) to select informative uncertain samples with inconsistent predictions between thetwo networks, and a Compact Selector (CS) to remove redundancy of selected samples. We extensively evaluateour method on three public pathological image classification datasets, i.e., CRC5000, Chaoyang and CRC100Kdatasets, and the results demonstrate that our UniSAL significantly surpasses several state-of-the-art activelearning methods, and reduces the annotation cost to around 10% to achieve a performance comparable tofull annotation.

基于深度学习的组织病理学图像分类对癌症的准确高效诊断至关重要。然而，标注大量组织病理学图像用于训练既昂贵又耗时，导致可用于训练深度神经网络的标注数据匮乏。为减少人工工作量并提高标注效率，我们提出了统一半监督主动学习框架（UniSAL），该框架可有效选择信息丰富且具代表性的样本进行标注。首先，不同于大多数现有主动学习方法仅在每轮中从标注样本训练，我们提出双视图高置信度伪训练，利用标注和未标注图像训练模型以选择查询样本——两个网络对输入图像的不同增强版本进行处理，相互提供多样化伪标签，并引入伪标签引导的类内对比学习，以获取更优特征表示用于有效样本选择。其次，基于每轮训练的模型，我们设计了新颖的不确定性与代表性样本选择策略，包含分歧感知不确定性选择器（DUS）以选择两网络预测不一致的信息丰富不确定性样本，以及紧凑选择器（CS）以消除所选样本的冗余。我们在三个公共病理图像分类数据集（CRC5000、Chaoyang和CRC100K）上进行了广泛评估，结果表明UniSAL显著超越了多种最先进的主动学习方法，并将标注成本降低至约10%，即可实现与全标注相当的性能。

Method

方法

As illustrated in Fig. 1, the proposed UniSAL framework for efficienthistopathological image annotation consists of two parts: (1) A Dualview High-confidence Pseudo Training (DHPT) paradigm for leveraginglabeled and unlabeled data for model training and feature learning; (2)A novel Uncertain and Representative Sample Selection (URS) moduleto select the most informative and representative samples for queryingbased on the trained models and learned features at each round.

如图1所示，所提出的用于高效组织病理学图像标注的UniSAL框架包含两个部分：（1）双视图高置信度伪训练（DHPT）范式，用于利用标注和未标注数据进行模型训练与特征学习；（2）新型不确定与代表性样本选择（URS）模块，用于基于每轮训练的模型及学习到的特征，筛选最具信息性和代表性的样本进行查询。

Conclusion

结论

Compared to existing active learning methods (Sener and Savarese,2018; Gal et al., 2017; Jin et al., 2022), the key distinction of UniSALlies in its unified framework of active learning (AL) and semi-supervisedlearning (SSL). Existing approaches that combine AL and SSL typicallytreat these tasks as separate stages or focus on one aspect at a time.For example, BoostMIS (Zhang et al., 2022) uses FixMatch (Sohn et al.,during the training phase to incorporate semi-supervised learning, while in the selection phase, it perturbs features to identify uncertain samples for annotation. These two stages operate independently,without direct interaction or mutual enhancement. In contrast, UniSALinnovatively integrates AL and SSL within a unified dual-network architecture, enabling their synergistic interaction to enhance performance.Specifically, compared with existing works, UniSAL has different mechanism and utilization in terms of dual network architecture design,contrastive learning and sample selection.

First, the dual network architecture plays a crucial role in effectively utilizing unlabeled data and enhancing sample selection. Thoughmethods such as CPS (Chen et al., 2021b) and DivideMix (Li et al.,employ a dual network architecture for pseudo label training andnoisy label learning, respectively, they are not leveraged for sampleselection. In contrast, the dual networks in DHPT are combined withcontrastive feature learning, aiding representative sample selection.Additionally, the discrepancy between the two networks is naturallyleveraged for uncertain sample selection, seamlessly integrating AL andSSL within a unified framework. Although BoostMIS (Zhang et al.,and Scribble2Label (Lee and Jeong, 2020) use thresholds toobtain confident pseudo-labels, they rely on EMA to update the teachermodel, with gradients only backpropagating to the student model. Incontrast, our approach employs a dual-network architecture whereboth networks are on equal footing, mutually updating each otherusing high-quality pseudo-labels simultaneously. Besides, the two networks in our method are symmetric, where their discrepancy can wellrepresent uncertain samples, while the discrepancy in the asymmetric teacher-student architecture usually shows unreliable part of thestudent.Second, UniSAL regularizes the feature space through pseudo-labelguided class-wise contrastive learning, ensuring that the selected samples are more representative of the underlying data distribution. WhileCCSSL (Yang et al., 2022) uses class-wise contrastive learning to mitigate confirmation bias and enhance robustness against noisy or outof-distribution labels, our approach integrates this technique within aunified AL-SSL framework. Instead of focusing on noise reduction, ourmethod aims to improve feature separability, thereby enabling moreeffective and representative sample selection. This integration of classwise contrastive learning within active learning enhances the qualityof the selected samples, making the selection process more efficient.Thirdly, during the sample selection phase, unlike methods (Senerand Savarese, 2018; Gal et al., 2017) that focus solely on uncertainty orrepresentativeness, UniSAL combines disagreement-aware uncertaintybased selection with compactness-based selection to address both aspects simultaneously, ensuring the acquisition of more valuable samples. Although a similar concept exists in Query-by-Committee (QBC)(Settles, 2009), our approach improves upon it by incorporating random data augmentations before calculating disagreement. This augmentation not only captures a broader range of uncertainty but alsoenhances the selection of informative samples, making the process moreeffective than traditional QBC methods.The experiments show UniSAL’s notable performance gains, yet itfaces some challenges. First, UniSAL increases the computational costfor training and inference due to the use of two networks. However,due to the bias of a single network, the mutual supervision betweentwo networks can achieve more robust results, and the inter-networkdisagreement can be well leveraged to filter out unreliable predictions.Another issue not addressed in this paper is the cold start problem inactive learning. At the initial stage where all the training images areunannotated, we simply randomly selected a small subset (e.g., 1%)for annotation to start to train the networks. However, randomly selection may not be optimal for identifying the most valuable samplesin the first query round (Liu et al., 2023). A potential solution isto leverage unsupervised or self-supervised training with the entireunlabeled dataset, or models pre-trained on other datasets (Chen et al.,2024; Vorontsov et al., 2024), to obtain a feature representation of thesamples, which helps to identify representative ones for querying beforeany annotations.In summary, we propose a novel semi-supervised active learning algorithm UniSAL. A dual-view high-confidence pseudo training strategyis proposed to leverage both labeled and unlabeled samples, which notonly improves the model performance but also helps effective sampleselection for querying. The disagreement between two networks isused to select uncertain samples, and a well-separated feature spacebased on pseudo label-guided contrastive learning helps selecting representative samples, respectively. Results on three public pathologicalimage classification datasets demonstrate that our method reducesannotation costs to only 10% of the full dataset while maintainingperformance comparable to fully annotated models, and it significantlyoutperforms state-of-the-art active learning approaches. Our proposedUniSAL shows strong potential for clinical deployment, as it enables thetraining of high-performing models with limited annotated data. Thiscapability not only streamlines the diagnostic workflow by reducing thedependency on scarce expert annotations but also accelerates the development of robust, clinically applicable systems. In the forthcomingresearch, it is of interest to extend our UniSAL for pathological imagesegmentation and object detection tasks, and explore cold-start activelearning for better selecting the initial set of samples to be annotated.

与现有主动学习方法（Sener和Savarese，2018；Gal等人，2017；Jin等人，2022）相比，UniSAL的核心区别在于其将主动学习（AL）与半监督学习（SSL）整合到统一框架中。现有结合AL和SSL的方法通常将二者视为独立阶段，或一次仅聚焦于其中一个方面。例如，BoostMIS（Zhang等人，2022）在训练阶段采用FixMatch（Sohn等人，2020）融入半监督学习，而在样本选择阶段则通过扰动特征来识别需标注的不确定性样本。这两个阶段独立运行，缺乏直接交互或相互增强机制。相比之下，UniSAL创新性地在统一双网络架构中集成AL与SSL，通过二者的协同交互提升性能。具体而言，与现有研究相比，UniSAL在以下方面展现出不同的机制与应用逻辑： 1. 双网络架构设计：通过并行网络（f_A与f_B）的协同训练，利用双视图高置信度伪训练（DHPT）实现标注数据与未标注数据的联合学习，而非仅依赖单一模型或独立阶段。 2. 对比学习机制：引入伪标签引导的类内对比学习策略，推动不同类别样本在特征空间中充分分离，强化模型判别能力，而现有方法多未结合此类特征优化策略。 3. 样本选择策略：通过分歧感知不确定性选择器（DUS）与紧凑性选择器（CS）的结合，同步考虑模型预测分歧与特征空间代表性，避免冗余样本选择，而传统方法常单独依赖不确定性或代表性单一维度。这种一体化设计使UniSAL能够在有限标注预算下更高效地学习特征表示，并精准筛选高价值样本，实现性能超越。

Figure

图

Fig. 1. Illustration of our UniSAL for active learning. In each round of querying, the labeled and unlabeled images are used by Dual-view High-confidence Pseudo Training(DHPT) between two parallel networks (𝑓𝐴 and 𝑓𝐵 ) to enhance the model’s performance, and contrastive learning is used to obtain well-separated feature representations. Then,a Disagreement-aware Uncertainty Selector (DUS) and a Compact Selector (CS) based on the predictions and features from the two networks respectively are used to select themost uncertain and representative samples.

图1. 主动学习框架UniSAL示意图。在每轮查询中，双并行网络（𝑓𝐴和𝑓𝐵）通过双视图高置信度伪训练（DHPT）同时利用标注与未标注图像提升模型性能，并结合对比学习获取可分离性强的特征表示。随后，基于两网络的预测结果与特征输出，分别通过分歧感知不确定性选择器（DUS）和紧凑性选择器（CS）筛选最具不确定性与代表性的样本。

Fig. 2. Accuracy (%) obtained by different AL methods with 9 query rounds on theCRC5000 dataset. The shaded area represents the standard deviation over 5 runs.

图2. 不同主动学习（AL）方法在CRC5000数据集上经9轮查询后的准确率（%）。阴影区域表示5次独立实验的标准差。

Fig. 3. Comparison of effectiveness of sample selection between TAAL (Gaillochetet al., 2022) and UniSAL after the first query. Initial refers to the initial labeled setbased on random selection. Note that class 5 has a low sample number in the initialset, leading to low recall of that class in the initial model. Our method effectivelyfetches more samples of class 5 to query (a), which effectively improves its recall afterthe first query round (b).

图3. TAAL（Gaillochet等人，2022）与UniSAL在首次查询后样本选择有效性对比。“Initial”表示基于随机选择的初始标注集。注意：初始集中第5类样本数量较少，导致初始模型中该类召回率较低。我们的方法在（a）中有效选取更多第5类样本进行查询，使首次查询后（b）该类召回率显著提升。

Fig. 4. Visual Comparison of selected samples between random selection, TAAL (Gaillochet et al., 2022) and UniSAL after the first query on the CRC5000 dataset. (a) shows thesimilarity matrix of the query batch based on cosine similarity in the feature space, where blue and red color denote low and high similarity values, respectively. (b) shows somesamples in the query batch, where dashed blue rectangles highlight similar images

图4. 在CRC5000数据集上首次查询后，随机选择、TAAL（Gaillochet等人，2022）和UniSAL所选样本的可视化对比。(a) 展示了基于特征空间余弦相似度的查询批次样本相似性矩阵，其中蓝色和红色分别表示低和高相似度值。(b) 展示了查询批次中的部分样本，蓝色虚线框标注了相似图像。

Fig. 5. Comparison with state-of-the-art AL and SSL methods on the Chaoyang dataset. The query batch size is 2% of the training set (80). Note that before the first query (round0), our method already outperforms the others by learning from both labeled and unlabeled images. The shaded area represents the standard deviation over 5 runs

图5. 在朝阳数据集上与最先进的主动学习（AL）和半监督学习（SSL）方法的对比。每轮查询批量为训练集的2%（80个样本）。请注意，在首次查询前（第0轮），我们的方法通过同时学习标注和未标注图像，性能已超越其他方法。阴影区域表示5次独立实验的标准差。

Fig. 6. Accuracy (%) obtained by different AL methods with 4 query rounds on theCRC100K dataset. The shaded area represents the standard deviation over 5 runs.

图6.不同主动学习（AL）方法在CRC100K数据集上经4轮查询后的准确率（%）。阴影区域表示5次独立实验的标准差。

Fig. 7. Ablation study of the proposed modules on the CRC5000 dataset. Baselinemeans supervised learning from the annotated samples in each query round, and upperbound refers to fully supervised learning with the entire training set being annotated.

图7. 在CRC5000数据集上对所提模块的消融研究。基线（Baseline）表示每轮查询中仅从标注样本进行监督学习，而上界（upper bound）指对整个训练集进行全标注的监督学习

Fig. 8. Performance comparison of existing AL methods with and without the proposedDHPT

图8. 现有主动学习（AL）方法在引入与未引入所提双视图高置信度伪训练（DHPT）时的性能对比

Fig. 9. t-SNE visualization of the CRC5000 training set in the feature space obtained by different methods. (b), (c), (d) and (f) are shown for the 9th query

图9. 不同方法在CRC5000训练集特征空间中的t-SNE可视化。(b)、(c)、(d)和(f)展示的是第9轮查询时的结果。

Table

表

Table 1Accuracy (%) obtained by different SSL (first section) and AL (second section) methods with 9 query rounds on the CRC5000 dataset. The query batch size is 1%of the training set (26). The first five AL methods have the same performance before the first query as they only use the labeled set 𝐷𝑙 𝑞 for training.

表1不同半监督学习（SSL，第一部分）与主动学习（AL，第二部分）方法在CRC5000数据集上经9轮查询后的准确率（%）。每轮查询批量为训练集的1%（26个样本）。前五种AL方法在首次查询前性能相同，因其仅使用标注集*𝐷𝑙 𝑞进行训练。