论文网址:CLIP in medical imaging: A survey - ScienceDirect
项目页面:github.com
英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用
目录
1. 心得
2. 论文逐段精读
2.1. Abstract
2.2. Introduction
2.3. Background
2.3.1. Contrastive language-image pre-training
2.3.2. Variants of CLIP
2.3.3. Medical image–text dataset
2.4. CLIP in medical image–text pre-training
2.4.1. Challenges of CLIP pre-training
2.4.2. Multi-scale contrast
2.4.3. Data-efficient contrast
2.4.4. Explicit knowledge enhancement
2.4.5. Others
2.4.6. Summary
2.5. CLIP-driven applications
2.5.1. Classification
2.5.2. Dense prediction
2.5.3. Cross-modal tasks
2.5.4. Summary
2.6. Comparative analysis
2.7. Discussions and future directions
2.8. Conclusion
1. 心得
(1)我这可能只记录这篇文章比较不同的地方,基础CLIP和医学影像就不记录了,可以参考原文。主要是太长了没必要全搬运
(2)怎么全文画图风格还不一样,每个人画一张拼的?
(3)偏记录一点,介绍了不同的特别多模型
2. 论文逐段精读
2.1. Abstract
①就说CLIP在医学成像领域有意义然后要探索一下
2.2. Introduction
①Limitations: poor performance on out-of-distribution performance
②The trend of CLIP relevant papers (left) and medical image contained in thosed papers (right):
③How CLIP be used:
2.3. Background
2.3.1. Contrastive language-image pre-training
①How CLIP works(如果没看过可以去找CLIP原文,很清晰易懂的):
②Performance of CLIP in medical field:
2.3.2. Variants of CLIP
①介绍了一些变体,但因为没画图很难记住或者一眼知道有啥区别
2.3.3. Medical image–text dataset
①Open medical dataset:
2.4. CLIP in medical image–text pre-training
①Representative CILP based medical models:
2.4.1. Challenges of CLIP pre-training
①Challenges of CLIP in medical image field:
Modality-influenced, local and global image/text analysis needed |
Scarse data(不是说零样本泛化性都很好了吗为什么又说数据稀缺 |
Need professional kownledge |
2.4.2. Multi-scale contrast
①GLoRIA matches text with subgraph:
②LoVT further assigns different weights on different sentence
2.4.3. Data-efficient contrast
①Blindly push all negative pairs away might reduce the relevance of similar disease:
②Add description or shuffle sentences
③Using medical image video
2.4.4. Explicit knowledge enhancement
①Combined with graph or kownledge graph(KG):
2.4.5. Others
~
2.4.6. Summary
~
2.5. CLIP-driven applications
2.5.1. Classification
①CLIP based models on image classification:
(1)Zero-shot classification
①Diagnosis example(我靠还能这样,,做二分类):
②How Xplainer works(我靠牛呗啊CLIP现在都酱紫玩的):
(2)Context optimization
①Example of context optimization:
这没什么解释,不能让人快速上手啊哈哈
2.5.2. Dense prediction
①Methods:
(1)Detection
①Lists relevant models
(2)2D medical image segmentation
①fine tune CLIP to 2D medical image dataset
(3)3D medical image segmentation
①Examples:
(4)Others
2.5.3. Cross-modal tasks
①Repesentitive models:
(1)Generation
①Automatically generate medical report or medical image
(2)Medical visual question answering
①Example(这构造奇奇怪怪的):
(3)Image–text retrieval
①Current models focus on global image feature
②X-TRA:
2.5.4. Summary
~
2.6. Comparative analysis
①How Multi-modality Large Language Model (MLLM) different from CLIP:
②Performance of CLIP on different image sets:
2.7. Discussions and future directions
①Inter-disease similarity:
②Challenges: inconsistency between pre-training and application, incomprehensive evaluation of refined pre-training, challenges of volumetric imaging, limited scope of refined CLIP pre-training, debiasing in CLIP Models, enhancing adversarial robustness of CLIP, exploring the potential of metadata, incorporation of high-order correlations, beyond image–text alignment
2.8. Conclusion
~