fastGEO v1.7.0 大更新,支持PCA、差异分析、火山图、热图、差异箱线图、去批次等分析

前言

之前一篇文章【fastGEO V1.6.1 这个版本强的可怕,GEO数据自动下载、探针注释、Shiny App】介绍了fastGEO用于GEO数据下载和探针注释的核心功能。

虽然是付费50获取安装包(刚开始是20),但也深受欢迎,说明这个R包的确好用。

尤其是我自己,用了开发的这个R包后做数据挖掘方便了太多太多。

需要的小伙伴可以直接联系我付费50获取,价格只会涨不会跌,购买后永久免费获取最新版本。

也收到了一些用户的反馈,希望加入一些分析功能,其实这些在之前的推文【fastGEO v1.01,快速下载GEO表达谱、差异分析、火山图、热图】也展示过,只不过还没有放在R包里。

所以这一次更新,把PCA、差异分析、火山图、热图、差异箱线图、去批次这些分析通通加到R包里,仅使用我的R包就可以快速完成这些流程。

更新展示

格式懒得改了,可以移步到Gitee里在浏览,或点击【阅读原文】,效果更佳:fastGEO安装和使用教程。

下游分析

get_GEO_group
  • 根据样本信息提取并整理表达谱和分组信息
  • 获取表达谱和分组信息后就可以做一系列的分析了
  • 指定作为分组依据的临床信息表的列名, 及每个分组的匹配模式和分组名称即可
  • 可指定两组或多组
obj = download_GEO("GSE18105", out_dir = "test/00_GEO_data_GSE18105")
Find local annotation file, will be used preferably!
INFO [2025-08-02 15:00:34] Step1: Download GEO data ...
INFO [2025-08-02 15:00:34] Querying dataset: GSE18105 ...- Use local curl- Found 1 GPL- Found 111 samples, 39 metas.- Writting sample_anno to test/00_GEO_data_GSE18105/GSE18105_sample_anno.csv - Normalize between arrays ...- Successed, file save to test/00_GEO_data_GSE18105/GSE18105_GPL570.RData.
INFO [2025-08-02 15:00:55] Step2: Annotate probe GPL570 ...
INFO [2025-08-02 15:00:55] Use built-in annotation file ...
- All porbes matched!
- All porbes annotated!
INFO [2025-08-02 15:00:55] Removing duplicated genes by method: max ...
INFO [2025-08-02 15:01:20] Done.
expM = obj$expM
sample_anno = obj$sample_anno
hd(expM)
dim: 22881 × 111 
A data.frame: 5 × 5
GSM452552GSM452553GSM452554GSM452555GSM452556
<dbl><dbl><dbl><dbl><dbl>
A1BG 3.8841673.8059433.7793923.9612853.980595
A1BG-AS1 4.7556355.0219914.9928245.0026804.811076
A1CF 7.6039068.9070488.0351918.6039417.788007
A2M10.9725957.6647119.7034059.6846169.976007
A2M-AS1 4.3499034.1279304.2403154.5377124.449675
hd(sample_anno)
dim: 111 × 39 
A data.frame: 5 × 5
titlegeo_accessionstatussubmission_datelast_update_date
<chr><chr><chr><chr><chr>
GSM452552patient 100, cancer, LCMGSM452552Public on Feb 04 2010Sep 14 2009Aug 28 2018
GSM452553patient 101, cancer, LCMGSM452553Public on Feb 04 2010Sep 14 2009Aug 28 2018
GSM452554patient 102, cancer, LCMGSM452554Public on Feb 04 2010Sep 14 2009Aug 28 2018
GSM452555patient 103, cancer, LCMGSM452555Public on Feb 04 2010Sep 14 2009Aug 28 2018
GSM452556patient 104, cancer, LCMGSM452556Public on Feb 04 2010Sep 14 2009Aug 28 2018
table_GEO_clinical(obj$sample_anno)
$characteristics_ch1metastasis: metastasis metastasis: metastatic recurrence 26                                18 metastasis: none 67 $characteristics_ch1.1tissue: cancer, homogenized         tissue: cancer, LCM 17                          77 
tissue: normal, homogenized 17 
# 两组
group_list = get_GEO_group(obj, group_name = "characteristics_ch1.1", "cancer, homogenized" = "Cancer", "normal, homogenized" = "Normal")
group = group_list$group 
table(group)
group
Cancer Normal 17     17 
SID = group_list$SID
hd(SID)
dim: 34 
  1. 'GSM452646'
  2. 'GSM452647'
  3. 'GSM452648'
  4. 'GSM452649'
  5. 'GSM452650'
expM = group_list$expM
hd(expM)
dim: 22881 × 34 
A data.frame: 5 × 5
GSM452646GSM452647GSM452648GSM452649GSM452650
<dbl><dbl><dbl><dbl><dbl>
A1BG4.2183754.102014 4.0801644.0171333.958630
A1BG-AS14.8134364.965083 4.8545584.6858864.796381
A1CF8.8662997.978485 8.9470889.3733719.222967
A2M6.5133928.51780811.5186168.2768357.645857
A2M-AS13.8807563.910804 5.1145584.0815893.817199
sample_anno = group_list$sample_anno
hd(sample_anno)
dim: 34 × 39 
A data.frame: 5 × 5
titlegeo_accessionstatussubmission_datelast_update_date
<chr><chr><chr><chr><chr>
GSM452646patient 006, cancer, homogenizedGSM452646Public on Feb 04 2010Sep 14 2009Aug 28 2018
GSM452647patient 011, cancer, homogenizedGSM452647Public on Feb 04 2010Sep 14 2009Aug 28 2018
GSM452648patient 024, cancer, homogenizedGSM452648Public on Feb 04 2010Sep 14 2009Aug 28 2018
GSM452649patient 027, cancer, homogenizedGSM452649Public on Feb 04 2010Sep 14 2009Aug 28 2018
GSM452650patient 028, cancer, homogenizedGSM452650Public on Feb 04 2010Sep 14 2009Aug 28 2018
# 多组
group_list = get_GEO_group(obj, group_name = "characteristics_ch1.1", "cancer, homogenized" = "Cancer", "normal, homogenized" = "Normal","cancer, LCM" = "Cancer_LCM")
table(group_list$group)


Cancer Cancer_LCM Normal
17 77 17

run_PCA
  • 执行PCA分析并绘图
  • 默认添加椭圆、默认不添加样本标签
  • 可输出图片(PDF/PNG), 返回ggplot对象, 可自行修改
group = group_list$group 
expM = group_list$expM
SID = group_list$SID
table(group)
groupCancer Cancer_LCM     Normal 17         77         17 
run_PCA(expM, group, out_name = "./test/00_GEO_data_GSE18105/01_PCA")



# 修改属性
run_PCA(expM, group, title = "PCA Plot", legend.title = "Group", text.size = 30, pt.size = 5, key.size = 5, color = c("red", "green", "blue"))



  • 添加label, 适合样本较少的情况, 可以更清晰地展示离群样本名
  • 这里每组只提取前5个样本展示
SID2 = unlist(lapply(group_list$SID_list, function(x) x[1:5]))
group2 = group[match(SID2, SID)]
expM2 = expM[, SID2]
table(group2)                     
group2Cancer Cancer_LCM     Normal 5          5          5 
run_PCA(expM2, group2, label = TRUE)



run_DEG_limma
  • 快速进行limma差异分析, 适合标准化的芯片数据或高通量数据
  • 可设置阈值, 默认 |log2FC| > 1, Padj < 0.05
  • 可设置不使用P值矫正
  • 可添加基因label, 可自定义基因
DEG_tb = run_DEG_limma(expM, group, Case = "Cancer", Control = "Normal", out_name = "./test/00_GEO_data_GSE18105/02_DEG")
Number of Up regulated genes: 1006 
Number of Down regulated genes: 1096 
Number of Not Change regulated genes: 20779 
head(DEG_tb)
A data.frame: 6 × 7
logFCAveExprtP.Valueadj.P.ValBDEG
<dbl><dbl><dbl><dbl><dbl><dbl><chr>
FOXQ15.7163548.77703013.7020082.117724e-158.075941e-1224.97073Up
CLDN14.6408078.50236919.3303546.543012e-201.497107e-1534.74020Up
LGR54.4854297.35480310.2233196.637827e-121.421583e-0917.14899Up
KRT234.4453036.613692 8.9371071.917386e-101.589992e-0813.84479Up
DPEP14.2603638.24049010.1225208.572521e-121.662270e-0916.89826Up
CEMIP4.1478648.04887712.2184765.487202e-145.879765e-1121.82719Up
# 最低阈值
DEG_tb2 = run_DEG_limma(expM, group, Case = "Cancer", Control = "Normal", log2FC = 0.5, padj = FALSE, out_name = "./test/00_GEO_data_GSE18105/03_DEG")
Number of Up regulated genes: 2969 
Number of Down regulated genes: 2724 
Number of Not Change regulated genes: 17188 
plot_volcano_limma
  • 根据差异表格绘制火山图
group_list = get_GEO_group(obj, group_name = "characteristics_ch1.1", "cancer, homogenized" = "Cancer", "normal, homogenized" = "Normal","cancer, LCM" = "Cancer_LCM")
table(group_list$group)


Cancer Cancer_LCM Normal
17 77 17

group = group_list$group 
expM = group_list$expM
SID = group_list$SID
table(group)
groupCancer Cancer_LCM     Normal 17         77         17 
DEG_tb = run_DEG_limma(expM, group, Case = "Cancer", Control = "Normal", out_name = "./test/00_GEO_data_GSE18105/02_DEG")
Number of Up regulated genes: 1006 
Number of Down regulated genes: 1096 
Number of Not Change regulated genes: 20779 
plot_volcano_limma(DEG_tb, out_name = "./test/00_GEO_data_GSE18105/04_volcano")
[1m[22mScale for [32my[39m is already present.
Adding another scale for [32my[39m, which will replace the existing scale.
Warning message:
"[1m[22mDuplicated aesthetics after name standardisation: [32msegment.colour[39m"

# 显示阈值
plot_volcano_limma(DEG_tb, caption = TRUE)
[1m[22mScale for [32my[39m is already present.
Adding another scale for [32my[39m, which will replace the existing scale.
Warning message:
"[1m[22mDuplicated aesthetics after name standardisation: [32msegment.colour[39m"

# 选择 top logFC 前10的基因进行显示
plot_volcano_limma(DEG_tb, caption = TRUE, method = "logFC")
[1m[22mScale for [32my[39m is already present.
Adding another scale for [32my[39m, which will replace the existing scale.
Warning message:
"[1m[22mDuplicated aesthetics after name standardisation: [32msegment.colour[39m"

# 不显示基因名
plot_volcano_limma(DEG_tb, caption = TRUE, label = FALSE)
[1m[22mScale for [32my[39m is already present.
Adding another scale for [32my[39m, which will replace the existing scale.

# 自定义label基因, 每组随机挑选5个
set.seed(1234)
gene_slect = as.character(aggregate(rownames(DEG_tb), by = list(DEG_tb$DEG), function(x) sample(x, 5))[, 2])
gene_slect
  1. 'CPM'
  2. 'STX16'
  3. 'TTC27'
  4. 'FFAR4'
  5. 'RP11-31K23.2'
  6. 'MCM2'
  7. 'GNA11'
  8. 'PIK3IP1'
  9. 'RFC3'
  10. 'CAPN9'
  11. 'TFPI'
  12. 'ASCC3'
  13. 'DMXL2'
  14. 'GK5'
  15. 'SLC2A1'
plot_volcano_limma(DEG_tb, caption = TRUE, label = gene_slect)
[1m[22mScale for [32my[39m is already present.
Adding another scale for [32my[39m, which will replace the existing scale.
Warning message:
"[1m[22mDuplicated aesthetics after name standardisation: [32msegment.colour[39m"

  • 如果差异分析的阈值发生改变, 这里也要进行一样的设置
# 最低阈值
DEG_tb2 = run_DEG_limma(expM, group, Case = "Cancer", Control = "Normal", log2FC = 0.5, padj = FALSE, out_name = "./test/00_GEO_data_GSE18105/03_DEG")
Number of Up regulated genes: 2969 
Number of Down regulated genes: 2724 
Number of Not Change regulated genes: 17188 
plot_volcano_limma(DEG_tb2, log2FC = 0.5, padj = FALSE)
[1m[22mScale for [32my[39m is already present.
Adding another scale for [32my[39m, which will replace the existing scale.
Warning message:
"[1m[22mDuplicated aesthetics after name standardisation: [32msegment.colour[39m"

plot_heatmap_DEG
  • 根据差异分析结果和表达谱绘制差异基因热图
# 仅支持输入两种分组
group2 = group[group != "Cancer_LCM"]
expM2 = expM[, group != "Cancer_LCM"]
table(group2)
group2
Cancer Normal 17     17 
plot_heatmap_DEG(expM2, DEG_tb, group2, out_name = "./test/00_GEO_data_GSE18105/04_heatmap")



set_image(7, 10)
plot_heatmap_DEG(expM2, DEG_tb, group2, ntop = 30, w = 15)



plot_boxplot
gene_select = c(head(rownames(DEG_tb), 3), tail(rownames(DEG_tb), 3))
gene_select
  1. 'FOXQ1'
  2. 'CLDN1'
  3. 'LGR5'
  4. 'GCG'
  5. 'ZG16'
  6. 'CLCA4'
source("W:/02_Study/R_build/build/fastGEO/functions/analysis.R")
set_image(14, 6)
plot_boxplot(expM2, group2, genes = gene_select,out_name = "./test/00_GEO_data_GSE18105/06_boxplot", w = 10, h = 5)



set_image(14, 6)
plot_boxplot(expM2, group2, genes = gene_select, breaks = c("Normal", "Cancer"), # 修改x轴顺序ylab = "log2(FPKM + 1)", # 修改y轴标题color = c("blue", "red"), # 修改颜色w = 10, h = 5)



# 多组
set_image(14, 6)
plot_boxplot(expM, group, breaks = c("Normal", "Cancer", "Cancer_LCM"), # 修改x轴顺序comparisons = list(c("Cancer", "Normal"), c("Cancer_LCM", "Normal")), # 手动指定分组genes = gene_select)



run_DEG_deseq2
  • 快速使用DESeq2进行差异分析, 适合count数据
  • 有count数据优先使用DESeq2
load2("./test/TCGA/00_TCGA_data.RData")
Loading objects:expMexpM_FPKMgroupsample_anno
hd(expM)
dim: 59427 × 562 
A data.frame: 5 × 5
TCGA-60-2712-01A-01R-0851-07TCGA-56-7221-01A-11R-2045-07TCGA-21-A5DI-01A-31R-A26W-07TCGA-43-7657-01A-31R-2125-07TCGA-94-7033-01A-11R-1949-07
<int><int><int><int><int>
5_8S_rRNA 0 0 0 0 0
5S_rRNA 0 0 0 4 4
7SK 0 3 0 0 0
A1BG 7 1 0 6 5
A1BG-AS18134314124
table(group)
groupLUAD Normal 511     51 
sum(group == "Normal")

51

# 取子集测试
set.seed(1234)
SID_tumor = sample(colnames(expM)[group == "LUAD"], sum(group == "Normal"))
expM2 = expM[, c(SID_tumor, colnames(expM)[group == "Normal"])]
group2 = group[match(colnames(expM2), colnames(expM))]
table(group2)
group2LUAD Normal 51     51 
DEG_tb_count = run_DEG_deseq2(expM2, group2, Case = "LUAD", Control = "Normal", out_name = "./test/TCGA/TCGA_LUAD_DEG")
estimating size factorsestimating dispersionsgene-wise dispersion estimatesmean-dispersion relationshipfinal dispersion estimatesfitting model and testing-- replacing outliers and refitting for 3453 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)estimating dispersionsfitting model and testing

Number of Up regulated genes: 6032 
Number of Down regulated genes: 11035 
Number of Not Change regulated genes: 37154 
count_to_exp
  • 用于将count矩阵转为标准化的表达谱, 可用于绘制PCA、热图和箱线图等
  • 不依赖于基因长度, 省去计算TPM/FPKM的麻烦
  • 使用 DESeq2 的 VST 标准化方法
expM_vst = count_to_exp(expM2)
hd(expM_vst)
dim: 59427 × 102 
A data.frame: 5 × 5
TCGA-58-A46K-01A-11R-A24H-07TCGA-68-A59J-01A-21R-A26W-07TCGA-66-2753-01A-01R-0980-07TCGA-77-8136-01A-11R-2247-07TCGA-46-3765-01A-01R-0980-07
<dbl><dbl><dbl><dbl><dbl>
5_8S_rRNA1.8852981.8852981.8852981.8852981.885298
5S_rRNA3.1842271.8852982.5866421.8852981.885298
7SK1.8852981.8852981.8852981.8852981.885298
A1BG3.4522683.8816633.2503464.2692593.959024
A1BG-AS15.4023775.8044805.6396236.7559545.297581
expM_FPKM2 = expM_FPKM[, colnames(expM2)]
hd(expM_FPKM2)
dim: 59427 × 102 
A data.frame: 5 × 5
TCGA-58-A46K-01A-11R-A24H-07TCGA-68-A59J-01A-21R-A26W-07TCGA-66-2753-01A-01R-0980-07TCGA-77-8136-01A-11R-2247-07TCGA-46-3765-01A-01R-0980-07
<dbl><dbl><dbl><dbl><dbl>
5_8S_rRNA0.00000000.00000000.000000000.00000000.0000000
5S_rRNA1.71874520.00000000.631337140.00000000.0000000
7SK0.00000000.00000000.000000000.00000000.0000000
A1BG0.12578330.17976580.081612320.27774680.1615653
A1BG-AS10.84149010.94553280.870187151.59129640.5865966
  • 测试vst算法与TCGA官网提供的FPKM的相关性
  • 结果显示Sperman相关性达到0.984, 说明相关性极高, 可以等价
# 加载必需的包(确保版本兼容)
if (!require("dplyr", quietly = TRUE)) {install.packages("dplyr")library(dplyr)
}
if (!require("ggplot2", quietly = TRUE)) {install.packages("ggplot2")library(ggplot2)
}# 确保两个矩阵的基因和样本匹配
common_genes <- intersect(rownames(expM_vst), rownames(expM_FPKM2))
common_samples <- intersect(colnames(expM_vst), colnames(expM_FPKM2))expM_vst_filtered <- expM_vst[common_genes, common_samples]
expM_FPKM_filtered <- expM_FPKM2[common_genes, common_samples]# 转换为长格式(不使用rownames_to_column)
# 处理VST数据:手动添加基因名列
vst_df <- as.data.frame(expM_vst_filtered)
vst_df$gene <- rownames(vst_df)  # 手动将行名转为gene列
vst_long <- tidyr::pivot_longer(vst_df, cols = -gene, names_to = "sample", values_to = "vst_value")# 处理FPKM数据:手动添加基因名列
fpkm_df <- as.data.frame(expM_FPKM_filtered)
fpkm_df$gene <- rownames(fpkm_df)  # 手动将行名转为gene列
fpkm_long <- tidyr::pivot_longer(fpkm_df, cols = -gene, names_to = "sample", values_to = "fpkm_value")# 合并数据
combined_data <- inner_join(vst_long, fpkm_long, by = c("gene", "sample"))# 计算相关性
cor_pearson <- cor(combined_data$vst_value, combined_data$fpkm_value, method = "pearson")
cor_spearman <- cor(combined_data$vst_value, combined_data$fpkm_value, method = "spearman")
hd(combined_data)
dim: 6061554 × 4 
A data.frame: 5 × 4
genesamplevst_valuefpkm_value
<chr><chr><dbl><dbl>
15_8S_rRNATCGA-58-A46K-01A-11R-A24H-071.8852980
25_8S_rRNATCGA-68-A59J-01A-21R-A26W-071.8852980
35_8S_rRNATCGA-66-2753-01A-01R-0980-071.8852980
45_8S_rRNATCGA-77-8136-01A-11R-2247-071.8852980
55_8S_rRNATCGA-46-3765-01A-01R-0980-071.8852980
# 绘制散点图
set.seed(1234)
combined_data = data.frame(combined_data)
combined_data2 = combined_data[sample(1:nrow(combined_data), 10000), ]
# 计算相关性
cor_pearson <- cor(combined_data2$vst_value, combined_data2$fpkm_value, method = "pearson")
cor_spearman <- cor(combined_data2$vst_value, combined_data2$fpkm_value, method = "spearman")
ggplot(combined_data2, aes(x = fpkm_value, y = vst_value)) +geom_point(alpha = 0.3, size = 1) +geom_smooth(method = "lm", color = "red", se = FALSE) +labs(x = "FPKM Expression",y = "VST Normalized Expression",title = paste0("Correlation between VST and FPKM\n","Pearson: ", round(cor_pearson, 3), " | ","Spearman: ", round(cor_spearman, 3))) +theme_minimal() +theme(plot.title = element_text(hjust = 0.5))
[1m[22m`geom_smooth()` using formula = 'y ~ x'

  • 绘制的热图也看不出差异
plot_heatmap_DEG(expM_vst, DEG_tb_count, group2)



plot_heatmap_DEG(expM_FPKM2, DEG_tb_count, group2)



run_ComBat
  • 去除多个数据集之间的批次效应, 并绘制去批次前后的PCA图
# 下载处理 GSE108109 芯片数据
obj = download_GEO("GSE108109", out_dir = "./test/run_ComBat")
INFO [2025-08-02 21:47:24] Step1: Download GEO data ...
INFO [2025-08-02 21:47:24] Querying dataset: GSE108109 ...- Use local curl- Found 1 GPL- Found 111 samples, 34 metas.- Writting sample_anno to ./test/run_ComBat/GSE108109_sample_anno.csv - Normalize between arrays ...- Successed, file save to ./test/run_ComBat/GSE108109_GPL19983.RData.
INFO [2025-08-02 21:47:35] Step2: Annotate probe GPL19983 ...
INFO [2025-08-02 21:47:36] Use built-in annotation file ...
- All porbes matched!
- Warnning: 732/25582 probes fail to annotated!
INFO [2025-08-02 21:47:36] Removing duplicated genes by method: max ...
INFO [2025-08-02 21:48:01] Done.
obj_list = get_GEO_group(obj, group_name = "source_name_ch1","Membranous Nephropathy" = "MN","Living donor" = "Control")
expM_GSE108109 = obj_list$expM
group_GSE108109 = obj_list$group
#下载处理 GSE216841 高通量
sample_anno = get_GEO_pheno("GSE216841")
INFO [2025-08-02 21:52:25] Querying dataset: GSE216841 ...- Use local curl- Found 1 GPL- Found 34 samples, 40 metas.- Writting sample_anno to 00_GEO_data/GSE216841_sample_anno.csv - Can't found expression profile, please seehttps://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE216841 - Return sample annotation!
hd(sample_anno)
dim: 34 × 40 
A data.frame: 5 × 5
titlegeo_accessionstatussubmission_datelast_update_date
<chr><chr><chr><chr><chr>
GSM6696604Normal control 1GSM6696604Public on Jan 25 2023Oct 28 2022Jan 25 2023
GSM6696605Normal control 2GSM6696605Public on Jan 25 2023Oct 28 2022Jan 25 2023
GSM6696606Normal control 3GSM6696606Public on Jan 25 2023Oct 28 2022Jan 25 2023
GSM6696607Normal control 4GSM6696607Public on Jan 25 2023Oct 28 2022Jan 25 2023
GSM6696608Normal control 5GSM6696608Public on Jan 25 2023Oct 28 2022Jan 25 2023
expM_raw = read.faster("./test/run_ComBat/GSE216841_MNd_ncounts_annot.txt.gz")
hd(expM_raw)
dim: 20321 × 37 
A data.frame: 5 × 5
ensembl_gene_idhgnc_GRCh38p12descriptionN_CTRL1N_CTRL2
<chr><chr><chr><dbl><dbl>
ENSG00000000003ENSG00000000003TSPAN6 tetraspanin 6 [Source:HGNC Symbol;Acc:HGNC:11858] 92.874228309.359955
ENSG00000000005ENSG00000000005TNMD tenomodulin [Source:HGNC Symbol;Acc:HGNC:17757] 1.020596 1.041616
ENSG00000000419ENSG00000000419DPM1 dolichyl-phosphate mannosyltransferase subunit 1, catalytic [Source:HGNC Symbol;Acc:HGNC:3005]255.148977 1.041616
ENSG00000000457ENSG00000000457SCYL3 SCY1 like pseudokinase 3 [Source:HGNC Symbol;Acc:HGNC:19285] 373.538103887.456841
ENSG00000000460ENSG00000000460C1orf112chromosome 1 open reading frame 112 [Source:HGNC Symbol;Acc:HGNC:25565] 110.224358289.569251
countM = expM_raw[, -c(1:3)]
name_list = c("N_CTRL" = "Normal control", "MEMBR" = "Idiopathic membranous nephropathy", "MCD" = "Minimal change disease")
new_name = paste(name_list[paste0(gsub("[0-9]", "", colnames(countM)))], gsub("[A-Z_]", "", colnames(countM)))
colnames(countM) = sample_anno$geo_accession[match(new_name, sample_anno$title)]
countM = ceiling(countM)
countM = data.frame(SYMBOL = expM_raw$hgnc_GRCh38p12, countM)
countM = aggregate(. ~ SYMBOL, data = countM, function(x) x[which.max(x)])
countM = col2rownames(countM)
expM = count_to_exp(countM)
converting counts to integer mode

group = subString(sample_anno[colnames(countM), "title"], -1, " ", rev = TRUE, collapse = " ")
table(group)
group
Idiopathic membranous nephropathy            Minimal change disease 12                                14 Normal control 8 
SID_MN = colnames(expM)[group == "Idiopathic membranous nephropathy"]
SID_Control = colnames(expM)[group == "Normal control"]
expM_GSE216841 = expM[, c(SID_MN, SID_Control)]
group_GSE216841 = rep_by_len(c("MN", "Control"), list(SID_MN, SID_Control))
expM_list = list2(expM_GSE108109, expM_GSE216841)
names(expM_list) = subString(names(expM_list), 2, "_")
names(expM_list)
  1. 'GSE108109'
  2. 'GSE216841'
group_merge = Reduce(c, sapply(paste0("group_", names(expM_list)), get))
table(group_merge)
group_merge
Control      MN 14      56 
set_image(9, 10)
obj = run_ComBat(expM_list, group_merge, out_name = "./test/run_ComBat/01_Combat")
Warning message in MASS::cov.trob(data[, vars]):
"Probable convergence failure"
Warning message in MASS::cov.trob(data[, vars]):
"Probable convergence failure"
Warning message in MASS::cov.trob(data[, vars]):
"Probable convergence failure"

# 提取结果
expM = obj$expM
group = obj$group
boxplot(expM)

table(group)
group
Control      MN 14      56 

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。
如若转载,请注明出处:http://www.pswp.cn/pingmian/91592.shtml
繁体地址,请注明出处:http://hk.pswp.cn/pingmian/91592.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

LLM 典型模型技术特性及项目落地全流程实践

在大语言模型(LLM)技术快速迭代的当下,开发者面临的核心挑战已从 “是否使用” 转变为 “如何正确选型并高效落地”。本文将系统剖析当前主流 LLM 的技术特性,结合实际项目架构,提供从模型选型、接口集成到性能优化的全流程技术方案,并附关键代码实现,为工业级 LLM 应用…

机器学习消融实验:方法论演进、跨领域应用与前沿趋势

一、定义与起源 消融实验&#xff08;Ablation Study&#xff09;是一种系统性移除或修改模型关键组件以评估其对整体性能贡献的实验方法论。其术语源于神经科学和实验心理学&#xff08;20世纪60-70年代&#xff09;&#xff0c;指通过切除动物脑区研究行为变化的实验范式。2…

北京-4年功能测试2年空窗-报培训班学测开-今天来聊聊我的痛苦

最近状态很不对劲&#xff0c;因为我很少花时间好好思考&#xff0c;只是处于执行状态&#xff0c;甚至也不太写笔记了&#xff0c;我原以为这样会更高效&#xff0c;现在想想&#xff0c;开始不愿花时间深思才是断弦的开始吧而且从结课后我有了隐瞒&#xff0c;我不想过多透露…

深度解析 | AI 幻觉的形成和应对路径

写这一篇的缘由一是因为我也在摸索如何降低 AI 幻觉提升 AI 工具使用效率&#xff0c;二是因为前两周在MIT学习时老师讲的一节课&#xff0c;刚好也解释了这个问题&#xff0c;所以一并做个总结&#xff0c;分享给大家。 近几年&#xff0c;大型语言模型&#xff08;LLM&#…

Java把word转HTML格式

Java把word转HTML格式&#xff0c;两种方式方式一&#xff1a;maven引入依赖,pom.xml<dependency><groupId>e-iceblue</groupId><artifactId>spire.office.free</artifactId><version>5.3.1</version> </dependency>然后代码读…

#C语言——学习攻略:探索字符函数和字符串函数(一)--字符分类函数,字符转换函数,strlen,strcpy,strcat函数的使用和模拟实现

&#x1f31f;菜鸟主页&#xff1a;晨非辰的主页 &#x1f440;学习专栏&#xff1a;《C语言学习》 &#x1f4aa;学习阶段&#xff1a;C语言方向初学者 ⏳名言欣赏&#xff1a;"编程的本质是理解问题&#xff0c;然后把它分解成可执行的步骤。" 目录 1. 字符分类函…

(吃饭)质数时间

题目描述如果把一年之中的某个时间写作 a 月 b 日 c 时 d 分 e 秒的形式&#xff0c;当这五个数都为质数时&#xff0c;我们把这样的时间叫做质数时间&#xff0c;现已知起始时刻是 2022 年的 a 月 b 日 c 时 d 分 e 秒&#xff0c;终止时刻是 2022 年的 u 月 v 日 w 时 x 分 y…

【RK3568 RTC 驱动开发详解】

RK3568 RTC 驱动开发详解一、Linux RTC 子系统架构​二、设备树配置​三、驱动四、时间相关命令实时时钟&#xff08;RTC&#xff09;是嵌入式系统中不可或缺的硬件模块&#xff0c;负责在系统断电后继续计时&#xff0c;为设备提供稳定的时间基准。本文将以瑞芯微 RK3568 平台…

文本编码检测库`chardet` 和 `uchardet`对比使用示例及注意事项

在处理未知编码的二进制数据时&#xff0c;chardet 和 uchardet 是两个非常实用的字符编码自动检测库&#xff0c;尤其适用于从卫星通信、文件、网络流等来源获取的未标明编码的文本数据。一、chardet&#xff08;Python版&#xff09; ✅ 简介 chardet 是一个用 Python 编写的…

[Windows]Postman-app官方历史版本下载方法

Postman-app官方历史版本下载方法最新版&历史版本官网地址最新版本下载历史版本下载禁止自动更新方法Postman最新版安装后必须要登录才能使用某些特定功能&#xff0c;多有不便&#xff0c;因此花了点时间整理了一下历史版本如何下载的方法&#xff0c;链接均为官网链接&am…

【Spring Boot 快速入门】三、分层解耦

目录分层解耦案例&#xff1a;将 emp.xml 中的数据解析并响应三层架构分层解耦IOC & DI 入门IOC 详解DI 详解分层解耦 案例&#xff1a;将 emp.xml 中的数据解析并响应 emp.xml 内容如下&#xff1a; <emps><emp><name>Tom</name><age>18…

井云科技2D交互数字人:让智能服务触手可及的实用方案

在如今的数字化时代&#xff0c;智能交互已成为各行业提升服务质量的重要方向。而井云 2D 交互数字人系统凭借其独特的技术优势&#xff0c;正逐渐成为众多企业实现智能服务升级的优选。它无需复杂的操作和高昂的成本&#xff0c;就能让数字人在各类线下场景中发挥重要作用&…

本地部署VMware ESXi,并实现无公网IP远程访问管理服务器

ESXi&#xff08;VMware ESXi&#xff09;是VMware公司推出的一款企业级虚拟化平台&#xff0c;基于裸机&#xff08;bare-metal&#xff09;安装的虚拟化操作系统。它可以在一台物理服务器上运行多个虚拟机&#xff0c;广泛应用于数据中心和云计算环境中。很多公司为了方便管理…

让科技之光,温暖银龄岁月——智绅科技“智慧养老进社区”星城国际站温情纪实

七月的风&#xff0c;带着夏日的热情&#xff0c;轻轻拂过邯郸星城国际社区葱郁的绿意。2025年7月30日&#xff0c;一个以“幸福晚景&#xff0c;乐享银龄—智慧养老进社区”为主题的活动&#xff0c;如一股暖流&#xff0c;浸润了社区的长者们。智绅科技怀揣着“科技赋能养老&…

Java单元测试和设计模式

单元测试 . 测试分类 什么是测试? 测试的目的是尽可能多的发现软件中存在的BUG,而不是为了隐藏BUG。事实上测试有很多种类,比如:边界测试,压力测试,性能测试等 黑盒测试 黑盒测试也叫功能测试,主要关注软件每个功能是否实现,并不关注软件代码是否有错误;测试人员…

UOS统信桌面系统解决编译错误:C compiler cc is not found指南

一、系统环境 1.操作系统版本2.编译环境 PC:~$ gcc --version gcc (Uos 8.3.0.13-deepin1) 8.3.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY o…

深入理解 Docker 容器网络:为什么用 host 网络模式能解决连通性问题?

Docker 已经成为现代应用部署的标配&#xff0c;大家都知道它的网络隔离做得很好&#xff0c;既安全又灵活。不过&#xff0c;在实际用 Docker 部署服务的过程中&#xff0c;相信很多人都遇到过这样的情况&#xff1a;主机上能连通的外部服务&#xff0c;一到容器里却死活连不上…

Spring Boot 异常处理:从全局捕获到优化用户体验!

全文目录&#xff1a;开篇语**前言****1. Spring Boot 异常处理的基本概念****2. 使用 ExceptionHandler 局部处理异常****示例&#xff1a;局部异常处理****优化建议&#xff1a;****3. 使用 ControllerAdvice 和 RestControllerAdvice 进行全局异常处理****示例&#xff1a;全…

vue3.0 + TypeScript 中使用 axios 同时进行二次封装

项目背景是vite搭建的vue3.0 TypeScript 的项目&#xff0c;需要统一处理和统一维护就对axios进行了二次封装 axios的安装 npm install axios定义http文件夹然后内部定义index.ts文件&#xff0c;内部开始封装 import axios, {type AxiosInstance} from "axios";…

ESP32- 项目应用1 音乐播放器之sd的驱动配置 #1

音乐播放器 ESP32- 项目应用1 音乐播放器之sd的驱动配置 #1 文章目录 音乐播放器 1 sd卡介绍 1.1 SDCARD介绍 1.2 物理结构 1.3 协议说明 1.4 sd 卡模式 1.5 数据模式 1.6 sdio 初始化流程 1.7 SPI 模式下的 SD 卡初始化 2 原理图 2.1 sd原理图 2.2 esp32的接口 3 代码配置 3.…