Seurat新版教程:Guided Clustering Tutorial-（下）

2020 年 3 月 30 日
筆記

回顧

Seurat新版教程:Guided Clustering Tutorial-（上）

好了，最重要的一步來了，聚類分析。Seurat採用的是graph-based聚類方法，k-means方法在V3中已經不存在了。

聚類

# Cluster the cells  #Identify clusters of cells by a shared nearest neighbor (SNN) modularity optimization based clustering algorithm. First calculate k-nearest neighbors and construct the SNN graph. Then optimize the modularity function to determine clusters. For a full description of the algorithms, see Waltman and van Eck (2013) The European Physical Journal B. Thanks to Nigel Delaney (evolvedmicrobe@github) for the rewrite of the Java modularity optimizer code in Rcpp!  pbmc <- FindNeighbors(pbmc, dims = 1:10)  pbmc <- FindClusters(pbmc, resolution = 0.5)  Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck    Number of nodes: 2638  Number of edges: 96033    Running Louvain algorithm...  0%   10   20   30   40   50   60   70   80   90   100%  [----|----|----|----|----|----|----|----|----|----|  **************************************************|  Maximum modularity in 10 random starts: 0.8720  Number of communities: 9  Elapsed time: 0 seconds    > table([email protected]) # 查看每一類有多少個細胞      0   1   2   3   4   5   6   7   8  697 483 480 344 271 162 155  32  14   # 提取某一類細胞。  > head(subset(as.data.frame([email protected]),[email protected]=="2"))                   [email protected]  AAACCGTGCTTCCG                 2  AAAGAGACGCGAGA                 2  AAAGCAGATATCGG                 2  AAAGTTTGTAGCGT                 2  AAATGTTGAACGAA                 2  AAATGTTGTGGCAT                 2

提取某一cluster細胞。

> pbmc  An object of class Seurat  13714 features across 2638 samples within 1 assay  Active assay: RNA (13714 features)   1 dimensional reduction calculated: pca  > subpbmc<-subset(x = pbmc,idents="2")  > subpbmc  An object of class Seurat  13714 features across 480 samples within 1 assay  Active assay: RNA (13714 features)   1 dimensional reduction calculated: pca    ?WhichCells  head(WhichCells(pbmc,idents="2"))    [1] "AAACCGTGCTTCCG" "AAAGAGACGCGAGA" "AAAGCAGATATCGG" "AAAGTTTGTAGCGT" "AAATGTTGAACGAA" "AAATGTTGTGGCAT"  # Look at cluster IDs of the first 5 cells  head(Idents(pbmc), 5)    AAACATACAACCAC AAACATTGAGCTAC AAACATTGATCAGC AAACCGTGCTTCCG AAACCGTGTATGCG               1              3              1              2              6  Levels: 0 1 2 3 4 5 6 7 8    #提取部分細胞  > pbmc  An object of class Seurat  32738 features across 2700 samples within 1 assay  Active assay: RNA (32738 features)  > head(colnames(pbmc@assays$RNA@counts)[1:30])  [1] "AAACATACAACCAC" "AAACATTGAGCTAC" "AAACATTGATCAGC" "AAACCGTGCTTCCG" "AAACCGTGTATGCG" "AAACGCACTGGTAC"  > subbset<-subset(x=pbmc,cells=colnames(pbmc@assays$RNA@counts)[1:30])  > subbset  An object of class Seurat  32738 features across 30 samples within 1 assay  Active assay: RNA (32738 features)

當然，我們用的基本都是默認參數，建議？FindClusters一下，看看具體的參數設置，比如雖然是圖聚類，但是卻有不同的算法，這個要看相應的文獻了。

Algorithm for modularity optimization (1 = original Louvain algorithm; 2 = Louvain algorithm with multilevel refinement; 3 = SLM algorithm; 4 = Leiden algorithm). Leiden requires the leidenalg python.

系統發育分析（Phylogenetic Analysis of Identity Classes）

#Constructs a phylogenetic tree relating the 'average' cell from each identity class.  # Tree is estimated based on a distance matrix constructed in either gene expression space or PCA spac    ?BuildClusterTree  pbmc<-BuildClusterTree(pbmc)  Tool(object = pbmc, slot = 'BuildClusterTree')      Phylogenetic tree with 9 tips and 8 internal nodes.    Tip labels:      0, 1, 2, 3, 4, 5, ...    Rooted; includes branch lengths.    PlotClusterTree(pbmc)

識別低質量樣本。

#Calculate the Barcode Distribution Inflection  ?CalculateBarcodeInflections  pbmc<-CalculateBarcodeInflections(pbmc)  SubsetByBarcodeInflections(pbmc)

可視化降維

非線性降維——這個目的是為了可視化，而不是特徵提取（PCA），雖然它也可以用來做特徵提取。

UMAP

#Run non-linear dimensional reduction (UMAP/tSNE)  # If you haven't installed UMAP, you can do so via reticulate::py_install(packages =  # 'umap-learn')  pbmc <- RunUMAP(pbmc, dims = 1:10)    UMAP(a=None, angular_rp_forest=False, b=None, init='spectral',     learning_rate=1.0, local_connectivity=1, metric='correlation',     metric_kwds=None, min_dist=0.3, n_components=2, n_epochs=None,     n_neighbors=30, negative_sample_rate=5, random_state=None,     repulsion_strength=1.0, set_op_mix_ratio=1.0, spread=1.0,     target_metric='categorical', target_metric_kwds=None,     target_n_neighbors=-1, target_weight=0.5, transform_queue_size=4.0,     transform_seed=42, verbose=True)  Construct fuzzy simplicial set  Construct embedding      completed  0  /  500 epochs      completed  50  /  500 epochs      completed  100  /  500 epochs      completed  150  /  500 epochs      completed  200  /  500 epochs      completed  250  /  500 epochs      completed  300  /  500 epochs      completed  350  /  500 epochs      completed  400  /  500 epochs      completed  450  /  500 epochs  > head(pbmc@[email protected]) # 提取UMAP坐標值。                      UMAP_1    UMAP_2  AAACATACAACCAC   1.7449461 -2.770269  AAACATTGAGCTAC   4.6293025 12.092374  AAACATTGATCAGC   5.2499804 -2.011440  AAACCGTGCTTCCG -11.9875174 -1.568554  AAACCGTGTATGCG   0.1626114 -8.743275  AAACGCACTGGTAC   2.9192858 -1.487868

t-SNE

pbmc <- RunTSNE(pbmc, dims = 1:10)   head(pbmc@[email protected])                     tSNE_1     tSNE_2  AAACATACAACCAC -11.701490   7.120466  AAACATTGAGCTAC -21.981401 -21.330793  AAACATTGATCAGC  -1.292978  23.822324  AAACCGTGCTTCCG  30.877776  -9.926240  AAACCGTGTATGCG -34.619197   8.773753  AAACGCACTGGTAC  -3.046157  13.013471

比較一下兩個可視化的結果。

# note that you can set `label = TRUE` or use the LabelClusters function to help label  # individual clusters  plot1<-DimPlot(pbmc, reduction = "umap",label = TRUE)+scale_color_npg()  plot2<-DimPlot(pbmc, reduction = "tsne",label = TRUE)+scale_color_npg()  CombinePlots(plots = list(plot1, plot2),legend="bottom")

可以看出，兩者的降維可視化的結構是一致的，UMAP方法更加緊湊——在降維圖上，同一cluster離得更近，不同cluster離得更遠，作為一種後來的算法有一定的優點，但是t-SNE結構也能很好地反映cluster的空間結構。

差異分析

# Finding differentially expressed features (cluster biomarkers)  # find all markers of cluster 1  cluster1.markers <- FindMarkers(pbmc, ident.1 = 1, min.pct = 0.25)  head(cluster1.markers, n = 5)                p_val avg_logFC pct.1 pct.2    p_val_adj  IL32 1.894810e-92 0.8373872 0.948 0.464 2.598542e-88  LTB  7.953303e-89 0.8921170 0.981 0.642 1.090716e-84  CD3D 1.655937e-70 0.6436286 0.919 0.431 2.270951e-66  IL7R 3.688893e-68 0.8147082 0.747 0.325 5.058947e-64  LDHB 2.292819e-67 0.6253110 0.950 0.613 3.144372e-63

這是一種one-others的差異分析方法，就是cluster1與其餘的cluster來做比較，當然這個是可以指定的,參數就是ident.2。

# find all markers distinguishing cluster 5 from clusters 0 and 3  cluster5.markers <- FindMarkers(pbmc, ident.1 = 5, ident.2 = c(0, 3), min.pct = 0.25)  head(cluster5.markers, n = 5)                       p_val avg_logFC pct.1 pct.2     p_val_adj  FCGR3A        7.583625e-209  2.963144 0.975 0.037 1.040018e-204  IFITM3        2.500844e-199  2.698187 0.975 0.046 3.429657e-195  CFD           1.763722e-195  2.362381 0.938 0.037 2.418768e-191  CD68          4.612171e-192  2.087366 0.926 0.036 6.325132e-188  RP11-290F20.3 1.846215e-188  1.886288 0.840 0.016 2.531900e-184

看看輸出結果都是什麼。

?FindMarkers  data.frame with a ranked list of putative markers as rows,  and associated statistics as columns (p-values, ROC score, etc., depending on the test used (test.use)).  The following columns are always present:    avg_logFC: log fold-chage of the average expression between the two groups. Positive values indicate that the gene is more highly expressed in the first group  pct.1: The percentage of cells where the gene is detected in the first group  pct.2: The percentage of cells where the gene is detected in the second group  p_val_adj: Adjusted p-value, based on bonferroni correction using all genes in the dataset

我們還可以輸出一個總表。

# find markers for every cluster compared to all remaining cells, report only the positive ones  > pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)  Calculating cluster 0     |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 08s  Calculating cluster 1     |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 06s  Calculating cluster 2     |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 21s  Calculating cluster 3     |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 07s  Calculating cluster 4     |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 08s  Calculating cluster 5     |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 27s  Calculating cluster 6     |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 21s  Calculating cluster 7     |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 24s  Calculating cluster 8     |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 19s    > head(pbmc.markers)                p_val avg_logFC pct.1 pct.2     p_val_adj cluster  gene  RPS12 2.008629e-140 0.5029988 1.000 0.991 2.754633e-136       0 RPS12  RPS27 2.624075e-140 0.5020359 0.999 0.992 3.598656e-136       0 RPS27  RPS6  1.280169e-138 0.4673635 1.000 0.995 1.755623e-134       0  RPS6  RPL32 4.358823e-135 0.4242773 0.999 0.995 5.977689e-131       0 RPL32  RPS14 3.618793e-128 0.4283480 1.000 0.994 4.962812e-124       0 RPS14  RPS25 5.612007e-124 0.5203411 0.997 0.975 7.696306e-120       0 RPS25    > pbmc.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC)  # A tibble: 18 x 7  # Groups:   cluster [9]         p_val avg_logFC pct.1 pct.2 p_val_adj cluster gene         <dbl>     <dbl> <dbl> <dbl>     <dbl> <fct>   <chr>   1 1.96e-107     0.730 0.901 0.594 2.69e-103 0       LDHB   2 1.61e- 82     0.922 0.436 0.11  2.20e- 78 0       CCR7   3 7.95e- 89     0.892 0.981 0.642 1.09e- 84 1       LTB   4 1.85e- 60     0.859 0.422 0.11  2.54e- 56 1       AQP3   5 0.            3.86  0.996 0.215 0.        2       S100A9   6 0.            3.80  0.975 0.121 0.        2       S100A8   7 0.            2.99  0.936 0.041 0.        3       CD79A   8 9.48e-271     2.49  0.622 0.022 1.30e-266 3       TCL1A   9 2.96e-189     2.12  0.985 0.24  4.06e-185 4       CCL5  10 2.57e-158     2.05  0.587 0.059 3.52e-154 4       GZMK  11 3.51e-184     2.30  0.975 0.134 4.82e-180 5       FCGR3A  12 2.03e-125     2.14  1     0.315 2.78e-121 5       LST1  13 7.95e-269     3.35  0.961 0.068 1.09e-264 6       GZMB  14 3.13e-191     3.69  0.961 0.131 4.30e-187 6       GNLY  15 1.48e-220     2.68  0.812 0.011 2.03e-216 7       FCER1A  16 1.67e- 21     1.99  1     0.513 2.28e- 17 7       HLA-DPB1  17 7.73e-200     5.02  1     0.01  1.06e-195 8       PF4  18 3.68e-110     5.94  1     0.024 5.05e-106 8       PPBP

這裡可以注意一下only.pos 參數，可以指定返回positive markers 基因。test.use可以指定檢驗方法，可選擇的有：wilcox，bimod，roc，t，negbinom，poisson，LR，MAST，DESeq2。 cluster間保守conserved marker基因.

#Finds markers that are conserved between the groups   head(FindConservedMarkers(pbmc, ident.1 = 0, ident.2 = 1, grouping.var = "groups"))  Testing group g2: (0) vs (1)     |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 04s  Testing group g1: (0) vs (1)     |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 05s              g2_p_val g2_avg_logFC g2_pct.1 g2_pct.2 g2_p_val_adj     g1_p_val g1_avg_logFC g1_pct.1 g1_pct.2  S100A4  6.687228e-33   -0.8410733    0.678    0.959 9.170864e-29 2.583278e-35   -0.9569347    0.687    0.945  MALAT1  2.380860e-16    0.2957331    1.000    1.000 3.265112e-12 7.501490e-20    0.3201783    1.000    1.000  VIM     1.192054e-14   -0.4921326    0.816    0.951 1.634783e-10 1.193930e-19   -0.4945881    0.798    0.945  ANXA2   3.115304e-13   -0.6406856    0.169    0.461 4.272327e-09 2.186333e-18   -0.7695240    0.164    0.504  ANXA1   5.226901e-18   -0.7544607    0.451    0.800 7.168173e-14 1.413468e-17   -0.6660324    0.507    0.803  S100A11 1.832303e-16   -0.6955104    0.288    0.665 2.512820e-12 1.208765e-17   -0.7757913    0.256    0.584          g1_p_val_adj     max_pval minimump_p_val  S100A4  3.542707e-31 6.687228e-33   5.166555e-35  MALAT1  1.028754e-15 2.380860e-16   1.500298e-19  VIM     1.637356e-15 1.192054e-14   2.387860e-19  ANXA2   2.998337e-14 3.115304e-13   4.372665e-18  ANXA1   1.938430e-13 1.413468e-17   1.045380e-17  S100A11 1.657700e-13 1.832303e-16   2.417530e-17

繪製基因小提琴圖，其中右邊的圖使用原始的count取log繪製的，其實趨勢還是蠻一致的。

plot1<-VlnPlot(pbmc, features = c("MS4A1", "CD79A"))+scale_color_npg()  # you can plot raw counts as well  plot2<- VlnPlot(pbmc, features = c("MS4A1", "CD79A"),ncol=1, same.y.lims=T,slot = "counts", log = TRUE)+scale_color_npg()  CombinePlots(plots = list(plot1, plot2))

plot1<-FeaturePlot(pbmc, features = c("MS4A1", "GNLY", "CD3E", "CD14", "FCER1A", "FCGR3A", "LYZ", "PPBP",                                 "CD8A"),min.cutoff = 0, max.cutoff = 4)  top10 <- pbmc.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)  plot2<-DoHeatmap(pbmc, features = top10$gene) + NoLegend()+scale_color_npg()  #CombinePlots(plots = list(plot1, plot2))    library(gridExtra)  grid.arrange(plot1,plot2,ncol = 2, nrow = 1)

gridExtra 也是可以用來組合seurat繪製的圖片的，不足為奇，seurat本身用的ggplot2語法。

細胞周期分析

與細胞周期相關的基因。

> cc.genes  $`s.genes`   [1] "MCM5"     "PCNA"     "TYMS"     "FEN1"     "MCM2"     "MCM4"     "RRM1"     "UNG"      "GINS2"    "MCM6"  [11] "CDCA7"    "DTL"      "PRIM1"    "UHRF1"    "MLF1IP"   "HELLS"    "RFC2"     "RPA2"     "NASP"     "RAD51AP1"  [21] "GMNN"     "WDR76"    "SLBP"     "CCNE2"    "UBR7"     "POLD3"    "MSH2"     "ATAD2"    "RAD51"    "RRM2"  [31] "CDC45"    "CDC6"     "EXO1"     "TIPIN"    "DSCC1"    "BLM"      "CASP8AP2" "USP1"     "CLSPN"    "POLA1"  [41] "CHAF1B"   "BRIP1"    "E2F8"    $g2m.genes   [1] "HMGB2"   "CDK1"    "NUSAP1"  "UBE2C"   "BIRC5"   "TPX2"    "TOP2A"   "NDC80"   "CKS2"    "NUF2"    "CKS1B"  [12] "MKI67"   "TMPO"    "CENPF"   "TACC3"   "FAM64A"  "SMC4"    "CCNB2"   "CKAP2L"  "CKAP2"   "AURKB"   "BUB1"  [23] "KIF11"   "ANP32E"  "TUBB4B"  "GTSE1"   "KIF20B"  "HJURP"   "CDCA3"   "HN1"     "CDC20"   "TTK"     "CDC25C"  [34] "KIF2C"   "RANGAP1" "NCAPD2"  "DLGAP5"  "CDCA2"   "CDCA8"   "ECT2"    "KIF23"   "HMMR"    "AURKA"   "PSRC1"  [45] "ANLN"    "LBR"     "CKAP5"   "CENPE"   "CTCF"    "NEK2"    "G2E3"    "GAS2L3"  "CBX5"    "CENPA"

?CellCycleScoring  ## Not run:  # pbmc_small doesn't have any cell-cycle genes  # To run CellCycleScoring, please use a dataset with cell-cycle genes  # An example is available at http://satijalab.org/seurat/cell_cycle_vignette.html  pbmc <- CellCycleScoring(    object = pbmc,    g2m.features = cc.genes$g2m.genes,    s.features = cc.genes$s.genes  )  head(x = [email protected])[,c(7,8,9,10,11)] # 我是為了好看取了幾列來看，你大可不必。                   seurat_clusters groups     S.Score   G2M.Score Phase  AAACATACAACCAC               1     g1  0.10502807 -0.04507596     S  AAACATTGAGCTAC               3     g1 -0.02680010 -0.04986215    G1  AAACATTGATCAGC               1     g2 -0.01387173  0.07792135   G2M  AAACCGTGCTTCCG               2     g2  0.04595348  0.01140204     S  AAACCGTGTATGCG               6     g1 -0.02602771  0.04413069   G2M  AAACGCACTGGTAC               1     g2 -0.03692587 -0.08341568    G1    VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt","percent.HB","G2M.Score","S.Score"), ncol = 6)#+scale_color_npg()

在UMAP空間中繪製細胞周期信息。

umapem<-pbmc@[email protected]  metada= [email protected]  dim(umapem);dim(metada)    metada$bar<-rownames(metada)  umapem$bar<-rownames(umapem)  ccdata<-merge(umapem,metada,by="bar")  head(ccdata)  library(ggplot2)  plot<-ggplot(ccdata, aes(UMAP_1, UMAP_2,label=Phase))+geom_point(aes(colour = factor(Phase)))+    #plot<-plot+scale_colour_manual(values=c("#CC33FF","Peru","#660000","#660099","#990033","black","red", "#666600", "green","#6699CC","#339900","#0000FF","#FFFF00","#808080"))+    labs("@yunlai",x = "", y="")  plot=plot+scale_color_aaas()  +    theme_bw()+theme(panel.grid=element_blank(),legend.title=element_blank(),legend.text = element_text(color="black", size = 10, face = "bold"))  plot<-plot+guides(colour = guide_legend(override.aes = list(size=5))) +theme(plot.title = element_text(hjust = 0.5))    plot

探索

我們可以用SingleR或者celaref來定義每一個細胞的類型，而不只是cluster某某某了。其中SingleR與seurat是無縫銜接的，seurat對象可以讀到這個裏面。這裡先不寫，假定我們已經知道了各個類群：

# singleR  #Assigning cell type identity to clusters  new.cluster.ids <- c("Naive CD4 T", "Memory CD4 T", "CD14+ Mono", "B", "CD8 T", "FCGR3A+ Mono",                       "NK", "DC", "Platelet")  names(new.cluster.ids) <- levels(pbmc)  pbmc <- RenameIdents(pbmc, new.cluster.ids)  plot1<-DimPlot(pbmc, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()  plot2<-DimPlot(pbmc, reduction = "tsne", label = TRUE, pt.size = 0.5) + NoLegend()  grid.arrange(plot1,plot2,ncol = 2, nrow = 1)

平均表達譜函數AverageExpression

?AverageExpression    AverageExp<-AverageExpression(pbmc,features=unique(top10$gene))  Finished averaging RNA for cluster Naive CD4 T  Finished averaging RNA for cluster Memory CD4 T  Finished averaging RNA for cluster CD14+ Mono  Finished averaging RNA for cluster B  Finished averaging RNA for cluster CD8 T  Finished averaging RNA for cluster FCGR3A+ Mono  Finished averaging RNA for cluster NK  Finished averaging RNA for cluster DC  Finished averaging RNA for cluster Platelet  > typeof(AverageExp)  [1] "list"  > head(AverageExp$RNA)        Naive CD4 T Memory CD4 T CD14+ Mono          B      CD8 T FCGR3A+ Mono         NK         DC  Platelet  RPS3A   80.173486    61.905672 24.1161656 65.4588054 53.2413307   26.3218430 38.9542301 39.4926725 8.2843982  LDHB    13.697489    13.663320  2.5790368  2.8923463  7.1405019    1.3868494  4.4117033  3.1508971 2.0904628  CCR7     2.984692     1.293789  0.1020109  1.0788038  0.1631841    0.1413904  0.1196927  0.1473077 0.0000000  CD3D    10.724878    11.342980  0.4632153  0.3310177 13.9581981    0.2767605  1.1144081  0.2506665 0.0000000  CD3E     7.320622     7.807142  0.4356602  0.5741410  7.6701063    0.4992164  3.5389591  0.4322447 1.6960081  NOSIP    5.912196     5.196203  1.2965789  0.8373659  2.4063675    2.0503487  2.1314856  1.0916285 0.0804829    library(psych)  library(pheatmap)  coorda<-corr.test(AverageExp$RNA,AverageExp$RNA,method="spearman")  pheatmap(coorda$r)

記得保存數據以便下次使用。

saveRDS(pbmc, file = "D:\Users\Administrator\Desktop\RStudio\single_cell\filtered_gene_bc_matrices\hg19pbmc_tutorial.rds")

富集分析

有了基因列表其實就可以做富集分析了藉助enrichplot等R包可以做出很多好的探索。

require(org.Hs.eg.db)  library(topGO)  library(DOSE)  x=as.list(org.Hs.egALIAS2EG)  geneList<-rep(0,nrow(pbmc))  names(geneList)<-row.names(pbmc)  geneList<-geneList[intersect(names(geneList),names(x))]  newwallgenes=names(geneList)    for (ii in 1:length(geneList)){    names(geneList)[ii]<-x[[names(geneList)[ii]]][1]    }      gene_erichment_results=list()  for (c1 in as.character(unique(levels(pbmc.markers$cluster)))){    print(paste0("RUN ", c1))    testgenes<-subset(pbmc.markers,cluster==c1)$gene    gene_erichment_results[[c1]]=list()    testgeneList=geneList    testgeneList[which(newwallgenes %in% testgenes)]= 1    #gene_erichment_results=list()    tab1=c()    for(ont in c("BP","MF","CC")){      sampleGOdata<-suppressMessages(new("topGOdata",description="Simple session",ontology=ont,allGenes=as.factor(testgeneList),                                         nodeSize=10,annot=annFUN.org,mapping="org.Hs.eg.db",ID="entrez"))      resultTopGO.elim<-suppressMessages(runTest(sampleGOdata,algorithm="elim",statistic="Fisher"))        resultTopGO.classic<-suppressMessages(runTest(sampleGOdata,algorithm="classic",statistic="Fisher"))      tab1<-rbind(tab1,GenTable(sampleGOdata,Fisher.elim=resultTopGO.elim,Fisher.classic=resultTopGO.classic,orderBy="Fisher.elim",                                topNodes=200))    }    gene_erichment_results[[c1]][["topGO"]]=tab1    x<-suppressMessages(enrichDO(gene=names(testgeneList)[testgeneList==1],ont="DO",pvalueCutoff=1,pAdjustMethod="BH",universe=names(testgeneList),                                  minGSSize=5,maxGSSize=500,readable=T))    gene_erichment_results[[c1]][["DO"]]=x    dgn<-suppressMessages(enrichDGN(names(testgeneList)[testgeneList==1]))    gene_erichment_results[[c1]][["DGN"]]=dgn  }      gene_erichment_results[["8"]][["topGO"]][1:5,]    gene_erichment_results[["1"]][["topGO"]][1:5,]         GO.ID                                        Term Annotated Significant Expected Rank in Fisher.classic Fisher.elim Fisher.classic  1 GO:0019221         cytokine-mediated signaling pathway       516          22     4.15                      3     4.5e-05        1.4e-10  2 GO:0045059            positive thymic T cell selection        11           3     0.09                     55     7.9e-05        8.7e-05  3 GO:0050850 positive regulation of calcium-mediated ...        30           4     0.24                     61     9.1e-05        0.00010  4 GO:0033209 tumor necrosis factor-mediated signaling...        98           6     0.79                     70     0.00013        0.00016  5 GO:0002250                    adaptive immune response       301          11     2.42                     45     0.00040        3.8e-05

可視化

library(enrichplot)  dotplot(gene_erichment_results[[1]][["DGN"]], showCategory=30)

## categorySize can be scaled by 'pvalue' or 'geneNum'  p1<-cnetplot(gene_erichment_results[[1]][["DGN"]], categorySize="pvalue", foldChange=geneList)  p2<-cnetplot(gene_erichment_results[[1]][["DGN"]], foldChange=geneList, circular = TRUE, colorEdge = TRUE)    plot_grid(p1, p2, ncol=2)