腸型分析學習筆記

2020 年 3 月 3 日
筆記

腸型，Enterotype，是2011年在這篇文章中提出的，即將過去的2018年又有20多們腸道微生物的大佬對腸型的概念進行了回顧和確認。一直比較好奇怎樣來用程式碼分析腸型，今天找到了這個教程，放在這：

這是那篇原始的文章：Arumugam, M., Raes, J., et al. (2011) Enterotypes of the human gut microbiome, Nature,doi://10.1038/nature09944 在Google上一搜，作者竟然做了個分析腸型的教程在這，學習一下：http://enterotyping.embl.de/enterotypes.html 這是2018年大佬們的共識文章：這是國人翻譯的這篇文章，http://blog.sciencenet.cn/blog-3334560-1096828.html 當然，如果你只需要獲得自己的結果或者自己課題的結果，不需要跑程式碼的，有最新的網頁版分型，更好用，網址也放在這，同樣也是上面翻譯的那篇文章里提到的網址：http://enterotypes.org/ 只需要把菌屬的含量比例文件上就能很快得到結果。

下面我就邊學習邊做來嘗試著來個分析，並把程式碼放在這裡備忘。其實作者已經整理好了程式碼，我學習一下，爭取實現對手上的數據進行分析。

首先下載測試數據，

wget http://enterotyping.embl.de/MetaHIT_SangerSamples.genus.txt  wget http://enterotyping.embl.de/enterotypes_tutorial.sanger.R

跑跑示例數據，排排錯

我表示對R語言還只是一知半解的狀態，所以，先跑下，然後能用上自己的數據，當個工具用就暫知足啦。我是黑蘋果10.11的系統，運行這個軟體提示少了Xquartz，於是裝了個，windows和linux應該不需要。原程式碼中還提示『沒有"s.class"這個函數』，百度了一下發現有個老兄的新浪部落格說了是這個包，於是加了句library(ade4)就ok了。 Xquartz的下載地址Mac 10.6+：https://dl.bintray.com/xquartz/downloads/XQuartz-2.7.11.dmg

#Uncomment next two lines if R packages are already installed  #install.packages("cluster")  #install.packages("clusterSim")  library(cluster)  library(clusterSim)  #BiocManager::install("genefilter")  library(ade4)#Download the example data and set the working directory  #setwd('<path_to_working_directory>')  data=read.table("../MetaHIT_SangerSamples.genus.txt", header=T, row.names=1, dec=".", sep="t")  data=data[-1,]dist.JSD <- function(inMatrix, pseudocount=0.000001, ...) {   KLD <- function(x,y) sum(x *log(x/y))   JSD<- function(x,y) sqrt(0.5 * KLD(x, (x+y)/2) + 0.5 * KLD(y, (x+y)/2))   matrixColSize <- length(colnames(inMatrix))   matrixRowSize <- length(rownames(inMatrix))   colnames <- colnames(inMatrix)   resultsMatrix <- matrix(0, matrixColSize, matrixColSize) inMatrix = apply(inMatrix,1:2,function(x) ifelse (x==0,pseudocount,x)) for(i in 1:matrixColSize) {     for(j in 1:matrixColSize) {       resultsMatrix[i,j]=JSD(as.vector(inMatrix[,i]),                              as.vector(inMatrix[,j]))     }   }   colnames -> colnames(resultsMatrix) -> rownames(resultsMatrix)   as.dist(resultsMatrix)->resultsMatrix   attr(resultsMatrix, "method") <- "dist"   return(resultsMatrix)  }data.dist=dist.JSD(data)pam.clustering=function(x,k) { # x is a distance matrix and k the number of clusters   require(cluster)   cluster = as.vector(pam(as.dist(x), k, diss=TRUE)$clustering)   return(cluster)  }data.cluster=pam.clustering(data.dist, k=3)require(clusterSim)  nclusters = index.G1(t(data), data.cluster, d = data.dist, centrotypes = "medoids")nclusters=NULLfor (k in 1:20) {   if (k==1) {     nclusters[k]=NA   } else {     data.cluster_temp=pam.clustering(data.dist, k)     nclusters[k]=index.G1(t(data),data.cluster_temp,  d = data.dist,                           centrotypes = "medoids")   }  }plot(nclusters, type="h", xlab="k clusters", ylab="CH index",main="Optimal number of clusters")obs.silhouette=mean(silhouette(data.cluster, data.dist)[,3])  cat(obs.silhouette) #0.1899451#data=noise.removal(data, percent=0.01)## plot 1  obs.pca=dudi.pca(data.frame(t(data)), scannf=F, nf=10)  obs.bet=bca(obs.pca, fac=as.factor(data.cluster), scannf=F, nf=k-1)  dev.new()  s.class(obs.bet$ls, fac=as.factor(data.cluster), grid=F,sub="Between-class analysis", col=c(3,2,4))#plot 2  obs.pcoa=dudi.pco(data.dist, scannf=F, nf=3)  dev.new()  s.class(obs.pcoa$li, fac=as.factor(data.cluster), grid=F,sub="Principal coordiante analysis", col=c(3,2,4))

上圖，稍微調整下

, col=c(3,2,4)這個程式碼是給三個聚類上不同的顏色，還沒搞清楚怎麼給畫的圈上色來實現理好看的效果，相信對於熟悉R語言的同學是小菜一碟。, cell=0, cstar=0是不顯示圈和邊線，只顯示散點。不加這兩個參數，只用上面的程式碼，圖如下：

加上兩個參數的圖片，就和教程里的最後面的兩張圖一樣：

元旦要到了，提前祝大家元旦快樂！好好學習，天天給力！