limma做RNAseq差異分析
- 2020 年 4 月 1 日
- 筆記
limma是一個很強大的用於分析芯片的R包,也可以用於RNA-Seq的差異分析 以兩個組比較為例:首先輸入count表達矩陣,這裡也跟其他差異分析R包一樣,不要輸入已經標準化的數據。 本文主要參考:https://www.bioinfo-scrounger.com/archives/115/
library(limma) library(edge) counts <- read.csv("raw_counts.csv",row.names = 1) #Creates a DGEList dge_counts <- DGEList(counts = counts,remove.zeros = T) #Calculate normalization factors to scale the raw library sizes. #用TMM進行標準化 tmm_counts <- calcNormFactors(dge_counts) #count進行標準化以及轉化為log2的值 logCPM_counts <- cpm(tmm_counts , log=TRUE)
製作分組矩陣
#設置分組信息 group_list <- factor(c(rep("control",2), rep("treat",2))) design <- model.matrix(~group_list) colnames(design) <- levels(group_list) rownames(design) <- colnames(counts)
為了避免文庫大小在樣本間變化的影響,可以使用limma的voom方法進行處理
y = voom(logCPM_counts, design, plot = T)
對voom的描述
Transform count data to log2-counts per million (logCPM), estimate the mean-variance relationship and use this to compute appropriate observation-level weights. The data are then ready for linear modelling.
voom()作用是原始counts轉換為logCPM值,將所有計數加0.5,以避免取對數零。然後,將logCPM值矩陣進行標準化。在運行voom()之前,應對counts矩陣進行過濾以除去counts非常低的基因。為此,可以使用edgeR包中的filterByExpr函數。

image.png
因為我已經通過edgeR的TMM標準化,所以效果還行,不需要處理,如果效果不好可能由於數據沒有過濾好,如下圖:

image.png
對voom圖的解釋可以參考這裡: https://stats.stackexchange.com/questions/160255/voom-mean-variance-trend-plot-how-to-interpret-the-plot
差異分析:
#不需要voom轉化時: fit <- lmFit(logCPM_counts, design) fit <- eBayes(fit) DE_genes <- topTable(fit, coef = 2,p.value = 0.5, lfc = log2(1.5), number = Inf,sort.by="logFC")
#不進行TMM轉化,即不運行calcNormFactors(),直接進行voom轉化 y = voom(counts, design, plot = T) fit <- lmFit(y, design) fit <- eBayes(fit) DE_genes <- topTable(fit, coef = 2,p.value = 0.05, lfc = log2(1.5), number = Inf,sort.by="logFC")
歡迎關注~
參考: https://www.bioinfo-scrounger.com/archives/115/ https://www.jianshu.com/p/616de0ee881a https://mp.weixin.qq.com/s?__biz=MzI4NjMxOTA3OA==&mid=2247483987&idx=1&sn=aa2ca81e7fe128edaaedc47479c517c9&chksm=ebdf8adadca803cc31261a1ccabf8a6bdb835ce12b670cc01969fcfde51cfdb991d0619e7695&scene=21#wechat_redirect