Title: | Mitigating the Adverse Impact of Batch Effects in Sample Pattern Detection |
---|---|
Description: | Modifies the distance matrix obtained from data with batch effects, so as to improve the performance of sample pattern detection, such as clustering, dimension reduction, and construction of networks between subjects. The method has been published in Bioinformatics (Fei et al, 2018, <doi:10.1093/bioinformatics/bty117>). Also available on 'GitHub' <https://github.com/tengfei-emory/QuantNorm>. |
Authors: | Teng Fei, Tianwei Yu |
Maintainer: | Teng Fei <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.5 |
Built: | 2025-01-25 05:04:57 UTC |
Source: | https://github.com/tengfei-emory/quantnorm |
Brain RNA-Seq data for both human and mouse.
data(brain)
data(brain)
Large matrix with 15041 rows and 62 columns
Zhang, Ye, et al. "Purification and characterization of progenitor and mature human astrocytes reveals transcriptional and functional differences with mouse." Neuron 89.1 (2016): 37-53.
For data with known labels, this function constructs a connection matrix between unique labels, such as unique cell types. The returned matrix can be used for subject-wise network construction.
connection.matrix(mat, label, threshold = 0.15, closest = TRUE)
connection.matrix(mat, label, threshold = 0.15, closest = TRUE)
mat |
n*n dissimilarity (1-correlation) matrix (e.g. obtained by QuantNorm). |
label |
n-dimension vector for the labels of the n subjects. Replicates share the same label. |
threshold |
A number between 0 to 1. Two groups will be regarded as connected if average 1-correlation < threshold. |
closest |
True or False. Whether connect the closest group or not if the closest group cannot satisfy the threshold condition. |
Returns the connection matrix between unique labels.
Teng Fei. Email: [email protected]
Fei et al (2018), Mitigating the adverse impact of batch effects in sample pattern detection, Bioinformatics, https://doi.org/10.1093/bioinformatics/bty117.
library(network); library(ggplot2); library(sna); library(GGally) #drawing network graph data("ENCODE") #Assigning the batches based on species batches <- c(rep(1,13),rep(2,13)) #QuantNorm correction corrected.distance.matrix <- QuantNorm(ENCODE,batches,method='row/column', cor_method='pearson', logdat=FALSE,standardize = TRUE, tol=1e-4) #Constructing connection matrix mat <- connection.matrix(mat=corrected.distance.matrix,label=colnames(corrected.distance.matrix)) #Creating network object and plot ENCODE.net=network(mat, directed=FALSE) ENCODE.net %v% "Species" <- c(rep('Human',13),rep('Mouse',13)) p0 <- ggnet2(ENCODE.net,label=TRUE,color = 'Species', palette = "Set2", size = 3, vjust = -0.6,mode = "kamadakawai",label.size = 3, color.legend = 'Species')+theme(legend.position = 'bottom') plot(p0)
library(network); library(ggplot2); library(sna); library(GGally) #drawing network graph data("ENCODE") #Assigning the batches based on species batches <- c(rep(1,13),rep(2,13)) #QuantNorm correction corrected.distance.matrix <- QuantNorm(ENCODE,batches,method='row/column', cor_method='pearson', logdat=FALSE,standardize = TRUE, tol=1e-4) #Constructing connection matrix mat <- connection.matrix(mat=corrected.distance.matrix,label=colnames(corrected.distance.matrix)) #Creating network object and plot ENCODE.net=network(mat, directed=FALSE) ENCODE.net %v% "Species" <- c(rep('Human',13),rep('Mouse',13)) p0 <- ggnet2(ENCODE.net,label=TRUE,color = 'Species', palette = "Set2", size = 3, vjust = -0.6,mode = "kamadakawai",label.size = 3, color.legend = 'Species')+theme(legend.position = 'bottom') plot(p0)
Normalized ENCODE raw counts data for both human and mouse.
data(ENCODE)
data(ENCODE)
Large matrix with 10309 rows and 26 columns
Reproduced according to Gilad, Yoav, and Orna Mizrahi-Man. "A reanalysis of mouse ENCODE comparative gene expression data." F1000Research 4 (2015).
This function applies quantile normalization on the distance matrix (dissimilarity matrix) and return the corrected distance matrix.
QuantNorm(dat, batch, method = "row/column", cor_method = "spearman", tol = 0.01, max = 50, logdat = TRUE, standardize = FALSE)
QuantNorm(dat, batch, method = "row/column", cor_method = "spearman", tol = 0.01, max = 50, logdat = TRUE, standardize = FALSE)
dat |
The original p*n batch effect data with n subjects and p RNA-seq measurements or the n by n distance matrix. |
batch |
The vector of length n indicating which batch the subjects belong to. |
method |
Method for the quantile normalization. There are two options: "row/column" and "vectorize". |
cor_method |
Method to calculate the correlation matrix, can be 'spearman'(default), 'pearson' or 'kendall'. |
tol |
The tolerance for the iterative method "row/column", which is the Euclidean distance of the vectorized two dissimilarity matrices before and after each iteration. |
max |
Maximum number of the iteration if the tolerance is not reached. |
logdat |
Whether conducting log transformation to data or not. |
standardize |
Whether conducting standardization [(dat - mean)/sqrt(var)] to data or not. |
Returns the corrected 1-correlation matrix between subjects.
Teng Fei. Email: [email protected]
Fei et al (2018), Mitigating the adverse impact of batch effects in sample pattern detection, Bioinformatics, <https://doi.org/10.1093/bioinformatics/bty117>.
library(pheatmap) #drawing heatmap data("ENCODE") #load the ENCODE data #Before correction, the subjects are clustered by species pheatmap(cor(ENCODE)) #Assigning the batches based on species batches <- c(rep(1,13),rep(2,13)) #QuantNorm correction corrected.distance.matrix <- QuantNorm(ENCODE,batches,method='row/column', cor_method='pearson', logdat=FALSE, standardize = TRUE, tol=1e-4) pheatmap(1-corrected.distance.matrix)
library(pheatmap) #drawing heatmap data("ENCODE") #load the ENCODE data #Before correction, the subjects are clustered by species pheatmap(cor(ENCODE)) #Assigning the batches based on species batches <- c(rep(1,13),rep(2,13)) #QuantNorm correction corrected.distance.matrix <- QuantNorm(ENCODE,batches,method='row/column', cor_method='pearson', logdat=FALSE, standardize = TRUE, tol=1e-4) pheatmap(1-corrected.distance.matrix)