Moderated estimation of fold change and dispersio

6/6/2023

DESeq2 integrates methodological advances with several novel features to facilitate a more quantitative analysis of comparative RNA-seq data using shrinkage estimators for dispersion and fold change. Here we present DESeq2, a successor to our DESeq method. We, therefore, developed a statistical framework to facilitate gene ranking and visualization based on stable estimation of effect sizes (LFCs), as well as testing of differential expression with respect to user-defined thresholds of biological significance. Furthermore, the number of genes called significantly differentially expressed depends as much on the sample size and other aspects of experimental design as it does on the biology of the experiment – and well-powered experiments often generate an overwhelmingly long list of hits. Ranking by fold change, on the other hand, is complicated by the noisiness of LFC estimates for genes with low counts. However, small changes, even if statistically highly significant, might not be the most interesting candidates for further investigation. Often the goal of differential analysis is to produce a list of genes passing multiple-test adjustment, ranked by P value. The most common approach in the comparative analysis of transcriptomics data is to test the null hypothesis that the logarithmic fold change (LFC) between treatment and control for a gene’s expression is exactly zero, i.e., that the gene is not at all affected by the treatment. baySeq and ShrinkBayes estimate priors for a Bayesian model over all genes, and then provide posterior probabilities or false discovery rates (FDRs) for differential expression. DSS uses a Bayesian approach to provide an estimate for the dispersion for individual genes that accounts for the heterogeneity of dispersion values for different genes. BBSeq models the dispersion on the mean, with the mean absolute deviation of dispersion estimates used to reduce the influence of outliers. Our DESeq method detects and corrects dispersion estimates that are too low through modeling of the dependence of the dispersion on the average expression strength over all samples. edgeR, moderates the dispersion estimate for each gene toward a common estimate across all genes, or toward a local estimate from genes with similar expression strength, using a weighted conditional likelihood. Many methods for differential expression analysis of RNA-seq data perform such information sharing across genes for variance (or, equivalently, dispersion) estimation. In high-throughput assays, this limitation can be overcome by pooling information across genes, specifically, by exploiting assumptions about the similarity of the variances of different genes measured in the same experiment. Inferential methods that treat each gene separately suffer here from lack of power, due to the high uncertainty of within-group variance estimates.

This task is general: methods for it are typically also applicable for other comparative HTS assays, including chromatin immunoprecipitation sequencing, chromosome conformation capture, or counting observed taxa in metagenomic studies.īesides the need to account for the specifics of count data, such as non-normality and a dependence of the variance on the mean, a core challenge is the small number of samples in typical HTS experiments – often as few as two or three replicates per condition. An important task here is the analysis of RNA sequencing (RNA-seq) data with the aim of finding genes that are differentially expressed across groups of samples. The rapid adoption of high-throughput sequencing (HTS) technologies for genomic studies has resulted in a need for statistical methods to assess quantitative differences between experiments.

0 Comments

Moderated estimation of fold change and dispersio

Leave a Reply.

Author

Archives

Categories