Why are there so many genes at the top of my volcano plot?

Learn why volcano plots for single cell RNA-seq experiments might have a pile-up of genes at the top of the plot

If you’ve ever wondered why certain genes appear clustered at the top of your volcano plot, with a y-value of 310, you’re not alone! This phenomenon happens when genes have extremely small adjusted p-values, which are statistically significant but cannot be plotted accurately beyond a certain threshold. Let's break down what’s going on.

image-1

Understanding the volcano plot

Volcano plots are a commonly used tool in differential expression analysis to visually represent the results of your single cell RNA-seq experiment. On the plot:

  • The y-axis typically represents the -log10 of the adjusted p-value, a measure of statistical significance.
  • The x-axis shows the fold-change in gene expression between conditions.

The purpose of the volcano plot is to highlight genes that show significant changes in expression, allowing you to quickly identify potential biomarkers or interesting candidates for further investigation.

Why do I have genes piled up at the top of my plot?

When a gene’s adjusted p-value is extremely small (close to zero), its statistical significance is so high that it can’t be accurately visualized using regular plotting methods. This is because of how computers handle numbers - there’s a limit to how small an adjusted p-value can be before it gets rounded down to zero.

As a result, when a gene has an adjusted p-value that rounds to zero, it can cause a “pile-up” of genes at the very top of the plot, all appearing with the same high significance.

Why do my genes max out on the y-axis at 310?

The y-axis of a volcano plot typically uses the -log10 transformation of the adjusted p-value. As adjusted p-values get smaller, the -log10 value increases. However, there is a practical limit: due to numerical representation limitations in R, the programming language used to run differential expression analysis, any adjusted p-value smaller than 2.225074e-308 is treated as zero, and its true value cannot be represented. Since you can't calculate the -log10 of zero, these values are capped for visualization.

Genes that fall into this category are so highly significant that their adjusted p-values are effectively zero, but to preserve the accuracy of the visualization, their -log10 values are capped at 310. This is why you might see a cluster of genes appearing at the top of your volcano plot, all with a y-value of 310.

Key takeaways

  • The y=310 value represents genes with extremely small adjusted p-values that are rounded to zero; the -log10 of these adjusted p-values are capped at 310 for plotting purposes.
  • This cap ensures that highly significant genes are accurately represented at the top of the volcano plot.
  • The presence of multiple genes at this value simply reflects their very strong significance and is a result of limitations in how small adjusted p-values can be displayed.

We hope this helps clarify why you might see genes piling up at the top of your volcano plot!