Which peak calling algorithm is most appropriate for your genomic dataset?
Written by Joelle Lo
One of the challenges in analyzing genomic sequencing data is peak calling which identifies areas in the genome that are enriched for bound proteins of interest, such as transcription factors. Peak analysis uses algorithms, such as MACS2 and SEACR, to identify these binding sites.
What is MACS2?
This is considered the default peak caller for ChIP-seq and CUT&RUN datasets, but since this model is designed for deeply sequenced data it may call more peaks in an attempt to distinguish signal from noise.
Model-based Analysis of ChIP-seq (MACS) is an algorithm that considers genome complexity when evaluating the significance on ChIP regions. MACS can detect enrichment for both transcription factor binding sites as well as larger regions of interest. MACS can be used on its own for a treated sample or alongside a control sample bu removing redundancy, increasing confidence in peak calls.
On the Pluto platform, you can choose between Broad and Narrow for your MACS2 peak calling when setting up a new analysis with the experiment wizard. Broad peak analysis works best for identifying regions enriched for histone modifications. Narrow peak analysis works best for identifying transcription factor binding sites.
What is SEACR?
Sparse Enrichment Analysis for CUT&RUN (SEACR) is a peak caller designed for use on paired end CUT&RUN data. It is able to more precisely call peaks compared to algorithms designed for ChIP-seq, resulting in a smaller number of false positives in datasets. Because SEACR was built around using precise position and fragment information in CUT&RUN data, it may not be sensitive enough to detect all peaks in a dataset, a tradeoff for optimizing increased peak calling precision.
On Pluto, you will have the option between relaxed and stringent for SEACR peak calling when setting up your analysis. This refers to setting a signal threshold for identifying peaks. Relaxed considers the total signal threshold between the knee of the peak and peak of a signal curve while stringent just uses the peak of the curve. The default for SEACR is stringent. A visual example of how this is calculated is shown in Figure 2 of Meers et al. which describes how SEACR works (below).
Should I use MACS2 or SEACR?
MACS2 is a versatile peak calling tool that can be used with either ChIP-seq or CUT&RUN, but may call peaks that might not exist. If you have a ChIP-seq dataset, start with MACS2 before trying SEACR (if desired).
SEACR may be more appropriate in cases where you want to increase peak calling precision and reduce false positives, especially in datasets with higher signal to noise ratios such as CUT&RUN and CUT&Tag. If you have a CUT&RUN dataset, start with SEACR.
Choosing a peak caller will be dependent on the goals of your analysis and how confident you want to be identifying a peak. Results from MACS2 and SEACR can also be compared to see what peaks are called by both methods, providing a more clear starting point for peaks to move forward with.
References
-
Zhang Y, Liu T, Meyer CA, et al. Model-based analysis of ChIP-Seq (macs). Genome Biology. 2008;9(9). doi:10.1186/gb-2008-9-9-r137
-
Meers MP, Tenenbaum D, Henikoff S. Peak calling by sparse enrichment analysis for CUT&RUN chromatin profiling. Epigenetics & Chromatin. 2019;12(1). doi:10.1186/s13072-019-0287-4