Methods for the out-of-the-box CUT&RUN pipeline (nf-core/cutandrun)
Overview
On all plans in Pluto, scientists are able to run a streamlined, community-validated CUT&RUN pipeline, nf-core/cutandrun. Written with Nextflow and available as an open source repository on Github, this pipeline is actively maintained and widely considered an industry-standard for analyzing CUT&RUN data.
To run the pipeline, you'll upload a sample annotation table and raw FASTQ files. From there, you'll be prompted to select 3 different parameters:
- Genome build - Pluto will show you available genome builds based on the organism you selected when creating the experiment (e.g. GRCh38 / hg38 for human samples)
- Library type - If your experiment generated both R1 and R2 FASTQ files, select paired-end, otherwise select single-end.
- Peak type - You'll select broad or narrow, depending on your antibody. Typically broad is used for histone marks and narrow is used for transcription factors.
- Peak caller - MACS2 or SEACR are implemented as options in the nf-core/cutandrun pipeline.
- Peak calling p-value - Default p<0.0001, this determines the p-value to apply when calling consensus peaks.
Other notes on peak calling
Generally speaking, "peaks" refer to regions where at least one of the target samples significantly exceeds its matched control for a given genomic region. In the nf-core/cutandrun pipeline, peaks are defined as regions where at least one of the target samples significantly exceeds its matched IgG control for a given genomic region. Consensus peaks are merged across all samples using BEDTools and annotated to the nearest gene transcriptional start site using HOMER. Peak counts are generated using featureCounts.
The pipeline outputs:
- Dynamic methods section with references
- MultiQC report
- BAM/BAI files (and additional pipeline intermediates depending on your plan tier)
- Raw count matrix, which you'll then use for flexible downstream analysis on Pluto's canvas
Citable methods
[Single-end / Paired-end] FASTQ files were processed using the nf-core-cutandrun pipeline (v3.1).1,2,3 Adapter sequences were removed with Trim Galore4. Reads were aligned with Bowtie25 to GRCh38 (NCBI, p.14, release 110) and duplicates were removed for only control samples using picard6.Peaks were called with macs27 (normalization mode: CPM). Peaks were defined as regions where at least one of the target samples significantly exceeds its matched IgG control for a given genomic region. Consensus peaks were merged across all samples using BEDTools8 and annotated to the nearest gene transcriptional start site using HOMER9. Peak counts were generated using featureCounts10.
Other parameters
Besides the user-editable parameters described above, the nf-core/cutandrun pipeline has numerous other parameters. View the full list on the nf-core website for more details.
In Pluto's Start-Up tiers, these parameters are set to reasonable and robust default values that have been shown to produce reliable results across a wide variety of experiments. Below is the full list of parameters with default values (you can also download the parameters as JSON):
{
"save_spikein_aligned": false,
"extend_fragments": true,
"custom_config_base": "https://raw.githubusercontent.com/nf-core/configs/master",
"dt_heatmap_gene_beforelen": 3000,
"plaintext_email": false,
"run_deeptools_qc": false,
"run_consensus_all": true,
"run_input_check": true,
"run_igv": true,
"aligner": "bowtie2",
"peakcaller": "<selected peak caller>",
"only_genome": false,
"seacr_norm": "non",
"macs2_pvalue": 0.05,
"gtf": "<path to file>",
"save_trimmed": false,
"clip_r1": 0,
"clip_r2": 0,
"skip_fastqc": false,
"spikein_genome": "Ecoli_K12_MG1655",
"bowtie2": "<path to file>",
"minimum_alignment_q_score": 20,
"run_cat_fastq": true,
"skip_peak_qc": false,
"only_preqc": false,
"trim_nextseq": 0,
"normalisation_binsize": 50,
"publish_dir_mode": "copy",
"input": "<input sample manifest>",
"only_alignment": false,
"dt_heatmap_peak_afterlen": 3000,
"dedup_target_reads": false,
"enable_conda": false,
"macs2_broad_cutoff": 0.00001,
"dt_qc_bam_binsize": 500,
"config_profile_url": "https://cloud.google.com/batch",
"use_private_ip": true,
"run_alignment": true,
"run_peak_qc": true,
"three_prime_clip_r1": 0,
"save_unaligned": false,
"three_prime_clip_r2": 0,
"fasta": "<path to user-selected genome>",
"custom_config_version": "master",
"run_peak_calling": true,
"save_align_intermed": true,
"replicate_threshold": 1,
"normalisation_c": 10000,
"macs_gsize": 2940000000,
"fragment_size": 100,
"schema_ignore_params": "genomes,callers,dedup_control_only,fragment_size,run_igv,run_multiqc,run_reporting,run_consensus_all,run_peak_calling,run_remove_dups,run_mark_dups,run_read_filter,run_alignment,run_trim_galore_fastqc,run_cat_fastq,run_input_check,run_genome_prep,run_peak_qc,run_deeptools_qc,run_deeptools_heatmaps,run_preseq",
"only_input": false,
"min_peak_overlap": 0.2,
"config_profile_description": "Google Cloud Batch API Profile",
"run_multiqc": true,
"google_debug": false,
"blacklist": "<path to user-selected genome>",
"skip_trimming": false,
"seacr_peak_threshold": 0.05,
"skip_removeduplicates": false,
"use_control": true,
"outdir": "<oudir>",
"genome": "<path to user-selected genome>",
"help": false,
"normalisation_mode": "CPM",
"run_reporting": true,
"run_coverage": true,
"save_reference": true,
"monochrome_logs": false,
"max_cpus": 32,
"igv_show_gene_names": true,
"skip_multiqc": false,
"skip_preseq": false,
"igg_scale_factor": 0.5,
"max_multiqc_email_size": "25.MB",
"use_spot": true,
"max_time": "240.h",
"skip_heatmaps": true,
"pipeline_type": "experiment",
"tracedir": "<outdir for pipeline info>",
"validate_params": true,
"config_profile_contact": "Hatem Nawar @hnawar",
"consensus_peak_mode": "all",
"run_trim_galore_fastqc": true,
"only_filtering": false,
"igenomes_base": "s3://ngi-igenomes/igenomes/",
"spikein_fasta": "<path to genome.fa>",
"dt_heatmap_gene_bodylen": 5000,
"skip_igv": false,
"run_remove_dups": true,
"google_bucket": false,
"multiqc_title": "<experiment ID>",
"max_memory": "128.GB",
"boot_disk": "100 GB",
"run_preseq": true,
"dt_heatmap_gene_afterlen": 3000,
"run_mark_dups": true,
"save_merged_fastq": true,
"skip_dt_qc": false,
"run_genome_prep": true,
"experiment_id": "<experiment ID>",
"only_peak_calling": false,
"skip_reporting": false,
"igenomes_ignore": false,
"run_deeptools_heatmaps": false,
"spikein_bowtie2": "<path to Bowtie2Index>",
"macs2_narrow_peak": false,
"min_frip_overlap": 0.2,
"dt_heatmap_peak_beforelen": 3000,
"run_read_filter": true,
"show_hidden_params": false,
"google_preemptible": true,
"location": "us-central1",
"callers": [
"<selected peak caller>"
],
"google_zone": "<workspace region>"
}