Methods and parameters for the out-of-the-box RNA-seq pipeline (nf-core/rnaseq) offered in Pluto
Overview
On all plans in Pluto, scientists are able to run a streamlined, community-validated RNA-seq pipeline, nf-core/rnaseq. Written with Nextflow and available as an open source repository on Github, this pipeline is actively maintained and widely considered the industry-standard for analyzing RNA-seq data.
To run the pipeline, you'll upload a sample annotation table and raw FASTQ files. From there, you'll be prompted to select 3 different parameters:
- Genome build - Pluto will show you available genome builds based on the organism you selected when creating the experiment (e.g. GRCh38 / hg38 for human samples)
- Strandedness - If you know the strandedness of your experiment, select forward, reverse, or unstranded. If you don't know, simply select auto and the pipeline will infer strandedness.
- Library type - If your experiment generated both R1 and R2 FASTQ files, select paired-end, otherwise select single-end.
The pipeline outputs:
- Dynamic methods section with references
- MultiQC report
- BAM/BAI files (and additional pipeline intermediates depending on your plan tier)
- Raw count matrix, which you'll then use for flexible downstream analysis on Pluto's canvas
Citable methods
Raw FASTQ files were processed using the nf-core/rnaseq pipeline (v3.7) (1, 2). Adapter sequences were removed with Trim Galore (3). Reads were aligned with STAR (4) to GRCh38/hg19 (human) or GRCm38/mm10 (mouse) and quantified to gene counts using RSEM (5).
References
- Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol (2020). doi: 10.1038/s41587-020-0439-x.
- nf-core/rnaseq (latest version). https://nf-co.re/rnaseq. doi: 10.5281/zenodo.1400710.
- FelixKrueger/TrimGalore (latest version). https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/. doi: 10.5281/zenodo.5127898.
- Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics (2013). doi: 10.1093/bioinformatics/bts635.
- Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics (2011). doi: 10.1186/1471-2105-12-323.
Other parameters
Besides the 3 user-editable parameters above, the nf-core/rnaseq pipeline has numerous other parameters. View the full list on the nf-core website for more details.
In Pluto's Starter tiers, these parameters are set to reasonable and robust default values that have been shown to produce reliable results across a wide variety of experiments. Below is the full list of parameters with default values (you can also download the parameters as JSON):
{
"remove_ribo_rna": false,
"custom_config_base": "https://raw.githubusercontent.com/nf-core/configs/master",
"skip_deseq2_qc": true,
"gencode": false,
"umitools_dedup_stats": false,
"plaintext_email": false,
"gene_bed": "",
"save_reference": false,
"skip_markduplicates": false,
"ribo_database_manifest": "/.nextflow/assets/pluto-biosciences/pluto-pipelines-rnaseq/assets/rrna-db-defaults.txt",
"monochrome_logs": false,
"aligner": "star_rsem",
"max_cpus": 32,
"featurecounts_group_type": "gene_biotype",
"save_bbsplit_reads": false,
"skip_multiqc": false,
"skip_preseq": true,
"skip_dupradar": false,
"save_align_intermeds": true,
"gtf": <organism-specific GTF>,
"max_multiqc_email_size": "25.MB",
"use_spot": true,
"max_time": "240.h",
"save_trimmed": false,
"min_trimmed_reads": 10000,
"pipeline_type": "experiment",
"skip_fastqc": false,
"deseq2_vst": true,
"umitools_extract_method": "string",
"validate_params": true,
"config_profile_contact": "Hatem Nawar @hnawar",
"bam_csi_index": false,
"with_umi": false,
"skip_qc": false,
"version": false,
"trimmer": "trimgalore",
"publish_dir_mode": "copy",
"input": <user-created sample annotation file>,
"skip_bigwig": false,
"strandedness": "auto",
"igenomes_base": "s3://ngi-igenomes/igenomes",
"stringtie_ignore_gtf": false,
"config_profile_url": "https://cloud.google.com/batch",
"use_private_ip": true,
"skip_umi_extract": false,
"google_bucket": false,
"save_unaligned": false,
"featurecounts_feature_type": "exon",
"multiqc_title": <experiment ID>,
"custom_config_version": "master",
"max_memory": "128.GB",
"boot_disk": "100 GB",
"fasta": <organism-specific fasta>,
"hisat2_build_memory": "200.GB",
"skip_rseqc": false,
"save_non_ribo_reads": false,
"skip_alignment": false,
"skip_qualimap": false,
"star_index": "",
"star_ignore_sjdbgtf": false,
"skip_stringtie": true,
"gtf_extra_attributes": "gene_name",
"save_merged_fastq": true,
"rsem_index": <organism-specific index>,
"save_umi_intermeds": false,
"schema_ignore_params": "genomes",
"config_profile_description": "Google Cloud Batch API Profile",
"min_mapped_reads": 5,
"rseqc_modules": "bam_stat,inner_distance,infer_experiment,junction_annotation,junction_saturation,read_distribution,read_duplication",
"experiment_id": <experiment ID>,
"google_debug": false,
"gtf_group_features": "gene_id",
"skip_trimming": false,
"igenomes_ignore": false,
"skip_pseudo_alignment": false,
"outdir": <out directory>,
"skip_bbsplit": true,
"genome": <user-selected genome build>,
"help": false,
"show_hidden_params": false,
"test_data_base": "https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3",
"google_preemptible": true,
"location": <GCP region>,
"umitools_grouping_method": "directional",
"google_zone": <GCP zone>,
"skip_biotype_qc": true
}