RNA-seq pipeline parameters

Methods and parameters for the out-of-the-box RNA-seq pipeline (nf-core/rnaseq) offered in Pluto

Overview

On all plans in Pluto, scientists are able to run a streamlined, community-validated RNA-seq pipeline, nf-core/rnaseq. Written with Nextflow and available as an open source repository on Github, this pipeline is actively maintained and widely considered the industry-standard for analyzing RNA-seq data.

To run the pipeline, you'll upload a sample annotation table and raw FASTQ files. From there, you'll be prompted to select 3 different parameters:

  • Genome build - Pluto will show you available genome builds based on the organism you selected when creating the experiment (e.g. GRCh38 / hg38 for human samples)
  • Strandedness - If you know the strandedness of your experiment, select forward, reverse, or unstranded. If you don't know, simply select auto and the pipeline will infer strandedness.
  • Library type - If your experiment generated both R1 and R2 FASTQ files, select paired-end, otherwise select single-end.

The pipeline outputs:

  • Dynamic methods section with references
  • MultiQC report
  • BAM/BAI files (and additional pipeline intermediates depending on your plan tier)
  • Raw count matrix, which you'll then use for flexible downstream analysis on Pluto's canvas

Citable methods

Raw FASTQ files were processed using the nf-core/rnaseq pipeline (v3.7) (1, 2). Adapter sequences were removed with Trim Galore (3). Reads were aligned with STAR (4) to GRCh38/hg19 (human) or GRCm38/mm10 (mouse) and quantified to gene counts using RSEM (5).

References

  1. Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol (2020). doi: 10.1038/s41587-020-0439-x.
  2. nf-core/rnaseq (latest version). https://nf-co.re/rnaseq. doi: 10.5281/zenodo.1400710.
  3. FelixKrueger/TrimGalore (latest version). https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/. doi: 10.5281/zenodo.5127898.
  4. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics (2013). doi: 10.1093/bioinformatics/bts635.
  5. Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics (2011). doi: 10.1186/1471-2105-12-323.

Other parameters

Besides the 3 user-editable parameters above, the nf-core/rnaseq pipeline has numerous other parameters. View the full list on the nf-core website for more details.

In Pluto's Starter tiers, these parameters are set to reasonable and robust default values that have been shown to produce reliable results across a wide variety of experiments. Below is the full list of parameters with default values (you can also download the parameters as JSON):

{
    "remove_ribo_rna": false,
    "custom_config_base": "https://raw.githubusercontent.com/nf-core/configs/master",
    "skip_deseq2_qc": true,
    "gencode": false,
    "umitools_dedup_stats": false,
    "plaintext_email": false,
    "gene_bed": "",
    "save_reference": false,
    "skip_markduplicates": false,
    "ribo_database_manifest": "/.nextflow/assets/pluto-biosciences/pluto-pipelines-rnaseq/assets/rrna-db-defaults.txt",
    "monochrome_logs": false,
    "aligner": "star_rsem",
    "max_cpus": 32,
    "featurecounts_group_type": "gene_biotype",
    "save_bbsplit_reads": false,
    "skip_multiqc": false,
    "skip_preseq": true,
    "skip_dupradar": false,
    "save_align_intermeds": true,
    "gtf": <organism-specific GTF>,
    "max_multiqc_email_size": "25.MB",
    "use_spot": true,
    "max_time": "240.h",
    "save_trimmed": false,
    "min_trimmed_reads": 10000,
    "pipeline_type": "experiment",
    "skip_fastqc": false,
    "deseq2_vst": true,
    "umitools_extract_method": "string",
    "validate_params": true,
    "config_profile_contact": "Hatem Nawar @hnawar",
    "bam_csi_index": false,
    "with_umi": false,
    "skip_qc": false,
    "version": false,
    "trimmer": "trimgalore",
    "publish_dir_mode": "copy",
    "input": <user-created sample annotation file>,
    "skip_bigwig": false,
    "strandedness": "auto",
    "igenomes_base": "s3://ngi-igenomes/igenomes",
    "stringtie_ignore_gtf": false,
    "config_profile_url": "https://cloud.google.com/batch",
    "use_private_ip": true,
    "skip_umi_extract": false,
    "google_bucket": false,
    "save_unaligned": false,
    "featurecounts_feature_type": "exon",
    "multiqc_title": <experiment ID>,
    "custom_config_version": "master",
    "max_memory": "128.GB",
    "boot_disk": "100 GB",
    "fasta": <organism-specific fasta>,
    "hisat2_build_memory": "200.GB",
    "skip_rseqc": false,
    "save_non_ribo_reads": false,
    "skip_alignment": false,
    "skip_qualimap": false,
    "star_index": "",
    "star_ignore_sjdbgtf": false,
    "skip_stringtie": true,
    "gtf_extra_attributes": "gene_name",
    "save_merged_fastq": true,
    "rsem_index": <organism-specific index>,
    "save_umi_intermeds": false,
    "schema_ignore_params": "genomes",
    "config_profile_description": "Google Cloud Batch API Profile",

    "min_mapped_reads": 5,
    "rseqc_modules": "bam_stat,inner_distance,infer_experiment,junction_annotation,junction_saturation,read_distribution,read_duplication",
    "experiment_id": <experiment ID>,
    "google_debug": false,
    "gtf_group_features": "gene_id",
    "skip_trimming": false,
    "igenomes_ignore": false,
    "skip_pseudo_alignment": false,
    "outdir": <out directory>,
    "skip_bbsplit": true,
    "genome": <user-selected genome build>,
    "help": false,
    "show_hidden_params": false,
    "test_data_base": "https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3",
    "google_preemptible": true,
    "location": <GCP region>,
    "umitools_grouping_method": "directional",
    "google_zone": <GCP zone>,
    "skip_biotype_qc": true
}