Refine your single cell RNA-seq data by removing cluster(s)
Written by Caitlin Winkler, PhD
Refine cluster is the seventh and final preprocessing step in the Preprocessing phase of the Experiment roadmap, and follows the Cluster step. The Refine cluster step is optional.
During this step, you have the opportunity to refine your data by removing cluster(s) before finalizing your workflow and proceeding to cluster annotation and downstream analysis.
You might want to remove cluster(s) from your data set if you've been able to identify cluster(s) that appear to be primarily driven by technical artifacts, noise, or low-quality cells. However, it is important to exercise caution when removing clusters, especially if they could potentially represent rare cell populations or subtypes that are biologically important.
Running the preprocess
To run the Refine clusters preprocess, you will first select whether or not you would like to remove any cluster(s) or skip the Refine clusters step. If you decide to remove any cluster(s), you will select the resolution you want to remove clusters from and then will select the clusters you want to remove.
Once you've selected your parameters, click Run preprocessing step. Feel free to navigate away from the modal window while the preprocess is running; you will get an email notification once the preprocess has successfully completed and you are able to move on to the next step.
Check out the Instructions & Tips tab in the modal window for more information about the preprocess, as well as recommendations on what to consider when removing cluser(s) from your data.
With all single cell RNA-seq preprocesses, your sample-level metadata is readily available for reference within the modal window under the Samples tab.
Navigating the results
The Refine cluster step returns several plots and results tables for each resolution that you can review, similar to the Cluster step. Most plots are interactive, and all results tables are downloadable.
For the dimensionality reduction plot(s) (Dim plot), you can toggle between different clusters by clicking the legend (one click will highlight the cluster, and clicking the cluster again will return the plot to the original view). The Results tab shows the marker genes identified per cluster.
The Dot plot is not interactive, but is scrollable, and highlights the top 3 marker genes identified per cluster. The Results tab shows the marker genes identified per cluster (this is the same results table as when viewing the Dim plot).
The Sample by cluster composition plot is interactive. You can hover over the bars to get additional composition info. You can also toggle between different clusters by clicking on the legend (one click will hide the cluster, a double-click will isolate the cluster; click or double-click on the cluster or sample again to return the plot to the original view). The Results tab shows the tabular results used to generate the Sample by cluster composition plot.
Similar to the Sample by cluster composition plot, the Cluster by sample composition plots is also interactive. You can hover over the bars to get additional composition info, and toggle between different samples by clicking on the legend (one click will hide the sample, a double-click will isolate the sample; click or double-click on the sample again to return the plot to the original view). The Results tab shows the tabular results used to generate the Cluster by sample composition plot.
Accepting the results
If you need to or want to change any of the preprocess parameters, you can rerun the process by updating your parameters and clicking the Apply new updates button. Once you are ready to move on to the next preprocessing step in the workflow, click Accept results & proceed. This will pop-up an additional confirmation window, where you can click Yes, accept & proceed to continue on with the workflow or No, take me back if you would like to keep modifying the Refine clusters step.
What's next?
After you have successfully completed the Refine clusters step, and if you are happy with your preprocess workflow, the next step is to finalize the workflow. This will allow you to proceed to cluster annotation and downstream analysis!