Assigning scRNA-seq data to known and de novo cell types using CellAssign

doi:10.21203/rs.2.10442/v1

Method Article

Assigning scRNA-seq data to known and de novo cell types using CellAssign

https://doi.org/10.21203/rs.2.10442/v1

This work is licensed under a CC BY 4.0 License

This protocol has been posted on Protocol Exchange, an open repository of community-contributed protocols sponsored by Nature Portfolio. These protocols are posted directly on the Protocol Exchange by authors and are made freely available to the scientific community for use and comment.

Version 1

posted

You are reading this latest protocol version

Assigning cells to known or de-novo cell types is an important step in the analysis of single-cell RNA-sequencing (scRNA-seq) data. This protocol outlines how to use the CellAssign R package to accomplish this.

Computational biology and bioinformatics

scRNA-seq

cell types

cell type assignment

RNA-seq

microenvironment

cell type composition

Assigning cells to known or de-novo cell types is an important step in the analysis of single-cell RNA-sequencing (scRNA-seq) data. CellAssign is a recently published statistical model that models the over-expression of a set of marker genes for each pre-specified cell type. CellAssign then computes a probability that each cell is of a given cell type, or is of an “unknown” cell type (does not reflect the expected expression of any of the specified cell types). These assignments can then be used to (i) study the cell type composition of each sample, (ii) focus in on a given cell type for further analysis (e.g. unsupervised clustering), or (iii) remove nuisance cell types.

Software:

R computing environment (> version 3.5)

The devtools R package

The cellassign R package (https://github.com/Irrationone/cellassign)

1. Install Tensorflow within R:

install.packages("tensorflow")

tensorflow::install_tensorflow()

2. Install cellassign by running

devtools::install_github(“Irrationone/cellassign”)

and load by calling library(cellassign)

3. Prepare single-cell expression data in the form of a SingleCellExperiment object

https://bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html

We will assume this object is “sce”. In rowData(sce) should be fields “ID”, corresponding to ensembl gene ID, and “Symbol”, corresponding to HGNC symbol.

4. Compute size factors using scran

sce <- computeSumFactors(sce)

5. Specify marker gene data

This is in the form of a list, where the names of the list are the cell types and the contents are marker genes for the cell types. An example can be found in the CellAssign package by calling data(example_TME_markers). As a simple example, we can create one for T cells and epithelial cells:

marker_list <- list(t_cells = c(“PTPRC”, “CD2”), epithelial = “EPCAM”)

Note that there is no requirement marker genes should be mutually exclusive or not expressed in other cell types.

6. Turn marker list into binary matrix using marker_list_to_mat

marker_mat <- marker_list_to_mat(marker_list)

Optional: an “unknown” cell type may be included by passing include_other = TRUE to marker_list_to_mat

7. Match IDs to rows of the SingleCellExperiment

mm <- match(rownames(marker_mat), rowData(sce)$Symbol)

8. Subset SingleCellExperiment to markers only

sce_marker <- sce[mm,]

9. Run CellAssign

fit <- cellassign(exprs_obj = sce_marker,

marker_gene_info = marker_mat,

s = sizeFactors(sce_marker))

Note that covariates can be included at this point by passing an argument named “x” to cellassign. For more information see the vignette below.

10. View assigned cell types

print(fit$cell_type)

11. View maximum likelihood parameter estimates

print(fit$mle_params)

This includes the cell assignment probabilities in fit$mle_params$gamma

For more detailed example see the package vignette at https://irrationone.github.io/cellassign/introduction-to-cellassign.html

Common errors include:

Including cells in the SingleCellExperiment or gene expression matrix passed to “cellassign” that have no counts remaining, after subsetting to marker genes only.

Not subsetting to marker genes only, ie passing a full SingleCellExperiment with all genes to “cellassign”. The marker matrix and expression data passed to cellassign should be for marker genes only.

Time end-to-end for a beginner user should be around 2 hours.

The resulting object returned by “cellassign” includes cell type assignments and maximum likelihood parameter estimates. This is in the form of a “cellassign_fit” object. This allows users to perform useful downstream analyses such as correlating cell type composition with phenotypes or further unsupervised analysis on cell subsets.

S.P.S. is a founder, shareholder, and consultant of Contextual Genomics Inc.

Download PDF

Version 1

posted

You are reading this latest protocol version

Assigning scRNA-seq data to known and de novo cell types using CellAssign

Status:

Version 1

Abstract

Introduction

Equipment

Procedure

Troubleshooting

Time Taken

Anticipated Results

Additional Declarations

Associated Publications

Status:

Version 1

Privacy Policy

Terms of Service

Cookie Settings