Prediction of intercellular communication networks using CellComm

doi:10.21203/rs.3.pex-1843/v1

Intercellular communication is important for tissue development and homeostasis, and when dysregulated contributes to a multitude of pathobiological processes. Cells communicate with each other by several mechanisms, including direct cell-cell contacts between membrane-tethered ligands and receptors on the cell surface, through secreted molecules that bind their cognate receptor on the receiving cell, or alternative modalities such as exosomes. The study of cell-cell communication networks using single-cell genomic approaches is now under intensive investigation, and innovative algorithms to interpret the data, infer how cells interact and identify the downstream effectors of a putative binding of a ligand to its cognate receptor are critically needed. Here, we describe a protocol to run CellComm, a data-driven systems biology algorithm that integrates single-cell RNA-sequencing, protein interaction networks, and gene regulatory networks to infer which cells within a heterogeneous tissue are actively communicating, as well as their downstream transcriptional programs. When spatial transcriptomics data is available, CellComm additionally identifies spatially-resolved cell-cell interactions within the tissue. This protocol is associated with our Nature Cell Biology paper describing our algorithm: CellComm infers cellular crosstalk that drives hematopoietic stem and progenitor cell development.

CellComm

scRNA-seq

spatial transcriptomics

cellular communication

intracellular signaling

Visium 10X

# Data

We applied CellComm to Spatial Transcriptomics data generated using the 10X Visium technology. This dataset was obtained from the article Multimodal Analysis of Composition and Spatial Architecture in Human Squamous Cell Carcinoma (Cell 182, 1661–1662) published by Andrew Ji and collaborators in 2020. This dataset profiled/analyzed the tumor and tumor microenvironment (TME) of cutaneous squamous cell carcinoma (cSCC). In summary, this work identified a tumor-specific keratinocyte (TSK) population that interacts with a fibrovascular niche.

# Before starting...

More information about CellComm and additional tutorials can be found in https://github.com/edroaldo/fusca

Before you start you should create a folder for your analyses. It must contain the folders filtered_feature_bc_matrix (containing the files matrix.mtx.gz, barcodes.tsv.gz and features.tsv.gz), spatial (containing the files aligned_fiducials. jpg, detected_tissue_image.jpg, scalefactors_json.json, tissue_hires_image.png, tissue_lowres_image.png and tissue_positions_list.csv) and a folder where the results (especially the figures) will be stored (here we called results).

The files contained in folders filtered_feature_bc_matrix and spatial are the outputs of Visium 10X technology and, therefore, it is possible to analyze any Visium 10X dataset.

1. Install fusca package - Framework for Unified Single-Cell Analysis.

# Code:

install.packages("devtools")

library("devtools")

devtools::install_github("edroaldo/fusca")

2. Load libraries.

# Code:

library("dplyr")

library("fusca")

library("igraph")

library("scales")

library("ggplot2")

3. Set the 1 - working directory as your analysis folder and 2 - base directory as your results folder.

# Code

setwd("/workspace/Edroaldo/CellComm")

bd <- "results"

4. Create a CellRouter object containing expression and image information from ST data.

Here we use the function read10X to read or organize the objects in folder filtered_feature_bc_matrix. Then we create the CellRouter object using the CreateCellRouter function and add the tissue image and data related to it using the read10XImage function.

Used parameters the CreateCellRouter function:

- assay.type = "ST", defining spatial transcriptomics analysis.

- min.genes = 100, defining the minimum number of genes that a cell must express to be considered.

- min.cells = 25, defining the minimum number of cells that a gene is expressed (*counts* > 0) to be considered.

- is.expr = 0, defining the minimum expression value (*counts*) a gene must have to be considered expressed.

Used parameters the read10XImage function:

- assay.type = "ST", defining spatial transcriptomics analysis.

# Code:

sample_names <- c("Sample1")

data <- read10X("filtered_feature_bc_matrix/")

image_paths <- c("spatial/tissue_lowres_image.png")

scalefactor_paths <- c("spatial/scalefactors_json.json")

tissue_paths <- c("spatial/tissue_positions_list.csv")

cellrouter <- CreateCellRouter(data, assay.type = "ST",

min.genes = 100, min.cells = 25, is.expr = 0)

cellrouter <- read10XImage(cellrouter, assay.type = "ST",

sample_names, image_paths, scalefactor_paths, tissue_paths)

5. Preprocess the expression data.

Here we use the function FindVariableGenes to identify the genes with the highest variance of expression, normalize and scale the expression data using the Normalize and scaleData functions and, for the last (scaleData), we use the 3000 most variable genes.

Used parameters the FindVariableGenes function:

- assay.type = "ST", defining spatial transcriptomics analysis.

- method = "coefficient_variation", defining the method for the identification of the most variable genes

- pvalue = "0.05", defining the *p*-value threshold for the considered genes.

Used parameters the Normalize function:

- assay.type = "ST", defining spatial transcriptomics analysis.

Used parameters the ScaleData function:

- assay.type = "ST", defining spatial transcriptomics analysis.

- genes.use = character, defining the genes to center/scale. Here we used the 3000 most variable genes.

- blocksize = numeric, defining the size of the blocks in which genes will be scaled.

# Code:

var.genes <- FindVariableGenes(cellrouter, assay.type = "ST",

method = "coefficient_variation", pvalue = 0.05)

var.genes <- var.genes[order(var.genes$adj.pvalue, decreasing = FALSE), ]

[email protected] <- rownames(var.genes[1:3000, ])

cellrouter <- Normalize(cellrouter, assay.type = "ST")

cellrouter <- scaleData(cellrouter, assay.type = "ST",

genes.use = [email protected], blocksize = nrow(cellrouter@assays$ST@ndata))

6. Dimensionality reduction and clustering.

Then we reduce the dimensionality of the data using linear methods, with Principal Component Analysis (PCA) using the computePCA function and non-linear methods, with the t-Distributed Stochastic Neighbor Embedding (t-SNE) using the computeTSNE function and the Uniform Manifold Approximation and Projection (UMAP) using the computeUMAP function. As a next step we identified transcriptional clusters using the findClusters function.

Used parameters the computePCA function:

- assay.type = "ST", defining spatial transcriptomics analysis.

- seed = X, sets the seed of R‘s random number generator, which is useful for creating simulations or random objects/analysis that can be reproduced.

- num.pcs = 50, defining the number of principal components to compute.

- genes.use = character; defining the genes to compute the PCA. Here we used the 3000 most variable genes.

Used parameters the computeTSNE function:

- seed = X, sets the seed of R‘s random number generator, which is useful for creating simulations or random objects/analysis that can be reproduced.

- num.pcs = 8, defining the number of principal components to use in the tSNE calculation.

Used parameters the computeUMAP function:

- seed = X, sets the seed of R‘s random number generator, which is useful for creating simulations or random objects/analysis that can be reproduced.

- num.pcs = 8, defining the number of principal components to use in the UMAP calcularion.

- metric = "euclidean", defining the distances between data points to compute.

- n_neighbors = 15, defining the number of nearest neighbors.

- spread = 1, defining the effective scale of embedded points.

- min_dist = 0.1, defining how close the points appear in the final layout.

Used parameters the findClusters function:

- assay.type = "ST", defining spatial transcriptomics analysis.

- num.pcs = 8, defining the number of principal components to use for clustering

- nn.type = "knn", defining the method to find the *k*-nearest neighbors graph.

- k = 20, defining the number of nearest neighbors to build a *k*-nearest neighbors graph.

# Code:

cellrouter <- computePCA(cellrouter, assay.type = "ST",

seed = 42, num.pcs = 50, genes.use = [email protected])

plot(cellrouter@pca$sdev)

cellrouter <- computeTSNE(cellrouter,

seed = 1, num.pcs = 8, max_iter = 1000)

cellrouter <- computeUMAP(cellrouter,

seed = 1, num.pcs = 8, metric = "euclidean", n_neighbors = 15, spread = 1, min_dist = 0.1)

cellrouter <- findClusters(cellrouter, assay.type = "ST",

num.pcs = 8, nn.type = "knn", k = 20)

plotReducedDimension(cellrouter, assay.type = "ST", reduction.type = "tsne", annotation = "population", annotation.color = "colors",

dotsize = 1.5, showlabels = TRUE, labelsize = 5, convex = FALSE)

plotReducedDimension(cellrouter, assay.type = "ST", reduction.type = "umap", annotation = "population", annotation.color = 'colors',

dotsize = 1.5, showlabels = TRUE, labelsize = 5, convex = FALSE)

7. Calculate the Tumor-Specific Keratinocytes (TSK) and Fibrovascular cell gene-sets score.

Used parameters the scoreGeneSets function:

- assay.type = "ST", defining spatial transcriptomics analysis.

- genes.list = character, defining the genes to score.

- bins = 25, defining the bin number to divide the expression data.

# Code:

TSK <- c('FEZ1', 'IL24', 'INHBA', 'KCNMA', 'LAMC2', 'MAGEA4', 'MMP10', 'NT5E')

Fibrovascular <- c('CAV1', 'CD93', 'ENG', 'FN1', 'ITGB1')

genelist <- list(TSK = TSK, Fibrovascular = Fibrovascular)

cellrouter <- scoreGeneSets(cellrouter, assay.type = "ST", genes.list = genelist, bins = 25)

plotSignatureScore(cellrouter, assay.type = 'ST', names(genelist), reduction.type='umap', threshold = 2, columns = 2, dotsize=0.5)

plotSpatialClusters(cellrouter, assay.type = "ST", sample.name = "Sample1",

cluster.column = "population", colors.column = "colors", point.size = 1.7,

annotate_centroids = FALSE, title = NULL)

plotSelectedClusters(cellrouter, assay.type = "ST", sample.name = "Sample1",

cluster.column = "population", colors.column = "colors", point.size = 1.2,

annotate_centroids = FALSE, title = NULL,

selected_clusters = list(c(as.character(1:9))), facets = TRUE, num.cols = 3)

8. Transfer cell type annotations from the sc-RNAseq atlas of cSCC generated by Ji et al. 2020 to clusters identified from the ST data.

# Code:

tmp <- recode(as.character(cellrouter@assays$ST@sampTab$population),

"1"="Fibrovascular-like",

"7"='TSK',

"4"="Lymphoid-like1",

"5"="Lymphoid-like2",

"6"="Lymphoid-like2",

"5"="Basal-like", # !

"8"="KC-diff-like1",

"9"="KC-diff-like2",

"2"="KC-diff-like3",

"3"="KC-diff-like4")

names(tmp) <- rownames(cellrouter@assays$ST@sampTab)

df <- data.frame(cluster = cellrouter@assays$ST@sampTab$population, celltype = tmp)

rownames(df) <- rownames(cellrouter@assays$ST@sampTab)

cellrouter <- addInfo(cellrouter, assay.type = "ST", sample.name = sample_names[[1]],

metadata = df, colname = "celltype",

metadata.column = "celltype")

plotReducedDimension(cellrouter, assay.type = 'ST',

reduction.type = 'umap', annotation="celltype", annotation.color = 'celltype_color',

dotsize=1, showlabels = FALSE, labelsize=2.5, convex=FALSE)

plotSpatialClusters(cellrouter, assay.type = "ST", sample.name = "Sample1",

cluster.column = "celltype", colors.column = "celltype_color",

point.size = 2, title = NULL, annotate_centroids = FALSE)

TSK <- c('FEZ1', 'IL24', 'LAMC2', 'MMP10', 'NT5E')

plotDRExpression(cellrouter, assay.type = "ST",

reduction.type = "umap", dims.use = c(1, 2),

genelist = TSK[1:5], title = "", columns = 5, dotsize = 1, threshold = 5)

TSK <- c('FEZ1', 'IL24', 'INHBA', 'KCNMA', 'LAMC2', 'MAGEA4', 'MMP10', 'NT5E')

gene_plots <- plotSpatialExpression(cellrouter, assay.type = "ST", sample.name = "Sample1",

genes=c('TNC', 'MMP9'),

point.size = 1.7)

gene_plots

9. Identify cluster-specific gene signatures and marker genes.

Used parameters the *findSignatures* function:

- assay.type = "ST", defining spatial transcriptomics analysis.

- column = "celltype", defining the column with the groups in metadata.

- pos.only = TRUE, defining to get only the up-regulated genes.

- fc.threshold = 0.2, defining the fold change threshold.

# Code:

markers <- findSignatures(cellrouter, assay.type = "ST",

column = "celltype", pos.only = TRUE, fc.threshold = 0.2); gc();

markers.top <- markers %>% group_by(population) %>% top_n(10, fc)

markers.top <- as.data.frame(markers.top)

plot <- plotSignaturesHeatmap(cellrouter, assay.type = 'ST',

markers.top, genes.show = as.vector(markers.top$gene),

column.ann = 'celltype', column.color = 'celltype_color',

num.cells = 700, threshold = 2)

# **# CellComm in *cluster-cluster* communication**

Next, we used CellComm to infer spatially-informed cell-cell interactions by calculating an interaction score that takes into account the number of ligand-receptor interactions co-expressed between cell types and their spatial distances on the tissue.

10. Calculate the mean expression of ligands and receptors per cluster.

Here we accessed a list of ligand-receptor interactions published by Robin Browaeys and collaborators in 2020 (NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods 17, 159–162 (2020)) and averaged the expression of each ligand/receptor as well as the average mean expression of these in each transcriptional cluster of the spots using the computeValue and calculateObservedMean functions.

Used parameters the computeValue function:

- assay.type = "ST", defining spatial transcriptomics analysis.

- genelist = character, defining the genes to use in the analysis.

- column = "celltype", defining the column with the groups in metadata.

- fun = "mean", statistical function to summary the gene expression data.

Used parameters the population.pairing function:

- mean.expr = mean.expr, object defined/calculated by the *computeValue* function.

- pairs = pairs, object with ligands/receptors pairs.

- ligands = ligands, defining the list of genes annotated as ligands.

- receptors = receptors, defining the list of genes annotated as receptors.

- threshold = 0.25, threshold to select the genes in the population according to their mean expression.

Used parameters the calculateObservedMean function:

- mean.expr = mean.expr, object defined/calculated by the *computeValue* function.

- interactions = interactions, object defined/calculated by the *population.pairing* function.

Used parameters the findSignatures function:

- assay.type = "ST", defining spatial transcriptomics analysis.

- column = "celltype", defining the column with the groups in metadata.

- pos.only = TRUE, defining to get only the up-regulated genes.

- fc.threshold = 0.2, defining the fold change threshold.

# Code:

lr_network = readRDS(url("https://zenodo.org/record/3260758/files/lr_network.rds"))# from NicheNet

head(lr_network)

pairs <- lr_network

pairs$Pair.Name <- paste(pairs$from, pairs$to, sep = "_")

ligands <- unique(lr_network$from)

ligands <- intersect(ligands, rownames(cellrouter@assays$ST@ndata))

receptors <- unique(lr_network$to)

receptors <- intersect(receptors, rownames(cellrouter@assays$ST@ndata))

ligands.receptors <- unique(c(ligands, receptors))

mean.expr <- computeValue(cellrouter, assay.type = "ST",

genelist = ligands.receptors, "celltype", fun = "mean"); gc();

interactions <- population.pairing(mean.expr = mean.expr, pairs = pairs, ligands = ligands, receptors = receptors, threshold = 0.25)

interactions <- calculateObservedMean(mean.expr = mean.expr, interactions = interactions)

head(interactions)

markers <- findSignatures(cellrouter, assay.type = "ST",

column = "celltype", pos.only = TRUE, fc.threshold = 0.2); gc();

11. Calculate a null distribution of intracluster gene expression means.

Next, CellComm shuffles cell state labels to calculate a null distribution of intracluster gene expression means and calculate an empirical p-value, which is used to filter out ligand-receptor interactions observed by chance. This algorithm predicts potential cluster-cluster interaction networks based on ligand-receptor pairings observed more often than expected by chance.

Used parameters the clusterPermutation function:

- assay.type = "ST", defining spatial transcriptomics analysis.

- nPerm = 1000, defining the number of permutations.

- interactions = interactions, object defined/calculated by the *population.pairing* function.

- genelist = character, defining the genes to use in the analysis.

- cluster.label = "celltype", defining the column with the groups in metadata.

Used parameters the calculatePvalue function:

- p = p, object defined/calculated by the *clusterPermutation* function.

- nPerm = 1000, defining the number of permutations.

- interactions = interactions, object defined/calculated by the *population.pairing* function.

# Code:

genelist <- unique(c(interactions$ligand, interactions$receptor))

p <- clusterPermutation(cellrouter, assay.type = "ST",

genelist = genelist, interactions = interactions, cluster.label = "celltype", nPerm = 1000)

interactions.p <- calculatePvalue(p, nPerm = 1000, interactions2 = interactions)

#saveRDS(interactions.p, file = paste(bd, "/interactions_with_Pvalue_1000_cluster.rds", sep = ""))

#interactions.p <- readRDS(paste(basedir, "/interactions_with_Pvalue_1000_cluster.rds", sep = ""))

tmp <- interactions.p[which(interactions.p$pvalue < 0.01),]

head(tmp)

12. Calculate a network based on the number of interactions (ligand/receptors), cluster centroids and distances between these clusters.

Used parameters the cellnetwork3 function:

- threshold = 5, defining the number of elements in the pair for it to be selected.

Used parameters the calculateCentroids function:

- assay.type = "ST", defining spatial transcriptomics analysis.

- sample.name = "Sample1", defining the name of the sample.

- cluster.column = "celltype", defining the column with the groups in metadata.

- cluster.type = "Cluster", defining the column where the clusters are indicated. It could be either 'Cluster' or 'Subcluster'.

Used parameters the calculateDistanceMatrix function:

- assay.type = "ST", defining spatial transcriptomics analysis.

- sample.name = "Sample1", defining the name of the sample.

- cluster.type = "Cluster", defining the column where the clusters are indicated. It could be either 'Cluster' or 'Subcluster'.

- spot_distance = 100, defining the distance between the center of the spots.

- normalize = FALSE, normalize the distances dividing by the total size of the tissue.

# Code:

my_matrix <- interactionmatrix(tmp)

head(matrix)

my_graph <- cellnetwork3(tmp, threshold = 5)

cellrouter <- calculateCentroids(cellrouter, assay.type = "ST", sample.name = "Sample1",

cluster.column = "celltype", cluster.type = "Cluster")

cellrouter <- calculateDistanceMatrix(cellrouter, assay.type = "ST", sample.name = "Sample1",

cluster.type = "Cluster", spot_distance = 100, normalize = FALSE)

plots <- predictCellInteractions(cellrouter, assay.type = "ST", sample.name = "Sample1",

cluster.type = "Cluster", graph = my_graph, distance.threshold = 0.75)

gridExtra::grid.arrange(grobs = plots, ncol = 3)

plotSelectedClusters(cellrouter, assay.type = "ST", sample.name = "Sample1",

cluster.column = "celltype", colors.column = "celltype_color",

selected_clusters = list(c("TSK", "Fibrovascular-like", "KC-diff-like3", "Lymphoid-like2", "KC-diff-like4")),

point.size = 1.5, title = NULL, annotate_centroids = TRUE)

13. Comparison with the original publication.

We selected ligand-receptor pairs identified by CellComm based on the subset of ligands identified by the original publication, to make the analysis more comparable with the original publication. Specifically, we inferred ligand-receptor pairs between co-localized tumor-TME or TME-tumor interactions.

# Code:

x <- tmp[which(tmp$ligand %in% c("CCL2", "CCL5", "CTGF", "CXCL10", "CXCL16", "MMP9", "PGF", "TGFB1", "TIMP1", "TNC", "VEGFA")), ]

# Tumor -> Stroma

tumor <- c("TSK_Fibrovascular-like", 'TSK_Lymphoid-like2', "TSK_KC-diff-like4")

gene.pairs <- plotPairDotplot(x, interactionList = tumor, num.pairs = 50)

head(gene.pairs$markers)

print(gene.pairs$plot)

# Stroma -> Tumor

TME <- c("Fibrovascular-like_TSK", 'Lymphoid-like2_TSK', "KC-diff-like4_TSK")

gene.pairs <- plotPairDotplot(x, interactionList = TME, num.pairs = 50)

head(gene.pairs$markers)

print(gene.pairs$plot)

plotSelectedClusters(cellrouter, assay.type = "ST", sample.name = "Sample1",

cluster.column = "celltype", colors.column = "celltype_color",

selected_clusters = list(c("TSK")), title = NULL, annotate_centroids = FALSE)

plots <- plotSpatialExpression(cellrouter, assay.type = "ST", sample.name = "Sample1",

point.size = 1.7, genes = c("LRP1", "MMP9", "SDC1", "TNC"))

gridExtra::grid.arrange(grobs = plots, ncol = 3)

dotplot(object = cellrouter, assay.type = "ST",

genes.use = c("LRP1", "MMP9", "SDC1", "TNC"), column = 'celltype', thr = 0.5)

# **# CellComm in *subcluster-subcluster* communication**

We used CellComm to subcluster spatially-resolved transcriptional clusters based on their spatial localization on the tissue, and performed a focused, spatially-resolved, high-resolution analysis of intercellular communication between TSKs, fibrovascular-like and lymphoid-like niche.

# Code:

cellrouter <- findSubclusters(cellrouter, assay.type = "ST", sample.name = "Sample1",

cluster.column = "celltype", subcluster.column = "Subpopulation",

method = "mclust")

plotSpatialSubclusters(cellrouter, assay.type = "ST", sample.name = "Sample1",

cluster.column = "celltype", subcluster.column = "Subpopulation",

selected_clusters = list(c("TSK")),

title = NULL, annotate_centroids = FALSE)

plotSpatialSubclusters(cellrouter, assay.type = "ST", sample.name = "Sample1",

cluster.column = "celltype", subcluster.column = "Subpopulation",

selected_clusters = list(c("Fibrovascular-like")),

title = NULL, annotate_centroids = FALSE)

plotSpatialSubclusters(cellrouter, assay.type = "ST", sample.name = "Sample1",

cluster.column = "celltype", subcluster.column = "Subpopulation",

selected_clusters = list(c("Lymphoid-like2")),

title = NULL, annotate_centroids = FALSE)

14. Calculate the mean expression of ligands and receptors per subcluster.

Here we accessed a list of ligand-receptor interactions published by Robin Browaeys and collaborators in 2020 (NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods 17, 159–162 (2020)) and averaged the expression of each ligand/receptor as well as the mean mean expression of these in each transcriptional subclusters of the spots using the *computeValueSubclusters* and *calculateObservedMean* functions.

Used parameters the *computeValueSubclusters* function:

- assay.type = "ST", defining spatial transcriptomics analysis.

- genelist = character, defining the genes to use in the analysis.

- column = "celltype", defining the column with the groups in metadata.

- fun = "mean", statistical function to summary the gene expression data.

- clusters = character, defining the selected clusters.

Used parameters the *population.pairing* function:

- mean.expr = mean.expr.sub, object defined/calculated by the *computeValueSubclusters* function.

- pairs = pairs, object with ligands/receptors pairs.

- ligands = ligands, defining the list of genes annotated as ligands.

- receptors = receptors, defining the list of genes annotated as receptors.

- threshold = 0.25, threshold to select the genes in the population according to their mean expression.

Used parameters the *calculateObservedMean* function:

- mean.expr = mean.expr.sub, object defined/calculated by the *computeValueSubclusters* function.

- interactions = interactions, object defined/calculated by the *population.pairing* function.

Used parameters the *findSignatures* function:

- assay.type = "ST", defining spatial transcriptomics analysis.

- column = "celltype", defining the column with the groups in metadata.

- pos.only = TRUE, defining to get only the up-regulated genes.

- fc.threshold = 0.2, defining the fold change threshold.

lr_network = readRDS(url("https://zenodo.org/record/3260758/files/lr_network.rds"))# from NicheNet

head(lr_network)

pairs <- lr_network

pairs$Pair.Name <- paste(pairs$from, pairs$to, sep = "_")

ligands <- unique(lr_network$from)

ligands <- intersect(ligands, rownames(cellrouter@assays$ST@ndata))

receptors <- unique(lr_network$to)

receptors <- intersect(receptors, rownames(cellrouter@assays$ST@ndata))

ligand.receptors <- unique(c(ligands, receptors))

mean.expr.sub <- computeValueSubclusters(cellrouter, assay.type = "ST",

genelist = ligand.receptors, column = "celltype", fun = "mean",

clusters = c("TSK", "Fibrovascular-like", "Lymphoid-like2")); gc();

interactions <- population.pairing(mean.expr = mean.expr.sub, pairs, ligands = ligands, receptors = receptors, threshold = 0.25)

interactions <- calculateObservedMean(interactions, mean.expr.sub)

head(interactions)

markers <- findSignatures(cellrouter, assay.type = 'ST',

column = 'Subpopulation', pos.only = TRUE, fc.threshold = 0.2); gc();

15. Calculate a null distribution of intrasubcluster gene expression means.

Next, CellComm shuffles cell state labels to calculate a null distribution of intrasubcluster gene expression means and calculate an empirical p-value, which is used to filter out ligand-receptor interactions observed by chance. This algorithm predicts potential subcluster-subcluster interaction networks based on ligand-receptor pairings observed more often than expected by chance.

Used parameters the clusterPermutationSubcluster function:

- assay.type = "ST", defining spatial transcriptomics analysis.

- nPerm = 1000, defining the number of permutations.

- interactions = interactions, object defined/calculated by the *population.pairing* function.

- genelist = character, defining the genes to use in the analysis.

- cluster.label = "celltype", defining the column with the groups in metadata.

- clusters = character, defining the selected clusters.

Used parameters the calculatePvalue function:

- p = p, object defined/calculated by the clusterPermutation function.

- nPerm = 1000, defining the number of permutations.

- interactions = interactions, object defined/calculated by the *population.pairing* function.

genelist <- unique(c(interactions$ligand, interactions$receptor))

p <- clusterPermutationSubcluster(cellrouter, assay.type = "ST",

genelist = genelist, interactions = interactions, cluster.label = "celltype", nPerm = 1000,

clusters = c("TSK", "Fibrovascular-like", "Lymphoid-like2"))

interactions.p <- calculatePvalue(p, nPerm = 1000, interactions2 = interactions)

saveRDS(interactions.p, file = paste(bd, '/interactions_subcluster_with_Pvalue.rds', sep=""))

tmp <- interactions.p[which(interactions.p$pvalue < 0.01), ]

head(tmp)

16. Calculate a network based on the number of interactions (ligand/receptors), subcluster centroids and distances between these subclusters.

Used parameters the cellnetwork3 function:

- threshold = 5, defining the number of elements in the pair for it to be selected.

Used parameters the calculateCentroids function:

- assay.type = "ST", defining spatial transcriptomics analysis.

- sample.name = "Sample1", defining the name of the sample.

- cluster.column = "Subpopulation", defining the column with the groups in metadata.

- cluster.type = "Subcluster", defining the column where the clusters are indicated. It could be either 'Cluster' or 'Subcluster'.

Used parameters the calculateDistanceMatrix function:

- assay.type = "ST", defining spatial transcriptomics analysis.

- sample.name = "Sample1", defining the name of the sample.

- cluster.type = "Subcluster", defining the column where the clusters are indicated. It could be either 'Cluster' or 'Subcluster'.

- spot_distance = 100, defining the distance between the center of the spots.

- normalize = FALSE, normalize the distances dividing by the total size of the tissue.

# Code:

my_matrix <- interactionmatrix(tmp)

head(matrix)

my_graph <- cellnetwork3(tmp, threshold = 5)

cellrouter <- calculateCentroids(cellrouter, assay.type = "ST", sample.name = "Sample1",

cluster.column = "Subpopulation", cluster.type = "Subcluster")

cellrouter <- calculateDistanceMatrix(cellrouter, assay.type = "ST", sample.name = "Sample1",

cluster.type = "Subcluster", spot_distance = 100, normalize = FALSE)

plots <- predictCellInteractions(cellrouter, assay.type = "ST", sample.name = "Sample1",

cluster.type = "Subcluster", graph = my_graph, distance.threshold = 0.35) # Cluster: distance.threshold = 0.75

gridExtra::grid.arrange(grobs = plots, ncol = 3)

plotSelectedClusters(cellrouter, assay.type = "ST", sample.name = "Sample1",

cluster.column = "celltype", colors.column = "celltype_color",

selected_clusters = list(c("Fibrovascular-like", "KC-diff-like3", "KC-diff-like4", "Lymphoid-like2", "TSK")),

point.size = 1.5, title=NULL, annotate_centroids=TRUE)

x <- tmp[which(tmp$ligand %in% c("CCL2", "CCL5", "CTGF", "CXCL10", "CXCL16", "MMP9", "PGF", "TGFB1", "TIMP1", "TNC", "VEGFA")), ]

17. Comparison with the original publication.

We selected ligand-receptor pairs identified by CellComm based on the subset of ligands identified by the original publication, to make the analysis more comparable with the original publication. Specifically, we inferred ligand-receptor pairs between co-localized tumor-TME or TME-tumor interactions.

# Code:

# Tumor -> Stroma

tumor <- c("TSK-1_Fibrovascular-like-4", "TSK-2_Fibrovascular-like-4", "TSK-3_Fibrovascular-like-4", "TSK-4_Fibrovascular-like-4")

gene.pairs_01 <- plotPairDotplot(x, interactionList = tumor, num.pairs = 50)

head(gene.pairs_01$markers)

plot(gene.pairs_01$plot)

# Stroma -> Tumor

TME <- c("Fibrovascular-like-4_TSK-1", "Fibrovascular-like-4_TSK-2", "Fibrovascular-like-4_TSK-3", "Fibrovascular-like-4_TSK-4")

gene.pairs_02 <- plotPairDotplot(x, interactionList = TME, num.pairs = 50)

head(gene.pairs_02$markers)

plot(gene.pairs_02$plot)

plotSelectedClusters(cellrouter, assay.type = "ST", sample.name = "Sample1",

cluster.column = "celltype", colors.column = "celltype_color",

selected_clusters = list(c("TSK")), title = NULL, annotate_centroids = FALSE)

plots <- plotSpatialExpression(cellrouter, assay.type = "ST", sample.name = "Sample1",

point.size = 1.7, genes = c("LRP1", "MMP9", "SDC1", "TNC"),

cluster.column = "Subpopulation", selected_clusters = c("Fibrovascular-like-4", "TSK-1"))

gridExtra::grid.arrange(grobs = plots, ncol = 3)

18. Predict a gene regulatory network (GRN) and create/define a protein-protein interaction network (PPI) for intracellular signaling.

Then we identified transcription factors and transcription regulators based on gene ontology annotations and defined two igraph objects (networks) containing the GRN and the PPI network using the *buildGRN* and *createPPI* functions. The PPI network used data from the *iRefIndex* database.

Used parameters the *buildGRN* function:

- assay.type = "ST", defining spatial transcriptomics analysis.

- species = "Hs", defining the species: *Hs* for *Homo Sapiens* or *Mm* for *Mus Musculus*.

- genes.use = character, defining the genes to to include in the gene regulatory network.

- blocksize = numeric, defining the size of the blocks in which genes will be scaled.

- zscore = 4, defining the threshold to identify putative regulatory interactions.

- filename = character, filename where the GRN data will be saved.

Used parameters the *createPPI* function:

- filename = character, protein interaction data.

- expr = cellrouter@assays$ST@ndata, gene expression in each cell.

# Code:

filename <- paste(bd, "/GRN_P4R1.R", sep = "")

grn.data <- buildGRN(cellrouter, assay.type = "ST",

species = "Hs", genes.use = rownames(cellrouter@assays$ST@ndata), blocksize = nrow(cellrouter@assays$ST@ndata), zscore = 4,

filename = filename)

filename <- "9606.mitab.05-29-2019_zcat.txt"

inetwork <- createPPI(filename = filename, expr = cellrouter@assays$ST@ndata)

filename <- paste(bd, "/9606.mitab.05-29-2019_zcat.R", sep = "")

save(inetwork, file = filename)

19. Create a weighted PPI network based on cell-type (cluster/subcluster-type) specific correlation coefficients.

Used parameters the *cellnetworksPPI2* function:

- ppi = inetwork$network, defining the network as calculated by the *createPPI* function.

- expr = cellrouter@assays$ST@ndata, defining the gene expression in each cell/spot.

- samples = cellrouter@assays$ST@sampTab, defining the metadata information of the cells/spots.

- column = "Subpopulation", defining the column with the groups in metadata.

- corThr1 = 0.10, values smaller then corThr1 in the correlation matrix will be replaced by 0. Therefore, there will be no connections between the genes with value 0.

- corThr2 = 0.25, keep gene-gene interactions with correlation higher than the threshold.

- dir.prefix = "", defining the prefix of the directory where the files will be saved.

# Code:

#inetwork <- get(load(paste(basedir, "/9606.mitab.05-29-2019_zcat.R", sep = "")))

corTables <- cellnetworksPPI2(ppi = inetwork$network, expr = cellrouter@assays$ST@ndata,

samples = cellrouter@assays$ST@sampTab, column = 'Subpopulation',

corThr1 = 0.10, corThr2 = 0.25, dir.prefix = "")

grn.data <- get(load(paste(bd, '/GRN_P4R1.R', sep=""))) # !

20. Find paths between sources (receptors) and targets (transcription factors).

We had filtered out 1 - statistically significant interactions between subclusters (using a null distribution of intrasubcluster gene expression means), 2 - ligands identified by the original publication (*CCL2*, *CCL5*, *CTGF*, *CXCL10*, *CXCL16*, *MMP9*, *PGF*, *TGFB1*, *TIMP1*, *TNC*, *VEGFA*), 3 - interactions between tumor (TSK-1, -2, -3 and -4) and stroma (Fibrovascular-like-4). Now we define possible *sources* (receptors) and *targets/sinks* (all transcription factors in the expression data) of intracellular signaling. Then, we identify pathways connecting source and target populations using the *findpaths.simpleRJava* function.

Next, CellComm calculates an activity score for each receptor-to-regulator path based on the assumption that critically important downstream transcriptional regulators have their regulons (predicted target genes in our reconstructed gene regulatory network) enriched in cell/cluster/subcluster type-specific signatures, implicating the transcriptional regulator as a major determinant of cell/cluster/subcluster type identity. This computational approach allows CellComm to predict putative signaling pathways and downstream transcriptional networks without prior knowledge of pathway structure. We used the *summarize.flow* function to Preprocess the output of *findpaths.simpleRJava* to a format acceptable by the function *activeSignaling* which in turn is responsible for identifying the transcriptional regulators for which their predicted regulons are significantly enriched in cluster or cell type-specific gene signatures. The *rankpaths* function computes and ranks pathways from cell surface receptors to downstream transcriptional regulators based.

Used parameters the findpaths.simpleRJava function:

- cellcomm = cellcomm, CellComm object.

- sources.targets = sts2Sreceptor[[interaction]], list of cell surface receptors and downstream transcriptional regulators.

- file = filename to save intermediary results.

- dir = subfolder to save intermediary results.

- maindir = basedir from which subfolders will be created.

Used parameters the summarize.flow function:

- files = files, list of files pointing to flow networks containing raw output from findpaths.SimpleRJava function.

Used parameters the activeSignaling function:

- data = cellrouter@assaysSST@ndata, normalized gene expression data.

- sampTab = cellrouter@assays$ST@sampTab, study metadata, containing a column identifying clusters or cell types.

- grn = grn, the gene regulatory network previously reconstructed from the gene expression data.

- markers = markers, result of running findSignatures. It is a table containing differential expression results.

- summary = summary, output of function summary.flow.

- fc = 0.2, fold change used as a node size parameter in igraph graph generated by CellComm and determine whether to show or not the label for that node.

Used parameters the rankpaths function:

- summary = summary, the output of summarize.flow function.

- apathways = apathways$pathways, the output of activeSignaling function.

# Code:

tfs <- intersect(grn.data$tfs, rownames(cellrouter@assays$ST@ndata))

x <- rbind(gene.pairs_01$markers, gene.pairs_02$markers)

head(x, n = 3)

sts2 <- list()

for(i in unique(x$celltypes)){

tmp <- x[which(x$celltypes == i),]

rs <- unique(tmp$receptor)

sts2[["receptor"]][[i]]$sources <- rs # receptors

sts2[["receptor"]][[i]]$targets <- tfs # TFs (all)

}

# Example

#head(sts2$receptor$`TSK-1_Fibrovascular-like-4`)

wd_package <- getwd()

setwd(wd_package)

maindir <- paste0(wd_package, '')

# tumor <- c("TSK-1_Fibrovascular-like-4", "TSK-2_Fibrovascular-like-4", "TSK-3_Fibrovascular-like-4", "TSK-4_Fibrovascular-like-4")

# TME <- c("Fibrovascular-like-4_TSK-1", "Fibrovascular-like-4_TSK-2", "Fibrovascular-like-4_TSK-3", "Fibrovascular-like-4_TSK-4")

interactionList <- c(tumor, TME)

cellcomm <- CreateCellComm()

for(interaction in interactionList){

interaction <- gsub(" ", ".", interaction)

print(interaction)

celltype <- strsplit(interaction, split = "[_]")[[1]][2]

cat("Using ", celltype, " coexpression network\n")

filename <- paste(celltype, ".txt", sep='')

dirname <- gsub(("[.]"), ".", interaction)

cellcomm <- findpaths.simpleRJava(cellcomm, sources.targets = sts2$receptor[[interaction]], file = filename, dir = dirname, maindir = maindir)

}

files <- list()

for(i in interactionList){

f <- paste(strsplit(i, split = "_")[[1]][2], ".txt", sep = "")

files[[i]] <- f

}

summary <- summarize.flow(files, prefix = "")

grn <- grn.data$GRN_table

colnames(grn) <- c("target", "reg", "zscore", "corr")

sampTab <- cellrouter@sampTab

apathways <- activeSignaling(data = cellrouter@assays$ST@ndata, sampTab = cellrouter@assays$ST@sampTab, grn = grn, markers = markers, summary = summary, fc = 0.2)

npaths <- rankpaths(summary, apathways$pathways)

options(repr.plot.width = 7, repr.plot.height = 7)

a1 <- signaling2TF(npaths, num.pathways = 20, interaction = tumor[1])

a2 <- signaling2TF(npaths, num.pathways = 20, interaction = tumor[2])

a3 <- signaling2TF(npaths, num.pathways = 20, interaction = tumor[3])

a4 <- signaling2TF(npaths, num.pathways = 20, interaction = tumor[4])

combined <- do.call(rbind, list(a1$df, a2$df, a3$df, a4$df))

g <- ggplot(combined, aes(path, interaction)) + geom_point(aes(colour=score, size=score)) + #geom_tile(aes(fill = score2)) +

scale_colour_gradientn("Activity score",colours=c("midnightblue","dodgerblue3","white","goldenrod1","darkorange2")) + theme_bw() +

xlab("") + ylab("") +

theme(legend.position="top", axis.text.x = element_text(size=rel(1), angle=45, hjust=1),

panel.grid.minor = element_blank(),

axis.ticks=element_blank(),#)+#,

panel.border=element_rect(fill = NA, colour=alpha('black', 1),size=1)) +

guides(colour = guide_colourbar(title.position="top", title.hjust = 0.5),

size = guide_legend(title.position="top", title.hjust = 0.5)) + coord_flip()

g

a5 <- signaling2TF(npaths, num.pathways = 15, interaction = TME[1])

a6 <- signaling2TF(npaths, num.pathways = 15, interaction = TME[2])

a7 <- signaling2TF(npaths, num.pathways = 15, interaction = TME[3])

a8 <- signaling2TF(npaths, num.pathways = 15, interaction = TME[4])

combined <- do.call(rbind, list(a5$df, a6$df, a7$df, a8$df))

g <- ggplot(combined, aes(path, interaction)) + geom_point(aes(colour=score, size=score)) + #geom_tile(aes(fill = score2)) +

scale_colour_gradientn("Activity score",colours=c("midnightblue","dodgerblue3","white","goldenrod1","darkorange2")) + theme_bw() +

xlab("") + ylab("") +

theme(legend.position="top", axis.text.x = element_text(size=rel(1), angle=45, hjust=1),

panel.grid.minor = element_blank(),

axis.ticks=element_blank(),#)+#,

panel.border=element_rect(fill = NA, colour=alpha('black', 1),size=1)) +

guides(colour = guide_colourbar(title.position="top", title.hjust = 0.5),

size = guide_legend(title.position="top", title.hjust = 0.5)) + coord_flip()

g

Step: 7

Problem: The function scoreGeneSets creates columns in the metadata table that, if run more than once, generates column names different from the desired one, with suffix “.1”, “.2”, etc.

Solution: If the function has been run more than once, it is suggested to delete the columns added in the metadata table.

Step: 19

Problem: Networks with too many elements can cause java to crash.

Solution: Use higher values in the network trimming parameters, corThr1 e corThr2.

Time end-to-end for a beginner user should be around 4 hours (the computation of the null distribution/empirical p is the most computationally demanding step and, consequently, takes longer).

1. Ji, A. L. et al. Multimodal Analysis of Composition and Spatial Architecture in Human Squamous Cell Carcinoma. Cell. 182(2), 497-514 (2020).

This work was supported by the following grants from the National Institutes of Health (NIH): RC2DK120535 (GQD, TEN), U01HL134812 (GQD, TEN), R24DK092760 (GQD, TEN), R01DK098241 (TEN), F32DK122715, K01DK129409 (WWS). E.L.d.R would like to thank CAPES – Coordination for the Improvement of Higher Education Personnel, Brazil. C.K. was supported by a postdoctoral fellowship from the German Research Foundation (DFG; grant KU 3580/1-1).

Prediction of intercellular communication networks using CellComm

Status:

Version 1

Abstract

Procedure

Troubleshooting

Time Taken

References

Acknowledgements

Associated Publications

Status:

Version 1

Privacy Policy

Terms of Service

Cookie Settings