1. Introduction
Single-cell RNA sequencing (scRNA-seq) and Spatial RNA sequencing are
widely used techniques for profiling gene expression in individual cells
with their locations in the histological sections. These allow molecular
biology to be studied at a resolution that cannot be matched by bulk
sequencing of cell populations. To better visualize the result of
reduction, spatial gene expression pattern in single cell or spatial
experiment data, ggsc
provides some layer functions based
on the ggplot2
grammar. It can work with the
SingleCellExperiment
class or Seurat
class,
which are the widely used classes for storing data from single cell
experiment.
2. Installation
To install ggsc
package, please enter the following
codes in R:
3. The data pre-processing
Here we use an example data from a single sample (sample 151673) of
human brain dorsolateral prefrontal cortex (DLPFC) in the human brain,
measured using the 10x Genomics Visium platform. First, a brief/standard
data pre-processing were done with the scater
and
scran
packages.
library(BiocParallel)
library(STexampleData)
library(scater)
library(scran)
library(ggplot2)
# create ExperimentHub instance
eh <- ExperimentHub()
# query STexampleData datasets
myfiles <- query(eh, "STexampleData")
spe <- myfiles[["EH7538"]]
spe <- addPerCellQC(spe, subsets=list(Mito=grep("^MT-", rowData(spe)$gene_name)))
colData(spe) |> head()
## DataFrame with 6 rows and 13 columns
## barcode_id sample_id in_tissue array_row
## <character> <character> <integer> <integer>
## AAACAACGAATAGTTC-1 AAACAACGAATAGTTC-1 sample_151673 0 0
## AAACAAGTATCTCCCA-1 AAACAAGTATCTCCCA-1 sample_151673 1 50
## AAACAATCTACTAGCA-1 AAACAATCTACTAGCA-1 sample_151673 1 3
## AAACACCAATAACTGC-1 AAACACCAATAACTGC-1 sample_151673 1 59
## AAACAGAGCGACTCCT-1 AAACAGAGCGACTCCT-1 sample_151673 1 14
## AAACAGCTTTCAGAAG-1 AAACAGCTTTCAGAAG-1 sample_151673 1 43
## array_col ground_truth cell_count sum detected
## <integer> <character> <integer> <numeric> <numeric>
## AAACAACGAATAGTTC-1 16 NA NA 622 526
## AAACAAGTATCTCCCA-1 102 Layer3 6 8458 3586
## AAACAATCTACTAGCA-1 43 Layer1 16 1667 1150
## AAACACCAATAACTGC-1 19 WM 5 3769 1960
## AAACAGAGCGACTCCT-1 94 Layer3 2 5433 2424
## AAACAGCTTTCAGAAG-1 9 Layer5 4 4278 2264
## subsets_Mito_sum subsets_Mito_detected subsets_Mito_percent
## <numeric> <numeric> <numeric>
## AAACAACGAATAGTTC-1 37 9 5.94855
## AAACAAGTATCTCCCA-1 1407 13 16.63514
## AAACAATCTACTAGCA-1 204 11 12.23755
## AAACACCAATAACTGC-1 430 13 11.40886
## AAACAGAGCGACTCCT-1 1316 13 24.22234
## AAACAGCTTTCAGAAG-1 651 12 15.21739
## total
## <numeric>
## AAACAACGAATAGTTC-1 622
## AAACAAGTATCTCCCA-1 8458
## AAACAATCTACTAGCA-1 1667
## AAACACCAATAACTGC-1 3769
## AAACAGAGCGACTCCT-1 5433
## AAACAGCTTTCAGAAG-1 4278
colData(spe) |> data.frame() |>
ggplot(aes(x = sum, y = detected, colour = as.factor(in_tissue))) +
geom_point()
plotColData(spe, x='sum', y = 'subsets_Mito_percent', other_fields="in_tissue") + facet_wrap(~in_tissue)
Firstly, we filter the data to retain the cells that are in the
tissue. Then cell-specific biases are normalized using the
computeSumFactors
method.
spe <- spe[, spe$in_tissue == 1]
clusters <- quickCluster(
spe,
BPPARAM = BiocParallel::MulticoreParam(workers=2),
block.BPPARAM = BiocParallel::MulticoreParam(workers=2)
)
spe <- computeSumFactors(spe, clusters = clusters, BPPARAM = BiocParallel::MulticoreParam(workers=2))
spe <- logNormCounts(spe)
Next, we use the Graph-based clustering method to do the reduction
with the runPCA
and runTSNE
functions provided
in the scater
package.
# identify genes that drive biological heterogeneity in the data set by
# modelling the per-gene variance
dec <- modelGeneVar(spe)
# Get the top 15% genes.
top.hvgs <- getTopHVGs(dec, prop=0.15)
spe <- runPCA(spe, subset_row=top.hvgs)
output <- getClusteredPCs(reducedDim(spe), BPPARAM = BiocParallel::MulticoreParam(workers=2))
npcs <- metadata(output)$chosen
npcs
## [1] 13
reducedDim(spe, "PCAsub") <- reducedDim(spe, "PCA")[,1:npcs,drop=FALSE]
g <- buildSNNGraph(spe, use.dimred="PCAsub", BPPARAM = MulticoreParam(workers=2))
cluster <- igraph::cluster_walktrap(g)$membership
colLabels(spe) <- factor(cluster)
set.seed(123)
spe <- runTSNE(spe, dimred="PCAsub", BPPARAM = MulticoreParam(workers=2))
Dimensional reduction plot
Here, we used the sc_dim
function provided in the
ggsc
package to visualize the TSNE
reduction
result. Unlike other packages, ggsc
implemented the
ggplot2
graphic of grammar syntax and visual elements are
overlaid through the combinations of graphic layers. The
sc_dim_geom_label
layer is designed to add cell cluster
labels to a dimensional reduction plot, and can utilized different
implementation of text geoms, such as geom_shadowtext
in
the shadowtext
package and geom_text
in the
ggplot2
package (default) through the geom
argument.
sc_dim(spe, reduction = 'TSNE') +
sc_dim_geom_label(
geom = shadowtext::geom_shadowtext,
color='black',
bg.color='white'
)
Visualize ‘features’ on a dimensional reduction plot
To visualize the gene expression of cells in the result of reduction,
ggsc
provides sc_feature
function to highlight
on a dimensional reduction plot.
genes <- c('MOBP', 'PCP4', 'SNAP25', 'HBB', 'IGKC', 'NPY')
target.features <- rownames(spe)[match(genes, rowData(spe)$gene_name)]
sc_feature(spe, target.features[1], slot='logcounts', reduction = 'TSNE')
In addition, it provides sc_dim_geom_feature
layer
working with sc_dim
function to visualize the cells
expressed the gene and the cell clusters information simultaneously.
sc_dim(spe, slot='logcounts', reduction = 'TSNE') +
sc_dim_geom_feature(spe, target.features[1], color='black')
sc_dim(spe, alpha=.3, slot='logcounts', reduction = 'TSNE') +
ggnewscale::new_scale_color() +
sc_dim_geom_feature(spe, target.features, mapping=aes(color=features)) +
scale_color_viridis_d()
It also provides sc_dim_geom_ellipse
to add confidence
levels of the the cluster result, and sc_dim_geom_sub
to
select and highlight a specific cluster of cells.
selected.cluster <- c(1, 6, 8)
sc_dim(spe, reduction = 'TSNE') +
sc_dim_sub(subset=selected.cluster, .column = 'label')
sc_dim(spe, color='grey', reduction = 'TSNE') +
sc_dim_geom_sub(subset=selected.cluster, .column = 'label') +
sc_dim_geom_label(geom = shadowtext::geom_shadowtext,
mapping = aes(subset = label %in% selected.cluster),
color='black', bg.color='white')
Violin plot of gene expression
ggsc
provides sc_violin
to visualize the
expression information of specific genes using the violin layer with
common legend, the genes can be compared more intuitively.
sc_violin(spe, target.features[1], slot = 'logcounts',
.fun=function(d) dplyr::filter(d, value > 0)
) +
ggforce::geom_sina(size=.1)
sc_violin(spe, target.features, slot = 'logcounts') +
theme(axis.text.x = element_text(angle=45, hjust=1))
Spatial features
To visualize the spatial pattern of gene, ggsc
provides
sc_spatial
to visualize specific features/genes with image
information.
library(aplot)
f <- sc_spatial(spe, features = target.features,
slot = 'logcounts', ncol = 3,
image.mirror.axis = NULL,
image.rotate.degree = -90
)
f
pp <- lapply(target.features, function(i) {
sc_spatial(spe, features = i, slot = 'logcounts', image.rotate.degree = -90, image.mirror.axis = NULL)
})
aplot::plot_list(gglist = pp)
Session information
Here is the output of sessionInfo() on the system on which this document was compiled:
## R version 4.3.1 (2023-06-16)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.6.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] aplot_0.1.10 ggsc_1.0.2
## [3] scran_1.30.0 scater_1.30.0
## [5] ggplot2_3.4.2 scuttle_1.12.0
## [7] STexampleData_1.9.0 SpatialExperiment_1.12.0
## [9] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
## [11] Biobase_2.62.0 GenomicRanges_1.54.1
## [13] GenomeInfoDb_1.38.0 IRanges_2.36.0
## [15] S4Vectors_0.40.1 MatrixGenerics_1.14.0
## [17] matrixStats_1.0.0 ExperimentHub_2.10.0
## [19] AnnotationHub_3.10.0 BiocFileCache_2.10.1
## [21] dbplyr_2.3.3 BiocGenerics_0.48.1
## [23] BiocParallel_1.36.0
##
## loaded via a namespace (and not attached):
## [1] RcppAnnoy_0.0.21 splines_4.3.1
## [3] later_1.3.1 ggplotify_0.1.1
## [5] bitops_1.0-7 filelock_1.0.2
## [7] tibble_3.2.1 polyclip_1.10-4
## [9] lifecycle_1.0.3 edgeR_4.0.1
## [11] globals_0.16.2 lattice_0.21-8
## [13] MASS_7.3-60 magrittr_2.0.3
## [15] plotly_4.10.2 limma_3.58.1
## [17] sass_0.4.6 rmarkdown_2.23
## [19] jquerylib_0.1.4 yaml_2.3.7
## [21] metapod_1.10.0 httpuv_1.6.11
## [23] Seurat_4.3.0.1 sctransform_0.3.5
## [25] spatstat.sparse_3.0-2 sp_2.0-0
## [27] reticulate_1.30 pbapply_1.7-2
## [29] cowplot_1.1.1 DBI_1.1.3
## [31] RColorBrewer_1.1-3 abind_1.4-5
## [33] zlibbioc_1.48.0 Rtsne_0.16
## [35] purrr_1.0.1 RCurl_1.98-1.12
## [37] yulab.utils_0.0.6 tweenr_2.0.2
## [39] rappdirs_0.3.3 GenomeInfoDbData_1.2.10
## [41] ggrepel_0.9.3 irlba_2.3.5.1
## [43] spatstat.utils_3.0-3 listenv_0.9.0
## [45] goftest_1.2-3 spatstat.random_3.1-5
## [47] dqrng_0.3.0 fitdistrplus_1.1-11
## [49] parallelly_1.36.0 DelayedMatrixStats_1.24.0
## [51] leiden_0.4.3 codetools_0.2-19
## [53] DelayedArray_0.28.0 ggforce_0.4.1
## [55] prettydoc_0.4.1 tidyselect_1.2.0
## [57] farver_2.1.1 ScaledMatrix_1.10.0
## [59] viridis_0.6.3 spatstat.explore_3.2-1
## [61] jsonlite_1.8.7 BiocNeighbors_1.20.0
## [63] ellipsis_0.3.2 progressr_0.13.0
## [65] ggridges_0.5.4 survival_3.5-5
## [67] ggnewscale_0.4.9 tools_4.3.1
## [69] ica_1.0-3 Rcpp_1.0.11
## [71] glue_1.6.2 gridExtra_2.3
## [73] SparseArray_1.2.0 xfun_0.39
## [75] dplyr_1.1.2 withr_2.5.0
## [77] BiocManager_1.30.22 fastmap_1.1.1
## [79] bluster_1.12.0 fansi_1.0.4
## [81] digest_0.6.33 rsvd_1.0.5
## [83] gridGraphics_0.5-1 R6_2.5.1
## [85] mime_0.12 colorspace_2.1-0
## [87] scattermore_1.2 tensor_1.5
## [89] spatstat.data_3.0-1 RSQLite_2.3.1
## [91] tidyr_1.3.0 utf8_1.2.3
## [93] generics_0.1.3 data.table_1.14.8
## [95] htmlwidgets_1.6.2 httr_1.4.6
## [97] S4Arrays_1.2.0 uwot_0.1.16
## [99] pkgconfig_2.0.3 gtable_0.3.3
## [101] blob_1.2.4 lmtest_0.9-40
## [103] XVector_0.42.0 shadowtext_0.1.2
## [105] htmltools_0.5.5 SeuratObject_4.1.3
## [107] scales_1.2.1 png_0.1-8
## [109] ggfun_0.1.1 knitr_1.43
## [111] reshape2_1.4.4 rjson_0.2.21
## [113] nlme_3.1-162 curl_5.0.1
## [115] zoo_1.8-12 cachem_1.0.8
## [117] stringr_1.5.0 BiocVersion_3.18.0
## [119] KernSmooth_2.23-22 miniUI_0.1.1.1
## [121] parallel_4.3.1 vipor_0.4.5
## [123] AnnotationDbi_1.64.1 pillar_1.9.0
## [125] grid_4.3.1 vctrs_0.6.3
## [127] RANN_2.6.1 promises_1.2.0.1
## [129] tidydr_0.0.5 BiocSingular_1.18.0
## [131] beachmat_2.18.0 xtable_1.8-4
## [133] cluster_2.1.4 beeswarm_0.4.0
## [135] evaluate_0.21 magick_2.7.4
## [137] cli_3.6.1 locfit_1.5-9.8
## [139] compiler_4.3.1 rlang_1.1.1
## [141] crayon_1.5.2 future.apply_1.11.0
## [143] labeling_0.4.2 plyr_1.8.8
## [145] ggbeeswarm_0.7.2 stringi_1.7.12
## [147] deldir_1.0-9 viridisLite_0.4.2
## [149] munsell_0.5.0 Biostrings_2.70.1
## [151] lazyeval_0.2.2 spatstat.geom_3.2-2
## [153] Matrix_1.6-0 patchwork_1.1.2
## [155] sparseMatrixStats_1.14.0 bit64_4.0.5
## [157] future_1.33.0 KEGGREST_1.42.0
## [159] statmod_1.5.0 shiny_1.7.4.1
## [161] interactiveDisplayBase_1.40.0 highr_0.10
## [163] ROCR_1.0-11 igraph_1.5.0
## [165] memoise_2.0.1 RcppParallel_5.1.7
## [167] bslib_0.5.0 bit_4.0.5