Spatially resolved transcriptomics technologies enable the sequencing of gene expression profiles while preserving the spatial context. The data analysis for spatial transcriptomics comprises three fundamental components: cell type clustering, gene imputation, and multi-sample integration. In this study, we present STGMVA, a comprehensive analysis toolkit that employs a spatiotemporal gaussian mixture variational autoencoder to tackle these tasks effectively. STGMVA includes two stages: pretraining the gene expression and spatial location using a gaussian mixture model and learning the embedding vectors through a variational graph autoencoder. Results demonstrate STGMVA surpasses state-of-the-art approaches on various spatial transcriptomics datasets, exhibiting superior performance across different scales and resolutions. Notably, STGMVA achieves the highest clustering accuracy in human brain, mouse hippocampus, and mouse olfactory bulb tissues. Furthermore, STGMVA enhances and denoises gene expression patterns for gene imputation task. Additionally, STGMVA has the capability to correct batch effects and achieve joint analysis when integrating multiple tissue slices.