cynomolgus and Chinese rhesus macaque sequencing, assembly and analyse
Method Article
Cynomolgus and Chinese rhesus macaque genome assembly and analysis
https://doi.org/10.1038/protex.2011.264
This work is licensed under a CC BY-NC 3.0 License
This protocol has been posted on Protocol Exchange, an open repository of community-contributed protocols sponsored by Nature Portfolio. These protocols are posted directly on the Protocol Exchange by authors and are made freely available to the scientific community for use and comment.
posted
You are reading this latest protocol version
cynomolgus and Chinese rhesus macaque sequencing, assembly and analyse
cynomolgus
Chinese rhesus macaque
macaca
assembly
analyse
1. SOAP denovo assembly
SOAPdenovo employs the de Bruijn graph algorithm in order both to simplify the task of assembly and to reduce computational complexity. Low quality reads were filtered and potential sequencing errors were removed by k-mer frequency-based error correction. We filtered the following type of reads:
2. RNA-seq sequencing
1| Homogenise frozen tissues in Trizol reagent in a bead mill with 5mm stainless steel beads.
2| Follow the Trizol procedure, including two alcohol precipitations and suspension of the final RNA pellet in RNAse-free water.
3| Construct RNA sequencing libraries using an Illumina standard mRNA-Seq Prep Kit. Briefly: Use oligo(dT) magnetic beads to purify the poly-A containing mRNA molecules. Further fragment the mRNA into short lengths by controlled temperature, and then randomly primed during first strand synthesis by reverse transcription. Follow this with second-strand synthesis with DNA polymerase I to create double-stranded cDNA fragments. Subject double stranded cDNA to end repair by Klenow and T4 DNA polymerases and A-tailed by Klenow lacking exonuclease activity.
4| Ligation to Illumina Paired-End Sequencing adapters, size selection by gel electrophoresis and then PCR amplification complete the library preparation. Sequence the paired-end libraries sequenced on a Illumina Genome Analyzer for 100 bp at each end.
3. Gene prediction
use BLAT to map genes of IR (MMUL_0_1) and human (Ensembl release-56) onto two macaca genome, Orthologous regions were then determined by best-BLAT hit and synteny-based analysis, followed by the application of "Exonerate":http://www.ebi.ac.uk/~guy/exonerate/ and "GENEWISE":http://www.ebi.ac.uk/Tools/Wise2/index.html to refine gene model at each locus.
4. Assembly quality validation in neutral mode
Neutral InDel model1 can be used to validate the quality of our genome assemblies.When aligning two closely related genome sequences, the frequencies of lengths of successive alignment blocks (which were split by gaps during the alignment), termed Inter-gap Segments (IGS), may be expected to follow a geometric frequency distribution under a standard neutral model.Within the neutral evolving regions, incorrect InDels introduced during the assembly process would result in the observed IGS length distribution departing from the geometric distribution. The introduced InDels would generate an excess of short IGS over the number predicted by the neutral InDel model. By quantifying this excess, several parameters viz. the proportion (ɛ), average density (D), and number (Ng) of the clustered erroneous gaps in the genome alignments can be estimated.
1 Meader, S., Hillier, L. W., Locke, D., Ponting, C. P. & Lunter, G. Genome assembly quality: assessment and improvement using the neutral indel model. Genome Res 20, 675-684.
The authors declare no competing financial interests
This protocol has been posted on Protocol Exchange, an open repository of community-contributed protocols sponsored by Nature Portfolio. These protocols are posted directly on the Protocol Exchange by authors and are made freely available to the scientific community for use and comment.
posted
You are reading this latest protocol version