Following, we describe the step-by-step procedure and remark some key features that should be taken into consideration. All the steps of the protocol are summarized in Figure 1. Those steps marked with an asterisk include some comments in the troubleshooting section. In addition, we have applied part of this protocol to several model organisms (see Table 1). Columns with red numbers in Table 1 correspond to the different steps.
Step 1) Select an organism which genome had been sequenced and built. Sequencing should have been achieved near to completion.
Step 2) Determine in silico the number of sequences containing four perfect tandem telomeric repeats at ITSs. Similarly, determine the number of sequences containing five or more perfect tandem telomeric repeats. These numbers can be determined at the NCBI Mapviewer home (http://www.ncbi.nlm.nih.gov/mapview/), running a megablast search and using the Build reference database. The megablast search parameters are the following: automatically adjust parameters for short input sequences; expect threshold=10; max matches in a query range=0; match/mismatch scores=1,-2; gap cost=linear; word size=256 and without filters or repeat masking. The output of the megablast searches should appear ordered by chromosomes and by degree of score. Download the alignments as a Hit Table (text) and count the sequences with the highest evalues, which make perfect matches with the search sequences, using Microsoft Word. If the megablast search is performed with a sequence that contains four perfect tandem telomeric repeats, an specific ITS containing five perfect tandem repeats will reveal two overlapping sequences in the output and so on. The terminal perfect tandem telomeric repeat sequences should not be counted.
Step 3) Determine the mean telomeric length of the biological source to be studied if it had not been previously published. The Terminal Restriction Fragments protocol (TRFs)18 could be used for that purpose. Different tissues of the same organism might have different telomeric lengths. In principle, if the length of telomeres varies in different tissues of the same organism, the mean telomeric length of the biological sources used to perform the ChIP-seq experiments should be determined. However, this might not be required for all the organisms (see table 1). See troubleshooting 3.
Step 4) Determine the number of sequences containing four or more perfect tandem telomeric repeats at telomeres. For a specific biological source, essentially the same value is obtained with all the telomeric sequences independently of the number of repeats that they contain. This value is determined as follows: multiply the number of haploid chromosomes by 2 and by their average telomeric length expressed in bp. Then, divide the resulting value by the number of bp in each perfect telomeric repeat.
Step 5) Divide the number of sequences containing four perfect tandem telomeric repeats at ITS between the number of the same type of sequences at telomeres. Repeat this step using the numbers obtained with sequences containing five or more perfect telomeric repeats. Select those sequences that render ratios equal or lower than 2%. See troubleshooting 5-1.
A 2% ratio can be obtained with short stretches of perfect tandem telomeric repeats in different model organisms (see Table 1). This is because, whereas telomeres are essentially composed of perfect telomeric repeats, ITSs are usually composed of short stretches of perfect repeats interspersed with degenerated repeats. However, there are some organisms, like hamster, that contain more perfect telomeric repeats at internal chromosomal loci than at telomeres19,20. This protocol cannot be applied to them. See troubleshooting 5-2.
Step 6) Perform ChIP-seq experiments that render longer reads than the sequence or sequences selected to analyze telomeres. Check the quality of the results by standard methods. Previously published ChIP-seq experiments can also be used to analyze telomeric chromatin structure. In these cases, the number of telomeric reads can easily be calculated at NCBI, using the Sequence Reads Archive software (http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?).
We recommend to check the suitability of the ChIP-seq experiments for telomeric chromatin structure analyses. Specific quality controls should be performed. It might happen that a ChIP-seq experiment is fine to analyze genome-wide aspects of chromatin organization but not to analyze telomeres. This is because single sequences are used for the analysis of telomeres and some bias can arise along the ChIP-seq experiments during the cloning, amplification or sequencing of the telomeric sequences. The special architecture and repetitive nature of telomeres might favour those problems. The study of ChIP-seq experiments that have been repeated at least twice, starting with different samples of the same biological source, should help in this task. If similar results are obtained in the different repetitions of the same experiment, we consider the resulting ChIP-seq data suitable for telomeric chromatin structure analyses. See troubleshooting 6-1.
If previously published ChIP-seq experiments are analyzed, we recommend calculating the percentage of four tandem telomeric repeat reads in the input samples. In addition, we also recommend to calculate the number of four tandem telomeric repeat reads versus the number of reads corresponding to another genomic sequence, also in the input samples. Both types of numbers should remain similar in different experiments. However, we would rather prefer to analyze repetitions of the same ChIP-seq experiment that had been started with different samples of the same biological source. In this way, the bias of the immunoprecipitated samples should also be controlled. See troubleshooting 6-2.
We also recommend performing telomeric analyses using more than one telomeric sequence for each ChIP-seq run, if possible. Similar results should be obtained with sequences containing different number of perfect telomeric repeats, as far as the quality of the reads is good. See troubleshooting 6-3.
Step 7) Count the number of telomeric reads in the input and immunoprecipitated DNA samples and normalize them against the total number of reads for each run. Determine the telomeric enrichments levels and represent them together with the results obtained for the rest of the genome. For example, the lg2 IP/I could be represented for the whole genome, including telomeres.
Applications and target audience:
This method can be applied to any model system, excluding those that contain many more perfect tandem telomeric repeats at ITSs than at telomeres. In addition, it could be used to analyze telomeric chromatin structure from previously published ChIP-seq experiments.
Advantages, limitations and adaptations
Our group has previously analyzed the chromatin structure of telomeres independently of ITSs using a frequently cutting restriction enzyme. However, the method described here is easier to achieve, can be applied to experiments already reported and to future ChIP-seq experiments.
This method cannot be applied to model systems that contain many more perfect tandem telomeric repeats at ITSs than at telomeres. However, in those cases, it could be applied to analyze the chromatin structure of ITSs instead of the chromatin structure of telomeres.