1) Process Raw Data
The input file for COHCAP is a table of beta values. Users should typically be able to produce this table using Genome Studio. Click "here":https://docs.google.com/uc?id=0B1xpw6_kQMKuVm1kS3V6dlJBbmc&export=download&revid=0B1xpw6_kQMKuQ25yS0RDQ3JlYWNKTEk0THRGZmxQQjVGTU40PQ for more detailed instructions on using this strategy. The main benefit to using this input file format is that any sort of pre-processing can be preformed upstream of COHCAP.
Although users can almost always find data matrices with beta values, it may not always be possible to find raw .idat files in the precise format that can be processed using Genome Studio. To avoid this need, this protocol will use the minfi package for pre-processing.
The cell line data set used in the COHCAP publication will be used as an example for this protocol.
1a) Download raw data: Because the "GEO entry":http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE42308 doesn't provide raw .idat files, please download the data from this link: "http://sourceforge.net/projects/cohcap/files/Protocol_Exchange_Example.zip/download":http://sourceforge.net/projects/cohcap/files/Protocol_Exchange_Example.zip/download
Please note that this file also provides a template for running this protocol as well as the benchmarks described in the introduction.
1b) Run minfi to create COHCAP input file
First, open R.
Next, set the working directory to the extracted "Protocol_Exchange_Example" folder. This can be accomplished using the setwd() command in R. The template script assumes that the .zip file has been extracted on the Desktop of a Windows computer (but it still requires some modification to the file path to work on your own computer).
Using the example data set, run the following code:
==== start code====
library(minfi)
idat.folder <- "5958154021"
RG.raw <- read.450k.exp(idat.folder)
methyl.norm <- preprocessIllumina(RG.raw, bg.correct = TRUE, normalize = "controls")
beta.table <- getBeta(methyl.norm)
probes <- rownames(beta.table)
output.table <- data.frame(SiteID=probes, beta.table)
beta.file <- "minfi.txt"
write.table(output.table, file=beta.file, sep="\t", quote=F, row.names=F)
====end code====
The purpose of the code above is to create a text file (minfi.txt) that will be used as the input file for COHCAP.
2) Annotation and Quality Control
==== start code====
library(COHCAP)
sample.file <- "COHCAP_sample_description.txt"
project.folder <- getwd()
project.name <- "COHCAP_450k_Protocol_Exchange"
beta.table <- COHCAP.annotate(beta.file, project.name, project.folder, platform="450k-UCSC")
COHCAP.qc(sample.file, beta.table, project.name, project.folder)
====end code====
The above code should produce a table of annotated beta values in the "Raw_Data" folder as well as the following quality control files in the "QC" folder:
2a) "COHCAP_450k_Protocol_Exchange_cluster.pdf":http://www.nature.com/protocolexchange/system/uploads/2961/original/COHCAP_450k_Protocol_Express_cluster.jpg?1390603806 : allows user to see if samples within the same group cluster together
2b) COHCAP_450k_Protocol_Exchange_descriptive statistics.txt: contains minimum, maximum, median and 25th/75th percentile beta values; intented to complement the sample histogram
2c) "COHCAP_450k_Protocol_Exchange_hist.pdf":http://www.nature.com/protocolexchange/system/uploads/2963/original/COHCAP_450k_Protocol_Express_hist.jpg?1390603831 : allows use to see if any samples appear to have an abnormal beta distribution (in which case, we would recommend those samples should be removed as outliers); the descriptive statistics file can help users identify the specific outliers(s)
2d) "COHCAP_450k_Protocol_Exchange_pca.pdf":http://www.nature.com/protocolexchange/system/uploads/2965/original/COHCAP_450k_Protocol_Express_pca.jpg?1390603854 : allows user to see if samples within the same group cluster together
2e) COHCAP_450k_Protocol_Exchange_pca.txt: includes all principal components, allowing use to produce additional figures it they are not satisfied with the default 2D PCA plot
3) CpG Site Analysis
==== start code====
filtered.sites <- COHCAP.site(sample.file, beta.table, project.name, project.folder, ref="parental")
====end code====
The code produces the following files:
3a) COHCAP_450k_Protocol_Exchange_CpG_site_filter.xlsx: table of differentially methylated CpG sites
3b) COHCAP_450k_Protocol_Exchange_wig folder: contains .wig files for visualization using IGV or the UCSC Genome Browser
Visualizing these files will be described in the next section, but you can at least confirm the following files have been created:
mutant.avg.beta.wig: average beta values per CpG site across all samples in the "mutant" group
mutant.vs.parental.delta.beta.wig: average delta beta values per CpG sites for the average beta across all samples in the "mutant" group subtracted by the average beta across all samples in the "parental" group
parental.avg.beta.wig:average beta values per CpG site across all samples in the "parental" group
4) CpG Island Analysis
==== start code====
filtered.islands <- COHCAP.avg.by.island(sample.file, filtered.sites, beta.table, project.name, project.folder, ref="parental")
====end code====
The code produces the following files:
4a) COHCAP_450k_Protocol_Exchange_CpG_island_filtered-Avg_by_Island.xlsx: statistics for differentially methylated CpG islands
4b) COHCAP_450k_Protocol_Exchange_Box_Plots: folder containing box-plots for differentially methylated CpG islands. For example, the box-plot for chr18:55862653-55862873 (mapped to gene NEDD4L, corresponding to the file "NEDD4L_chr18_55862653_55862873.pdf":http://www.nature.com/protocolexchange/system/uploads/2967/original/NEDD4L_chr18_55862653_55862873.jpg?1390852235 ) is shown below:
This CpG island will be used as an example to demonstrate how you can use the .wig files (in the "CpG_Site" folder) to visualize methylation patterns anywhere in the genome.
Visualizing CpG Site Methylation via IGV8 (Optional)
Step #1) Download IGV: "http://www.broadinstitute.org/software/igv/download":http://www.broadinstitute.org/software/igv/download
Step #2) Open IGV
Step #3) Make sure the genome reference is set to hg19
Step #4) Import the .wig files using "File --> Load from File". If desired, you can also load other tracks (such as CpG Islands) the same way.
Step #5) Enter the desired region of interest (can be coordinates in the form chrA:pos1-pos2 or a gene symbol). As an example, we will visualize the regions summarized in the box plot above (chr18:55862653-55862873)
Step #6) To make visulization easier, you may wish to expand the tracks by doing to "Tracks --> Fit Data to Window"
Step #7) At any time, you can export a screenshot of what you view in IGV using "File --> Save Image". Here is a screenshot for the region listed above (also see "Figure 7":http://www.nature.com/protocolexchange/system/uploads/2969/original/IGV_chr18_55862653_55862873.png?1390853127 ):
Visualizing CpG Site Methylation via UCSC Genome Browser9 (Optional)
Step #1) Go to the UCSC Genome Browser web portal: "http://genome.ucsc.edu/cgi-bin/hgGateway":http://genome.ucsc.edu/cgi-bin/hgGateway
Make sure you are using hg19 as your human reference sequence.
Step #2) Click the button for "add custom tracks"
Step #3) Upload your .wig files by clicking "Select File" and then "Submit". This will need to be done separately for each .wig file. Keep clicking "add custom tracks" until all necessary files are uploaded.
Unlike IGV, you do not need to do anything special to upload tracks like the UCSC CpG Island locations.
Step #4) Once all of the .wig files are uploaded, click "go to genome browser"
Step #5) Enter the desired region of interest (can be coordinates in the form chrA:pos1-pos2 or a gene symbol). As an example, we will visualize the regions summarized in the box plot above (chr18:55862653-55862873)
Step #6) The UCSC Genome Browser contains a lot of tracks to compare to your differentially methylated region. You can add tracks (listed below the genome view) by setting the pull-down value to anything except "hide" and clicking "refresh". For example, the UCSC CpG Islands track is in the "Regulation" section (and likely not set to be viewed by default)
Step #7) If you wish to view your genomic region more clearly, right-click on the image and select "View Image". This will open up a new window with a .png image that can be saved. For example, see a screenshot of the region below (also see "Figure 8":http://www.nature.com/protocolexchange/system/uploads/2971/original/UCSC_chr18_55862653_55862873.png?1390854865 ):
This region was selected because it was falsely identified by all the programs presented in "Table 2":http://www.nature.com/protocolexchange/system/uploads/2957/original/Table_2.jpg?1390347459 . As you can see from the figures above (and "Supplemental Figure S7":http://nar.oxfordjournals.org/content/suppl/2013/03/27/gkt242.DC1/nar-03334-met-n-2012-File006.pdf from the original COHCAP paper), visual inspection would likely confirm this region as a reasonable candidate. Nevertheless, the are some additional criteria that could be applied that would exclude this region:
Possible Criteria #1: Filter based upon genomic region. This is an intronic CpG island. If visualization of the CpG island shows no overlap with the gene promoter, this candidate may be less interesting (although, strictly speaking, there isn't a technical reason why this region shouldn't validate successfully)
Possible Criteria #2: Increase the minimum number of CpG sites per island (default = 4). A more densely covered region may be less likely to show contradictory results due to lack of consistent methylation changes at all possible CpG sites (not just the ones included on the 450k array).
In all cases, it is strongly recommended that you visualize your .wig files to identify the optimal start and stop coordinates for your differentially methylated region. The optimal boundaries often do not exactly correspond to the start and stop coordinates for the official CpG island (which is how the region is defined in COHCAP, even the CpG shores for a given CpG island).