PDMS chip fabrication:
• Place molds into a TMCS vapor chamber.
• Mix 30g Base + 6g curing agent (or other amount, keep 5:1 ratio) in a clean plastic cup.
• Mix for 1 minute (2200 rpm), degas for 2 minutes (2400 rpm) (TINKER mixer).
• Pour onto control layer mold and place mold in vacuum chamber.
• Mix 10g Base + 0.5g curing agent (20:1 ratio) in a clean plastic cup.
• Mix for 1 minute and degas for 2 minutes.
• Spin coat onto flow layer at 2400 rpm for 35 sec, ramp time 15 sec.
• Remove control layer mold from vacuum chamber, making sure that no bubbles remain on the surface.
• Place the control and flow layer in a 80oC convection oven and incubate for 30 minutes.
• Remove casts from oven, cut out control layer, punch holes, and align to flow layer.
• Put aligned device back into 80oC oven and incubate for at least 90 minutes.
• Remove devices from oven and punch holes in a flow layer. Use a puncher of a 2.5-4 mm diameter to make holes on “sample collectors” ( "Figure 1B":http://www.nature.com/protocolexchange/system/uploads/4883/original/compressed_figure1_ONP.jpg?1478899003 ) to which samples will be pipetted directly.
• Bond PDMS chips to glass slides and use within the following 10 to 30 min.
SMiLE-seq procedure:
1. Sample preparation
1.1. Set up the expression mix for the TFs as follows:
3 ul ITT mix (TnT® SP6 High-Yield Wheat Germ master mix)
100 ng plasmid DNA (the pF3A-eGFP or pF3A-mCherry expression vector10 containing the ORF of interest)
Nuclease-free ddH2O till 5 ul total volume
Incubate at 25oC for 3 hours or longer.
1.2. Synthesize the target dsDNA libraries:
Order randomized libraries as single stranded oligos as well as the oligo containing a Cy5 5‘-fusion: /5Cy5/CAA GCA GAA GAC GGC ATA CG from IDT, resuspend it to achieve concentrations of 200uM and 500uM respectively. Mix: 5 ul NEBuffer 2 (NEB), 5ul dNTPs, 0.5 ul Cy5 labeling primer (500 uM) (IDT), 1.5 ul library oligos (200 uM) (IDT), 37 ul ddH2O. Incubate as follows:
• 94oC - 5 min
• 50oC - 60 sec
• place tubes on ice
• add 1 ul of Klenow 3’ – 5’ exo-
• 37oC – 60 min
• keep at 0oC
Use MinElute to purify the obtained double-stranded libraries, elute in 12 ul of EB. Dilute the libraries 1:10 in ddH2O and add 50 ng of poly-dIdC (Sigma) to each 10 ul of diluted libraries.
1.3. Mix the DNA baits with TFs of interest in small PCR tubes:
2uL expressed non-purified TF, 2 uL diluted dsDNA library, and 2 uL of a partner TF (if applicable). Incubate the mixtures for 30 min for most of the factors except for KRAB ZFPs for which extended incubation times up to 180 min might be required.
2. Chip set up
2.1. Connect the microchip to the automated set-up:
Move the inverted microscope close to the WAGO automated set-up ( "Figure 1A":http://www.nature.com/protocolexchange/system/uploads/4883/original/compressed_figure1_ONP.jpg?1478899003 ). Place the clean freshly assembled microfluidic chip on the microscope table.
2.2. Connect the control tubing C1 to C11 to the chip as shown on "Figure 1B":http://www.nature.com/protocolexchange/system/uploads/4883/original/compressed_figure1_ONP.jpg?1478899003 . Set the operational control pressure of control tubing lines to ~12 psi using pressure gauges.
2.3. Open CoDeSyS software and load the custom “SMiLE-automated” script. (CoDeSyS is a package for industrial automation, which translates user-defined sample processing operations into a sequence of commands for microvalve control). Go to “PLC_Visu” tab. Optional: Use WebVisu mobile (WAGO Corp.) application for the remote set-up control.
3. SMiLE (Selective Microfluidics-based Ligand Enrichment)
3.1. Plug the tubes pre-filled with BSA-biotin, PBS, Neutravidin and anti-eGFP antibody to the Inlets F1, F2, F3 and F4 respectively ( "Figure 1B":http://www.nature.com/protocolexchange/system/uploads/4883/original/compressed_figure1_ONP.jpg?1478899003 ).
3.2. Activate the CoDeSys script using Online -> Connect and then Online -> Run. Start the chip processing by pressing the “Chip Priming” button on the “PLC_Visu” control panel. (This step will build up the surface chemistry needed for immunochemical protein pull-down.)
3.3. When the button “Load samples” jumps to red on the “PLC_Visu” panel, pipette the content of individual PCR tubes (TF of interest mixed with DNA baits) in individual chip wells of the sample collectors ( "Figure 1B":http://www.nature.com/protocolexchange/system/uploads/4883/original/compressed_figure1_ONP.jpg?1478899003 ).
3.4. Press the red “Load samples” button. This will activate the on-chip sample loading by capillary force.
3.5. When the “Elute” button jumps to red (approximately 40 min after loading), mix 20 uL of TE with 3 uL of Proteinase K (Life Technologies) and load the mixture in a tygon tube and plug the tube to the Inlet 1. Plug a clean empty tygon tube (DNA collection tube) to the Outlet 2.
3.6. Press the “Elute” button and wait 30 min.
3.7. Collect the TF-bound DNA from the tube plugged to the Outlet 2.
4. Library amplification and sequencing
4.1. Amplify the recovered DNA using HiFi KAPA polymerase as follows: for the PCR mix, use 10 uL of 5xHF KAPA buffer, 1.5 ul of dNTPs (supplied with the kit), 0.5 ul of primer GA2seq FW (10uM), 0.5 ul of primer GA2seq RV (10 uM), 0.5 ul of KAPA HiFi polymerase and then add eluted DNA and then top up with ddH2O for a total volume of 50 ul. PCR amplify the DNA using a 2 min 95oC hot start followed by 17 cycles of (98oC for 20 sec, 65oC for 15 sec, and 72oC for 90 sec) and 2 min at 72oC.
4.2. Purify the PCR product using a MinElute kit from QIAGEN and elute the DNA in 10 uL of EB.
4.3. Sequence the pooled libraries on a HiSeq or NextSeq500 instrument (Illumina) as 2-4% spike-ins to a sequencing lane loaded with other DNA libraries containing compatible Illumina adapters (typically originating from ChIP-seq or RNA-seq experiments).
Detailed information about primers, barcodes and libraries used in the original study can be retrieved from Isakova et al 2016, Nature Methods, Supplementary Table 6.
5. SMiLE-seq data analysis
5.1. Parsing of raw sequencing reads
Demultiplex and trim raw Illumina reads to 30 bp corresponding to the randomized DNA region using FASTX-tools (http://hannonlab.cshl.edu/fastx_toolkit/).
Count the identical reads, collapse them in one, and subsequently order according to occurrence from most to least abundant (using FASTX-tools).
Identify the consensus binding sequences (seeds) through MEME motif discovery12:
Command:
meme <collapsed_ordered_list.fasta> -mod zoops -dna -minw 4 -maxw 20 -nmotifs 10 -maxsize 1000000
Use non-collapsed reads to calculate read statistic and perform PWM training using HMM as described below.
5.2. HMM motif training
5.2.1. Define the initial parameters:
• Use a MEME-derived seed in IUPAC format. Modify the seed by adding 1 extra 'N' on each side to allow for a more flexible motif search using the HMM-based program (see optimization steps).
• Set the read length in bp corresponding to the length of the randomized region. For the presented scenario, this value is 30bp.
• Set the background probability of each nucleotide. This value depends on the library and is measured in step 5.2.2.
• Set the number of sequences to sample randomly to train the HMM. In our case, we used 10'000 or 25'000 sequences.
• Set the number of Baum-Welch iterations to train the HMM. Default is 20.
• Set the prior probability for a sequence to contain a binding site. This influences the initial state of the HMM. High values allow more "stringent training", i.e. to get a more information rich matrix. Use 0.5 as a default value.
5.2.2. Filter and randomly shuffle the input fasta file to remove:
• 'N' containing sequences
• sequences that are larger than the expected read length
• multiple instance of a sequence
Command:
fa2filtered.pl < read_length > < < fasta_file > | filtered2unique.pl > < filtered_fasta_file >
where:
< read_length > defines the read length in bp. Here, it is set at 30bp.
< fasta_file > defines the sequence library from which a PWM should be trained, in fasta format.
< filtered_fasta_file > defines the output file address.
5.2.3. Write an HMM and train it on a subset of randomly selected sequences using mamot13. This step can be done using ht_selex_opt.pl but is detailed here for clarity:
• Estimate the nucleotide background distribution.
Command:
bc.pl < < filtered_fasta_file >
where:
< filtered_fasta_file > are the filtered sequences from step 5.2.2.
• Add a given number of background bases on each side of the sequences. Additionally, this script turns 'U' to 'T' characters (in case of RNA sequences) and eliminates all non DNA characters from the sequence. The sequences are reformatted into a mamot usable format. At the end, only select a given number (between 10'000 and 25'000) of sequences for the training.
Command:
./padseq.pl < left_flank > < right_flank > < bckg > < < filtered_fasta_file > | head -n < n_seq > > < seq_file >
where:
< left_flank > sets the number of background nucleotides to add on the left of the sequence. Set it to “1”.
< right_flank > sets the number of background nucleotides to add on the right of the sequence. Set it to “1”.
< bckg > coma-separated background probabilities of each base (a,c,g,t), e.g. '0.25,0.25,0.25,0.25' which would be a uniform background. This has been measured at the first step of 5.2.3.
< filtered_fasta_file > defines sequences to use. This file has been created at step 5.2.2.
< n_seq > defines the number of sequences to sample from < filtered_fasta_file > for later training. Use a number between 10'000 and 25'000.
< seq_file > defines the output file.
5.2.4. Generate a file containing an HMM model in the format required by mamot. The model design is described in the method figure:
See figure in Figures section.Commands:
• to search a palyndromic motif
./cons2init_sym.pl < seed > < read_len > < bckg > < prior_TFBS_prob > > < hmm_model_file >
• to search a non-palyndromic motif
./cons2init.pl < seed > < read_len > < back > < prior_TFBS_prob > > < hmm_model_file >
where:
< seed > is the seed sequence.
< read_len > is the length of the reads in the sequence file (30bp).
< bckg > are comma-separated background probabilities for each base (a,c,g,t), e.g '0.25,0.25,0.25,0.25' for a uniform background. This has been measured at step 5.2.3.
< prior_TFBS_prob > is the prior probability of a sequence to contain a true binding site.
The number of nodes in the forward (F) and reverse (R) paths are given by the number of characters in the seed. The initial emission probabilities of a node are set according to the corresponding character in the seed. Here is a list of prior probabilities for each possible character:
A = 0.70 0.10 0.10 0.10
C = 0.10 0.70 0.10 0.10
G = 0.10 0.10 0.70 0.10
T = 0.10 0.10 0.10 0.70
R = 0.40 0.10 0.40 0.10
Y = 0.10 0.40 0.10 0.40
M = 0.40 0.40 0.10 0.10
K = 0.10 0.10 0.40 0.40
W = 0.40 0.10 0.10 0.40
S = 0.10 0.40 0.40 0.10
B = 0.10 0.30 0.30 0.30
D = 0.30 0.10 0.30 0.30
H = 0.30 0.30 0.10 0.30
V = 0.30 0.30 0.30 0.10
N = 0.25 0.25 0.25 0.25
The initial transition probability from START to its successor nodes depends on the prior probability of a sequence to contain a true binding site (< prior_TFBS_prob >).
If this is set to 0.8 then START->FB will be 0.4 (0.8/2), START->RB will be 0.4 (0.8/2) and START->II will be 0.2 (1-0.8).
5.2.5. Run mamot on the model generated at step 5.2.4.
Command:
mamot -B -w 1.0 -i < iter > -t -p -m < hmm_model_file > < seq_file > 2 > /dev/null
where:
< iter > is the number of Baum-Welch iterations used to train the model (20).
< hmm_model_file > is the file containing the HMM model.
< seq_file > is the file containing the sequences needed to train the model. This file has been generated at step 5.2.3.
The output file from mamot is called FinalModel.
5.2.6. Extract transition probabilities and PWMs from mamot results.
Command:
./extract_trans.pl ==>== FinalModel
./extract_pwm.pl ==<== FinalModel