SMiLE-seq: Selective Microfluidics-based Ligand Enrichment followed by sequencing

doi:10.1038/protex.2016.089

Method Article

SMiLE-seq: Selective Microfluidics-based Ligand Enrichment followed by sequencing

https://doi.org/10.1038/protex.2016.089

This work is licensed under a CC BY-NC 3.0 License

This protocol has been posted on Protocol Exchange, an open repository of community-contributed protocols sponsored by Nature Portfolio. These protocols are posted directly on the Protocol Exchange by authors and are made freely available to the scientific community for use and comment.

Version 1

posted

You are reading this latest protocol version

Selective Microfluidics-based Ligand Enrichment followed by sequencing (SMiLE-seq) is a rapid, semi-automated method aimed at resolving the DNA binding specificities of full-length transcription factors (TFs). The core of SMiLE-seq is a cross talk-devoid microfluidic platform that performs selection of DNA that is specifically bound to TFs from a pool of randomized DNA. Coupled to high-throughput sequencing, this platform allows the characterization of TF DNA binding preferences at an unprecedented resolution in just a single day. Unlike other, already established in vitro technologies that also aim to determine TF binding specificities, SMiLE-seq operates at micro scale and requires minute amounts of biological material. Moreover, it produces specificity models that characterize even low-affinity and transient molecular interactions and that have equal to superior predictive power than previously reported motifs. Finally, SMiLE-seq enables motif detection for monomers, homodimers, as well as heterodimers. SMiLE-seq should therefore prove highly valuable in deriving unbiased quantitative specificity models for single and dimeric, full-length TFs.

Computational biology and bioinformatics

Biotechnology

Molecular Biology

Microfluidics

Transcription Factor (TF)

DNA binding motif

gene regulation

in vitro methods

high-throughput sequencing

semi-automated technology.

In the past, in vitro binding models of TFs were defined based on low-throughput techniques and thus had limited resolution and accuracy. With technological developments, the ability to measure and predict binding sites has improved. A large leap came in form of B1H assay¹, PBM² and HT-SELEX³ technologies as these high-throughput assays have produced DNA binding specificity data for hundreds of TFs. Despite these significant technological advances though, all currently available in vitro binding models cover together less than half of the known or predicted TFs^4,5,6,7. One of the underlying reasons is the laborious and technically complex nature of TF DNA binding analyses because TFs are often difficult to express or exhibit altered DNA binding preferences in an in vitro context. Another complicating factor is the ability of many TFs to bind DNA either in obligate or facultative heterodimer configurations, whose DNA binding properties remain vastly unexplored⁸.

We tackled this challenge by developing a novel, semi-automated technology SMiLE-seq that enables the robust identification of DNA binding specificities of TF monomers, homo- and heterodimers. SMiLE-seq exploits the resolving power of a microfluidics-based technology, MITOMI⁹, to perform a rigorous on-chip isolation of interacting TF-DNA complexes in a reagent-effective manner (Figure 1):

Figure 1. SMiLE-seq set-up. A. SMiLE-seq set-up. Each SMiLE-seq device consists of a PDMS chip (approximately 2 x 5 cm) bonded to a plasma-activated glass slide. The SMiLE-seq device is placed on the microscope table and is connected to the microcontroller-based control unit. The microscope camera, connected to an external display, enables chip observation during a SMiLE-seq experiment. B. Schematic design of a SMiLE-seq microchip. Blue and green colors denote flow and control layers respectively. Each unit of the device is connected to the collector unit on one side and the capillary pump on the other11. All units of the device are connected together by the continuous flow channel with four inlets (F1-F4) and three outlets (F5-F7). Switching between these two access modes can be done through the use of control micro valves (C1-C11).

Each SMiLE-seq experiment starts with the in vitro expression of TFs of interest and generation of target DNA libraries. The expressed TFs are then mixed with DNA libraries and loaded onto the microfluidic chip through passive capillary-based pumping. The antibody-captured immobilized TF-DNA complexes are then trapped on-chip with the deflectable valves (paralleling the workflow in ⁹) and the unbound material is removed through a washing step. The protein-DNA complexes are then disrupted by Proteinase K treatment and the bound DNA is collected by flashing PBS through the chip. Recovered DNA is then amplified and sequenced. The resulting read data is processed by a dedicated HMM-based motif discovery pipeline that is used to de novo identify the binding preferences of the tested TFs.

The advantages of SMiLE-seq include its ability to cover a wide affinity range, in contrast to HT-SELEX, which is more biased towards strong affinity binders. This difference is explained by the distinct experimental procedures of both assays to isolate and recover bound DNA. Specifically, SMiLE-seq does not use difficult-to-calibrate, salt-heavy washing buffers to purify molecular complexes. These buffers are used to disrupt non-specific interactions, but in practice, they tend to bias against medium- and low-affinity binders, while being not entirely effective in eliminating non-specifically bound DNA molecules³. Another advantage is that SMiLE-seq is not limited by the length of DNA baits, which is a drawback of the PBM assay².

Using SMiLE-seq, we were able to derive novel TF motifs and to refine DNA binding models for TF monomers and dimers belonging to different structural families. Importantly, we also demonstrated that SMiLE-seq can be used to study the DNA binding specificities of full-length TFs that have so far resisted a comprehensive DNA binding characterization.

Reagents and Equipment

For PDMS chip fabrication:

• flow and control layer molds (can be ordered from JD photo tools ( "http://www.jdphoto.co.uk/":http://www.jdphoto.co.uk/ ) or fabricated using photolithography techniques)

• Sylgard 184 silicon base and curing agent

• TINKER mixer compatible plastic cups

• Plastic Petri dishes (150 mm x 15 mm)

• TMCS

• Scalpels

• Biopsy punchers

• Spin-coater

• O2 Plasma chamber

• Glass slides (VWR with cut edges, Cat # ECN 631-1550)

For SMiLE:

• automated control set-up: solenoid valves (Pneumadyne, Cat#MSV10-8) connected to a WAGO programmable logic controller (PLC), ModBus 750-881 (WAGO Corp.), to pressure gauges, and to an external compressed air source

• luer lock syringes with flat ends, plastic tubing (to connect the pieces of the set-up together and connectors) (Moesch.com, selected according to the tubing size)

• CoDeSyS package (freely available at "www.codesys.com":http://www.codesys.com )

• Inverted microscope

• Tygon tubing (COLE PARMER, 30m tube tygon S54HL 0.51mm)

• Metal pins (Unimed, stainless steel tubing AISI 104, OD 0.65mm, ID 0.35mm and length 8mm)

• BSA-biotin (Pierce, Cat# 29130), PBS (Dulbeco), Neutravidin (Pierce, Cat# 31000), biotinilated anti-eGFP antibody (Abcam, Cat# ab6658), TE buffer (Life Technologies), Proteinase K (Life Technologies, Cat# 25530049)

• KAPA HiFi PCR kit (KAPA Biosystems)

• TnT® SP6 High-Yield Wheat Germ master mix, Promega Cat# L3261

• Single-stranded randomized libraries and Cy5-labeled extension primer (order from IDT)

• Klenow 3’-5’ exo- (NEB Cat# M0212)

• NEBuffer 2 (NEB)

• MinElute PCR purification kit (QIAGEN)

• Bench-top thermo cycler

• HiSeq or NextSeq500 Illumina sequencer

For SMiLE-seq data analysis:

Perl scripts:

• fa2filtered.pl

• bc.pl

• ht_selex_opt.pl

• padseq.pl

• cons2init_sym.pl

• cons2init.pl

• extract_trans.pl

• extract_pwm.pl

C programs:

• mamot ( "http://bcf.isb-sib.ch/mamot/":http://bcf.isb-sib.ch/mamot/ )

Programs and perl scripts can be found on our FTP-site at:

"ftp://ccg.vital-it.ch/pwmtools/":ftp://ccg.vital-it.ch/pwmtools/

The directory bin on FTP contains binary files compiled on Linux/CentOS 6.

PDMS chip fabrication:

• Place molds into a TMCS vapor chamber.

• Mix 30g Base + 6g curing agent (or other amount, keep 5:1 ratio) in a clean plastic cup.

• Mix for 1 minute (2200 rpm), degas for 2 minutes (2400 rpm) (TINKER mixer).

• Pour onto control layer mold and place mold in vacuum chamber.

• Mix 10g Base + 0.5g curing agent (20:1 ratio) in a clean plastic cup.

• Mix for 1 minute and degas for 2 minutes.

• Spin coat onto flow layer at 2400 rpm for 35 sec, ramp time 15 sec.

• Remove control layer mold from vacuum chamber, making sure that no bubbles remain on the surface.

• Place the control and flow layer in a 80^oC convection oven and incubate for 30 minutes.

• Remove casts from oven, cut out control layer, punch holes, and align to flow layer.

• Put aligned device back into 80^oC oven and incubate for at least 90 minutes.

• Remove devices from oven and punch holes in a flow layer. Use a puncher of a 2.5-4 mm diameter to make holes on “sample collectors” ( "Figure 1B":http://www.nature.com/protocolexchange/system/uploads/4883/original/compressed_figure1_ONP.jpg?1478899003 ) to which samples will be pipetted directly.

• Bond PDMS chips to glass slides and use within the following 10 to 30 min.

SMiLE-seq procedure:

1. Sample preparation

1.1. Set up the expression mix for the TFs as follows:

3 ul ITT mix (TnT® SP6 High-Yield Wheat Germ master mix)
100 ng plasmid DNA (the pF3A-eGFP or pF3A-mCherry expression vector¹⁰ containing the ORF of interest)
Nuclease-free ddH2O till 5 ul total volume

Incubate at 25^oC for 3 hours or longer.

1.2. Synthesize the target dsDNA libraries:

Order randomized libraries as single stranded oligos as well as the oligo containing a Cy5 5‘-fusion: /5Cy5/CAA GCA GAA GAC GGC ATA CG from IDT, resuspend it to achieve concentrations of 200uM and 500uM respectively. Mix: 5 ul NEBuffer 2 (NEB), 5ul dNTPs, 0.5 ul Cy5 labeling primer (500 uM) (IDT), 1.5 ul library oligos (200 uM) (IDT), 37 ul ddH2O. Incubate as follows:

• 94^oC - 5 min

• 50^oC - 60 sec

• place tubes on ice

• add 1 ul of Klenow 3’ – 5’ exo-

• 37^oC – 60 min

• keep at 0^oC

Use MinElute to purify the obtained double-stranded libraries, elute in 12 ul of EB. Dilute the libraries 1:10 in ddH2O and add 50 ng of poly-dIdC (Sigma) to each 10 ul of diluted libraries.

1.3. Mix the DNA baits with TFs of interest in small PCR tubes:

2uL expressed non-purified TF, 2 uL diluted dsDNA library, and 2 uL of a partner TF (if applicable). Incubate the mixtures for 30 min for most of the factors except for KRAB ZFPs for which extended incubation times up to 180 min might be required.

2. Chip set up

2.1. Connect the microchip to the automated set-up:

Move the inverted microscope close to the WAGO automated set-up ( "Figure 1A":http://www.nature.com/protocolexchange/system/uploads/4883/original/compressed_figure1_ONP.jpg?1478899003 ). Place the clean freshly assembled microfluidic chip on the microscope table.

2.2. Connect the control tubing C1 to C11 to the chip as shown on "Figure 1B":http://www.nature.com/protocolexchange/system/uploads/4883/original/compressed_figure1_ONP.jpg?1478899003 . Set the operational control pressure of control tubing lines to ~12 psi using pressure gauges.

2.3. Open CoDeSyS software and load the custom “SMiLE-automated” script. (CoDeSyS is a package for industrial automation, which translates user-defined sample processing operations into a sequence of commands for microvalve control). Go to “PLC_Visu” tab. Optional: Use WebVisu mobile (WAGO Corp.) application for the remote set-up control.

3. SMiLE (Selective Microfluidics-based Ligand Enrichment)

3.1. Plug the tubes pre-filled with BSA-biotin, PBS, Neutravidin and anti-eGFP antibody to the Inlets F1, F2, F3 and F4 respectively ( "Figure 1B":http://www.nature.com/protocolexchange/system/uploads/4883/original/compressed_figure1_ONP.jpg?1478899003 ).

3.2. Activate the CoDeSys script using Online -> Connect and then Online -> Run. Start the chip processing by pressing the “Chip Priming” button on the “PLC_Visu” control panel. (This step will build up the surface chemistry needed for immunochemical protein pull-down.)

3.3. When the button “Load samples” jumps to red on the “PLC_Visu” panel, pipette the content of individual PCR tubes (TF of interest mixed with DNA baits) in individual chip wells of the sample collectors ( "Figure 1B":http://www.nature.com/protocolexchange/system/uploads/4883/original/compressed_figure1_ONP.jpg?1478899003 ).

3.4. Press the red “Load samples” button. This will activate the on-chip sample loading by capillary force.

3.5. When the “Elute” button jumps to red (approximately 40 min after loading), mix 20 uL of TE with 3 uL of Proteinase K (Life Technologies) and load the mixture in a tygon tube and plug the tube to the Inlet 1. Plug a clean empty tygon tube (DNA collection tube) to the Outlet 2.

3.6. Press the “Elute” button and wait 30 min.

3.7. Collect the TF-bound DNA from the tube plugged to the Outlet 2.

4. Library amplification and sequencing

4.1. Amplify the recovered DNA using HiFi KAPA polymerase as follows: for the PCR mix, use 10 uL of 5xHF KAPA buffer, 1.5 ul of dNTPs (supplied with the kit), 0.5 ul of primer GA2seq FW (10uM), 0.5 ul of primer GA2seq RV (10 uM), 0.5 ul of KAPA HiFi polymerase and then add eluted DNA and then top up with ddH2O for a total volume of 50 ul. PCR amplify the DNA using a 2 min 95^oC hot start followed by 17 cycles of (98^oC for 20 sec, 65^oC for 15 sec, and 72^oC for 90 sec) and 2 min at 72^oC.

4.2. Purify the PCR product using a MinElute kit from QIAGEN and elute the DNA in 10 uL of EB.

4.3. Sequence the pooled libraries on a HiSeq or NextSeq500 instrument (Illumina) as 2-4% spike-ins to a sequencing lane loaded with other DNA libraries containing compatible Illumina adapters (typically originating from ChIP-seq or RNA-seq experiments).

Detailed information about primers, barcodes and libraries used in the original study can be retrieved from Isakova et al 2016, Nature Methods, Supplementary Table 6.

5. SMiLE-seq data analysis

5.1. Parsing of raw sequencing reads

Demultiplex and trim raw Illumina reads to 30 bp corresponding to the randomized DNA region using FASTX-tools (http://hannonlab.cshl.edu/fastx_toolkit/).
Count the identical reads, collapse them in one, and subsequently order according to occurrence from most to least abundant (using FASTX-tools).
Identify the consensus binding sequences (seeds) through MEME motif discovery¹²:

Command:

meme <collapsed_ordered_list.fasta> -mod zoops -dna -minw 4 -maxw 20 -nmotifs 10 -maxsize 1000000

Use non-collapsed reads to calculate read statistic and perform PWM training using HMM as described below.

5.2. HMM motif training

5.2.1. Define the initial parameters:

• Use a MEME-derived seed in IUPAC format. Modify the seed by adding 1 extra 'N' on each side to allow for a more flexible motif search using the HMM-based program (see optimization steps).

• Set the read length in bp corresponding to the length of the randomized region. For the presented scenario, this value is 30bp.

• Set the background probability of each nucleotide. This value depends on the library and is measured in step 5.2.2.

• Set the number of sequences to sample randomly to train the HMM. In our case, we used 10'000 or 25'000 sequences.

• Set the number of Baum-Welch iterations to train the HMM. Default is 20.

• Set the prior probability for a sequence to contain a binding site. This influences the initial state of the HMM. High values allow more "stringent training", i.e. to get a more information rich matrix. Use 0.5 as a default value.

5.2.2. Filter and randomly shuffle the input fasta file to remove:

• 'N' containing sequences

• sequences that are larger than the expected read length

• multiple instance of a sequence

Command:

fa2filtered.pl < read_length > < < fasta_file > | filtered2unique.pl > < filtered_fasta_file >

where:

< read_length > defines the read length in bp. Here, it is set at 30bp.

< fasta_file > defines the sequence library from which a PWM should be trained, in fasta format.

< filtered_fasta_file > defines the output file address.

5.2.3. Write an HMM and train it on a subset of randomly selected sequences using mamot¹³. This step can be done using ht_selex_opt.pl but is detailed here for clarity:

• Estimate the nucleotide background distribution.

Command:

bc.pl < < filtered_fasta_file >

where:

< filtered_fasta_file > are the filtered sequences from step 5.2.2.

• Add a given number of background bases on each side of the sequences. Additionally, this script turns 'U' to 'T' characters (in case of RNA sequences) and eliminates all non DNA characters from the sequence. The sequences are reformatted into a mamot usable format. At the end, only select a given number (between 10'000 and 25'000) of sequences for the training.

Command:

./padseq.pl < left_flank > < right_flank > < bckg > < < filtered_fasta_file > | head -n < n_seq > > < seq_file >

where:

< left_flank > sets the number of background nucleotides to add on the left of the sequence. Set it to “1”.

< right_flank > sets the number of background nucleotides to add on the right of the sequence. Set it to “1”.

< bckg > coma-separated background probabilities of each base (a,c,g,t), e.g. '0.25,0.25,0.25,0.25' which would be a uniform background. This has been measured at the first step of 5.2.3.

< filtered_fasta_file > defines sequences to use. This file has been created at step 5.2.2.

< n_seq > defines the number of sequences to sample from < filtered_fasta_file > for later training. Use a number between 10'000 and 25'000.

< seq_file > defines the output file.

5.2.4. Generate a file containing an HMM model in the format required by mamot. The model design is described in the method figure:

See figure in Figures section.

Commands:

• to search a palyndromic motif

./cons2init_sym.pl < seed > < read_len > < bckg > < prior_TFBS_prob > > < hmm_model_file >

• to search a non-palyndromic motif

./cons2init.pl < seed > < read_len > < back > < prior_TFBS_prob > > < hmm_model_file >

where:

< seed > is the seed sequence.

< read_len > is the length of the reads in the sequence file (30bp).

< bckg > are comma-separated background probabilities for each base (a,c,g,t), e.g '0.25,0.25,0.25,0.25' for a uniform background. This has been measured at step 5.2.3.

< prior_TFBS_prob > is the prior probability of a sequence to contain a true binding site.

The number of nodes in the forward (F) and reverse (R) paths are given by the number of characters in the seed. The initial emission probabilities of a node are set according to the corresponding character in the seed. Here is a list of prior probabilities for each possible character:

A = 0.70 0.10 0.10 0.10

 C = 0.10 0.70 0.10 0.10


 G = 0.10 0.10 0.70 0.10


 T = 0.10 0.10 0.10 0.70


 R = 0.40 0.10 0.40 0.10


 Y = 0.10 0.40 0.10 0.40


 M = 0.40 0.40 0.10 0.10


 K = 0.10 0.10 0.40 0.40


 W = 0.40 0.10 0.10 0.40


 S = 0.10 0.40 0.40 0.10


 B = 0.10 0.30 0.30 0.30


 D = 0.30 0.10 0.30 0.30


 H = 0.30 0.30 0.10 0.30


 V = 0.30 0.30 0.30 0.10


 N = 0.25 0.25 0.25 0.25

The initial transition probability from START to its successor nodes depends on the prior probability of a sequence to contain a true binding site (< prior_TFBS_prob >).

If this is set to 0.8 then START->FB will be 0.4 (0.8/2), START->RB will be 0.4 (0.8/2) and START->II will be 0.2 (1-0.8).

5.2.5. Run mamot on the model generated at step 5.2.4.

Command:

mamot -B -w 1.0 -i < iter > -t -p -m < hmm_model_file > < seq_file > 2 > /dev/null

where:

< iter > is the number of Baum-Welch iterations used to train the model (20).

< hmm_model_file > is the file containing the HMM model.

< seq_file > is the file containing the sequences needed to train the model. This file has been generated at step 5.2.3.

The output file from mamot is called FinalModel.

5.2.6. Extract transition probabilities and PWMs from mamot results.

Command:

./extract_trans.pl ==>== FinalModel

./extract_pwm.pl ==<== FinalModel

Meng, X., Brodsky, M. H. & Wolfe, S. A. A bacterial one-hybrid system for determining the DNA-binding specificity of transcription factors. Nat. Biotechnol. 23, 988–994 (2005).
Berger, M. F. & Bulyk, M. L. Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nat. Protoc. 4, 393–411 (2009).
Jolma, A. et al. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res. 20, 861–873 (2010).
Fulton, D. L. et al. TFCat: the curated catalog of mouse and human transcription factors. Genome Biol. 10, R29 (2009).
Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A. & Luscombe, N. M. A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet. 10, 252–263 (2009).
Kulakovskiy, I. V. et al. HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models. Nucleic Acids Res. 44, D116–D125 (2016).
Deplancke, B., Alpern, D. & Gardeux, V. The Genetics of Transcription Factor DNA Binding Variation. Cell 166, 538–554 (2016).
Jolma, A. et al. DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature 527, 384–388 (2015).
Maerkl, S. J. & Quake, S. R. A systems approach to measuring the binding energy landscapes of transcription factors. Science 315, 233–237 (2007).
Isakova, A., Berset, Y., Hatzimanikatis, V. & Deplancke, B. Quantification of Cooperativity in Heterodimer-DNA Binding Improves the Accuracy of Binding Specificity Models. J. Biol. Chem. 291, 10293–10306 (2016).
Zimmermann, M., Hunziker, P. & Delamarche, E. Valves for autonomous capillary systems. Microfluid. Nanofluidics 5, 395–402 (2008).
Bailey, T. L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. ISMB Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).
Schütz, F. & Delorenzi, M. MAMOT: hidden Markov modeling tool. Bioinforma. Oxf. Engl. 24, 1399–1400 (2008).

We would like to thank Prof. Sebastian Maerkl (EPFL) for his guidance in applying microfluidic technologies, Rene Dreos (EPFL) for helpful discussions on data analysis, our lab members Dr. Daniel Alpern, Pernille Rainer, and Riccardo Dainese for performing validation experiments for our study. We also thank Drs. Keith Harshman and Bastien Mangeat for their assistance in sample sequencing, as well as the VITAL-IT infrastructure to support our computational analyses. This work has been supported by funds from the Swiss National Science Foundation (#31003A_162735 and #CRSII3_147684), by SystemsX.ch Special Opportunity Project 2015/323, and by Institutional support from the Ecole Polytechnique Fédérale de Lausanne (EPFL).

The authors declare no competing financial interests.

Download PDF

Version 1

posted

You are reading this latest protocol version

SMiLE-seq: Selective Microfluidics-based Ligand Enrichment followed by sequencing

Status:

Version 1

Abstract

Figures

Introduction

Reagents

Procedure

References

Acknowledgements

Additional Declarations

Associated Publications

Status:

Version 1

Privacy Policy

Terms of Service

Cookie Settings