Protocol for the use of Polylox – endogenous barcoding for high resolution in vivo lineage tracing

doi:10.1038/protex.2017.092

Method Article

Protocol for the use of Polylox – endogenous barcoding for high resolution in vivo lineage tracing

https://doi.org/10.1038/protex.2017.092

This work is licensed under a CC BY 4.0 License

This protocol has been posted on Protocol Exchange, an open repository of community-contributed protocols sponsored by Nature Portfolio. These protocols are posted directly on the Protocol Exchange by authors and are made freely available to the scientific community for use and comment.

Version 1

posted

You are reading this latest protocol version

*Authors Weike Pei, Thorsten B. Feyerabend, Jens Rössler and Xi Wang contributed to this work equally.

This protocol describes methods for the generation of highly variable genetic markers in cells in vivo and the experimental and computational procedures for retrieving those barcodes from cells. The method is based on Cre-recombinase dependent Polylox barcoding¹. In cells or mice carrying the artificial Polylox DNA recombination locus in their germline (Rosa26^Polylox), Cre recombinase can induce DNA recombination in the Polylox locus. The barcodes are retrieved by PCR amplification of the Polylox locus from purified cell populations or from single cells, followed by long-range DNA sequencing of the barcode libraries. The recombined Polylox sequences are computationally analyzed to yield barcodes. The generation probability of barcodes is computed to allow filtering for rare barcodes.

Cell biology

Computational biology and bioinformatics

Biological techniques

barcoding

Polylox

fate mapping

stem cells

single molecule real time sequencing

SMRT sequencing

Cre recombinase

tamoxifen

barcode sequence alignment

generation probabilities

Markov model

The development of organs and tissues depends on the differentiation of progenitor cells into mature functional cell populations. In the blood and immune system, hematopoietic stem cells (HSC) give rise to more than 10 different lineages, yet the ‘structure’ of hematopoiesis and the underlying pathways have remained enigmatic. One of the most direct methods for the in vivo identification of precursor-product relationships is genetic fate mapping (also termed lineage tracing). A heritable reporter gene is switched on in cells of a specific phenotype or stage, and this marker is perpetuated in all daughter cells from the initially marked cells. Lineage tracing is commonly done using fluorescent protein expression. However, numbers of distinct colors are limited. To achieve higher resolution, which would be required for tissue deconvolution in complex organs, we developed the Polylox system, allowing genetic in-vivo-labeling of cells that is based on variable DNA recombinations. An artificial DNA cassette consisting of ten loxP sites in alternating orientation and nine intervening DNA segments of known sequence are recombined upon transient Cre recombinase expression, with the DNA segments being inverted or excised depending on the orientation of the loxP sites recognized by Cre. The DNA segments of the Polylox locus are called ‘1’-‘9’ in their original orientation and ‘A’-‘I’ when inverted. The position and orientation of the segments translates into the so-called Polylox barcode. While Cre is expressed, the shuffling of the Polylox segments can progress repeatedly, offering a theoretical diversity of ~1.8 mio. codes and a practical diversity (we identified a locus that had undergone at least 6 subsequent recombination events) of more than 600 thousand different recombined barcodes. The barcodes are thus serving as cellular identifiers for high-resolution lineage tracing and are retrieved for analysis either by classical Sanger sequencing from individual cells, or from bulk populations by SMRT sequencing technology (Pacific Biosciences). For the determination of lineage relationships, barcode distributions between different cellular populations are compared, and correlation coefficients are calculated as indicators of lineage proximity.

Important considerations for the successful use of the Polylox barcoding system are choice of Cre and calculation of barcode generation probability. A suitable Cre allele has to be inducible and tightly controlled for transient but not leaky expression. We have so far successfully tested two tamoxifen inducible Cre alleles, the Rosa26^CreERT2 (B6.129-Gt(ROSA)26Sor^{tm1(cre/ESR1)Tyj/J})  for ubiquitous labeling and Tie2^MerCreMer (Tek^{tm1.1(icre/Esr1*)Hrr}) for barcode induction in hematopoietic stem cells. Second, Polylox barcodes are not all generated with the same probability. Especially those that require only one recombination event are more likely to be generated than more complex barcodes for which multiple recombination steps are required. This protocol therefore also provides a method for calculating barcode generation probabilities.

Reagents for barcode induction

Tamoxifen (Sigma-Aldrich, T5648)

Progesterone (Sigma-Aldrich, P0130)

Peanut oil (Sigma-Aldrich, P2144)

Reagents for organ preparation

DPBS (Sigma, D8537-500ML)

FBS (Fetal Bovine Serum, Biochrom, S0115)

Ficoll-Paque PLUS (GE Healthcare, 17-1440-02)

Cell Strainer 40-μm mesh (Falcon, REF 352340)

Plastic Pasteur pipettes (Sarstedt, 86/1175-001)

Reagents for purifications of cells by FACS sorting

Mouse IgG (Jackson ImmunoResearch Laboratories, 015-000-003)

Antigen-specific fluorescent labeled antibodies

SYTOX Blue (Life Technologies, S34857)

5 ml tubes with 35 μm cell strainer snap cap (FALCON, REF352235)

Primer and PCR reagents

Primer #HL14 (5'-AATCAAGGGTCCCCAAACTCAC-3')

Primer #972 (5'-GAGGCAGCATCTGTCTACAAGAGATGG-3')

Primer #984 (5'-CATCGCATACCATACATAGGTGGAGG-3')

Primer #494 (5'-AGCTACAGCCTCGATTTGTGGTG-3')

Primer #2426 (5'-CGACGACACTGCCAAAGATTTC-3')

Primer #2427 (5'-CATACCTTAGAGAAAGCCTGTCGAG-3')

Primer #2450 (5'-TGTGGTATGGCTGATTATGATCAG-3')

Expand Long Template PCR system (Roche, 11759060001)

Proteinase K (Thermo Fisher Scientific, 25530-049)

dNTPs (Biozym, R0192)

Automated cell counter (e.g. Cellometer Auto 2000 (Nexcelom))

Cell sorter: BD FACS AriaIII (BD Biosciences)

PCR Thermocycler (e.g. T3000 Thermocycler, Biometra)

Qubit Fluorometer (Invitrogen)

Bioanalyzer (Agilent)

PacBio RS II or PacBio Sequel (Pacific Biosciences)

I. Barcode induction

a) Embryonic mice

Set up timed matings between Rosa26^{Polylox/Polylox} and Tie2^MCM/+ mice.
Nine days after the day of the plug (day 0.5), treat pregnant mice by oral gavage with a single dose of 2.5 mg tamoxifen and 1.25 mg progesterone to induce barcoding in the developing embryos at E9.5. Tamoxifen stock solution is prepared by dissolving 1 g tamoxifen in 4 ml absolute ethanol and 36 ml peanut oil at 55 °C. Stock solution can be stored at -20 °C.
The pups are delivered on E20.5 by caesarean section and raised by foster mothers.
Genotype the pups for the Tie2^MCM allele at 3-4 weeks after birth on genomic DNA prepared from tail biopsies using primers #972, #984 and #HL14. PCR conditions are as follows: 2 min at 94 °C; (20 s at 94 °C, 30 s at 62 °C, 1 min at 72 °C) 35 times; 5 min at 72 °C.
Successful tamoxifen induction of Rosa26^Polylox/+Tie2^MCM/+ mice is tested by PCR amplification of the Polylox locus from genomic DNA of Ficoll-separated peripheral blood leukocytes. The Polylox PCR is performed with the Expand Long Template PCR system using primers #2450 and #2427, annealing at the 5’ and 3’ anchor regions of the Polylox cassette, respectively. PCR conditions are as follows: 5 min at 95 °C; (30 s at 95 °C, 30 s at 56 °C, 5 min at 72 °C) repeat 35 times; 10 min at 72 °C. Separate PCR products by gel electrophoresis.

b) Adult mice

Prepare tamoxifen stock solution by dissolving 1 g tamoxifen in 4 ml absolute ethanol and 36 ml peanut oil at 55 °C.
Adult Rosa26^Polylox/+Tie2^MCM/+ mice are injected intraperitoneally with 1 mg tamoxifen on five consecutive days.
Successful tamoxifen induction of Rosa26^Polylox/+Tie2^MCM/+ mice is tested by PCR amplification of the Polylox locus as described above (Section Ia 5.)

II. Organ preparation

On the day of analysis, mice are sacrificed by carbon dioxide inhalation. Peritoneal cells should be harvested by peritoneal lavage before any other abdominal organs are prepared.

a) Peritoneal Cavity

Harvest peritoneal exudate cells by lavage of the peritoneal cavity for 5 times with 2 ml of 5% FACS buffer (DPBS with 5% vol/vol FBS).
Small opening and gentle lavage with plastic Pasteur pipettes is helpful to avoid injury and blood cell contamination.

b) Spleen

Take out the spleen and cut the organ into small pieces in 5% FACS buffer using surgical scissors.
To obtain a single cell suspension of splenocytes, the minced spleen is passed through a 40-μm filter using 5% FACS buffer and the plunger of a 5-ml syringe.

c) Bone Marrow

Prepare femora and tibiae (humerus, pelvis and vertebrae can also be included to maximize bone marrow recovery).
Bone marrow cells are either flushed from the long bones with 5% FACS buffer, or all bones are crashing in a mortar in 5% FACS buffer.
Resuspend the extracted bone marrow cells by repeated pipetting and remove remaining bone pieces by filtering through a 40-μm mesh.

III. Purification of cells by FACS sorting

Determine the concentration of the cell suspensions, e.g. on an automated cell counter. This was done using Cellometer Auto 2000 Cell Viability Counter according to the manufacturer’s protocol.
Block Fc receptors by incubation of the cells (5x10⁶ cells/50 μl) for 15 min with 300 μg/ml whole mouse IgG.
Add cocktails with titrated concentrations of fluorescent dye-labeled antibodies in 5% FACS buffer and stain the cells for 45 min in the dark on ice. Afterwards, wash cells in an appropriate amount of 5% FACS buffer.
For dead cell exclusion e.g. add 100 nM SYTOX Blue at least 5 min before analysis.
Cell suspensions should be filtered directly to remove any cell clumps before being placed on the FACS sorter, e.g. by passing the suspension through the mash of a snap cap cell strainer.
Stained cells are analyzed on a FACSAriaIII cell sorter running BD FACSDiva software and populations of interest are sorted into 20% FACS collection buffer (DPBS containing 20% FBS). Alternatively, cells can directly be sorted into PCR buffer.
Aliquots of the sorted cells should be reanalyzed to determine sort purity.

IV. PCR amplification of the Polylox locus

a) From single cells via nested PCR

By FACS sorting individual cells are deposited into 8-tube PCR stripes containing in each well 25 μl of 1x PCR buffer supplemented with 12.6 μg Proteinase K.
Perform lysis for 1 h at 55 °C, terminate at 95 °C for 10 min, and cool down to 4 °C before adding the remaining PCR reagents to a final volume of 50 μl.
Amplification of the Polylox cassette is done by nested PCR using the Expand Long Template PCR system:
- First round PCR: primer #2450 and primer #494 for 5 min at 95 °C; (30 s at 95 °C, 30 s at 56 °C, 5 min at 72 °C) repeat 35 times; 10 min at 72 °C.
- Second round PCR: Use 1-2 μl of the first PCR reaction as template and amplify with primers #2426 and #2427 for 5 min at 95 °C; (30 s at 95 °C, 30 s at 62 °C, 5 min at 72 °C) 35 times; 10 min at 72 °C.
The nested PCR products are analyzed by gel electrophoresis for product length, and sent for Sanger sequencing (e.g. GATC Biotech).

b) Cell populations (10,000-50,000 cells) by direct PCR

Sort cells into 1.5 ml Eppendorf tubes containing 300 μl of 20% FACS buffer.
Use 2-5% of the sorted cells for reanalysis and harvest the remaining cells by centrifugation for 5 min at 1500 g and 4 ºC.
Turn the tube 180° in the rotor and spin again for 5 min at 1500 g.
Carefully remove the supernatant and resuspend with 25 μl lysis buffer (1x PCR buffer 1 from the Expand Long Template PCR system in water and 12.6 μg Proteinase K). Digest at 55 °C for 1 hour and terminate the reaction at 95 °C for 10 min. DNA can be at -20 °C, but the experiment should continued swiftly.
PCR amplification of the Polylox locus is as described above (Section Ia 5.)
Visualize PCR products (one third of the PCR reaction) by gel electrophoresis.

c) Cell populations (>50,000 cells) via phenol-chloroform DNA extraction

Sort cells into 2 ml 20% FACS collection buffer and harvest the cell pellet by centrifuge at 400 g for 10 min after reanalysis for sort purity.
Carefully remove the supernatant leaving 50-100 μl of the supernatant to avoid disturbing of the cell pellet.
Resuspend the pellet in 500 μl freshly prepared proteinase K-solution (10 mM Tris-HCl pH 8, 5 mM EDTA, 1% SDS, 0.3 M Na-acetate, 0.2 mg/ml proteinase K) and incubated at 37 °C for at least 3 hours.
From the lysate, purify genomic DNA by phenol-chloroform extraction. Briefly, extract once with an equal volume of a 1:1 mixture of phenol and chloroform. Then extract the aqueous phase at least twice with an equal volume of chloroform. Precipitate the DNA by adding 1/10 volume of Na-acetate (3M) pH 5 and 1 volume of isopropanol and 5-20 μg glycogen for 1 hour at –20 °C. Spin at 16000 g for 10 min and wash the DNA pellet twice with 75% ethanol. Dissolve the air-dried pellet in water. Measure DNA concentration.
The Polylox cassette is readily amplified from 100-200 ng of template DNA (representing 1.7 - 3.5 x10⁴ cells) by PCR using the conditions described above (Section Ia 5.)
PCR products (one third of the PCR reaction) are observed by gel electrophoresis.

V. Polylox Sequencing

a) Sanger sequencing from single cells

Polylox amplicons from single cells were sequenced by primer walking, using a commercial service (e.g. GATC biotech, Germany). PCR products were either purified by QIAquick PCR Purification Kit, or DNA clean up was done by the provider of the sequencing service.

b) SMRT sequencing from cell populations

Use the entire PCR product that is left after gel electrophoresis for an initial AMPure PB bead clean up step. Perform according to the manufacturer’s protocol.
For quality control determine DNA concentration by Qubit measurement and perform a Bioanalyzer run according to the manufacturers’ protocols.
Continue library preparation for PacBio SMRT sequencing according to the “Amplicon Template Preparation and Sequencing” protocol released by the company. http://www.pacb.com/wp-content/uploads/Procedure-Checklist-Amplicon-Template-Preparation-and-Sequencing.pdf
Perform primer annealing and polymerase binding according to the manufacturer’s protocol.
Load the library by Magbead mode (PacBio RS II) or diffusion mode (PacBio Sequel) for SMRT sequencing. We usually recorded for 4 hours.

VI. Barcode identification

The SMRT sequencing yields data files with CCS reads. The following part explains the retrieval of Polylox barcodes from PacBio CCS Reads (RPBPBR).

a) Software and hardware

• Data (PacBio CCS reads, either in fasta or fastq format)

• Polylox adapters and segments (provided in the data folder of RPBPBR toolkit)

• Bowtie2 software (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)

• SAMtools (http://www.htslib.org/)

• RPBPBR toolkit (https://github.com/sunlightwang/RPBPBR)

• Hardware: Computer running either Linux or Mac OS X (10.6 Snow Leopard or later; at least 4 GB of RAM (8 GB per core preferred); at least quad-core CPU

• Perl (https://www.perl.org/ ) by default already installed in Linux or Mac OS X computers

b) Software setup

Commands given in the protocol are runnable at the UNIX shell prompt, which are prefixed with a ‘$’ character.

To install the SAMtools, download the SAMtools (http://www.htslib.org/download/) and unpack the SAMtools tarball:

$ tar jxvf samtools-1.5.tar.bz2  

Then cd to the SAMtools source directory and build the samtools binary

$ cd samtools-1.5 

$ ./configure --prefix=/path/to/install

$ make

$ make install

Copy the samtools binary to some directory in your PATH (e.g. $HOME/bin): 

$ cp samtools $HOME/bin 

Or add the directory containing samtools binary to your PATH environment variable

$ export PATH=/path/to/install/bin:$PATH

To install Bowtie2, download the latest binary package for Bowtie2 (https://sourceforge.net/projects/bowtie-bio/files/bowtie2/) and unpack the Bowtie2 zip archive:

$ unzip bowtie2-2.3.2-legacy-macos-x86_64.zip

Copy the Bowtie executables to a directory in your PATH (e.g. $HOME/bin): 

$ cd bowtie2-2.3.2-legacy

$ cp bowtie2* $HOME/bin

Or add the directory containing bowtie2 binaries to your PATH environment variable

$ export PATH=/path/to/bowtie2/binary/directory:$PATH

To install RPBPBR toolkit, clone the latest binary package from RPBPBR github site (https://github.com/sunlightwang/RPBPBR/)

$ git clone https://github.com/sunlightwang/RPBPBR.git

Add the directory containing RPBPBR binaries to your PATH environment variable

$ export PATH=/path/to/RPBPBR/bin/:$PATH

_c) Procedure _

To run RPBPBR on the example data files, cd to the RPBPBR example directory

$ cd /path/to/RPBPBR/example

Then execute RPBPBR on each example file:

$ RPBPBR test1.fastq test1 fastq

$ RPBPBR test2.fa test2 fasta

RPBPBR takes PacBio CCS reads (in either fasta or fastq format) and directly reports the number of barcodes in the PacBio library of interest for downstream analysis. By default, RPBPBR takes 4 cores per process; however, the number of cores is adjustable in the script. Using 4 cores, the running time of RPBPBR varies from < 1 hour to several hours depending on the amount of reads to be processed.

Usage: RPBPBR < input.fasta/fastq > < out.prefix > < type:fasta/fastq > [keep-temp]

< input.fasta/fastq > required, the PacBio read file in fasta or fastq format.

< out.prefix > required, the prefix of output file, and also the name of a temporary directory to be created during the process.

< type:fasta/fastq > required, the format of the PacBio read file, only can be fasta or fastq, other formats not acceptable.

[keep-temp] optional, if not specified or with value 0, the temporary directory created during the process will be removed after the process is done; otherwise, it will be kept.

_d) Output _

Output file name: < out.predix >.barcode.count.tsv

Output file is a tabular text file, each line gives the count (in the second column) of each barcode listed in the first column.

Total: total PacBio reads that have been processed.

Intact: the number of PacBio reads with both 5’ and 3’ adapter sequences.

Barcodes*: starting from 5’ and end with 3’, barcode segments are connected with hyphens.

In the barcode string, X represents non-recognized segments due to low sequencing quality.

VII. Barcode filtering

This section explains the computation of barcode generation probabilities based on which rare barcodes can be identified.

a) Software

• Data (Excel Sheet from RPB, see VI. Barcode identification)

• Matlab (R2013b)

• barcode_pipeline.m script

• data.mat

o barcode_library // all possible barcodes

o path_matrix // Pgen for a distinct # of recombinations

o min_list // list of minimal number of recombinations

b) Software setup

Copy barcode_pipeline.m and data.mat in desired path.

Import the following files from experimental data into your Matlab Workspace:

• A list of found barcodes in cell format

• A m by n matrix of reads, m is the number of barcodes, n the number of populations

• A list of population names in cell format

c) Procedure

To run barcode_pipeline.m type:

barcode_pipeline (< found_codes >, < found_reads >,< population_annotation >)

The pipelines running time varies between < 1 min to 10 minutes depending on the number codes.

Barcode_pipeline first checks all experimentally found barcodes with the complete list of possible barcodes, potentially purging erroneous barcodes from the list. Then the minimal number of recombinations for every barcode is determined (look up in pre-calculated table). From this minimal recombination number a frequency distribution of recombination events is calculated, which is then used to calculate the generation probabilities for all barcodes.

d) Output

Output is a Matlab-structure with the following fields:

• purged_codes // list of all possible codes in the data-set

• purged_reads // matrix containing the reads

• purged_freq // frequencies – normalized for populations

• recombination_frequencies // distribution of recombinations 0-10

• minimal_recom // list of minimal recombinations for all codes

• pgen // Pgen for every code

• annotation // population-annotation list

In vivo barcode induction takes 1-5 days, depending on the tamoxifen dosage and duration of application. Cell purification via FACS sorting is done within one day. DNA preparation and PCR amplification take one day. Library preparation and SMRT sequencing require 4-5 days. Computational analysis requires approximately 1 day.

To achieve a useful range of recombination, the specified tamoxifen dose may need to be adjusted depending on Cre driver, developmental stage and tissue to be labeled. This can be evaluated by the fragmentation pattern of the Polylox locus after tamoxifen treatment. Cell harvest and Polylox PCR are the most critical steps. We strongly recommend to test Polylox PCR on different cell numbers before starting experiments.

This protocol yields the recovered barcodes in the sampled cell populations, along with read counts and barcode generation probabilities.

W. Pei, T. B. Feyerabend, J. Rössler, X. Wang, D. Postrach, K. Busch, I. Rode, K. Klapproth, N. Dietlein, C. Quedenau, W. Chen, S. Sauer, S. Wolf, T. Höfer, and H.R. Rodewald. Polylox barcoding reveals hematopoietic stem cell fates realized in vivo. Nature 2017

The authors declare no competing financial interests.

Download PDF

Version 1

posted

You are reading this latest protocol version

Protocol for the use of Polylox – endogenous barcoding for high resolution in vivo lineage tracing

Status:

Version 1

Abstract

Introduction

Reagents

Equipment

Procedure

Timing

Troubleshooting

Anticipated Results

References

Additional Declarations

Associated Publications

Status:

Version 1

Privacy Policy

Terms of Service

Protocol for the use of Polylox – endogenous barcoding for high resolution in vivo lineage tracing

Status:

Version 1

Abstract

Introduction

Reagents

Equipment

Procedure

Timing

Troubleshooting

Anticipated Results

References

Additional Declarations

Associated Publications

Status:

Version 1

Privacy Policy

Terms of Service

Manage Cookie Preferences