Wednesday, April 29, 2015

The reported off-target effects in the recent Liang et al human embryo #CRISPR paper are partly incorrect.

As widely reported last week, a group in China has published results of CRISPR editing experiments in human triponuclear embryos (Liang et al, Protein & Cell 2015).   The news blurb in Nature is worth a read to get the context of the paper, which follows on the heels of a previous statement published in Science by leaders in the CRISPR field and others, in which they discourage CRISPR experiments in human embryos at this time pending further discussion of the implications of such research.    

In this post I won’t get into the ethical implications of the paper (which is more than I can deal with in one post anyway!).   Here I’ll discuss the technical results.   The Liang et al paper does not actually present much data that is very surprising - it is not unexpected that CRISPR can induce targeted mutations in humans, since it works in basically every species in which it’s been tried.    Their target gene was HBB (beta-globin) and they attempted HDR using the familiar approach of coinjecting Cas9 mRNA, guide RNA, and donor oligo ssDNA.

Here’s their 4 main points, paraphrased from the abstract:   
  1. Efficiency of HDR was low.
  2. Edited embryos were mosaic.   
  3. Off-target mutations were evident.
  4. A separate, highly homologous gene (HBD) could serve as donor template for repair, thus introducing sequences inadvertently from the other gene into the target gene.

Of these, points #1 and 2 were not surprising to those who have injected CRISPR reagents into mouse embryos, and not all that different.   The reported HDR efficiency was 14%, which is in line with at least some published mouse experiments (e.g. Singh et al 2014).   I will state that in our mouse core we are apparently seeing HDR results around 10-20% efficiency across several experiments.    Mosaicism has also been previously reported in CRISPR mice (Yen et al 2014).  Point #4 was kind of novel, but in retrospect not completely weird, since the HBD gene (delta-globin) is over 90% identical to HBB.

Ok, so regarding point #3 - off-target mutations...I was very interested in this because the authors reported four distinct off-target (OT) mutations in human embryos associated with the single CRISPR guide RNA they used, and this has already been described in the media as being substantially higher than OT rates in animal embryos.  Meaning, mouse embryos.      

One of these OTs was particularly surprising, as it looked like a very poor match to the target protospacer indeed - although the 3’-most 11 bases (the “seed” region”) matched the target, the 5’ bases only matched 1 out of 9 bases, for a total of 8 mismatches.    Frankly, this really scared me, because if true it means that the current methods used to predict OTs are not nearly broad enough.   But this level of OT mismatch was much greater than any OT I had seen before.

Bottom line: After looking at their data, I now firmly believe that only one of the four OT mutations were actually new mutations caused by CRISPR.  The other three were simply polymorphisms, already present in the  germline, that the authors mistakenly classified as OTs.

Here is how they did their OT analysis and my interpretation of the data.   Tripronuclear human embryos were obtained from a fertility clinic; you can identify these microscopically at the 1-cell zygote stage.  Fertilized by 2 sperm by accident, they are effectively triploid, and are absolutely unable to survive to term as normal pregnancies - but they can survive well enough during short-term CRISPR experiments, in which the embryos are only kept alive for a few days in vitro.   Briefly, 86 tripronuclear embryos were injected; 71 survived the injection; 56 of these were GFP-positive (used to as a reporter to show expression of injected reagents) and used for DNA analyses of on- and/or off-target effects.   28 of these embryos had on-target indel mutations and/or the desired HDR edit and were used for OT analysis.   

Note that they had originally chosen this particular CRISPR target from 3 potential targets they looked at in their gene;  one of these didn’t cut well and was not used further . For each of the other two, 7 sites were identified as the “top” potential OTs by using the MIT tool.  Of the two targets, one was found to have no OT mutations at the 7 potential OT sites when it was tested in 293T cells.  So they decided to work with this CRISPR target.  

2 of the 7 OT’s were found to have mutations in the injected embryos.  These were named G1-OT4 and G1-OT5 and are the first two of the four total OT mutations they claimed to identify (Figure 3A).  T7 mismatch assays were used for this analysis.   293T cell transfections with the guide RNA had already shown a lack of mutations across the 7 OT sites, that is, they were negative by T7 assays.  I’ll come back to these later.

They then did whole-exome sequencing on six of the embryos to identify potentially even more mutated OTs.  From this data, they first called indels and SNVs (single nucleotide variants) and then searched for protospacer similarity “allowing for ≤6 mismatches or perfect match of the last 10 nt 3′ of the gRNA” anywhere within 100 bp of the indels.  (Not sure if they did anything more with the SNVs.)   This identified two apparently new OTs, in the 3’ UTRs of  the C1QC and TTR genes, each found in one embryo (Figure 3B).  These were confirmed by T7 assays.

So - what does the T7 mismatch assay really indicate?  It reveals heterozygosity within the PCR product.   Of course, new mutations can cause this.  But so can plain old polymorphisms.   This is a drawback of using mismatch assays when applied to polymorphic samples.     

The next question is, simply, are there common human polymorphisms in the PCR products used in the OT analysis?  It’s easy to check this using the UCSC genome browser and the 1000 genomes site.   

For OT #1, a.k.a. “G1-OT4”, (Fig. 1C and 3A; PCR, hg19, chr11:132761837-132762356; intron of OPCML) there are no known common polymorphisms within the PCR that are close to the OT.    The closest SNP, rs79549129, has a minor allele frequency (MAF) of 1.2% but zero in asian populations.   There are no other annotated variants near the OT with a significant MAF.  The closest “common” SNP is rs2659601 but it’s about 50 bp from one end of the PCR product.  I don’t think that could produce the band sizes seen in the Fig. S3 T7 assays. From what I can tell from their Fig. S3 & S4, their T7 assays are compatible with new mutations that have been induced by cleavage close to the CRISPR OT site.   Thus, these look like “real” CRISPR OT effects at this site.    6/20 of on-target embryos, or 30%, had mutations at this OT.  So this looks like a real OT effect that replicates across embryos, but not in 293T cells.   

But then, polymorphisms become apparent in the other OTs...

For OT #2, a. k. a. “G1-OT5”, (Fig. 1C and 3A; PCR, hg19 chr22:31000551-31001000; intron of TULP4), there are two common SNPs on either side of the OT:  rs616358 (G/C) and rs628203 (T/C).   Haplotype derivations in Southern Han Chinese suggest haplotype population frequencies of ~56% GT, 35% GC, 9% CT, and zero % CC.  So we would expect to see plenty of heterozygosity in this PCR - easily observable by T7 assays, in the range of 50% or so being positive in a population based sample from this geographic location  - no CRISPR required.  This OT was a false positive.  

UCSC screen grab showing SNPs close to the OT (black bar in middle)

This leaves the two additional OTs they discovered by whole-exome sequencing.  Remember their workflow: they called indels in their data sets, then looked for nearby partial matches to the CRISPR target.   However, they apparently did not filter out known polymorphisms first.

OT #3 was in the 3’ UTR of the TTR gene.   Inspection of the OT sequence location (given in Fig. S6) on the UCSC browser clearly shows that rs143948820 is a known 9-base indel contained completely inside the OT.       Turns out that it’s uncommon outside of Asia but it has a MAF of ~2% in Southern Han Chinese.  With a heterozygosity of ~4% in normal diploids, it’s totally possible that 1 out of 6 triploid embryos would carry this variant.  This OT was very likely a false positive. 
UCSC screen grab; OT is black bar, indel variant is long red bar.

Finally, OT #4 was in the 3’ UTR of the C1QC gene.   And similar to the case above, this OT overlaps with a known 17-base indel, rs142916975, that has a MAF of 38% in Southern Han Chinese.   Heterozygosity should be close to 50%.   In fact, I’m surprised they got a false positive in only 1 of their 6 samples.   This is almost certainly a false positive.
UCSC screen grab; OT in black, indel variant in blue.

In summary, only 1 of the OTs holds up to scrutiny.    Importantly, neither OT found by exome sequencing holds up.  This flips their conclusion on its head: “Our whole-exome sequencing result only covered a fraction of the genome and likely underestimated the off- target effects in human 3PN zygotes.”.   While it’s certainly possible that some more OTs could be found by whole genome sequencing, the exome data was essentially totally negative.   Note that they chose a CRISPR to work with because it had a low apparent OT rate in 293T cells.   In retrospect, it was just by luck that G1-OT5 has a negative T7 assay in 293T cells.  It could have been heterozygous, but it's apparently not.

To their credit, the authors rightly restate that it’s going to be critical moving forward to carefully analyze off-target effects in any human applications of CRISPR.  This is widely agreed upon (see below).  However this paper made some technical mistakes in this regard.  While underestimating off-target effects could certainly have serious negative consequences for future CRISPR-based clinical treatments for genetic disease - and nobody wants that - overestimating them could generate an excess of hesitation to research the feasibility of such treatments within the broader scientific community.  

As stated by Baltimore et al in their Science commentary:

“It is critical to implement appropriate and standardized benchmarking methods to determine the frequency of off-target effects and to assess the physiology of cells and tissues that have undergone genome editing.”

I don’t think I’m blowing smoke here that we all need to get this right, as the media quickly reported on the Liang et al conclusions:  

From Wired:  “But—and this is a big but—using the technique without proper guidance could result in unforeseen consequences. The Chinese researchers, for example, found mutations in many of the embryos in genes other than the ones they’d targeted with CRISPR/Cas9.”

From Time, quoting Carl Zimmer from National Geographic:   “The experiment “came out poorly,” Zimmer says; in some cases, DNA was placed in the wrong spot and “off-target” mutations were discovered in the DNA.”

From the Washington Post:   “And in some of the embryos, the gene editing caused unintended mutations in other genes.”

From USA today (emphasis is mine): “The team also found that the complex used in the procedure was also acting on other parts of the genome, leading to other bits of it mutating. That happened much more than in previous experiments on adult human cells and animal embryos — and could happen yet more if the whole genome were used, as it would be if the embryo were to be implanted.”

            (OK, note from this last article the specific comparison to the very observations that I have blogged about in more detail than most people probably ever wanted to hear about...My point is that, due to the technical problems in the Liang paper, I don’t think we can yet say the off-target effects were “much more than in previous experiments on adult human cells and animal embryos”. )

One final note - my analysis of this paper should not be interpreted to mean that I fully endorse CRISPR experimentation or applications in human embryos.   I also applaud the authors' cautionary tone that the incomplete efficiency of CRISPR editing in humans is a problem that any therapeutic applications need to address.

Whew, this was the longest post yet.   

Tuesday, April 14, 2015

Are there sequence preferences near the 3' end of the #CRISPR protospacer? Paper from the C. elegans field explores this.

When the first word in a paper title is "Dramatic", I certainly wonder if I will agree after reading it…It's worth a blog post at any rate.   This paper by Farboud, B. and Meyer, B.J. is titled "Dramatic Enhancement of Genome Editing byCRISPR/Cas9 Through Improved Guide RNA Design" (Genetics, Vol. 199, 959–971 April 2015).   

As is true for other model organisms, CRISPR is very useful in nematodes for performing mutagenesis.  In this paper the authors were inspired by the previous observation that the Cas9 protein physically associates with the PAM motif (NGG) sort of promiscuously across DNA.  This had also previously led to the discovery that - in vitro - a CRISPR target region that is generally rich in GG dinucleotides will enable higher rates of cleavage at a unique CRISPR target within that region, than if the region is otherwise reduced in GG content.  In other words, general GG density probably "attracts" Cas9 and keeps more of it around, which in turn may enable faster recognition of the actual target.

Using this idea, the authors tested whether simply keeping an extra GG motif nearby the actual PAM NGG motif would enhance CRISPR mutagenesis in worms.  Turns out that if the PAM is followed (3') by another NGG, it doesn't help.  However, if the first 3 bases 5' to the PAM are NGG - that is, the last 2 bases of the protospacer are GG - they saw a consistent, and yes, dramatic improvement in recovery rates of CRISPR mutants.  This was a pretty striking finding and was validated across about 8 to 10 sites.  For comparison they tested "shifted" targets where they just shifted the protospacer 5' by 3 bases, and used those last 3 bases of the first protospacer as the PAM as the control.   These were almost uniformally poor in terms of absolute numbers of mutagenesis, with numbers usually at zero - meaning with their particular system of worm injections the baseline rate is pretty low.   The targets that had the GG at the protospacer 3' end usually had high mutation rates in double digit percentages.

So is this observed in other animals/cells/systems?   Well, based on my own work and from what I see in the literature, in mice we definitely don't need to have a GG at the protospacer 3' end to get high efficiency mutagenesis.   There are still no consistent rules here, but there are trends for sure.  Cas9 does seem to prefer purines in the last two bases of the protospacer as this seems to enhance gRNA loading (Wang et al 2014).     GC richness across the protospacer is definitely good (Gagnon et al 2014) which must correlate with G's in the protospacer 3' end.  Interestingly, Doench et al 2014 observed a preference for purine in the last base of the protospacer, but not much preference for the penultimate base (see their Fig 3a).

On the other hand I can't identify many examples yet of mammalian CRISPR targets that were published, had a GG in the last bases of the protospacer, had hard mutagenesis rates published and enough other targets in the same paper for a good comparison.  

Thursday, April 2, 2015

New #CRISPR paper with more evidence that inhibiting NHEJ increases homology-mediated DNA repair. Improvements over SCR7.

Chu et al. have recently reported in Nature Biotechnology a detailed description of inhibiting the NHEJ pathway to enhance HDR (homology-directed repair) in cell culture CRISPR editing experiments.  This follows on the observation by Singh et al that SCR7, an inhibitor of ligase IV, accomplishes this in mouse embryos although the data set in the Singh paper was limited.  This new paper is very detailed in its implementation of a traffic-light reporter system to quantify NHEJ and HDR under various conditions.   

Since there's more than one way to potentially inhibit NHEJ, the authors used several approaches.  First, shRNA was used to silence ligase IV and/or KU70 or KU80.    The latter two proteins make up the KU heterodimer, required for NHEJ.     Second, they used SCR7.  Third, they coexpressed the adenovirus proteins E1B55K and E4orf6.   These are known target ligase IV for degradation. 

Before getting into more details, the bottom line was that each of these approaches (in human HEK293 cells) could both inhibit NHEJ and increase the rate of HDR at the same time.  However the approaches did this to varying degrees.  There was a clear trend that the more NHEJ inhibition occurred, the greater was the increase in HDR efficiency.   The most effective treatment was the coexpression of E1B55K + E4orf6.   Not only did this reduce NHEJ more than the other methods but it showed the greatest HDR increase as well.  

Figs. 1c, 2h, and 2j respectively suggest ~7x, ~8x, and ~3.4-51x increases in HDR were observed with E1B55K + E4orf6 coexpression as compared to the identical CRISPR/Cas experiments without these factors.

Therefore, transient NHEJ inhibition is looking better and better as a way to enhance HDR in CRISPR applications.  This will be likely be very important for gene therapy approaches, as well as just plain very useful for any HDR applications including of course, mouse embryo injections.   In fact, Chu et al go on to show that by combining flow-sorting of transfected cells with transient selection with an inserted antibiotic gene they were able to obtain essentially HDR-edited cell clones at essentially a 100% rate (e.g. supplemental fig. 13).  

Suddenly I am very interested in these adenoviral E1B55K and E4orf6 genes… These are apparently derived from the adenovirus C, a.ka. adenovirus 5, and Origene sells cDNAs and antibodies for these.    E4orf6 is also called E4orf6/7.