Good experimental design is important when validating hits from RNAi screens. Off-target effects from single siRNAs and low-complexity siRNA pools (e.g. Dharmacon siGENOME) result in high false-positive rates that must be sorted out in validation experiments.
Dharmacon siGENOME pools (SMARTpools) have 4 siRNAs, and the most common form of validation is to test the pool siRNAs individually (deconvolution).
Rather than deconvoluting the pool, a better approach is to test with independent reagents. Should the phenotype be due to the seed effects of an siRNA in the siGENOME pool, the new designs (with presumably different seed sequences) should not show them. (Note that because they have their own potential complicating off-targets, an even better option would be to use a reagent like siPOOLs that minimises the likelihood of off-target effects).
Independent validation reagents was the approach used by Li et al. in a screen looking for enhancers of antiviral protein ZAP activity.
They first did a genome-wide (18,200 genes) screen with siGENOME pools, looking for pools that increased viral infection rate.
The biggest effect was with the positive control, ZAP (aka ZC3HAV1). Several other pools also stood out as giving large increases in viral infection (Fig 1B):
They identified 90 non-control genes with reproducible Z-scores above 3 in their replicate experiments (~0.5% of screened genes).
These 90 genes were then tested with 3 Ambion Silencer siRNAs. (They also included a few genes in the validation round based on pathway information and off-target analysis– more on this below.)
Of the 90 candidate hit genes, only 11 could be confirmed (Fig 2B, note that ZC3HAV1/ZAP is the positive control and JAK1 was added to the validation round based on pathway info. A gene was considered confirmed if 2 of 3 siRNAs had a Z-score > 3.):
We also see that only 1 of the 7 top hits from the first round (blue genes in the first figure) was confirmed. This is a common observation in RNAi screens: the strongest phenotypes are mostly due to off-target effects.
Off-target effects are difficult to interpret, even using advanced analysis programs like Haystack or GESS. The authors tested 4 genes identified by Haystack as targets for seed-based off-targeting. None of those genes could be confirmed in the validation round.
Our 2014 Nucleic Acids Research paper provides an excellent overview of the siPOOL technology. Google Scholar shows that our paper has been cited 64 times.
To put this into perspective, the 2012 PLoS One paper on C911 controls by Buehler et al. has 72 citations. C911 controls are probably the most effective way to determine whether a single-siRNA phenotype is due to an off-target effect.
These citation numbers show that siPOOLs have good mind share when researchers consider the issue of RNAi off-target effects.
We have noticed, however, that in some cases our NAR paper is cited to justify approaches that we do not endorse.
For example, two recent papers (1, 2) cite our paper as support for the use of Dharmacon ON-TARGETplus 4-siRNA pools to reduce the potential for off-target effects.
Our paper shows, however, that high-complexity siRNA pools (> 15 siRNAs) are needed to reliably reduce off-target effects.
This blogpost describes issues encountered in target validation and how to safeguard against poor reproducibility in RNAi experiments.
The importance of target validation
More than half of all clinical trials fail from a lack of drug efficacy. One of the major reasons for this is inadequate target validation.
Target validation involves verifying whether a target (protein/nucleic acid) merits the development of a drug (small molecule/biologic) for therapeutic application.
Failing to adequately validate a target can burden a pharma with roughly 800 million to 1.4 billion in drug development costs. Impact is not only monetary as largesite closures often result as companies struggle to save costs and a reduced production effort deprives patients of new medicines.
Performing target validation well
Special attention should therefore be given to performing target validation techniques well.
Many of these techniques involve inhibiting target expression to establish its relevance in a cellular or animal disease model. This can be performed with chemical probes, RNA interference (RNAi), genetic knock-outs, and even targeted protein degradation.
The reproducibility of these techniques however has been an issue of concern for drug developers. Less than half of all findings from peer-reviewed scientific publications was reported to be successfully reproduced.
Cellular phenotypes caused by a chemical or genetic perturbant should be considered to be off-target until proved otherwise, especially when the phenotypes were detected in a down assay and therefore could reflect a nonspecific loss of cellular fitness. It is only by performing rescue experiments that one can formally address whether the effects of a perturbant are on-target.
The comment highlights the issue of reagent non-specificity as a notable contribution towards poor reproducibility.
Certainly, for RNAi the wide-spread off-target effects of siRNAs has been observed in numerous publications. The mechanism being well-established to be based on microRNA-like seed-based recognition of non-target genes. The effect dominates over on-target effects in many large RNAi screens, illustrating the depth of the problem.
Reagent non-specificity is not restricted to RNAi. There have been multiple reports of non-specificity for gene editing technique, CRISPR, which can be read about in detail here, here and here. Recent publications continue to shed more light on its potential off-targets as we learn more about this relatively new technique.
Even chemical probes may have multiple targets. It is hence imperative that more than one target validation technique be used to avoid confirmation bias.
Target validation – a story from Pharma
Back in 2013, when siTOOLs was just starting out, a pharma approached us with a target validation problem.
They were obtaining different results with 3 different siRNAs in a cellular proliferation assay. Despite all 3 siRNAs potently downregulating the target gene, they produced different effects on cell viability.
Which siRNA tool to trust?
A whole-transcriptome expression analysis performed for the 3 siRNAs and a siPOOL designed against the same target revealed the reason for the large variability.
Despite all siRNA tools affecting the same target, the difference in extent of gene deregulation was astounding. With the greatest number of off-target effects, it was not surprising that siRNA 3 showed an impact on cell proliferation.
In contrast, siPOOLs had 5 to 25X less differentially expressed genes compared to the 3 commercial siRNAs against the same target. An expression analysis carried out for another gene target showed similar results i.e. siPOOLs having far less off-targets.
The target was dropped from development. A great example where failing early is a good thing, though it was not without costs from validating the multiple siRNAs.
The recommended target validation tool
Functioning like a pack of wolves, siPOOLs increase the chances of capturing large and difficult prey, while making full use of group diversity to compensate for individual weakness.
siPOOLs efficiently counter RNAi off-target effects by high complexity pooling of sequence-defined siRNAs. This enables individual siRNAs to be administered at much lower concentrations, below the threshold for stimulating significant off-target gene deregulation. Due to having multiple siRNAs against the same target gene, target gene knock-down is maintained and in fact becomes more efficient.
We still recommend using multiple target validation techniques. As a first evaluation however, siPOOLs are quick, easy and most of all, reliable.
Summary:Low-complexity siRNA pooling (e.g. Dharmacon siGENOME SMARTpools) does not prevent siRNA off-targets. It may in fact exacerbate off-target effects. Only high-complexity pooling (siPOOLs) can reliably ensure on-target phenotypes.
Low-complexity pooling increases the number of siRNA off-targets
One of the claims often made in favour of low-complexity pooling (e.g Dharmacon siGENOME SMARTpools) is that this pooling reduces the number of seed-based off-target effects compared to single siRNAs.
If this were true, we would expect different low-complexity siRNA pools for the same gene to give similar phenotypes. But this is not the case.
Published expression data shows that low-complexity pooling actually increases the number of off-targets.
Kittler et al. (2007) looked at the effect of combining differing number of siRNAs in low to medium complexity siRNA pools (siRNA pools sizes were: 1, 3, 5, 9, and 12).
Their work showed that the number of down-regulated genes (50% or greater silencing) actually increases when small numbers of siRNAs are combined. Only when larger numbers of siRNAs are combined does the number of off-targets start to drop:
[The figure is based on data from GEO dataset GSE6807. Down-regulated genes are those whose expression is reduced by 50% or more. Note that the orange point is taken from our 2014 NAR paper, as we are not aware of other published expression datasets with this many pooled siRNAs. A few caveats with combining these datasets are that they use different target genes, siRNA concentrations, and the data comes from a different expression platform.]
Low-complexity pooling: a bad solution for siRNA off-targets
Low-complexity pooling does not get rid of the main problem associated with single siRNAs: seed-based off-target effects. Based the above analysis, it can make it even worse. It also prevents use of the most effective computational measures against seed effects.
Redundant siRNA Activity (RSA) is a common on-target hit analysis method for single-siRNA screens. It checks how over-represented the siRNAs for a gene are at the top of a ranked screening list. If a gene has 2 or more siRNAs near the top of the list, it will score better than a gene that only has a single siRNA near the top of the list. This is one way to reduce the influence of strong off-target siRNAs.
Correcting single siRNA values by seed medians has also been shown to be an effective way to increase the on-target signal in screens. This correction is not effective for low-complexity pools, since each pool can contain 3-4 different seeds.
Off-target based hit detection algorithms (e.g. Haystack and GESS) are also only effective for single-siRNA screens. The advantage of these algorithms is that it permits the detection of hit genes that were not screened with on-target siRNAs. These algorithms are not effective for low-complexity pool screens.
Our recommendation: do not convert single siRNAs into low-complexity pools, rather use high-complexity siPOOLs to confirm hits
We do not recommend that screeners combine their single siRNA libraries into low-complexity pools (e.g. combining 3 Silencer Select siRNAs for the same target gene). If possible, it is better to screen the siRNAs individually and then apply seed-based correction, RSA and seed-based hit-detection algorithms.
The time saved by only screening one well per target may prove illusory when the deconvolution experiments show that the individual siRNAs have divergent phenotypes.
It is probably better to deal with off-target effects up front (by screening single siRNAs) than to be surprised by them later in the screen (during pool deconvolution).
Summary: Correcting for seed-based off-targets can improve the results from RNAi screening. However, the correlation between siRNAs for the same gene is still poor and the strongest screening hits remain difficult to interpret.
Seed-based off-target correction has little effect on reagent reproducibility
Given that seed-based off-targets are the main cause of phenotypes in RNAi screening, trying to correct for those effects makes good sense.
The dominance of seed-based off-targets means that independent siRNAs for the same gene usually show poor correlation.
If one could correct for the seed effect, the correlation between siRNAs targeting the same gene may improve.
One straightforward way to do seed correction is to subtract the ‘seed median’ from each siRNA. (The seed median is the median for all siRNAs having the given seed.)
This was the approach used by Grohar et al. in a recent genome-wide survey of EWS-FLI1 splicing (involved in Ewing sarcoma). They used the Silencer Select library, which has 3 siRNAs per target gene.
After seed correction, there is only minor improvement in the correlation between siRNAs targeting the same gene. The intra-class correlation (ICC) improves from 0.031 to 0.037. The ICC for siRNAs with the same 7-mer seed decreases from 0.576 to 0.261.
Although we have reduced the seed-based signal, it has not resulted in a correspondingly large improvement in the gene-based signal.
More sophisticated seed correction can improve reagent correlation
Grohar et al. used a simple seed-median subtraction method to correct their screening results.
A more sophisticated method (scsR) was developed by Franceshini et al. for seed-based correction of screening data. It corrects using the mean value for siRNAs with the same seed, and weighs the correction using the standard deviation the values. This allows seeds with a more consistent effect to contribute more to the data normalisation.
Applying the scsR method to the Grohar data, ICC for siRNAs targeting the same gene increases from 0.031 to 0.041. It is better than the increase with seed-median subtraction (0.037), but is still only a fairly minor improvement (plot created using random selection of 10,000 pairs of siRNAs that target the same gene):
Off-target correction increases double-hit rate in top siRNAs of RNAi screen
The following plot shows the count for single-hit and double-hit genes as we go through the top 1000 siRNAs (of ~60K screened in total). Double-hit means that the gene is covered by 2 (or more) hit siRNAs.
Despite the small improvement in reagent correlation, the double-hit rate is essentially the same using simple seed-median subtraction or the more advanced scsR method.
Furthermore, the number of double-hits is higher than what we’d expect by chance.
This shows that, despite the noise from off-target effects, there is some on-target signal that can be detected.
siRNAs with the strongest phenotypes remain difficult to interpret
Despite the fact that the double-hit count is higher than expected by change, most of the genes targeted by the strongest siRNAs are single-hits. siRNAs with the strongest phenotypes remain difficult to interpret.
Seed correction is best suited for single-siRNA libraries. Low-complexity pools, like siGENOME or ON-TARGETplus, are less amenable to effective seed correction since there are (usually) 4 different seeds per pool. This reduces the effectiveness of seed-based correction, even though seed-based off-target effects remain the primary determinant of observed phenotypes (as discussed here, here , and here).
The best way to correct for seed-based off-targets is to avoid them in the first place. Using more specific reagents, like high-complexity siPOOLs, is the key to generating interpretable RNAi screening results.
As discussed previously, deconvoluted Dharmacon siGENOME pools often give surprising results. (Deconvolution is the process of testing the 4 siRNAs in a pool individually. This is usually done in the validation phase of siRNA screens.)
One way to compare the relative contribution of target gene and off-target effects is to calculate the correlation between reagents having the same target gene or the same seed sequence. One of the first things we do when analysing single siRNA screens is to calculate a robust form of the intraclass correlation (rICC, see discussion at bottom for more about this).
Recently we were analysing deconvolution data from Adamson et al. (2012) and calculated the following rICC’s. (The phenotype measured was relative homologous recombination.)
Besides the order of magnitude difference between target gene and antisense seed correlation (which is commonly observed in RNAi screens), what stands out is the ~2-fold difference between the correlation by target gene and sense seed.
Very little of the the sense strand should be loaded into RISC, if the siRNAs were designed with appropriate thermodynamic considerations (the 5′ antisense end should be less stable than the 5′ sense end, to ensure that the antisense strand is preferentially loaded into RISC).
The above correlations suggest that some not insubstantial amount of sense strand is making it into the RISC complex.
Here is the distribution of delta-delta-G for siPOOLs and siGENOME siRNAs targeting the same 500 human kinases (see bottom of post for discussion of calculation). A positive delta-delta G means that the sense end is more thermodynamically stable than the antisense end, favouring the loading of the antisense strand into RISC.
This discrepancy in delta-delta G is also consistent with comparison of mRNA knockdown:
The siGENOME knockdown data comes from 774 genes analysed by qPCR in Simpson et al. (2008). The siPOOL knockdown data is from 223 genes where we have done qPCR validation.
Of note, the siGENOME pools were tested at 100 nM, whereas siPOOLs were tested at 1 nM.
(It should be mentioned that, although consistent with the observed differences in ddG, this is only an indirect comparison, and delta-delta G is not the only determinant of functional siRNAs.)
Notes on intraclass correlation
Intraclass correlation measures the agreement between multiple measurements (in this case, multiple siRNAs with the same target gene, or multiple siRNAs with the same seed sequence). One could also pair off all the repeated measures and calculate correlation using standard methods (parametrically using Pearson’s method, or non-parametrically using Spearman’s method). The main problem with such an approach is that there is no natural way to determine which measure goes in the x or y column. Correlations are normally between different variables (e.g. height and weight). In a case of repeated measures, there is no natural order, so the intraclass correlation (ICC) is the more correct way to measure the similarity of within-group measurements. As ICC depends on a normal distribution, datasets must first be examined, and if necessary, transformed beforehand.
Robust methods have the advantage of permitting the use of untransformed data, which is especially useful when running scripts across hundreds of screening dataset features. The algorithm we use calculates a robust approximation of the ICC by combining resampling and non-parametric correlation.
Here is the algorithm, in a nutshell:
Group observations (e.g. cell count) by the grouping variable (e.g. target gene or antisense seed)
Randomly assign one value of each group to the x or y column (groups with one 1 observation are skipped)
for example, if the grouping variable is target gene and siRNAs targeting PLK1 had the values 23, 30, 37, 45, the program would randomly choose 1 of the values for the x column and another for the y column
Calcule Spearman’s rho (non-parametric measure of correlation)
Repeat steps 1-3 a set number of times (e.g. 300) and store the calculated rho’s
Calculate mean of the rho values from 4. This is the robust approximation of the ICC (rICC).
Values from 4 are also used to calculate confidence intervals.
The program that calculates this is available upon request.
Notes on calculating delta-delta G
Delta-delta G was calculated using the Vienna RNA package, as detailed here: https://www.biostars.org/p/58979/ (in answer by Brad Chapman).
The delta-delta G was calculated using 3 terminal bps. We found that that ddG of the terminal 3 bps had the strongest correlation with observed knockdown. Others (e.g. Schwarz et al., 2003 and Khvorova et al., 2003) have also used the terminal 4 bps.
Want to receive regular blog updates? Sign up for our siTOOLs Newsletter:
A recent article in The Scientist asks whether, in light of a paper by Lin et al. showing phenotypic discrepancies between RNAi and CRISPR, this is not ‘the last nail in the coffin for RNAi as a screening tool’?
The paper in question found that a gene (MELK) that had been shown by many RNAi-based studies to be critical for several cancer types shows no effect when knocked out via CRISPR. They also report that in relevant published genome-wide screens, MELK was not at the top of the hit lists.
Does this mean that the papers that used RNAi were unlucky and off-target effects were responsible for their observed phenotypes?
Gray et al. identified MELK as a gene of interest based on microarray experiments. They then designed RNAi experiments to test its role in proliferation. Assuming that this study and the subsequent ones followed good RNAi experimental design (using reagents with varying seed sequences, testing the correlation between gene knockdown and phenotypic strength, etc.), we can be fairly confident that MELK is involved in proliferation. It might not be the most essential player, which would explain why it is not at the top of screening hit lists. And screening lists have the draw-back of enriching foroff-target hits.
Another possibility is that Lin et al. have observed a known complicating feature of knock-out screens: genetic compensation. Although they undertake experiments to address this issue, it could be that compensation takes place too quickly for their experiments to rule it out. Furthermore, they could have addressed this issue by testing knock-down reagents themselves, and checking whether genes they hypothesise as responsible for the supposed off-target effect in the published RNAi work are in fact down-regulated. C911 reagents could also be used to test for off-target effects. This is extra work, but given that they are disputing the results in many published studies, this seems justified.
As regards the role of RNAi in screening, The Scientist concludes with the following (suggesting that their answer to the question of whether this is the final nail is also No):
In the meantime, one obvious solution to the problem of target identification and validation is to use both CRISPR and RNAi to validate a target before it moves into clinical research, rather than relying on a single method. “We have CRISPR and short hairpin reagents for every gene in the human genome,” said Bernards. “So when we see a phenotype with CRISPR, we validate with short hairpin, and the other way around. I think that would be ideal.”
Although we agree that validating CRISPR hits with RNAi reagents is important (especially if drugability is a concern), one has to be careful with RNAi reagents, like single siRNAs/shRNAs or low-complexity pools, that are susceptible to seed-based off-target effects. For validating CRISPR screening hits, siPOOLs provide the best protection against unwanted off-target effects, saving you time, money, and disappointment during the validation phase.
Want to receive regular blog updates? Sign up for our siTOOLs Newsletter:
In our last blog entry, we discussed a classic RNAi screening paper from 2005 that showed that the top 3 screening hits were were due to off-target effects.
In this post, we analyse a more recent genome-wide RNAi screen by Hasson et al., looking in more detail at what proportion of top screening hits are due to on- vs. off-target effects.
Hasson et al. used the Silencer Select library, a second-generation siRNA library designed to optimise on-target knock down, and chemically modified to reduce off-target effects. Each gene is covered by 3 different siRNAs.
To begin the analysis, we ranked the screened siRNAs in descending order of % Parkin translocation, the study’s main readout.
We then performed a hypergeometric test on all genes covered by the ranked siRNAs. For example, if gene A has three siRNAs that rank 30, 44, and 60, we calculate a p-value for the likelihood of having siRNAs that rank that highly (more details provided at bottom of this post). It’s the underlying principle of the RSA algorithm, widely used in RNAi screening hit selection. If the 3 siRNAs for gene B have a ranking of 25, 1000, and 1500, the p-value will be higher (worse) than for gene A.
The same type of hypergeometric testing was done for the siRNA seeds in the ranked list. For example, if the seed ATCGAA was found in siRNAs having ranks of 11, 300, 4000, and 6000, we would calculate the p-value for those rankings. Seeds are over-represented in siRNAs at the top of the ranked list will have lower p-values.
After doing these hypergeometric tests, we had a gene p-value and a seed p-value for each row in the ranked list. We could then look at each row in the ranked list estimate whether the phenotypic is due to an on- or off-target effect by comparing the gene and seed p-values. [As a cutoff, we said that the effect is due to one of either gene or seed if the difference in p-value is at least two orders of magnitude. If the difference is less than this, the cause was considered ambiguous.]
After assigning the effect as gene/seed/ambiguous, we then calculated the cumulative percent of hits by effect at each position in the ranked list. Those fractions were then plotted as a stacked area chart (here, looking at the top 200 siRNAs from the screen):
The on-target effect is sandwiched between the massive ‘bun’ of off-target effects and ambiguous cause. We are reminded of these classic commercials from the 80s:
Want to receive regular blog updates? Sign up for our siTOOLs Newsletter:
Note on p-value calculations:
P-values were calculated using the cumulative hyper-geometric test (tests the probability of finding that many or more instances of members belonging to the particular group, in our case a particular gene or seed sequence). The p-value associated with a gene or seed is the best p-value for all the performed tests. For example, assume a gene had siRNAs with the following ranks: 5, 20, 1000. The first test calculates the p-value for finding 1 (of the 3) siRNAs when taking a sample of 5 siRNAs. The next test calculates the p-value for finding 2 (of 3) siRNAs when taking a sample of 20 siRNAs. And the last is the probability of getting 3 (of 3) siRNAs when taking a sample of 1000. If the best p-value came from the second test (2 of 3 siRNAs found in a sample size of 20), that is the p-value that the gene receives. This is also the approach used by the RSA (redundant siRNA activity) algorithm. One advantage of RSA is that it can compensate for variable knock down efficiency of the siRNAs covering a gene (e.g. if 1 of 3 gives little knockdown).
Classic Papers Series: Lin et al. show RNAi screen dominated by seed effects
Over the coming months, we will highlight a number of seminal papers in the RNAi field.
The first such paper is from 2005 by Lin et al. of Abbott Laboratories, who showed that the top hits from their RNAi screen were due to seed-based off-target effects, rather than the intended (and at that time, rather expected) on-target effect.
The authors screened 507 human kinases with 1 siRNA per gene, using a HIF-1 reporter assay to identify genes regulating hypoxia-induced HIF-1 response.
In the validation phase of their screen, they tested new siRNAs for hit genes, but found that they failed to reproduce the observed effect, even when using siRNAs that had a better on-target knock down than the pass 1 siRNAs.
Figure 1A. Left panel shows on-target knock down of pass 1 siRNA for GRK4 (O) and the new design (N). Centre panel shows Western blot of protein levels Right panel shows HIF-1 reporter activity for positive control (HIF1A) and the original (O) and new (N) siRNAs.
The on-target knock down is much-improved for the new design, yet its reporter activity is indistinguishable from negative control. Yet the pass 1 siRNA with poor knock down gives almost as strong a result as HIF1A (positive control).
By qPCR, they then showed that GRK4(O) and another one of the top 3 siRNAs silence HIF1A (the positive control gene). Using a number of different target constructs they also nicely show that it was due to seed-based targeting in the 3′ UTR.
Although the authors screened at a high initial concentration (100 nM), the observed off-targets persisted at 5 nM, suggesting that just screening at lower concentrations would not have improved their results.
The authors conclude:
In addition, due to the large percentage of the off- target hits generated in the screening, using a redundant library without pooling in the primary screen could significantly reduce the efforts required to eliminate off-target false positives and therefore, will be a more efficient design than using a pooled library.
Consistent with the work of Rossi et al. (discussed previously), another recent paper shows a lack of phenotypic response when knocking out a gene that gives a phenotypic response when knocked down.
Knocking out klf2a does not result in any discernible difference from wild-type (whereas knock-down has been shown to produce a range of cardiovascular phenotypes).
The authors conclude:
In summary, our work shows that even in the face of clear evidence of a potentially disruptive mutation induced in a gene of interest, it is currently very difficult to be certain that this leads to loss-of-function, and hence to be confident about the role of the gene in embryonic development.