Pooling for qPCR

Pooling for qPCR

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I am comparing miRNA expression levels in 3 different groups but I am low on money and time. I have to get some preliminary results to get the actual research going so I decided to pool my samples and get the results for 372 miRNAs and then if any of them shows prominent differential expression, I will investigate the expression levels of those candidate miRNAs in larger groups. But I can't decide how to group my samples for pooling. Should I pick 3 individual samples for each of the 3 groups (9 samples in total) or should I extract the total RNA for each sample for each group, make 3 pools of RNA and then just perform triplicates for each pool? The time and money spent in both situations is going to be the same. I think the second method is more appropriate yet I fear it might produce some deflated p-values, resulting in false discovery.

Should I pick 3 individual samples for each of the 3 groups (9 samples in total) or should I extract the total RNA for each sample for each group, make 3 pools of RNA and then just perform triplicates for each pool?

Depends on the nature of your samples. In general, pooling would average out the expression of different individuals. Pooling is generally done when you have low RNA yield per individual sample. If that is not the case then you should sample multiple individuals per group (three should be fine; more is better). This will let you know inter-individual/sample variation within a group. You can in turn set up three technical replicates for the qPCR to measure technical variation. So, in total, you would have 27 PCR reactions. This would not be very costly (unless you are using TaqMan like assay. Even then, sometimes working with more number of samples allows you to save the amount of reagent used per sample.

Basic Principles of RT-qPCR

Quantitative reverse transcription PCR (RT-qPCR) is used when the starting material is RNA. In this method, RNA is first transcribed into complementary DNA (cDNA) by reverse transcriptase from total RNA or messenger RNA (mRNA). The cDNA is then used as the template for the qPCR reaction. RT-qPCR is used in a variety of applications including gene expression analysis, RNAi validation, microarray validation, pathogen detection, genetic testing, and disease research.

One-step vs. Two-step RT-qPCR

RT-qPCR can be performed in a one-step or a two-step assay (Figure 1, Table 1). One-step assays combine reverse transcription and PCR in a single tube and buffer, using a reverse transcriptase along with a DNA polymerase. One-step RT-qPCR only utilizes sequence-specific primers. In two-step assays, the reverse transcription and PCR steps are performed in separate tubes, with different optimized buffers, reaction conditions, and priming strategies.

Figure 1. One-Step vs. Two-Step RT-qPCR.

Table 1. Advantages and Disadvantages when using one-step versus two-step assays in RT-qPCR

  • Less experimental variation since both reactions take place in the same tube
  • Fewer pipetting steps reduces risk of contamination
  • Suitable for high throughput amplification/screening
  • Fast and highly reproducible
  • Impossible to optimize the two reactions separately
  • Less sensitive than two-step because the reaction conditions are a compromise between the two combined reactions
  • Detection of fewer targets per sample
  • A stable cDNA pool is generated that can be stored for long periods of time and used for multiple reactions
  • The target and reference genes can be amplified from the same cDNA pool without multiplexing
  • Optimized reaction buffers and reaction conditions can be used for each individual reaction
  • Flexible priming options
  • The use of several tubes and pipetting steps exposes the reaction to a greater risk of DNA contamination
    Time consuming
  • Requires more optimization than one-step

Choosing total RNA vs. mRNA

When designing a RT-qPCR assay it is important to decide whether to use total RNA or purified mRNA as the template for reverse transcription. mRNA may provide slightly more sensitivity, but total RNA is often used because it has important advantages over mRNA as a starting material. First, fewer purification steps are required, which ensures a more quantitative recovery of the template and a better ability to normalize the results to the starting number of cells. Second, by avoiding any mRNA enrichment steps, one avoids the possibility of skewed results due to different recovery yields for different mRNAs. Taken together, total RNA is more suitable to use in most cases since relative quantification of the targets is more important for most applications than the absolute sensitivity of detection 1 .

Primers for Reverse Transcription

Three different approaches can be used for priming cDNA reactions in two-step assays: oligo(dT) primers, random primers, or sequence specific primers (Figure 2, Table 2). Often, a mixture of oligo(dT)s and random primers is used. These primers anneal to the template mRNA strand and provide reverse transcriptase enzymes a starting point for synthesis.

Figure 2. Four different priming methods for the reverse transcription step in two-step assays of RT-qPCR.

Table 2. Primer considerations for the cDNA synthesis step of RT-qPCR. Combining random primers and anchored oligo(dT) primers improves the reverse transcription efficiency and qPCR sensitivity.

  • Generation of full length cDNA from poly(A)-tailed mRNA
  • Good to use if little starting material is available
  • Anchor ensures that the oligo(dT) primer binds at the 5′ end of the poly(A) tail of mRNA
  • Only amplify gene with a poly(A) tail
  • Truncated cDNA from priming internal poly(A) sites*2
  • Bias towards 3′ end*
  • Anneal to all RNA (tRNA, rRNA, and mRNA)
  • Good to use for transcripts with significant secondary structures, or if little starting material is available
  • High cDNA yield
  • cDNA is made from all RNAs which is not always desirable and can dilute mRNA signal
  • Truncated cDNA
  • Specific cDNA pool
  • Increased sensitivity
  • Use reverse qPCR primer
  • Synthesis is limited to one gene of interest

Reverse Transcriptase Enzymes

Reverse Transcriptase is the enzyme that makes DNA from RNA. Some enzymes have RNase activity to degrade the RNA strand in the RNA-DNA hybrid after transcription. If an enzyme does not possess RNase activity, an RNaseH may be added for better qPCR efficiency. Commonly used enzymes include Moloney murine leukemia virus reverse transcriptase and Avian myeloblastosis virus reverse transcriptase. For RT-qPCR, it is ideal to choose a reverse transcriptase with high thermal stability, because this allows cDNA synthesis to be performed at higher temperatures, ensuring successful transcription of RNA with high levels of secondary structure, while maintaining their full activity throughout the reaction producing higher cDNA yields.

RNase H Activity of Reverse Transcriptase

RNase H activity degrades RNA from RNA-DNA duplexes to allow efficient synthesis of double-stranded DNA. However, with long mRNA templates, RNA may be degraded prematurely resulting in truncated cDNA. Hence, it is generally beneficial to minimize RNase H activity when aiming to produce long transcripts for cDNA cloning. In contrast, reverse transcriptases with intrinsic RNase H activity are often favored in qPCR applications because they enhance the melting of RNA-DNA duplex during the first cycles of PCR (Figure 3).

Figure 3. RNase H Activity of reverse transcriptases. In qPCR, use a reverse transcriptase with RNAse activity.

PCR, qPCR - Biology bibliographies - in Harvard style

Your Bibliography: Bunnell, T., Burbach, B., Shimizu, Y. and Ervasti, J., 2011. -Actin specifically controls cell growth, migration, and the G-actin pool. Molecular Biology of the Cell, 22(21), pp.4047-4058.

Campbell, N. A. and Reece, J. B.


2005 - Pearson, Benjamin Cummings - San Francisco

In-text: (Campbell and Reece, 2005)

Your Bibliography: Campbell, N. and Reece, J., 2005. Biology. San Francisco: Pearson, Benjamin Cummings.

Man, P.

Polymerase Chain Reaction: What Importance Does It Hold?

In-text: (Man, 2015)

Your Bibliography: Man, P., 2015. Polymerase Chain Reaction: What Importance Does It Hold?. [online] Available at: <> [Accessed 17 December 2015].

Gel Electrophoresis of PCR Products | National Diagnostics

In-text: (Gel Electrophoresis of PCR Products | National Diagnostics, 2015)

Your Bibliography: 2015. Gel Electrophoresis of PCR Products | National Diagnostics. [online] Available at: <> [Accessed 16 December 2015].

Understanding and measuring variations in DNA sample quality

In-text: (Understanding and measuring variations in DNA sample quality, 2015)

Your Bibliography: 2015. Understanding and measuring variations in DNA sample quality. [online] Available at: <> [Accessed 16 December 2015].

Packwood, T.

In-text: (Packwood, 2015)

Your Bibliography: Packwood, T., 2015. PCR. [online] Available at: <> [Accessed 16 December 2015].

QPCR vs. Digital PCR vs. Traditional PCR | Thermo Fisher Scientific

In-text: (qPCR vs. Digital PCR vs. Traditional PCR | Thermo Fisher Scientific, 2015)

Evaluation of COVID-19 RT-qPCR test in multi-sample pools

The recent emergence of SARS-CoV-2 lead to a current pandemic of unprecedented levels. Though diagnostic tests are fundamental to the ability to detect and respond, many health systems are already experiencing shortages of reagents associated with this test. Here, testing a pooling approach for the standard RT-qPCR test, we find that a single positive sample can be detected even in pools of up to 32 samples, with an estimated false negative rate of 10%. Detection of positive samples diluted in even up to 64 samples may also be attainable, though may require additional amplification cycles. As it uses the standard protocols, reagents and equipment, this pooling method can be applied immediately in current clinical testing laboratories. We hope that such implementation of a pool test for COVID-19 would allow expanding current screening capacities thereby enabling the expansion of detection in the community, as well as in close integral groups, such as hospital departments, army units, or factory shifts.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

Yad Hanadiv - The Rothschild Foundation in Israel

Author Declarations

All relevant ethical guidelines have been followed any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Pooling strategies for COVID-19 testing

David Austin
Grand Valley State University
Email David Austin

While the COVID-19 pandemic has brought tragedy and disruption, it has also provided a unique opportunity for mathematics to play an important and visible role in addressing a pressing issue facing our society.

By now, it's well understood that testing is a crucial component of any effective strategy to control the spread of the SARS-CoV-2 coronavirus. Countries that have developed effective testing regimens have been able, to a greater degree, to resume normal activities, while those with inadequate testing have seen the coronavirus persist at dangerously high levels.

Developing an effective testing strategy means confronting some important challenges. One is producing and administering enough tests to form an accurate picture of the current state of the virus' spread. This means having an adequate number of trained health professionals to collect and process samples along with an adequate supply of reagents and testing machines. Furthermore, results must be available promptly. A person is who unknowingly infected can transmit the virus to many others in a week, so results need to be available in a period of days or even hours.

One way to address these challenges of limited resources and limited time is to combine samples from many patients into testing pools strategically rather than testing samples from each individual patient separately. Indeed, some well-designed pooling strategies can decrease the number of required tests by a factor of ten that is, it is possible to effectively test, say, 100 patients with a total of 10 tests.

On first thought, it may seem like we're getting something from nothing. How could 10 tests yield accurate results for 100 patients? This column will describe how some mathematical ideas from compressed sensing theory provide the key.

One of the interesting features of the COVID-19 pandemic is the rate at which we are learning about it. The public is seeing science done in public view and in real time, and new findings sometimes cause us to amend earlier recommendations and behaviors. This has made the job of communicating scientific findings especially tricky. So while some of what's in this article may require updating in the near future, our aim is rather to focus on mathematical issues that should remain relevant and trust the reader to update as appropriate.

Some simple pooling strategies

While the SARS-CoV-2 virus is new, the problem of testing individuals in a large population is not. Our story begins in 1943 when Robert Dorfman proposed the following simple method for identifying syphilitic men called up for induction through the war time draft.

A 1941 poster encourages syphilis testing (Library of Congress)

Suppose we have samples from, say, 100 patients. Rather than testing each of the samples individually, Dorfman suggested grouping them into 10 pools of 10 each and testing each pool.

If the test result of a pool is negative, we conclude that everyone in that pool is free of infection. If a pool tests positively, then we test each individual in the pool.

In the situation illustrated above, two of the 100 samples are infected, so we perform a total of 30 tests to identify them: 10 tests for the original 10 pools followed by 20 tests for each member of the two infected pools. Here, the number of tests performed is 30% of the number required had we tested each individual separately.

Of course, there are situations where this strategy is disadvantageous. If there is an infected person in every pool, we end up performing the original 10 tests and follow up by then testing each individual. This means we perform 110 tests, more than if we had just tested everyone separately.

What's important is the prevalence $p$ of the infection, the expected fraction of infected individuals we expect to find or the probability that a random individual is infected. If the prevalence is low, it seems reasonable that Dorfman's strategy can lead to a reduction in the number of tests we expect to perform. As the prevalence grows, however, it may no longer be effective.

It's not too hard to find how the expected number of tests per person varies with the prevalence. If we arrange $k^2$ samples into $k$ pools of $k$ samples each, then the expected number of tests per person is $ E_k = frac1k + (1-(1-p)^k). $ When $k=10$, the expected number $E_<10>$ is shown on the right. Of course, testing each individual separately means we use one test per person so Dorfman's strategy loses its advantage when $E_kgeq 1$. As the graph shows, when $p>0.2$, meaning there are more than 20 infections per 100 people, we are better off testing each individual.

Fortunately, the prevalence of SARS-CoV-2 infections is relatively low in the general population. As the fall semester began, my university initiated a randomized testing program that showed the prevalence in the campus community to be around $papprox 0.01$. Concern is setting in now that that number is closer to 0.04. In any case, we will assume that the prevalence of infected individuals in our population is low enough to make pooling a viable strategy.

Of course, no test is perfect. It's possible, for instance, that an infected sample will yield a negative test result. It's typical to characterize the efficacy of a particular testing protocol using two measures: sensitivity and specificity. The sensitivity measures the probability that a test returns a positive result for an infected sample. Similarly, the specificity measures the probability that a test returns a negative result when the sample is not infected. Ideally, both of these numbers are near 100%.

Using Dorfman's pooling method, the price we pay for lowering the expected number of tests below one is a decrease in sensitivity. Identifying an infected sample in this two-step process requires the test to correctly return a positive result both times we test it. Therefore, if $S_e$ is the sensitivity of a single test, Dorfman's method has a sensitivity of $S_e^2$. For example, a test with a sensitivity of 95% yields a sensitivity around 90% when tests are pooled.

There is, however, an increase in the specificity. If a sample is not infected, testing a second time increases the chance that we detect it as such. One can show that if the sensitivity and specificity are around 95% and the prevalence at 1%, then pooling 10 samples, as shown above, raises the specificity to around 99%.

Some modifications of Dorfman's method

It's possible to imagine modifying Dorfman's method in several ways. For instance, once we have identified the infected pools in the first round of tests, we could apply a pooling strategy on the smaller set of samples that still require testing.

A second possibility is illustrated below where 100 samples are imagined in a square $10 imes10$ array. Each sample is included in two pools according to its row and column so that a total of 20 tests are performed in the first round. In the illustration, the two infected samples lead to positive results in four of the pools, two rows and two columns.

We know that the infected samples appear at the intersection of these two rows and two columns, which leads to a total of four tests in the second round. Once again, it's possible to express $E_k$, the number of expected tests per individual in terms of the prevalence $p$. If we have $k^2$ tests arranged in a $k imes k$ array, we see that $ E_k =frac2k + p + (1-p)(1-(1-p)^, $ if we assume that the sensitivity and specificity are 100%.

The graph at right shows the expected number of tests using the two-dimensional array, assuming $k=10$, in red with the result using Dorfman's original method in blue. As can be seen, the expected number of tests is greater using the two-dimensional approach since we invest twice as many tests in the first round of testing. However each sample is included in two tests in the initial round. For an infected sample to be misidentified, both tests would have to return negative results. This means that the two-dimensional approach is desirable because the sensitivity of this strategy is greater than the sensitivity of the individual tests and we still lower the expected number of tests when the prevalence is low.

While it is important to consider the impact that any pooling strategy has on these important measures, our focus will, for the most part, take us away from discussions of specificity and sensitivity. See this recent Feature column for a deep dive into their relevance.

Theoretical limits

There has been some theoretical work on the range of prevalence values over which pooling strategies are advantageous. In the language of information theory, we can consider a sequence of test results as an information source having entropy $I(p) = -plog_2(p) - (1-p)log_2(1-p)$. In this framework, a pooling strategy can be seen as an encoding of the source to effectively compress the information generated.

Sobel and Groll showed that $E$, the expected number of tests per person, for any effective pooling method must satisfy $E geq I(p)$. On the right is shown this theoretical limit in red along with the expected number of tests under the Dorfman method with $k=10$.

Further work by Ungar showed that when the prevalence grows above the threshold $pgeq (3-sqrt<5>)/2 approx 38\%$, then we cannot find a pooling strategy that is better than simply testing everyone individually.

RT-qPCR testing

While there are several different tests for the SARS-CoV-2 virus, at this time, the RT-qPCR test is considered the "gold standard." In addition to its intrinsic interest, learning how this test works will help us understand the pooling strategies we will consider next.

A sample is collected from a patient, often through a nasal swab, and dispersed in a liquid medium. The test begins by converting the virus' RNA molecules into complementary DNA through a process known as reverse transcription (RT). A sequence of amplification cycles, known as quantitative polymerase chain reaction (qPCR) then begins. Each cycle consists of three steps:

The liquid is heated close to boiling so that the transcribed DNA denatures into two separate strands.

Next the liquid is cooled so that a primer, which has been added to the liquid, attaches to a DNA strand along a specific sequence of 100-200 nucleotides. This sequence characterizes the complementary DNA of the SARS-CoV-2 virus and is long enough to uniquely identify it. This guarantees that the test has a high sensitivity. Attached to the primer is a fluorescent marker.

In a final warming phase, additional nucleotides attach to the primer to complete a copy of the complementary DNA molecule.

The RT-qPCR test takes the sample through 30-40 amplification cycles resulting in a significant increase in the number of DNA molecules, each of which has an attached fluorescent marker. After each cycle, we can measure the amount of fluorescence and translate it into a measure of the number of DNA molecules that have originated from the virus.

The fluorescence, as it depends on the number of cycles, is shown below. A sample with a relatively high viral load will show significant fluorescence at an early cycle. The curves below represent different samples and show how the measured fluorescence grows through the amplification cycles. Moving to the right, each curve is associated with a ten-fold decrease in the initial viral load of the sample. Taken together, these curves represent a range of a million-fold decrease in the viral load. In fact, the test is sensitive enough to detect a mere 10 virus particles in a sample.

The FDA has established a threshold, shown as the red horizontal line, above which we can conclude that the SARS-CoV-2 virus is present in the original sample. However, the test provides more than a binary positive/negative result by matching the fluorescence curve from a particular sample to the curves above, we can infer the viral load present in the original sample. In this way, the test provides a quantitative measure of the viral load that we will soon use in developing a pooling method.

Pooling samples from several individuals, only one of whom is infected, will dilute the infected sample. The effect is simply that the fluorescence response crosses the FDA threshold in a later cycle. There is a limit, however. Because noise can creep into the fluorescence readings around cycle 40, FDA standards state that only results from the first 39 cycles are valid.

Recent studies by Bilder and Yelin et al investigated the practical limits of pooling samples in the RT-qPCR test and found that a single infected sample can be reliably detected in a pool of up to 32. (A recent study by the CDC, however, raises concerns about detecting the virus using the RT-qPCR test past the 33rd amplification cycle. )

Non-adaptive testing strategies

Dorfman's pooling method and its variants described above are known as adaptive methods because they begin with an initial round of tests and use those results to determine how to proceed with a second round. Since the RT-qPCR test requires 3 - 4 hours to complete, the second round of testing causes a delay in obtaining results and ties up testing facilities and personnel. A non-adaptive method, one that produces results for a group of individuals in a single round of tests, would be preferable.

Several non-adaptive methods have recently been proposed and are even now in use, such as P-BEST. The mathematical ideas underlying these various methods are quite similar. We will focus on one called Tapestry.

We first collect samples from $N$ individuals and denote the viral loads of each sample by $x_j$. We then form these samples into $T$ pools in a manner to be explained a little later. This leads to a pooling matrix $A_i^j$ where $A_i^j = 1$ if the sample from individual $j$ is present in the $i^$ test and 0 otherwise. The total viral load in the $i^$ test is then $ y_i = sum_ A_i^j x_j, $ which can be measured by the RT-qPCR test. In practice, there will be some uncertainty in measuring $y_i$, but it can be dealt with in the theoretical framework we are describing.

Now we have a linear algebra problem. We can express the $T$ equations that result from each test as $ yvec = Axvec, $ where $yvec$ is the known vector of test results, $A$ is the $T imes N$ pooling matrix, and $xvec$ is the unknown vector of viral loads obtained from the patients.

Because $Tlt N$, this is an under-determined system of equations, which means that we cannot generally expect to solve for the vector $xvec$. However, we have some additional information: because we are assuming that the prevalence $p$ is low, the vector $xvec$ will be sparse, which means that most of its entries are zero. This is the key observation on which all existing non-adaptive pooling methods rely.

It turns out that this problem has been extensively studied within the area of compressed sensing, a collection of techniques in signal processing that allow one to reconstruct a sparse signal from a small number of observations. Here is an outline of some important ideas.

First, we will have occassion to consider a couple of different measures of the size of a vector.

First, $ orm<0>$ is the number of nonzero entries in the vector $zvec$. Because the prevalence of SARS-CoV-2 positive samples is expected to be small, we are looking for a solution to the equation $yvec=Axvec$ where $ orm<0>$ is small.

The 1-norm is $ orm <1>= sum_j

and the 2-norm is the usual Euclidean length: $ orm <2>= sqrt $

Remember that an isometry is a linear transformation that preserves the length of vectors. With the usual Euclidean length of a vector $zvec$ written as $||zvec||_2$, then the matrix $M$ defines an isometry if $||Mzvec||_2 = ||zvec||_2$ for all vectors $zvec$. The columns of such a matrix form an orthonormal set.

We will construct our pooling matrix $A$ so that it satisfies a restricted isometry property (RIP), which essentially means that small subsets of the columns of $A$ are almost orthonormal. More specifically, if $R$ is a subset of $<1,2,ldots, N>$, we denote by $A^R$ the matrix formed by pulling out the columns of $A$ labelled by $R$ for instance, $A^<<2,5>>$ is the matrix formed from the second and fifth columns of $A$. For a positive integer $S$, we define a constant $delta_S$ such that $ (1-delta_S)||xvec||_2 leq ||A^Rxvec||_2 leq (1+delta_S)||xvec||_2 $ for any set $R$ whose cardinality is no more than $S$. If $delta_S = 0$, then the matrices $A^R$ are isometries, which would imply that the columns of $A^R$ are orthonormal. More generally, the idea is that when $delta_S$ is small, then the columns of $A^R$ are close to being orthonormal.

Let's see how we can use these constants $delta_S$.

Because we are assuming that the prevalence $p$ is low, we know that $xvec$, the vector of viral loads, is sparse. We will show that a sufficiently sparse solution to $yvec = Axvec$ is unique.

For instance, suppose that $delta_ <2S>lt 1$, that $xvec_1$ and $xvec_2$ are two sparse solutions to the equation $yvec = Axvec$, and that $ orm<0>, orm <0>leq S$. The last condition means that $xvec_1$ and $xvec_2$ are sparse in the sense that they have fewer than $S$ nonzero components.

Now it follows that $Axvec_1 = Axvec_2 = yvec$ so that $A(xvec_1-xvec_2) = 0$. In fact, if $R$ consists of the indices for which the components of $xvec_1-xvec_2$ are nonzero, then $A^R(xvec_1-xvec_2) = 0$.

But we know that the cardinality of $R$ equals $ orm <0>leq 2S$, which tells us that $ 0 = ||A^R(xvec_1-xvec_2)||_2 geq (1-delta_<2S>)||xvec_1-xvec_2||_2. $ Because $delta_<2S>lt 1$, we know that that $xvec_1 - xvec = 0$ or $xvec_1 = xvec_2$.

Therefore, if $delta_ <2S>lt 1$, any solution to $yvec=Axvec$ with $ orm <0>leq S$ is unique that is, any sufficiently sparse solution is unique.

Now that we have seen a condition that implies that sparse solutions are unique, we need to explain how we can find sparse solutions. Candès and Tao show, assuming $delta_S + delta_ <2S>+ delta_ <3S>lt 1$, how we can find a sparse solution to $yvec = Axvec$ with $ orm <0>leq S$ by minimizing: $ min orm

yvec = Axvec. $ This is a convex optimization problem, and there are standard techniques for finding the minimum.

Why is it reasonable to think that minimizing $ orm<1>$ will lead to a sparse solution? Let's think visually about the case where $xvec$ is a 2-dimensional vector. The set of all $xvec$ satisfying $ orm <1>= |x_1| + |x_2| leq C$ for some constant $C$ is the shaded set below:

Notice that the corners of this set fall on the coordinate axes where some of the components are zero. If we now consider solutions to $yvec=Axvec$, seen as the line below, we see that the solutions where $ orm<1>$ is minimal fall on the coordinates axes. This forces some of the components of $xvec$ to be zero and results in a sparse solution.

This technique is related to one called the lasso (least absolute shrinkage and selection operator), which is well known in data science where it is used to eliminate unnecessary features from a data set.

All that remains is for us to find a pooling matrix $A$ that satisfies $delta_ + delta_ <2S>+ delta_ <3S>lt 1$ for some $S$ large enough to find vectors $xvec$ whose sparsity $ orm<0>$ is consistent with the expected prevalence. There are several ways to do this. Indeed, a matrix chosen at random will work with high probability, but the application to pooling SARS-CoV-2 samples that we have in mind leads us to ask that $A$ satisfy some additional properties.

The Tapestry method uses a pooling matrix $A$ formed from a Steiner triple system, an object studied in combinatorial design theory. For instance, one of Tapestry's pooling matrices is shown below, where red represents a 1 and white a 0.

This is a $16 imes40$ matrix, which means that we perform 16 tests on 40 individuals. Notice that each individual's sample appears in 3 tests. This is a relatively low number, which means that the viral load in a sample is not divided too much and that, in the laboratory, time spent pipetting the samples is minimzed. Each test consists of samples from about eight patients, well below the maximum of 32 needed for reliable RT-qPCR readings.

It is also important to note that two samples appear together in at most one test. Therefore, if $A^j$ is the $j^$ column of $A$, it follows that the dot product $A^jcdot A^k = 0$ or $1$. This means that two columns are either orthogonal or span an angle of $arccos(1/3) approx 70^circ$. If we scale $A$ by $1/sqrt<3>$, we therefore obtain a matrix whose columns are almost orthonormal and from which we can derive the required condition, $delta_S + delta_ <2S>+ delta_ <3S>lt 1$ for some sufficiently large value of $S$.

There is an additional simplification we can apply. For instance, if we have a sample $x_j$ that produces a negative result $y_i=0$ in at least one test in which it is included, then we can conclude that $x_j = 0$. This means that we can remove the component $x_j$ from the vector $xvec$ and the column $A^j$ from the matrix $A$. Removing all these sure negatives often leads to a dramatic simplication in the convex optimization problem.

Tapestry has created a variety of pooling matrices that can be deployed across a range of prevalences. For instance, a $45 imes 105$ pooling matrix, which means we perform 45 tests on 105 individuals, is appropriate when the prevalence is roughly 10%, a relatively high prevalence.

However, there is also a $93 imes 961$ pooling matrix that is appropriate for use when the prevalence is around 1%. Here we perform 93 tests on 961 patients in pools of size 32, which means we can test about 10 times the number of patients with a given number of tests. This is a dramatic improvement over performing single tests on individual samples.

If the prevalence turns out to be too high for the pooling matrix used, the Tapestry algorithm detects it and fails gracefully.

Along with non-adaptive methods comes an increase in the complexity of their implementation. This is especially concerning since reliability and speed are crucial. For this reason, the Tapestry team built an Android app that guides a laboratory technician though the pipetting process, receives the test results $y_i$, and solves for the resulting viral loads $x_j$ returning a list of positive samples.

Using both simulated and actual lab data, the authors of Tapestry studied the sensitivity and specificity of their algorithm and found that it performs well. They also compared the number of tests Tapestry performs with Dorfman's adaptive method and found that Tapestry requires many fewer tests, often several times fewer, in addition to finishing in a single round.


As we've seen here, non-adaptive pooling provides a significant opportunity to improve our testing capacity by increasing the number of samples we can test, decreasing the amount of time it takes to obtain results, and decreasing the costs of testing. These improvements can play an important role in a wider effort to test, trace, and isolate infected patients and hence control the spread of the coronavirus.

In addition, the FDA recently gave emergency use authorization for the use of these ideas. Not only is there a practical framework for deploying the Tapestry method, made possible by their Android app, it's now legal to do so.

Interestingly, the mathematics used here already existed before the COVID-19 pandemic. Dating back to Dorfman's original work of 1943, group pooling strategies have continued to evolve over the years. Indeed, the team of Shental et al. introduced P-BEST, their SARS-CoV-2 pooling strategy, as an extension of their earlier work to detect rare alleles associated to some diseases.


Mike Breen, recently of the AMS Public Awareness Office, oversaw the publication of the monthly Feature Column for many years. Mike retired in August 2020, and I'd like to thank him for his leadership, good judgment, and never-failing humor.


David Donoho. A Mathematical Data Scientist's perspective on Covid-19 Testing Scale-up. SIAM Mathematics of Data Science Distinguished Lecture Series. June 29, 2020.

Donoho's talk is a wonderful introduction to and overview of the problem, several approaches to it, and the workings of the scientific community.

This survey article provides a good overview of Dorfman's method and similar techniques.

Bilder was prolific in the area of group testing before (and after) the COVID-19 pandemic, and this page has many good resources, including this page of Shiny apps.

Milton Sobel and Phyllis A. Groll. Group testing to eliminate efficiently all defectives in a binomial sample. Bell System Technical Journal, Vol. 38, Issue 5, Sep 1959. Pages 1179&ndash1252.

Peter Ungar. The cutoff point for group testing. Communications on Pure and Applied Mathematics, Vol. 13, Issue 1, Feb 1960. Pages 49-54.

This paper and the next outline the Tapestry method.

This article and the next describe the P-BEST technique.

Noam Shental, Amnon Amir, and Or Zuk. Identification of rare alleles and their carriers using compressed se(que)nsing. Nucleic Acids Research, 2010, Vol. 38, No. 19.

David L. Donoho. Compressed Sensing. IEEE Transactions on Information Theory. Vol. 52, No. 4, April 2006.

Emamnuel J. Candès. Compressive sampling. Proceedings of the international congress of mathematicians. Vol. 2, 2006. Pages 1433-1452.

Emmanuel Candès, Justin Romberg, and Terence Tao. Stable Signal Recovery from Incomplete and Inaccurate Measurements. Communications in Pure and Applied Mathematics. Vol. 59, Issue 8, August 2006. Pages 1207-1223.

Emmanuel Candès and Terence Tao. Decoding by Linear Programming. IEEE Transactions on Information Theory. Vol. 51, Issue 12, Dec. 2005. Pages 4203-4215.

Chun Lam Chan, Pak Hou Che and Sidharth Jaggi. Non-adaptive probabilistic group testing with noisy measurements: Near-optimal bounds with efficient algorithms. 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton). Monticello, IL, 2011. Pages 1832-1839.

David Austin
Grand Valley State University
Email David Austin

Welcome to the Feature Column!

These web essays are designed for those who have already discovered the joys of mathematics as well as for those who may be uncomfortable with mathematics.
Read more . . .


The COVID-19 can present as symptomatic or symptomatic infections. Earlier it was thought that viral loads in symptomatic patients are higher compared with asymptomatic cases. However, in a recent study, it has been documented that viral loads are similar in symptomatic and asymptomatic patients thus, pool testing will produce similar results in all patients with COVID-19. 5 The present study is concordant with the findings of Abdalhamid et al 6 who reported that pool testing is effective in saving resources in the population having prevalence less than 10%. Laboratories have begun to demonstrate that SARS-CoV-2 can be detected in RT-qPCR performed on pooled samples, despite potential dilution. One limitation of pooling which authors feel is that positive sample reporting is delayed by a couple of hours which is taken in deconvoluting and retesting the specimen. They further concluded in his study of assessment of pooled testing to conserve resources that when the incidence rate of SARS-CoV-2 infection is 10% or less, group testing will result in the saving of reagents and personnel time with an overall increase in the testing capability of at least 69%.

A recent study from Lancet showed that over a range of pool sizes, from 4 to 30 samples per pool, Ct values of positive pools were between 22 and 29 for the envelope protein gene (E-gene) assay and between 21 and 29 for the spike protein gene (S-gene) assay. Ct values were lower in retested positive individual samples. The Ct values for both E-gene and S-gene assays in pools and individual positive samples were below 30 and easily categorized as positive. Ct value differences between pooled tests and individual positive samples (Ct pool − Ct positive sample) were in the range of up to five. 7 In another study on specimen pooling, it was observed that pooling did not affect the sensitivity of detecting SARS-CoV-2 when the PCR cycle threshold (Ct) of the original specimen was lower than 35. However, in specimens with low viral load (Ct > 35), 13.3% were false negative. 8 It can be explained by the fact that for each two-fold dilution Ct value increased by 1.24. 9 That means that all samples with Ct value greater than 38, when diluted to 1:5 and above, will show Ct value more than 40 and reported as negative on RT-PCR. Thus, in pooled samples, graphs should be analyzed for the sigmoid curve even beyond Ct value 40 and in case of the appearance of any graph, the RT-PCR should be repeated with deconvoluted samples. Further unpublished data from our center suggest that less than 37 and by following the above-mentioned precautions we picked maximum possible positive cases. Similar cases have been reported by another recent study. 10

To understand the advantages of a pooling approach, consider a laboratory receiving N = 100 samples and prevalence is 5%, that is, 5/100 samples are positive. If 10 pools are created for 100 samples then as a best-case scenario, we can have one pool positive and nine pool negative, and the total PCR reagent used to test 100 samples is 20. In the worst-case scenario, five pools will be positive thus total consumable used will be 60. So, taking the mean value as 40, we propose that for 100 samples in pooled testing we require 40 test reagents, thus saving 60% reagents. Each negative result, obtained by a single RT-qPCR reaction, determines that 100 individual samples are negative without the need for individual testing.

Reaction- and sample-specific inhibition affect standardization of qPCR assays of soil bacterial communities

Quantitative PCR (qPCR) is a popular technique used to quantify target sequences from DNA isolated from soil, but PCR inhibition makes it difficult to estimate gene copy number. Here, we evaluated the extent to which inhibition associated with reaction conditions and sample-specific properties influence the linear range of amplification, and the efficiency and sensitivity of qPCR assays of three bacterial gene targets. We adopted a sample pool approach and exploited the mathematical basis of qPCR to correct for sample-specific effects on amplification. Results revealed that qPCR efficiency and sensitivity were dependent on all conditions tested. In addition, the effect of annealing temperature and SYBR green PCR kit was target-specific, suggesting that the sample pool approach is appropriate for evaluating the quality of new primers. Likewise, the efficiency and sensitivity of qPCR amplification was sample-specific and is likely a result of site and date-specific co-extractants. When relativized against calculations based on plasmid curves alone, reaction-specific and sample-specific inhibition influenced calculations of gene copy number. To account for these differences, we present a brief protocol for soil samples that will facilitate comparison of future datasets.


► We compare reaction- and sample-specific inhibition of qPCR amplification of soil DNA. ► Sample-specific properties affect quantification of soil bacterial communities. ► Reaction-specific amplification affects quantification of soil bacterial communities. ► Data on qPCR amplification of samples can be used to standardize these differences.


Although it is easy to produce data from qPCR reactions, only through the application of a rigorous, stepwise approach will the data and interpretations be reflective of the tested experimental conditions. Bio‑Rad not only offers excellent reagent and instrument solutions but also a superior technical team to guide and support the scientific community in producing excellent results.

To learn the fundamentals and best practices of qPCR and to schedule a departmental qPCR workshop with a member of our Field Application Scientist team, contact your local Bio‑Rad representative today.

Pooling for qPCR - Biology

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited.

Feature Papers represent the most advanced research with significant potential for high impact in the field. Feature Papers are submitted upon individual invitation or recommendation by the scientific editors and undergo peer review prior to publication.

The Feature Paper can be either an original research article, a substantial novel research study that often involves several techniques or approaches, or a comprehensive review paper with concise and precise updates on the latest progress in the field that systematically reviews the most exciting advances in scientific literature. This type of paper provides an outlook on future directions of research or possible applications.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to authors, or important in this field. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

PCR output applications

PCR has become an indispensable tool in modern molecular biology and has completely transformed scientific research. The technique has also opened up the investigation of cellular and molecular processes to those outside the field of molecular biology and consequently also finds utility by scientists in many disciplines.

Whilst PCR is itself a powerful standalone technique, it has also been incorporated into wider techniques, such as cloning and sequencing, as one small but important part of these workflows.

Research applications of PCR include:

Gene transcription - PCR can examine variations in gene transcription among cell types, tissues and organisms at a specific time point. In this process, RNA is isolated from samples of interest, and reverse-transcribed into cDNA. The original levels of RNA for a specific gene can then be quantified from the amount of cDNA amplified in PCR.

Genotyping - PCR can detect sequence variations in alleles of specific cells or organisms. A common example is the genotyping of transgenic organisms, such as knock-out and knock-in mice. In this application, primers are designed to amplify either a transgene portion (in a transgenic animal) or the mutation (in a mutant animal).

Cloning and mutagenesis - PCR cloning is a widely used technique where double-stranded DNA fragments amplified by PCR are inserted into vectors (e.g., gDNA, cDNA, plasmid DNA). This for example, enables the creation of bacterial strains from which genetic material has been deleted or inserted. Site-directed mutagenesis can also be used to introduce point mutations via cloning. This often employs a technique known as recombinant PCR , in which overlapping primers are specifically designed to incorporate base substitutions (Figure 4). This technique can also be used to create novel gene fusions.

Genetic research - PCR is used in most laboratories worldwide. One of the most common applications is gene transcription analysis 9 , aimed at evaluating the presence or abundance of particular gene transcripts. It is a powerful technique in manipulating the genetic sequence of organisms – animal, plant and microbe - through cloning. This enables genes or sections of genes to be inserted, deleted or mutated to engineer in genetic markers alter phenotypes, elucidate gene functions and develop vaccines to name but a few. In genotyping, PCR can be used to detect sequence variations in alleles in specific cells or organisms. Its use isn’t restricted to humans either. Genotyping plants in agriculture assists plant breeders in selecting, refining, and improving their breeding stock. PCR is also the first step to enrich sequencing samples, as discussed above. For example, most mapping techniques in the Human Genome Project (HGP) relied on PCR.

Medicine and biomedical research - PCR is used in a host of medical applications, from diagnostic testing for disease-associated genetic mutations, to the identification of infectious agents. Another great example of PCR use in the medical realm is prenatal genetic testing. Prenatal genetic testing through PCR can identify chromosome abnormalities and genetic mutations in the fetus, giving parents-to-be important information about whether their baby has certain genetic disorders. PCR can also be used as a preimplantation genetic diagnosis tool to screen embryos for in vitro fertilization (IVF) procedures.

Forensic science - Our unique genetic fingerprints mean that PCR can be instrumental in both paternity testing and forensic investigations to pinpoint samples' sources. Small DNA samples isolated from a crime scene can be compared with a DNA database or with suspects' DNA, for example. These procedures have really changed the way police investigations are carried out. Authenticity testing also makes use of PCR genetic markers, for example, to determine the species from which meat is derived. Molecular archaeology too utilizes PCR to amplify DNA from archaeological remains.

Environmental microbiology and food safety - Detection of pathogens by PCR, not only in patients' samples but also in matrices like food or water, can be vital in diagnosing and preventing infectious disease.

PCR is the benchmark technology for detecting nucleic acids in every area, from biomedical research to forensic applications. Kary Mullis's idea, written on the back of a receipt on the side of the road, turned out to be a revolutionary one.


1. Chien A, Edgar DB, Trela JM. Deoxyribonucleic acid polymerase from the extreme thermophile Thermus aquaticus. J Bacteriol 1976127(3):1550-57 doi: 10.1128/JB.127.3.1550-1557.1976

2. Saiki RK, Scharf S, Faloona F, et al. Enzymatic amplification of beta-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anemia. Science 1985230(4732):1350 doi: 10.1126/science.2999980

3. Arya M, Shergill IS, Williamson M, Gommersall L, Arya N, Patel HRH. Basic principles of real-time quantitative PCR. Expert Review of Molecular Diagnostics 20055(2):209-19 doi: 10.1586/14737159.5.2.209

4. Bachman J. Chapter Two - Reverse-Transcription PCR (RT-PCR). In: Lorsch J, ed. Methods in Enzymology: Academic Press, 2013:67-74. doi : 10.1016/B978-0-12-420037-1.00002-6

5. Morley AA. Digital PCR: A brief history. Biomol Detect Quantif 20141(1):1-2 doi: 10.1016/j.bdq.2014.06.001

6. Taylor SC, Laperriere G, Germain H. Droplet Digital PCR versus qPCR for gene expression analysis with low abundant targets: from variable nonsense to publication quality data. Scientific Reports 20177(1):2409 doi: 10.1038/s41598-017-02217-x

7. Ahrberg CD, Manz A, Chung BG. Polymerase chain reaction in microfluidic devices. Lab on a Chip 201616(20):3866-84 doi: 10.1039/C6LC00984K

8. Garibyan L, Avashia N. Polymerase chain reaction. J Invest Dermatol 2013133(3):1-4 doi: 10.1038/jid.2013.1

9. VanGuilder HD, Vrana KE, Freeman WM. Twenty-five years of quantitative PCR for gene expression analysis. BioTechniques 200844(5):619-26 doi: 10.2144/000112776