16S sequencing: pooled or individual samples?

I am planning to run 16S sequencing on a set of individuals to explore microbial diversity. The question I am facing is: should I sequence them individually or I pool them (for example 5 individuals = 1 sample)? I was just thinking on advantages and disadvantages of both methods:

1) individual. I can get the diversity of each individual, in that way I know the proportion of individuals which have a particular species of bacteria. The cost are higher since I need more samples with respect to the pooled way, but the reads/sample needed should be lower (since this number increases with the microbial diversity).

2) pooled. I have only a general proportion of diversity in the pool (I can't assess how many individuals have a peculiar becterium). The cost are lower due to small amount of samples, but reads/sample should be higher due to more than one individual pooled (?).

Can someone help me? Are my assumptions right or I am missing something?

Ultimately, it depends on what insights you are trying to glean from your sequencing data, but from a purely scientific perspective, there's almost never a reason to pool samples in the the way you described. You pro-con list is not far off, in terms of information lost or gained, but I think it overlooks some of the down-stream consequences of losing that information on your analysis. For one thing, pooling limits your ability to make biologically or statistically meaningful comparisons. For example, if your individual samples are pooled by country of origin prior to sequencing, you have no way to estimate varation within those pooled samples. Because of that, not much can be said about any observed difference between countries beyond simple descriptions of richness and taxonomy. Treated as individual communities, you can look for differing trends, make meaningful beta-diversity comparisons, and actually analyze them with statistics. If you have environmental data, you might even be able to do some meaningful ecological analyses.

Second, as you mentioned, a pooled sample should theoretically have more diversity than the individual samples it was pooled from (at least it should have no fewer unique sequence variants than the richest of the pooled communities), but pooling could also mask community diversity. If one of the pooled samples has a particularly high abundance community that is dominated by just one or two organisms, it could create the appearance that some organisms at moderate abundance in other more diverse communities are actually vary rare, or non-existant. Even at a much greater depth of sequencing, diversity metrics accounting for abundance lose value, since any outlier sample could skew pooled results, creating a false impression of overall community profile.

Lastly, all of the cost-savings assumptions in your post are going to depend on the sequencing center you use and how they bill samples, so there isn't really a straight answer to give. But I will say that the per-sample cost of Illumina 16s amplicon sequencing is pretty cheap (especially compared to the cost of collecting insects from 5 different countries). And because of how sequencing is performed, pooling samples might not even save you much money. Sequencing is often billed per sequencing kit or per lane of sequencing used, not per individual sample you send. So, unless you have someone to share your sequencing run, it's possible that you'd pay roughly the same amount for 5 samples as you would for 95, with only minimal cost savings on library prep and extraction kits.

To summarize, pooling samples reduces your effective sampling effort and limits your potential scope of inferrence. The only valid reason I can think of for pooling samples in such a way would be if you cannont isolate enough bacterial DNA for sequencing from any one individual (which could be the case, depending on what type of insects you're working with). Like I said at the start, though, it really depends on what you want to learn from your 16s data. If the microbiome comparison is crucial to the outcome of your study, I'd suggest sequencing as many individual communities as possible, or at least a subset of individuals from each country (depending on how many you collected). If you don't plan on doing much analysis beyond describing microbial richness and maybe some taxonomic classifications, maybe pooled samples are sufficient for you needs.

Next Generation 16S rRNA Sequencing

Many sites of the human body are colonized by complex communities of microbes (the "human microbiome") in both health and various disease states. Highly diverse, polymicrobial specimens are often difficult, or even impossible, to fully characterize by techniques in common clinical use:

Figure 1. Examples of conventional 16S rRNA gene sequencing results from a bacterial isolate and a polymicrobial specimen. For the bacterial isolate (top), Sanger sequence data produces a clean electropherogram that can be used to provide a species-level taxonomic classification. For the polymicrobial sample (bottom), Sanger sequencing generates a different electropherogram for each species present, resulting in mixed signal which is uninterpretable.
  • Culture-based identification relies upon the ability of organisms to grow and replicate in vitro. Therefore, detection of fastidious or slow-growing organisms, or those rendered inviable due to processing (such as in formalin-fixed paraffin embedded tissue specimens) or during storage (such as anaerobes which have been exposed to oxygen) is limited. Furthermore, only a limited number of species can be practically classified by this approach.

Figure 2. High-powered magnification of a next-generation sequencing run. Each fluorescent spot represents an individual DNA molecule undergoing sequencing. The color of the spot indicates the identity of the nucleotide being interrogated during the current sequencing cycle. Image from Shendure, Porreca et al. Science (2005).

In contrast to conventional approaches, next-generation DNA sequencing (alternatively termed "NGS", "high-throughput sequencing", "massively parallel sequencing", or "deep sequencing") provides independent sequence data from millions of individual DNA molecules (Figure 2), allowing each fragment to be classified independently.

This unique ability extends upon the advantages of current molecular methods by allowing us to catalog the organisms present within even very complex polymicrobial bacterial communities, directly from patient specimens.

Information on Available Assays

Our lab currently offers high-fidelity Illumina next-generation DNA sequencing of clinical specimens which contain multiple bacterial DNA templates. Methods are validated for the purpose of clinical molecular diagnosis and patient care. Research services are also available - please contact us for additional information.

This test is available as reflex testing for specimens which are expected to be polymicrobial based on broad range bacterial PCR.

Contact [email protected] for details or questions.

Clinical Reporting

Upon completion of testing, a report is issued describing the results of 16S next generation sequencing. To view a sample report, click here or the thumbnail at left.

For additional information on how to submit a request and recieve a report, please contact us!


Population genetics studies and epidemiological studies on the genetics of multifactorial diseases require sequencing a large number of genomes at high coverage. This is mandatory both in order to reach sufficient power for case-control analysis and to compare the patterns of genetic variations across populations. Despite substantial reduction in the cost of NGS in recent years, sequencing a large number of individual genomes at high coverage is still economically challenging. An alternative cost-effective approach is to sequence DNA from pools of individuals (Pool-seq), which has other benefits like needing less DNA from each single individual and reducing overall work and time of sequencing experiments. Pooling allows even small labs to carry out population genetics studies, which are otherwise impossible due to exorbitant costs. However, pooling of DNA creates new problems and complexity in data analysis. One of the most challenging problems of Pool-seq is to correctly identify rare variants (allele frequency, AF < 0.01), as sequencing errors confound with the alleles present at low frequencies in the pools. Rare variants are not only abundant in population but also have potential functional roles 1,2 . Hundreds of Genome Wide Association Studies (GWAS) targeting common variants explain only a fraction of genetic heritability in complex diseases 3 . This implies that we need to look beyond “common disease/common variant (CD/CV)” hypothesis and genetic burden of many rare variants of small effect size with high penetrance might play key roles in explaining missing heritability of complex diseases 4,5 . Thus accurate determination of rare variants is extremely important in genetic disease research.

One of the key interests of population genetics study is the information about polymorphic sites and corresponding AF of variant alleles in the population. The power of many genetic analyses depends upon accurate determination of AFs of variants. In principle, Pool-seq should give more robust estimate of AF due to the larger sample size, which allows decreasing the overall variance of the estimated AF 6 . This hypothesis is well supported by mathematical models under the assumption that there are no sequencing errors and each individual contributes equal amount of DNA to the pools 7,8,9 . However, in reality the sequencing errors are appreciable 10,11 and achieving equimolar concentration of each individual’s DNA in the pools is also somewhat difficult, which makes it worthwhile to verify the accuracy of AFs in Pool-seq experiments.

In the present study, involving targeted re-sequencing of 996 individuals in 83 pools, we show that Pool-seq can be used to accurately estimate AFs of variant alleles. By comparing Pool-seq with several public variant databases and SNP-array data of individuals constituting the pools, we show that the Pool-seq AFs are robust and reliable. We also provide general filtering guideline in order to remove spurious variants due to sequencing errors. We individually sequenced and identified variants for all subjects of a single pool and compared them with the results of Pool-seq, showing that the proposed filters provide a low rate of false positive and false negative variants, thus proving the utility and efficacy of the filters.

Comprehensive Molecular Characterization of Bacterial Communities in Feces of Pet Birds Using 16S Marker Sequencing

Birds and other animals live and evolve in close contact with millions of microorganisms (microbiota). While the avian microbiota has been well characterized in domestic poultry, the microbiota of other bird species has been less investigated. The aim of this study was to describe the fecal bacterial communities of pet birds. Pooled fecal samples from 22 flocks representing over 150 individual birds of three different species (Melopsittacus undulatus or budgerigars, Nymphicus hollandicus or cockatiels, and Serinus canaria or domestic canaries) were used for analysis using the 16S rRNA gene sequencing in the MiSeq platform (Illumina). Firmicutes was the most abundant phylum (median 88.4 % range 12.9–98.4 %) followed by other low-abundant phyla such as Proteobacteria (median 2.3 % 0.1–85.3 %) and Actinobacteria (median 1.7 % 0–18.3 %). Lactobacillaceae (mostly Lactobacillus spp.) was the most abundant family (median 78.1 % 1.4–97.5 %), especially in budgerigars and canaries, and it deserves attention because of the ascribed beneficial properties of lactic acid bacteria. Importantly, feces from birds contain intestinal, urinary, and reproductive-associated microbiota thus posing a serious problem to study one anatomical region at a time. Other groups of interest include the family Clostridiaceae that showed very low abundance (overall median <0.1 %) with the exception of two samples from cockatiels (14 and 45.9 %) and one sample from budgerigars (19.9 %). Analysis of UniFrac metrics showed that overall, the microbial communities from the 22 flocks tended to cluster together for each bird species, meaning each species shed distinctive bacterial communities in feces. This descriptive analysis provides insight into the fecal microbiota of pet birds.

This is a preview of subscription content, access via your institution.

Pooled samples bias fungal community descriptions

We tested the accuracy of molecular analyses for recovering the species richness and structure of pooled fungal communities of known composition. We constructed replicate pools of 2–20 species and analysed these pools by two separate pooling-DNA extraction procedures and three different molecular analyses (Automated Ribosomal Intergenic Spacer Analysis (ARISA), terminal restriction fragment length polymorphism (T-RFLP) and clone library-sequencing). None of the methods correctly described the known communities. Only clone library-sequencing with high sequencing per pool (∼100 clones) recovered reasonable estimates of richness. Frequency data were skewed with all procedures and analyses. These results indicate that the error introduced by pooling samples is significant and problematic for ecological studies of fungal communities.

Table S1 Pooling 1 species, GenBank Accession numbers, pool identities and detection by terminal restriction fragment length polymorphism (T-RFLP) and cloning-sequencing

Table S2 Pooling 2 species, GenBank Accession numbers, pool identities and detection by cloning-sequencing

Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.

Filename Description
MEN_2743_sm_Pooling1_Final.xls53 KB Supporting info item
MEN_2743_sm_Pooling2_Final.xls17.5 KB Supporting info item

Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.

A Method for Targeted 16S Sequencing of Human Milk Samples

A semi-automated workflow is presented for targeted sequencing of 16S rRNA from human milk and other low-biomass sample types.


Studies of microbial communities have become widespread with the development of relatively inexpensive, rapid, and high throughput sequencing. However, as with all these technologies, reproducible results depend on a laboratory workflow that incorporates appropriate precautions and controls. This is particularly important with low-biomass samples where contaminating bacterial DNA can generate misleading results. This article details a semi-automated workflow to identify microbes from human breast milk samples using targeted sequencing of the 16S ribosomal RNA (rRNA) V4 region on a low- to mid-throughput scale. The protocol describes sample preparation from whole milk including: sample lysis, nucleic acid extraction, amplification of the V4 region of the 16S rRNA gene, and library preparation with quality control measures. Importantly, the protocol and discussion consider issues that are salient to the preparation and analysis of low-biomass samples including appropriate positive and negative controls, PCR inhibitor removal, sample contamination by environmental, reagent, or experimental sources, and experimental best practices designed to ensure reproducibility. While the protocol as described is specific to human milk samples, it is adaptable to numerous low- and high-biomass sample types, including samples collected on swabs, frozen neat, or stabilized in a preservation buffer.


The microbial communities that colonize humans are believed to be critically important to human health and disease influencing metabolism, immune development, susceptibility to disease, and responses to vaccination and drug therapy 1 , 2 . Efforts to understand the influence of the microbiota on human health currently emphasize the identification of microbes associated with defined anatomic compartments (i.e., skin, gut, oral, etc.), as well as localized sites within these compartments 3 , 4 . Underpinning these investigative efforts is the rapid emergence and increased accessibility of next-generation sequencing (NGS) technologies that provide a massively parallel platform for analysis of the microbial genetic content (microbiome) of a sample. For many physiological samples, the associated microbiome is both complex and abundant (i.e., stool), but, for some samples, the microbiome is represented by low microbial biomass (i.e., human milk, lower respiratory tract) where sensitivity, experimental artefacts, and possible contamination become major issues. The common challenges of microbiome studies and appropriate experimental design have been the subject of multiple review articles 5 , 6 , 7 , 8 .

Presented herein is a robust NGS experimental pipeline based on targeted sequencing of the rRNA 16S V4 region 9 to characterize the microbiome of human milk. Microbiome analysis of human milk is complicated not only by an inherently low microbial biomass 10 , but additionally by high levels of human DNA background 11 , 12 , 13 , 14 and potential carryover of PCR inhibitors 15 , 16 in extracted nucleic acid. This protocol relies on commercially available extraction kits and semi-automated platforms that can help minimize variability across sample preparation batches. It incorporates a well-defined bacterial mock community that is processed alongside samples as a quality control to validate each step in the protocol and provide an independent metric of pipeline robustness. Although the protocol as described is specific to the human milk samples, it is readily adaptable to other sample types including stool, rectal, vaginal, skin, areolar, and oral swabs 10 , 17 , and can serve as a starting point for researchers who wish to perform microbiome analyses.

Subscription Required. Please recommend JoVE to your librarian.


For all protocol steps, proper personal protective equipment (PPE) must be worn, and stringent contamination prevention approaches need to be taken. Observe flow of work from pre-amplification work areas to post-amplification work areas to minimize contamination of samples. All supplies used are sterile, free of RNase, DNase, DNA, and pyrogen. All pipette tips are filtered. A flowchart of the protocol steps is provided (Figure 1).

NOTE: Sample lysis and nucleic acid extraction are performed using a DNA/RNA extraction kit in a clean-room environment where both engineering and procedural controls are in place to minimize the introduction of environmental bacteria to the samples.

  1. Work area preparation
    1. Clean the biosafety cabinet (BSC) work area with appropriate surface cleaner to eliminate any nucleic acid contamination.
    2. Turn on the temperature-controlled vortexer, and set it to 37 °C.
    1. Check the lysis buffer for precipitates. Re-dissolve precipitates by warming at 37 °C.
    2. Prepare 600 µL of lysis buffer with 6 µL β-mercaptoethanol (β-ME) for each sample. Consider an extra 20% volume per sample.
    1. If whole milk is frozen, thaw it on ice. Aliquot 5 mL of whole milk into a 15-mL or a 5-mL sterile tube in BSC and keep it on ice.
    2. Spin the 5-mL milk aliquot for 10 min at 5,000 x g at 4 °C to pellet cells.
    3. Remove the fat layer, now the top layer in the tube, with a plastic spatula or large bore pipette tip.
    4. Without disturbing the pellet, remove all the supernatant except for 100 µL.
    5. Wash the pellet by resuspending in 1 mL of sterile phosphate buffered saline (PBS).
    6. Prepare 1 negative control by adding 1 mL of sterile PBS to a 5-mL tube.
    7. Transfer the suspension to a clean 1.5-mL centrifuge tube, and spin in a microcentrifuge for 1 min (5,000 x g at room temperature (RT)).
    8. Use a 1,000 µL sterile filtered pipette tip to discard the entire supernatant/fatty layer.
    9. If not extracting the same day, snap freeze the cell pellet by putting it in an ethanol/dry ice slurry, and immediately transfer it to the -80 °C freezer.
    1. Add 600 µL of lysis buffer containing β-ME to the pellet, and transfer the suspension to a bead tube.
    2. Each extraction batch of 12 contains 10 samples, 1 negative control (prepared in step 3.7 above), and 1 positive control (prepared in the next step).
      1. Prepare 1 positive control with lysis buffer and 20 µL of the bacterial mock sample (the mock community used has a concentration, once extracted, of approximately 0.2 ng/µL of DNA).
      1. Clean all surfaces in use with a non-enzymatic decontamination solution, leave for 10 min, then spray with 70% ethanol and wipe down the surface.
      1. Work area preparation
        1. Turn on the heat block and set the temperature to 70 °C.
        2. Turn on the temperature-controlled vortexer and set the temperature to 37 °C.
        3. Warm up the elution buffer (EB) containing 10 mM Tris-Cl, pH 8.5, in a 50-mL tube to 70 °C.
        4. Warm up 350 µL of frozen sample lysates in 2 mL tubes at 37 °C until completely thawed without any precipitate (approximately 10 min).
        1. Vortex and centrifuge the 2-mL tubes with sample briefly (3000 x g for 10 s).
        2. Insert 2 mL tubes into the shaker following the automated DNA/RNA purification instrument's loading chart, per the manufacturer's instructions.
        3. Get the rotor adaptors and set them up on the tray based on the number of samples.
        4. Label each rotor adaptor based on the sample's identification (ID).
        5. Cut off the lids and smooth the edges of individual spin columns for DNA and RNA.
        6. Insert the DNA spin column without the collection tube into the rotor adaptor. Discard the collection tube.
        7. Label 1.5 mL collection tubes, and insert into rotor adaptors.
        8. Set rotor adaptors into the centrifuge following the automated DNA/RNA purification instrument's loading chart.
        9. Insert manufacturer's 1,000 µL filter-tips into tip racks.
        10. Add RNase-free water to a 2-mL manufacturer's microcentrifuge tube based on the number of samples (per specific machine protocol instructions).
        11. Insert the tube into tube slot "A" of microcentrifuge tube slots.
        12. Discard any reagent that is left over in reagent bottles and fill with a minimum volume of 10 mL.
        13. Insert reagent bottles into the reagent bottle rack (except EB bottle).
        14. Add warm EB from a 50-mL tube to the reagent bottle position 6.
        15. Check 1.5 mL tubes are placed tightly in the rotor adaptors.
        16. Close the instrument's lid and select: "RNA"→ "extraction kit" → "Animal, tissues and cells" → "kit's name 350 µL Part A Custom DNA" → Edit Elution Volume to 100 µL (default) or 50 µL for low biomass samples → "Start."
        17. When complete, remove rotor adaptors out of the centrifuge and place them on the tray.
        18. Discard the DNA spin column from rotor adaptor position 3.
        19. Do NOT discard rotor adaptors, RNA is in position 2.
        20. Remove 1.5 mL collection tubes containing eluted DNA at position 3, and store in 󔼜 °C.
        21. Collect sample-containing tubes from the shaker and store in 󔼜 °C, if any sample is left over, otherwise discard.
        22. Continue with the protocol "kit's name 350 µL part B RNA" for further purification of RNA.
        23. RNA purification will be done from the approximately 350 µL flow-through that is in the middle position of the rotor adaptors.
        1. Insert the RNA spin column without its collection tube and lid into the rotor adaptor.
        2. Label new 1.5 mL collection tubes and insert them into rotor adaptor as indicated in the manual.
        3. Set the rotor adaptors into the centrifuge following the automated DNA/RNA purification instrument's loading chart.
        4. Close the instrument's lid and select: "RNA" → "Manufacturer's Kit" → "Animal, tissues and cells" → "Standard Part B RNA" → "Start."
        5. When completed, remove rotor adaptors out of the centrifuge and place them on the tray.
        6. Discard RNA spin column from position 3.
        7. Remove 1.5 mL collection tubes containing 30 µL eluted RNA at position 3, and store at 󔽘 °C.
        8. Remove reagent bottles.
        9. Dispose of rotor adaptor contents through appropriate hazardous waste channels.
        1. Spray all the automated DNA/RNA purification instrument's accessories, such as reagent racks, tray, and any other surface in use with a non-enzymatic decontamination solution, leave for 10 min and rinse with deionized (DI) water, then let them dry.
        2. Spray the automated instrument with only manufacturer approved non-enzymatic decontamination solution, wipe the inside of the centrifuge along with all surfaces in use, leave for 10 min, and then wipe with 70% ethanol. Do not use other types of decontamination solutions as they can damage the instrument.

        NOTE: The set-up for the 16S PCR is carried out in a designated pre-amplification workspace located within the clean-room. The reagents and samples are prepared and then loaded onto a liquid handler to perform the PCR for each sample in triplicate (30 samples, which include true samples and extraction positive and negative controls, plus 2 PCR water controls in triplicate, for a total of 96 combined samples and controls). Once the PCR reactions are assembled and sealed, the sample plate is transferred to a thermal cycler located in a post-amplification area for cycling.

        1. Work area preparation
          1. Clean the PCR workstation. Spray all surfaces in use with an RNase, DNase, DNA decontaminant, followed by DI water two times, and finally 70% ethanol.
          2. Prepare the 16S PCR Worksheet (see Targeted 16S PCR Worksheet) with an accurate sample list, and assign different barcoded primers to each sample 9 . Print out the worksheet and the plate maps (see Plate 1).
          1. Work in the PCR workstation to prepare everything.
          2. Take out 50 - 100 µL of DNA samples from -20 °C, and all the reagents needed, and thaw them on ice. Vortex and briefly spin down.
          3. The primers are pre-diluted to the working concentration of 5 µM in a minimum volume of 20 µL.
          4. Prepare the PCR master mix in the specific 5 mL tube with only the forward primer according to the calculation on the worksheet.
          1. For samples, take out a 32-well instrument's sample adaptor, and load 50 - 100 µL of DNA samples according to the 96-well plate map according to the manufacturer's instructions.
            1. For each PCR plate, set up 2 negative controls by placing 30 µL of PCR water in a clean sample tube.
            2. Place all the samples on the 32-well instrument's sample adaptor with caps locked in open position.
            1. Remove the cap of each reverse primer with specific barcode #'s one at a time (change gloves in between to avoid cross-contamination).
            2. Place a maximum of 32 primers on the instrument's reagent adaptor.
            1. Place the reagent adaptor in position B1. In order to avoid an edge effect, carefully place one edge of the adaptor against the grip side, and slowly bring the other edge down. Make sure to push on all the corners of the adaptors.
            2. Place the sample adaptor in Position C1. Make sure to push on all the corners of the adaptors.
            3. Vortex the 5-mL master mix, open the cap, and place it in position A on the instrument's master mix and reagent block.
            4. Place the PCR plate on the 96-well instrument's adaptor that is intended to hold half skirted PCR plates.
            5. Start the run and save as a new file.
            6. Follow the prompts and check mark each prompt: one, tips are available, two, waste box is available, and three, start.

            4. Targeted 16S Post-PCR Quality Control Using Tape-based Platform for Gel Electrophoresis

            NOTE: Post-PCR quality control (QC) and all subsequent steps are carried out in a designated post-amplification area of the lab. The DNA is analyzed in an automated DNA/RNA fragment analyzer.

            1. Work area preparation
              1. Clean the workstation by spraying all surfaces in use with an RNase, DNase, DNA decontaminant, followed by DI water two times, and finally 70% ethanol.
              2. Gather all supplies and equipment needed.
              1. Place sample buffer and ladder in the temperature-controlled vortexer at 25 °C for a minimum of 30 min.

              5. Library Calculation, Pooling, Clean-up, and QC

              1. After determining amplicon size and molarity of all samples, pool the libraries to achieve the final desired volume and nM for the pooled library (see Sample Calculation).
              2. Clean-up and concentrate the pooled library using a silica-membrane-based purification kit for PCR products, according to the manufacturer's protocol (see Table of Materials).
                1. Elute the DNA with a final volume of 50 µL.
                1. Dilute 2 µL of the pooled and cleaned library with 198 µL of the dilution buffer plus dye (1:100 dilution).
                2. Record the measured value from the fluorometric device and convert it according to the dilution factor.

                Subscription Required. Please recommend JoVE to your librarian.

                Representative Results

                The protocol presented here includes important quality control (QC) steps to ensure that the data generated meet benchmarks for protocol sensitivity, specificity, and contamination control. The protocol's first QC step follows PCR amplification of the 16S V4 region (Figure 2). One µL of PCR product from each sample was analyzed by electrophoresis to confirm that it was within the expected size range of 315 - 450 bp (Figure 2, red arrow). Some human milk samples generated lower amounts of specific product (Figure 2A, compare lanes 3 and 9 - 11 with lanes 4 - 8), suggesting either low levels of extracted microbial DNA in those samples, or carry-over of PCR inhibitors during extraction. For samples that produce less than 2.0 nM of product in the 315 - 450 bp range (Figure 2A, lane 7), PCR inhibitor cleanup is carried-out using a single step kit and the sample is re-amplified. Success rates for recovery of sample amplification after cleanup is approximately 40%. Quantitation of specific product for each sample (Figure 2B) is essential for determining its required volume for equal molar pooling of samples for sequencing. A pooled library for targeted sequencing is usually dominated by a specific PCR product (Figure 3). If there is a significant amount of non-specific product in the library, a gel-purification step should be added to the workflow.

                In the example presented in Figure 2A, faint bands are observed for buffer controls (BC lanes 2 and 12) and the PCR water negative control (PC lane 1), indicating possible environmental or reagent contamination. Such bands are not uncommon and typically represent low amounts of PCR product (i.e., ə nM) and produce few read counts during sequencing (ə,000). Representative sequencing results (Figure 4) confirm that these samples do indeed have very low sequencing read counts (Figure 4A, lanes 1 and 11 Figure 4B, Buffer and PCR Water lanes) and, importantly, the taxa composition for the control samples is distinct from the human milk samples (Figure 4A compare lanes 1 and 11 with lanes 2 - 10). High read counts in the negative controls, together with significant overlap in taxa composition between controls and samples, suggests cross-contamination and the need for improved contamination control.

                Sequencing results (Figure 4) demonstrate high diversity in the taxa associated with the human milk microbiome and variability in the number of sequencing read counts for each sample (Figure 4A, lanes 2 - 10). In contrast, the sequencing results for the bacterial mock that was processed along with the human milk samples demonstrated taxa composition and read counts that were comparable to results obtained for the mock in previous workflow runs (compare Figure 4A, lane 12 with Figure 4B, mock lanes). The consistent results for the mock lanes suggest that the observed variability for the human milk samples is an authentic experimental result, and not a function of intrinsic workflow variability.

                Figure 1: Flow chart of the Targeted 16S Sequencing Pipeline. Please click here to view a larger version of this figure.

                Figure 2: Quality control analysis of 16S V4 amplicons. (A) Gel image of 16S V4 amplicons resolved by electrophoresis using an automated DNA/RNA fragment analyzer. 16S V4 amplicons were generated according to Caporaso et al. 9 , and one µL of each PCR product was analyzed using high sensitivity DNA reagents according to the manufacturer's guidelines. Most human milk samples (lanes 3 - 6 and 8 - 11) and the bacterial mock (lane 13) produced a primary PCR product at the expected size of approximately 400 bp (red arrow). The human milk sample in lane 7 failed to produce a significant amount of specific product and was subject to cleanup and re-amplification. Minimal product was detected for the PCR negative control (PC, lane 1), and lysis buffer negative controls (BC, lanes 2 and 12) indicated minimal contamination present in the analyzed samples. MW, molecular weight markers: upper red and lower green bars identify the 1,500 bp and 25 bp size markers, respectively, in each lane. (B) Top Electropherogram of lane 3 from gel in (A). The primary PCR product falls within the peak region defined by the red vertical bars and comprises fragments ranging in size from 299 - 497 bp resulting in an average PCR product size of 396 bp. Gating is done on a slightly wider range than the anticipated amplicon size (in this case 315 - 450 bp) to be sure to include the entire sample peak. The upper and lower peaks correlate with fragment sizes of 25 bp and 1,500 bp, respectively. Bottom: chart summarizing the size parameters for the peak region, the concentration in ng/µL of the PCR product within the peak region, and the molarity in nM for the specific PCR product. This information is then used to calculate how much of each sample will be pooled in an equal molar library for sequencing (see Sample Calculation). Please click here to view a larger version of this figure.

                Figure 3: Electropherogram of a pooled and concentrated sequencing library. Equal molar amounts of individual samples to be sequenced were combined into a pooled library. The library was then cleaned and concentrated to a total volume of 50 µL using a silica-membrane-based PCR clean up kit. Final preparation of the library for sequencing on the next generation sequencer was conducted according to the manufacturer's protocol. This library was successfully sequenced despite the presence of additional bands. If there is a concern about PCR products outside the expected size range, the manufacturer's protocol suggests the addition of a gel size selection step. This QC step is not usually performed. Please click here to view a larger version of this figure.

                Figure 4: Evaluation of negative and positive controls. (A) Relative abundances of bacterial taxa of an extraction batch with controls and human milk samples. As a QC measure, compositions of each extraction batch as loaded on the automated DNA/RNA purification instrument are generated immediately following a sequencing run. Numbers under each sample bar indicate the number of filtered reads for the respective sample. The compositions of the buffer controls are distinct from that of the human milk samples. (B) Relative abundances of bacterial taxa in buffer, mock, and PCR controls. Number of reads and composition are evaluated for all negative (buffer and PCR water) and positive (bacterial mock) controls. The compositions of the buffer and water vary, but the mock community remains quite stable. Please click here to view a larger version of this figure.

                Subscription Required. Please recommend JoVE to your librarian.


                Targeted next-generation sequencing of 16S rRNA is a widely used, rapid technique for microbiome characterization 18 . However, many factors, including batch effects, environmental contamination, sample cross-contamination, sensitivity, and reproducibility can adversely affect experimental results and confound their interpretation 7 , 19 , 20 . To best facilitate robust 16S analyses, microbiome workflows must incorporate good experimental design, the use of appropriate controls, spatial segregation of workflow steps, and application of best practices. The protocol described here incorporates each of these parameters and provides important experimental tools to address the challenges above and implement a 16S workflow for diverse samples.

                Good experimental design is critical for 16S microbiome analyses. This includes proper collection and storage of samples, as well as selection of 16S primers appropriate for the region of interest. For example, the V4 region (515F/806R) is selected for human milk because it has good amplification of Bifidobacterium, which plays an important role in development of the neonatal gut microbiome 21 . Other primer sets (e.g., 27F/338R, 515F/926R) may be more appropriate for studies of other microbial communities. An important note is that the annealing temperature for the targeted 16S PCR and the expected amplicon size may vary based on primer selection.

                Other places the protocol may be modified are based on results of the QC steps incorporated in the work flow. A few options exist for troubleshooting when either no or little DNA is detected following the targeted 16S PCR amplification step. 1) The sample can be put through a PCR inhibitor removal step. Amplification following PCR inhibitor removal using a single step kit performed per the manufacturer's protocol is successful approximately forty percent of the time,ق) more extracted DNA can be added to the targeted 16S PCR, or 3) a new and potentially larger aliquot of milk can be processed if available. If there is a concern about PCR products outside the expected size range following the silica-membrane-based purification of the library, a gel purification step can be added. Finally, the QC steps are critical to determine if there is evidence of contamination, which is discussed in detail below. If significant contamination is detected, then depending on where the contamination is introduced, either the PCR can be repeated or if necessary, the sample can be re-extracted. Fortunately, with good laboratory practices, these are rare events. Finally, while this protocol is written to highlight caveats in the amplification of low biomass samples and specifically human milk, the protocol can easily be modified for the amplification of oral, rectal, vaginal, and skin swabs or sponges as well as stool. If other sample types are chosen, then consideration is given to which extraction kits are optimal for the specific sample type.

                Batch effects due to kits, reagents, or sequencing runs are important sources of variability in microbiome studies. DNA extraction kits, along with other reagents, possess low levels of bacterial DNA, which may vary substantially by lot 20 , 22 , 23 , 24 . For a large project, using a single lot of kits, reagents, selecting kits designed to minimize kit contamination may simplify analysis 7 . Samples of both subjects and controls are processed side by side. It is best if all the samples for a single study, both subject and control, can be incorporated into a single sequencing run. If a large number of samples are to be processed in batches, samples that are representative of both subjects and controls are included in each batch and processed together. It is also important to organize batch processing to minimize contamination of low-biomass samples (i.e., human milk) by samples that are high biomass (i.e., stool). In such cases, process low biomass sample types first, and then high biomass samples for the same study.

                Low biomass samples pose unique challenges to microbiome studies, as contamination from the environment, reagents, instruments, and the researcher can make it difficult to distinguish between authentic community members present in low abundance 7 , 25 , 26 and those that are artificially introduced to the sample through the experimental process 19 . The workflow described here incorporates important experimental negative controls at both the sample preparation step (buffer-only lysis control), and the PCR step (PCR water control) (see Figure 4). These controls help identify contamination sources and facilitate effective corrective measures at the bench or in silico 27 , 28 , 29 , 30 . Negative controls are carefully evaluated and reported with the study results 7 .

                To minimize contamination, spatial segregation of experimental activities into a clean pre-PCR amplification area and post-amplification area is important. Optimally, the clean room has both an area for PCR master mix preparation and a sample preparation/addition of master mix area, and may incorporate a separate dead air box or a biological safety cabinet housing dedicated consumables and small equipment needed for the master mix preparation. The clean room design incorporates a positive airflow system with high efficiency particulate air (HEPA) filtration. Use of personal protective equipment is essential to maintaining a controlled, low-microbial environment, and includes hairnets, lab coats, gloves, and shoe covers. Kits/reagents and samples are ideally stored in separate dedicated refrigerators/freezers. PCR setup is also carried out in the clean room in designated workstations clear separation of primer stocks and reagents from extracted DNA is maintained until samples are loaded on the automated pipetting platform. Once a PCR setup is complete, the plate is transferred to the post-amplification area and loaded onto a thermal cycler.

                It is important to restrict the flow of work activity from clean areas to post-amplification areas there is no retrograde movement of reagents, instruments, or supplies from post-amplification areas to the clean area. Personnel that have entered any of the post-amplification areas are barred from entry to the clean area for 24 hours (until the next day). In addition to the above workflow considerations, cleaning protocols must be implemented in both clean and post-amplification areas to minimize nucleic acid contamination of work surfaces and instrumentation. If physical barriers or separate rooms are not possible, all efforts must be taken to set up the work in areas as far apart as possible.

                In addition to contamination, microbiome studies are challenged by sensitivity, variability, and reproducibility 31 . This protocol addresses these issues by incorporating a defined bacterial mock community that is extracted, amplified, and sequenced along with each batch of samples (see Figure 4b). This control provides a constant internal reference that evaluates the reproducibility of the experimental results generated, and can be used to troubleshoot problems that arise. For example, the quality of the extracted mock DNA can provide a metric for effective sample lysis and DNA extraction, which genomic DNA controls miss. Quality control of PCR amplicons for the mock sample can also indicate PCR efficiency and specificity. Furthermore, because the mock comprises multiple bacteria types, the relative sensitivity for a processed batch of samples can be inferred by the representation of taxa in the sequencing results for the mock sample. An ideal mock community will evaluate the ability to detect key bacterial species in the compartment being analyzed, and therefore the composition of the mock community may need to vary by study. As shown in Figure 4a, there is considerable variability among sample sequencing results, but the sequence results for the bacterial mock community is highly reproducible (see Figure 4b).

                While the mock community in Figure 4 is a unique mixture of 33 strains from a combination of commercially available and local clinical isolates, a commercially available mock community has recently been developed 32 .

                Although the workflow described here is limited in its ability to broadly address reproducibility across different microbiome studies, it does provide an important experimental approach that allows researchers to incorporate appropriate experimental controls and monitor reproducibility within their own results.

                MiSeq Applications & Methods

                Get a detailed genome view of the smallest organisms. Small genome sequencing provides comprehensive analysis of microbial or viral genomes for public health, epidemiology, and disease studies. Sequence up to 24 small genomes per MiSeq run.

                See how other researchers are using small genome sequencing on the MiSeq System for microbial genomics studies:

                Library Prep
                Nextera XT Library Prep Kit

                Prepare sequencing libraries for small genomes, PCR amplicons, and plasmids in less than 90 min, with a low DNA input requirement.

                Illumina DNA Prep

                A fast, integrated workflow for a wide range of applications, from human whole-genome sequencing to amplicons, plasmids, and microbial species.

                MiSeq Reagent Kit v3 600 cycles

                Optimized chemistry to increase cluster density and read length, and improve sequencing quality scores, compared to earlier kit versions.

                SPAdes Genome Assembler

                Open source tool for de novo sequencing, designed to assemble small genomes from MDA single-cell and standard bacterial data sets.

                Browse sample data in Basespace Sequence Hub (login required):
                MiSeq Small Genome Data

                Estimated Cost Per Sample: $98*

                *Small whole-genome sequencing on the MiSeq System estimated cost per sample calculated 2016, based on 5 Mb genome, 50-100X coverage, 2 x 300 bp read length, Nextera XT Library Prep Kit, MiSeq Reagent v3 600-cycle kit

                Targeted resequencing focuses time, expenses, and analysis on sequencing only a subset of genes or genome regions of research interest. Amplicon sequencing, the ultra-deep sequencing of PCR amplicons, enables cost-effective analysis of up to hundreds of target genomic regions in one assay. Sequence up to 96 samples and 1536 amplicons or more in a single MiSeq run.

                Assay Design & Library Prep
                AmpliSeq for Illumina Sequencing Solution

                A highly multiplexed polymerase chain reaction (PCR)-based workflow for use with targets ranging from a few to hundreds of genes in a single run.

                DesignStudio Software
                MiSeq Reagent Kit v2 (300 cycles)

                MiSeq sequencing reagents in pre-filled, ready-to-use cartridges. Micro and nano formats are available for low output applications.

                Data Analysis
                Local Run Manager

                An on-premises software solution for creating sequencing runs, monitoring run status, and analyzing data.

                BaseSpace Sequence Hub

                The Illumina genomics cloud computing environment for NGS data analysis and management.

                BaseSpace Variant Interpreter

                Enables researchers to rapidly identify biologically significant variants from human genomic data.

                Sequencing the 16S ribosomal RNA (rRNA) gene is a culture-free method to identify and compare bacteria from complex microbiomes or environments that are difficult to study. Our demonstrated protocol for 16S rRNA sequencing can help take the guess work out of your experiments. Multiplexing lets you sequence up to 96 samples per MiSeq run.

                See how other researchers are using the MiSeq System to power their metagenomics studies:

                Library Prep
                Nextera XT Index Kit v2

                Nextera XT index kits allow for up to 384 uniquely indexed samples to be pooled and sequenced on a single sequencing run.

                MiSeq Reagent Kit v3 (600 cycles)

                Optimized chemistry to increase cluster density and read length, and improve sequencing quality scores, compared to earlier kit versions.

                16S Metagenomics BaseSpace App

                Performs taxonomic classification of 16S rRNA targeted amplicon reads using an Illumina-curated version of the GreenGenes taxonomic database.

                Estimated Cost Per Sample: $18*

                *16S rRNA sequencing on the MiSeq System estimated cost per sample calculated 2016, based on 96 samples, 2 x 300 bp read length, Nextera XT index primers, MiSeq Reagent v3 600-cycle kit

                NGS is Fueling Species Research in Australia

                High-throughput sequencing is paving the way to support agriculture, aquaculture, biodiversity, and conservation studies at the Deakin Genomics Center

                More Applications and Methods

                Gene Expression Analysis with Targeted RNA-Seq

                Targeted RNA sequencing (RNA-Seq) focuses on specific transcripts of interest, used to analyze gene expression and identify fusion genes.

                Application Note
                Researcher Interviews
                Targeted Gene Panels

                Targeted gene sequencing panels contain defined probe sets focused on specific genes of interest. Both predesigned and custom panels are available.

                Researcher Interviews
                De Novo Sequencing

                De novo sequencing with next-generation sequencing (NGS) enables fast, accurate characterization of species without a reference genome.

                Application Note
                MiRNA & Small RNA Analysis

                Isolate and sequence small RNA species, such as microRNA, to study the role of noncoding RNA in gene silencing and posttranscriptional regulation.

                Genotyping by Sequencing

                Genotyping by sequencing provides a low-cost genetic screening method to discover novel plant and animal SNPs and perform genotyping studies.

                Application Note
                DNA-Protein Interaction Analysis with ChIP-Seq

                Combining chromatin immunoprecipitation (ChIP) assays with sequencing, ChIP-Seq is a powerful method for genome-wide surveys of gene regulation.

                Researcher Interview
                Quality Control

                Perform quality control (QC) applications for bioproduction studies, or assess the quality of a sequencing library before committing it to a full-scale run.

                Researcher Interview
                Environmental DNA Sequencing

                Environmental DNA (eDNA) sequencing is a rapidly emerging method for studying biodiversity and monitoring ecosystem changes.

                Researcher Interview

                Methods Guide

                Access the information you need—from BeadChip arrays to library preparation for genome, transcriptome, or epigenome studies to sequencer selection, analysis, and support—all in one place. Select the best tools for your lab with our comprehensive guide designed specifically for research applications.

                Related Solutions

                Microbial Genomics

                Next-generation sequencing (NGS) is changing microbial genomics. Use NGS to discover novel microbes, monitor outbreaks, analyze food sources, and more.

                For Research Use Only

                Not for use in diagnostic procedures except as specifically noted.

                Innovative technologies

                At Illumina, our goal is to apply innovative technologies to the analysis of genetic variation and function, making studies possible that were not even imaginable just a few years ago. It is mission critical for us to deliver innovative, flexible, and scalable solutions to meet the needs of our customers. As a global company that places high value on collaborative interactions, rapid delivery of solutions, and providing the highest level of quality, we strive to meet this challenge. Illumina innovative sequencing and array technologies are fueling groundbreaking advancements in life science research, translational and consumer genomics, and molecular diagnostics.

                Re-Analysis of 16S rRNA Gene Sequence Data Sets Uncovers Disparate Laboratory-Specific Microbiomes Associated with the Yellow Fever Mosquito (Aedes aegypti)

                Host-microbiome dynamics occurring in the yellow fever mosquito (Aedes aegypti) contribute to host life history traits, and particular bacterial taxa are proposed to comprise a “core” microbiota that influences host physiology. Laboratory-based studies are frequently performed to investigate these processes however, experimental results are often presumed to be generalizable across laboratories, and few efforts have been made to independently reproduce and replicate significant findings. A recent study by Muturi et al. (FEMS Microbiol Ecol 95 (1):213, 2019) demonstrated the food source imbibed by laboratory-reared adult female mosquitoes significantly impacted the host-associated microbiota—a foundational finding in the field of mosquito biology worthy of independent evaluation. Here, we coalesce these data with two additional mosquito-derived 16S rRNA gene sequence data sets using a unifying bioinformatics pipeline to reproduce the characterization of these microbiota, test for a significant food source effect when independent samples were added to the analyses, assess whether similarly fed mosquito microbiomes were comparable across laboratories, and identify conserved bacterial taxa. Our pipeline characterized similar microbiome composition and structure from the data published previously, and a significant food source effect was detected with the addition of independent samples, increasing the robustness of this previously discovered component of mosquito biology. However, distinct microbial communities were identified from similarly fed but independently reared mosquitoes, and surveys across all samples did not identify conserved bacterial taxa. These findings demonstrated that while the main effect of the food source was supported, laboratory-specific conditions may produce inherently differential microbiomes across independent laboratory environments.

                This is a preview of subscription content, access via your institution.

                National Health and Medical Research Council Career Development Fellowship (LC, APP1130084 JP, APP1107599) National Health and Medical Research Council (JP, APP1143163 LC, APP1149029) Practitioner Fellowship (AWH) Senior Research Fellowship (AP), 1154389 Australian Research Council Future Fellowship (AP, FT140100047) Australian Research Council Discovery Project (JP, DP180101405) Stem Cells Australia – the Australian Research Council Special Research Initiative in Stem Cell Science (JP, AWH, AP, NP) Australian Research Council Development Early Career Researcher (QN, DE190100116)


                Genome Innovation Hub, The University of Queensland, 306 Carmody Road, St Lucia, Brisbane, QLD 4072, Australia

                Jun Xu, Stacey Andersen, Nathan J. Palpant, Grant W. Montgomery & Lachlan J.M Coin

                Institute for Molecular Bioscience, The University of Queensland, 306 Carmody Road, St Lucia, Brisbane, QLD 4072, Australia

                Caitlin Falconer, Quan Nguyen, Joanna Crawford, Brett D. McKinnon, Sally Mortlock, Stacey Andersen, Han Sheng Chiu, Longda Jiang, Nathan J. Palpant, Jian Yang, Grant W. Montgomery & Lachlan J.M Coin

                UNSW Cellular Genomics Futures Institute, School of Medical Sciences, University of New South Wales, Sydney, NSW 2052, Australia

                Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute, 384 Victoria St, Darlinghurst, Sydney, NSW 2010, Australia

                Anne Senabouth & Joseph E. Powell

                Department of Obstetrics and Gynaecology, Berne University Hospital, Bern, 3012, Switzerland

                Brett D. McKinnon & Michael D. Mueller

                Department of Anatomy and Neuroscience, The University of Melbourne, Parkville, 3010, Australia

                Department of Surgery, The University of Melbourne, Parkville, 3010, Australia

                Alex W. Hewitt & Alice Pébay

                Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, 3002, Australia

                Alex W. Hewitt & Alice Pébay

                School of Medicine, Menzies Institute for Medical Research, University of Tasmania, Hobart, 7005, Australia

                Institute for Advanced Research, Wenzhou Medical University, Wenzhou, 325027, Zhejiang, China

                Department of Microbiology and Immunology, The University of Melbourne, Parkville, 3010, Australia

                Department of Clinical Pathology, The University of Melbourne, Parkville, 3010, Australia

                Department of Infectious Disease, Imperial College London, London, W2 1NY, UK

                You can also search for this author in PubMed Google Scholar

                You can also search for this author in PubMed Google Scholar

                You can also search for this author in PubMed Google Scholar

                You can also search for this author in PubMed Google Scholar

                You can also search for this author in PubMed Google Scholar

                You can also search for this author in PubMed Google Scholar

                You can also search for this author in PubMed Google Scholar

                You can also search for this author in PubMed Google Scholar

                You can also search for this author in PubMed Google Scholar

                You can also search for this author in PubMed Google Scholar

                You can also search for this author in PubMed Google Scholar

                You can also search for this author in PubMed Google Scholar

                You can also search for this author in PubMed Google Scholar

                You can also search for this author in PubMed Google Scholar

                You can also search for this author in PubMed Google Scholar

                You can also search for this author in PubMed Google Scholar

                You can also search for this author in PubMed Google Scholar

                You can also search for this author in PubMed Google Scholar


                Authors’ contributions

                LC and CF initiated the project. LC and JX designed the algorithms. JX implemented the tools in Python and tested it on multiple datasets. MDM generated the endometrial stromal samples. AP and AH generated the fibroblast samples. QN, SM and AS preprocessed the sample datasets. JY and LJ provided the sibling data for simulation and helped in analysis. JP, QN, GWM, BM, SM, JC, and SA participated in important discussions and provided useful suggestions on multiple issues. JX drafted the manuscript, LC, JP, GWM, JY, QN, and JC reviewed and revised the manuscript. All authors read and approved the final manuscript.

                Authors’ information

                Jun Xu: Twitter(@xujun_jon) Lachlan J.M Coin: Twitter(@lachlancoin)

                Corresponding author

                Materials and methods

                Sample preparation and sequencing

                Sample collection and DNA isolation were performed as described in Costello et al. [6] and PCR, sequencing, and quality filtering of reads were performed as described in Caporaso et al. [14]. Samples were not collected on days 422 through 437.

                To facilitate massively parallel sequencing (1,967 samples), barcodes were reused across six lanes in a single Illumina GAIIx, with 374, 372, 364, 271, 265, and 323 samples in lanes 1 through 6, respectively (differing from Caporaso et al. [14], where samples were pooled and run over seven lanes). Sixteen samples were ultimately excluded from the analysis as fourteen samples were identified as potentially mislabeled (discussed below), and the barcodes for two samples were not found in the sequencing output, likely indicating a problem with amplification for those two samples.

                Data analysis

                To directly compare these M3/F4 time series samples with the samples presented in Costello et al. [6], which sequenced a different variable region (V2) using a different technology (454 FLX), a reference-based OTU picking protocol [18] was applied. After demultiplexing and quality filtering sequences, 97% OTUs were picked against the Greengenes database [19] (pre-filtered at 97% identity) using uclust [20]. Reads were assigned to OTUs based on their best match to a Greengenes sequence, and reads that did not match a Greengenes sequence at 97% or greater sequence identity were discarded. The Greengenes taxonomy associated with the best match in Greengenes was assigned to each OTU, and the Greengenes tree was used for phylogenetic diversity calculations. These steps and subsequent data analysis were performed using Quantitative Insights Into Microbial Ecology (QIIME) on AWS.

                Identifying mislabeled samples

                To identify potentially mislabeled samples, we used the random forests classifier [21]. A 2,000-tree forest was trained on the OTU × Sample Abundance matrix after evenly sampling to 500 sequences per sample and removing OTUs present in less than 1% of samples. The posterior probability that a given sample came from each of the body habitats (gut, oral cavity, skin) was estimated using only those trees in the forest that did not contain that sample in their training sets, to avoid overfitting. The classifier considers samples to be mislabeled when their alleged environment labels have a low posterior probability (<60%). Fourteen such samples were identified, and these samples were removed from all analyses.

                Core microbiome calculation

                The temporal core microbiome across body sites and individuals (Figure 2) was computed by varying the minimum number of samples in which an OTU must be observed to be considered part of the core microbiome, and then determining the number and fraction of total OTUs observed in each site (or combination of sites) that are part of the core. To facilitate direct comparison across sample types that contained different numbers of observations (for example, M3 (all) versus M3 gut), we randomly subsampled to exactly 130 observations per sample type, corresponding to the sample type for which we had the fewest observations.

                Community membership calculations

                The number of consecutive timepoints containing an OTU (Figure 3) was calculated as the maximum number of consecutive timepoints where an OTU was observed, allowing a zero count at a single timepoint to be considered part of a continuous stretch of non-zero counts if both adjacent timepoints had a non-zero count. This controls for sampling error as, for example, a long contiguous stretch of non-zero counts for an OTU interrupted by a single zero count for that OTU would likely indicate a bad sample, rather than a biologically relevant fact about that OTU in relation to the community. Persistent taxa were defined as those observed in 20% or more of the timepoints, but with at least 90% of those observations being consecutive (that is, they appear and remain present). Transient taxa were defined as those observed in at least 60% of the samples, but with at most 75% of those observations being consecutive (that is, they appear and disappear from the community frequently).

                Animated microbial community dynamics

                Animations were created in inVUE [22] based on the principal coordinate data presented in Figure 1a, b. inVUE files can be created in QIIME from the principal coordinate matrix and associated metadata file. After installing and opening inVUE, the user can run, pause, and stop the animations associated with different metadata categories.

                Data availability

                All sequence data and sample metadata are publicly available under the 'Moving Pictures of the Human Microbiome' project [MG-RAST:4457768.3-4459735.3].

                Watch the video: What Is 16s rRNA sequencing? (January 2022).