Validation of 515F and 806R primers for M. capitata V4 microbiome analyses
The workflow below provides step-by-step instructions for how the Bhattacharya Lab validated the 515F and 8069 (targeted the V4 region of 16S rDNA in bacteria) for microbiome analyses in Montipora capitata. These primers, while working for most other taxa, have proven troublesome in M. capitata, amplifying what seems to mostly be coral DNA. For the below samples, we performed a DNA extraction protocol that enriches for bacterial DNA, in hopes that this would alleviate the issues observed previously. For the validation, 3 Montipora capitata samples, 2 sediment samples, and 2 Galaxea fascicularis samples were tested. We find that while the primers do amplify the mitochondrial genome in M. capitata, enough 16S DNA was sequenced to perform the QIIME2 workflow for microbiome analysis. Qimme2 workflow was adapted from Dr. Emma Strand’s Open Lab Notebook Post, “Holobiont Integration 16S V4 QIIME2 Analysis Pipeline”.
Sample information
To fill in
Contamination analysis
To fill in
QIIME2 Workflow
Overview
- Installation of QIIME2
- Import data as a QIIME2 artifact
- Denoise and declutter
- Taxonomic classification
- Alpha and beta diversity analyses
1. Installation of QIIME2 and Silva database
We will install QIIME2 through Miniconda, as recommended. To do so, we must download the YAML file containing the list of conda libraries associated with QIIME2. In order to install QIIME2, you must have Miniconda or Anaconda installed first. For more information on conda, see https://docs.conda.io/.
wget https://data.qiime2.org/distro/core/qiime2-2023.2-py38-osx-conda.yml #download the YAML file
conda env create -n qiime2-2023.2 --file qiime2-2023.2-py38-osx-conda.yml #Install the libraries as a conda environment called "qiime2-2023.2"
rm qiime2-2023.2-py38-osx-conda.yml #delete the YAML file
Now, anytime we want to use QIIME2, we have to first activate the QIIME2 conda environment. This only has to be done anytime you start a new session in terminal. Before you close the session, deactivate the conda environment
conda activate qiime2-2023.2 #to activate
conda deactivate #to deactivate
Download Silva database (taxonomy identification trainer.
wget https://data.qiime2.org/2023.5/common/silva-138-99-515-806-nb-classifier.qza #added to the qimme2 sub-directory
2. Import data as a QIIME2 artifact
Sample data
We are working with demultiplexed paired-end data with quality information (e.g. FastQ files), so we will import our data in the Casava 1.8 format. To import our sample information, we need two files:
1) A sample manifest (csv format) providing the sample ID, filepath, and sequencing direction for each file
sample-id | absolute-filepath | direction |
---|---|---|
Microbio1 | /path_to/Microbio1_S103_R1_001.fastq.gz | forward |
Microbio1 | /path_to/Microbio1_S103_R2_001.fastq.gz | reverse |
Microbio3 | /path_to/Microbio3_S105_R1_001.fastq.gz | forward |
Microbio3 | /path_to/Microbio3_S105_R2_001.fastq.gz | reverse |
Microbio4 | /path_to/Microbio4_S106_R1_001.fastq.gz | forward |
Microbio4 | /path_to/Microbio4_S106_R2_001.fastq.gz | reverse |
Microbio5 | /path_to/Microbio5_S107_R1_001.fastq.gz | forward |
Microbio5 | /path_to/Microbio5_S107_R2_001.fastq.gz | reverse |
Microbio6 | /path_to/Microbio6_S108_R1_001.fastq.gz | forward |
Microbio6 | /path_to/Microbio6_S108_R2_001.fastq.gz | reverse |
Microbio8 | /path_to/Microbio8_S109_R1_001.fastq.gz | forward |
Microbio8 | /path_to/Microbio8_S109_R2_001.fastq.gz | reverse |
Microbio9 | /path_to/Microbio9_S110_R1_001.fastq.gz | forward |
Microbio9 | /path_to/Microbio9_S110_R2_001.fastq.gz | reverse |
2) A metadata file (tsv format) providing the sample ID and any relevant sample metadata, like species, extraction date, etc…
sample-id | extraction-id | source | collection-date | site-id | site-name | colony-id | microenvironment | rep-number | |
---|---|---|---|---|---|---|---|---|---|
#q2:types | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical | categorical |
Microbio1 | S5 | Reef_sediment | 20220511 | S2 | KBay_Reef11_West | C3 | Soil | R1 | |
Microbio3 | S11 | Reef_sediment | 20220511 | S1 | KBay_Reef12_West | C3 | Soil | R3 | |
Microbio4 | C61 | Montipora_capitata_tissue | 20220511 | S1 | KBay_Reef12_West | C1 | Middle | R2 | |
Microbio5 | C76 | Montipora_capitata_tissue | 20220511 | S2 | KBay_Reef11_West | C1 | Top | R2 | |
Microbio6 | C33 | Montipora_capitata_tissue | 20220511 | S2 | KBay_Reef11_West | C3 | Bottom | R2 | |
Microbio8 | G94 | Galaxea_fascicularis_tissue | 20221006 | T3 | Dbtank | G94 | Top | R1 | |
Microbio9 | G95 | Galaxea_fascicularis_tissue | 20221006 | T3 | DBtank | G95 | Top | R1 |
Code to import the sample manifest and metadata
MANIFEST="metadata/sample_manifest.csv"
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path $MANIFEST \
--input-format PairedEndFastqManifestPhred33 \
--output-path qimme2/sequences.qza
3. Denoise and declutter
First, we will remove adapter contamination. We know that our adapters content is about 0.1-0.3% because of the QC we did on the raw reads when assessing the amount of Mcap contamination. So, before we do anything else, we need to remove adapter contaminatio using the following code:
qiime cutadapt trim-paired --verbose \
--i-demultiplexed-sequences qimme2/sequences.qza \
--p-anywhere-f CTGTCTCTTATACACATCT \
--p-anywhere-r AGATGTGTATAAGAGACAG \
--o-trimmed-sequences qimme2/trimmed_sequences.qza
Next, we will denoise our data. In this part, we will trim off our primers. Our primers are 52 and 54 bp long, so we will truncate that part of the sequence. Because our sequences are 150 bp long and they don’t ever fall into a lower-quality threshold, we will use 150 bp as our other truncating length. What denoiseing does is dereplicate our sequences to reduce repetition and file size/memory requirements in downstream steps.
qiime dada2 denoise-paired --verbose \
--i-demultiplexed-seqs qimme2/trimmed_sequences.qza \
--p-trunc-len-r 150 --p-trunc-len-f 150 \
--p-trim-left-r 54 --p-trim-left-f 52 \
--o-table qimme2/table.qza \
--o-representative-sequences qimme2/rep-seqs.qza \
--o-denoising-stats qimme2/denoising-stats.qza \
--p-n-threads 20
Left off on the denoiseing step (code above). Last I checked 2AM 6/13/2023, it was still running on screen “qiime2”.
The last step of decluttering our data is clustering our sequences into ASVs, or a single representative sequence for sequences with 97% similarity to each other. These ASVs will be stored as a FeatureTable along with the total count of their abundances in each sample.
qiime metadata tabulate \
--m-input-file qimme2/denoising-stats.qza \
--o-visualization qimme2/denoising-stats.qzv
qiime feature-table summarize \
--i-table qimme2/table.qza \
--o-visualization qimme2/table.qzv \
--m-sample-metadata-file $METADATA
qiime feature-table tabulate-seqs \
--i-data qimme2/rep-seqs.qza \
--o-visualization qimme2/rep-seqs.qzv
Output files denoising-stats.qzv and table.qzv can be viewed in QIIME2 view.
Taxonomy classification based on Silva 515F-806R 16S database.
qiime feature-classifier classify-sklearn \
--i-classifier qimme2/silva-138-99-515-806-nb-classifier.qza \
--i-reads qimme2/rep-seqs.qza \
--o-classification qimme2/taxonomy.qza
TO BE CONTINUED…