Validation of 515F and 806R primers for M. capitata V4 microbiome analyses

The workflow below provides step-by-step instructions for how the Bhattacharya Lab validated the 515F and 8069 (targeted the V4 region of 16S rDNA in bacteria) for microbiome analyses in Montipora capitata. These primers, while working for most other taxa, have proven troublesome in M. capitata, amplifying what seems to mostly be coral DNA. For the below samples, we performed a DNA extraction protocol that enriches for bacterial DNA, in hopes that this would alleviate the issues observed previously. For the validation, 3 Montipora capitata samples, 2 sediment samples, and 2 Galaxea fascicularis samples were tested. We find that while the primers do amplify the mitochondrial genome in M. capitata, enough 16S DNA was sequenced to perform the QIIME2 workflow for microbiome analysis. Qimme2 workflow was adapted from Dr. Emma Strand’s Open Lab Notebook Post, “Holobiont Integration 16S V4 QIIME2 Analysis Pipeline”.

Sample information

To fill in

Contamination analysis

To fill in

QIIME2 Workflow

Overview

Installation of QIIME2
Import data as a QIIME2 artifact
Denoise and declutter
Taxonomic classification
Alpha and beta diversity analyses

1. Installation of QIIME2 and Silva database

We will install QIIME2 through Miniconda, as recommended. To do so, we must download the YAML file containing the list of conda libraries associated with QIIME2. In order to install QIIME2, you must have Miniconda or Anaconda installed first. For more information on conda, see https://docs.conda.io/.

wget https://data.qiime2.org/distro/core/qiime2-2023.2-py38-osx-conda.yml #download the YAML file  
conda env create -n qiime2-2023.2 --file qiime2-2023.2-py38-osx-conda.yml #Install the libraries as a conda environment called "qiime2-2023.2"  
rm qiime2-2023.2-py38-osx-conda.yml #delete the YAML file  

Now, anytime we want to use QIIME2, we have to first activate the QIIME2 conda environment. This only has to be done anytime you start a new session in terminal. Before you close the session, deactivate the conda environment

conda activate qiime2-2023.2  #to activate
conda deactivate #to deactivate  

Download Silva database (taxonomy identification trainer.

wget https://data.qiime2.org/2023.5/common/silva-138-99-515-806-nb-classifier.qza #added to the qimme2 sub-directory  

2. Import data as a QIIME2 artifact

Sample data

We are working with demultiplexed paired-end data with quality information (e.g. FastQ files), so we will import our data in the Casava 1.8 format. To import our sample information, we need two files:
1) A sample manifest (csv format) providing the sample ID, filepath, and sequencing direction for each file

sample-id	absolute-filepath	direction
Microbio1	/path_to/Microbio1_S103_R1_001.fastq.gz	forward
Microbio1	/path_to/Microbio1_S103_R2_001.fastq.gz	reverse
Microbio3	/path_to/Microbio3_S105_R1_001.fastq.gz	forward
Microbio3	/path_to/Microbio3_S105_R2_001.fastq.gz	reverse
Microbio4	/path_to/Microbio4_S106_R1_001.fastq.gz	forward
Microbio4	/path_to/Microbio4_S106_R2_001.fastq.gz	reverse
Microbio5	/path_to/Microbio5_S107_R1_001.fastq.gz	forward
Microbio5	/path_to/Microbio5_S107_R2_001.fastq.gz	reverse
Microbio6	/path_to/Microbio6_S108_R1_001.fastq.gz	forward
Microbio6	/path_to/Microbio6_S108_R2_001.fastq.gz	reverse
Microbio8	/path_to/Microbio8_S109_R1_001.fastq.gz	forward
Microbio8	/path_to/Microbio8_S109_R2_001.fastq.gz	reverse
Microbio9	/path_to/Microbio9_S110_R1_001.fastq.gz	forward
Microbio9	/path_to/Microbio9_S110_R2_001.fastq.gz	reverse

2) A metadata file (tsv format) providing the sample ID and any relevant sample metadata, like species, extraction date, etc…

	sample-id	extraction-id	source	collection-date	site-id	site-name	colony-id	microenvironment	rep-number
#q2:types	categorical	categorical	categorical	categorical	categorical	categorical	categorical	categorical	categorical
	Microbio1	S5	Reef_sediment	20220511	S2	KBay_Reef11_West	C3	Soil	R1
	Microbio3	S11	Reef_sediment	20220511	S1	KBay_Reef12_West	C3	Soil	R3
	Microbio4	C61	Montipora_capitata_tissue	20220511	S1	KBay_Reef12_West	C1	Middle	R2
	Microbio5	C76	Montipora_capitata_tissue	20220511	S2	KBay_Reef11_West	C1	Top	R2
	Microbio6	C33	Montipora_capitata_tissue	20220511	S2	KBay_Reef11_West	C3	Bottom	R2
	Microbio8	G94	Galaxea_fascicularis_tissue	20221006	T3	Dbtank	G94	Top	R1
	Microbio9	G95	Galaxea_fascicularis_tissue	20221006	T3	DBtank	G95	Top	R1

Code to import the sample manifest and metadata

MANIFEST="metadata/sample_manifest.csv"
qiime tools import \
    --type 'SampleData[PairedEndSequencesWithQuality]' \
    --input-path $MANIFEST \
    --input-format PairedEndFastqManifestPhred33 \
    --output-path qimme2/sequences.qza

3. Denoise and declutter

First, we will remove adapter contamination. We know that our adapters content is about 0.1-0.3% because of the QC we did on the raw reads when assessing the amount of Mcap contamination. So, before we do anything else, we need to remove adapter contaminatio using the following code:

qiime cutadapt trim-paired --verbose \
--i-demultiplexed-sequences qimme2/sequences.qza \
--p-anywhere-f CTGTCTCTTATACACATCT \
--p-anywhere-r AGATGTGTATAAGAGACAG \
--o-trimmed-sequences qimme2/trimmed_sequences.qza

Next, we will denoise our data. In this part, we will trim off our primers. Our primers are 52 and 54 bp long, so we will truncate that part of the sequence. Because our sequences are 150 bp long and they don’t ever fall into a lower-quality threshold, we will use 150 bp as our other truncating length. What denoiseing does is dereplicate our sequences to reduce repetition and file size/memory requirements in downstream steps.

qiime dada2 denoise-paired --verbose \  
  --i-demultiplexed-seqs qimme2/trimmed_sequences.qza \
  --p-trunc-len-r 150 --p-trunc-len-f 150 \
  --p-trim-left-r 54 --p-trim-left-f 52 \
  --o-table  qimme2/table.qza \
  --o-representative-sequences qimme2/rep-seqs.qza \
  --o-denoising-stats qimme2/denoising-stats.qza \
  --p-n-threads 20

Left off on the denoiseing step (code above). Last I checked 2AM 6/13/2023, it was still running on screen “qiime2”.

The last step of decluttering our data is clustering our sequences into ASVs, or a single representative sequence for sequences with 97% similarity to each other. These ASVs will be stored as a FeatureTable along with the total count of their abundances in each sample.

qiime metadata tabulate \
  --m-input-file qimme2/denoising-stats.qza \
  --o-visualization qimme2/denoising-stats.qzv
qiime feature-table summarize \
  --i-table qimme2/table.qza \
  --o-visualization qimme2/table.qzv \
  --m-sample-metadata-file $METADATA
qiime feature-table tabulate-seqs \
  --i-data qimme2/rep-seqs.qza \
  --o-visualization qimme2/rep-seqs.qzv

Output files denoising-stats.qzv and table.qzv can be viewed in QIIME2 view.

Taxonomy classification based on Silva 515F-806R 16S database.

qiime feature-classifier classify-sklearn \
  --i-classifier qimme2/silva-138-99-515-806-nb-classifier.qza \
  --i-reads qimme2/rep-seqs.qza \
  --o-classification qimme2/taxonomy.qza

TO BE CONTINUED…

Written on May 24, 2023