Ulithi23 Symbiont Composition
Symbiont composition analysis
November 4, 2025
I started reviewing my proposed methodology from my dissertation proposal today and I plan to start this analysis soon. I can’t decide whether I should do the coral popgen analysis first or this. I probably can’t do both at the same time because of high server traffic.
In my proposal, I wrote that I wanted to use the workflow outlined in Contaminant or goldmine? In silico assessment of Symbiodiniaceae community using coral hologenomes, which extracts non-coral reads from bam files (tentatively here: /storage/timothy/Storage/projects/0050_Ulithi_Trip_2023/03_Analysis/2023-07-17/02_Genotype_Samples/results/mapping_merged/Montipora_capitata_KBHIv3/), utilizes k-mer counting and frequency computation to compute distances between samples. The scripts to run this analysis are available on GitHub hisatakeishida/Symb-SHIN: Computational workflow to identify Symbiodiniaceae diversity by Scrapping Hologenome data with IN-sillico methods.

Next steps: Confirm that the BAM files I found should contain symbiont reads, if not, figure out where the unmapped reads are stored or re-rerun that part of the analysis. Then, I can test if I can run this analysis at the same time or before running the PopGen analysis.
November 10, 2025
Last week, I developed my plan for the analyzing the symbiont composition for the Ulithi Montipora samples. Essentially, my plan is to utilize the workflows set up in place by the Cooke lab which are extremely well documented here from the paper by Zhang et al. 2022. Their repository includes both a markdown file explanation of the workflow and well-organized scripts for running the analyses.
First, I wanted to use Kraken to do a genus-level analysis of the data and this would be probably what would be published in the data paper - I’d use more recent/complete genomes than the ones they have listed there. Then, I wanted to do an ITS2 profile like the one they have described to get more fine detail about the symbiont identities. Lastly, I would do the jellyfish and D2S statistics to get clusters to use as metadata for my drivers of proteome-wide variation analysis. It would also be cool to do the jellyfish and D2S statistics with the reads that map to the microbiome (combined reads from the bacteria, archaea, and fungal Ref Seq databases).
Before, I was just going to do the jellyfish and D2S statistics, but I figured I should probably also do the other two analyses for due diligence and because reviewers are going to ask.
Also, update with the .bam files that they probably only contain Montipora reads, but Tim could not give definitive confirmation. That’s ok though because I think for most of this we can start from FASTQ files.
