USDA-NRSP-8-gigas-rDNA Analysis Part 2

Project name: USDA-NRSP-8-gigas-rDNA
Funding source: USDA-NRSP-8
Github repo: https://github.com/mattgeorgephd/USDA-NRSP-8-gigas-rDNA
Species: crassostrea gigas
variable: ploidy

« previous notebook entry « | » next notebook entry »

DNA extraction and prep

Four oyster families (F05, F12, F13, & F14) were generated by Pacific Hybreed. Triploid oysters were made from each family by crossing a tetraploid with a diploid, generating half-sibling diploid and triploids within each family. Oyster spat was transferred to the Jamestown Pt. Whitney Shellfish Hatchery on 07-28-2022. Oysters were placed in heath trays and fed a mixed algae diet ad libitum. Whole body oyster tissue samples were collected on 11-14-2022; 15 samples were collected for each ploidy within each family. Samples were stored in the FTR -80 at position 5-1-3 & 5-1-4.

DNA was extracted from diploid and triploid juvenile Pacific oysters from two families (F05 and F14). We used the Omega Bio-Tek E.Z.N.A.® Mollusc DNA Kit; here is our protocol. We found that at least 100 mg of tissue was needed to get the amount of DNA needed to submit for Azenta for sequencing (assuming elution volume of 50 ul: at least 20 ng ul-1; at least 1500 ng total).

Grace Leuchtenberger (BIO Grad Student) and Henry Berg (SAFS undergraduate) helped with extractions. Here is the DNA extraction datasheet with sample manifest.

DNA Sequencing

We submitted 32 samples from two families to GeneWiz (Azneta) for sequencing on February 28. Here is the breakdown:

Family	Ploidy	sample_ID
F05	diploid	F052n01
F05	diploid	F052n02
F05	diploid	F052n03
F05	diploid	F052n04
F05	diploid	F052n05
F05	diploid	F052n06
F05	diploid	F052n07
F05	diploid	F052n08
F05	triploid	F053n01
F05	triploid	F053n02
F05	triploid	F053n03
F05	triploid	F053n04
F05	triploid	F053n05
F05	triploid	F053n06

Here is the link to the Azenta Quote. We sequenced at 30x coverage. It was assigned project number 30-835022638.

We Received the data on back on 04-29-2023. Sequencing data was provided as zipped fastq files on the genewiz sFTP server. Here are the server details:

Host: sftp://sftp.genewiz.com
User: mngeorge_uw
Password: pxLSUtDDLhprLvLkweVf
Port: 22

The Genewiz download guide is provided here. I downloaded the sequencing files directly to our lab sequencing repository “nightingales” on owl. Here is link to the C_gigas folder. Here is the code used to transfer:

# Login to owl
ssh mngeorge@owl.fish.washington.edu

# Set local working directory where files will be transferred to
cd /var/services/web/nightingales/C_gigas

# Login to sFTP server
sftp mngeorge_uw@sftp.genewiz.com

# change folder to Genewiz project folder w/ fastq files
cd 30-835022638/00_fastq

# confirm local dir
lpwd

# confirm remote working dir
pwd

# Transfer files from sFTP to Owl
mget *

Sequence QC, trimming

I downloaded the sequence files onto Raven in the “data/raw” using the same sftp method as described above. The next step was to unzip the fastq files:

# unzip .fastq.gz files
cd data/raw/
gunzip *.fastq.gz

Aand then run fastqc and multiqc on the on the raw data

Run fastqc on untrimmed files

mkdir fastqc/
mkdir fastqc/untrimmed/

/home/shared/FastQC/fastqc \
data/raw/*.fastq \
--outdir fastqc/untrimmed/ \
--quiet

Run multiqc on untrimmed files

eval "$(/opt/anaconda/anaconda3/bin/conda shell.bash hook)"
conda activate

cd fastqc/untrimmed/

multiqc .

The multiqc report for the raw data is here

I then trimmed adapter sequences (hard trimmed first 10 bps):

# trim adapter sequences

mkdir data/trimmed/
cd data/raw/

for F in *.fastq
do
#strip .fastq and directory structure from each file, then
# add suffice .trim to create output name for each file
results_file="$(basename -a $F | sed 's/\.[^.]*$/_trim&/')"

# run cutadapt on each file, hard trim first 10 bp
/home/shared/8TB_HDD_02/mattgeorgephd/.local/bin/cutadapt $F -u 10  -o \
/home/shared/8TB_HDD_02/mattgeorgephd/USDA-NRSP-8-gigas-rDNA/data/trimmed/$results_file
done

and concatenated the fastq files by sequencing run

mkdir data/trim-merge/

# Set the input and output directories
input_dir="/home/shared/8TB_HDD_02/mattgeorgephd/USDA-NRSP-8-gigas-rDNA/data/trimmed"
output_dir="/home/shared/8TB_HDD_02/mattgeorgephd/USDA-NRSP-8-gigas-rDNA/data/trim-merge/"

# Loop through all of the R1 fastq files in the input directory

for r1_file in "$input_dir"/*_R1_*.fastq
do
    # Extract the sequencing run from the R1 file name
    run=$(basename "$r1_file" | cut -d'_' -f1,2)

    # Find the corresponding R2 file
    r2_file="$input_dir"/"${run}_R2_*.fastq"

    # Concatenate the R1 and R2 files and save the output to a new file in the output directory
    cat "$r1_file" "$r2_file" > "$output_dir"/"${run}_trim-merge.fastq"
done

and again run fastqc and multiqc

Run fastqc on trimmed & merged files

mkdir fastqc/
mkdir fastqc/trim-merge/

/home/shared/FastQC/fastqc \
data/trim-merge/*.fastq \
--outdir fastqc/trim-merge/ \
--quiet

Run multiqc on trimmed & merged files

eval "$(/opt/anaconda/anaconda3/bin/conda shell.bash hook)"
conda activate

cd fastqc/trim-merge/

multiqc .

Here is the final multiqc report for the trimmed and merged files can be found here

The files are on Raven in:

/home/shared/8TB_HDD_02/mattgeorgephd/USDA-NRSP-8-gigas-rDNA/data/trim-merge