Project name: USDA-NRSP-8-gigas-rDNA

Funding source: USDA-NRSP-8

Species: *crassostrea gigas*

variable: ploidy

**Power analysis**

Determine the sample size needed for whole genome sequencing given publicly available data about single copy gene variation within c.gigas across locations.

Mac was able to pull publicly available data; her analysis is here. Here are her results.

From this analysis, we can see that the variation :

type | mean | SD |
---|---|---|

mito | 31 | 9 |

ribo | 335 | 105 |

Looking across region, it wasn’t much better:

type | country | mean | SD |
---|---|---|---|

mito | china | 26.7 | 7.6 |

mito | japan | 38.7 | 8.4 |

mito | south africa | 32.4 | 6.8 |

ribo | china | 323.9 | 129.7 |

ribo | japan | 361.1 | 100.7 |

ribo | south africa | 331 | 53.9 |

I used the PWR package to determine the number of samples that we need to sequence given the variation observed, where X is the variation:

```
library(pwr)
# Set parameters
alpha <- 0.05 # significance level
power <- 0.80 # desired power
sigma <- 8.4 # standard deviation of control group
delta <- 20 # difference between trt and control group
n <- NULL # sample size to be determined
# Perform power analysis
pwr.t.test(d = delta/sigma, sig.level = alpha, power = power, n = n)
```

Here are the results (best case scenario):
mito_copy nuumber

delta | SD | n required (each group) |
---|---|---|

20 | 6.8 | 3.13 |

10 | 6.8 | 8.3 |

5 | 6.8 | 30.02 |

20 | 8.4 | 4.0 |

10 | 8.4 | 12.11 |

5 | 8.4 | 45.3 |