# Lab 12 Exercises: Genetic Association Studies ## Exercise 1: Case-control association testing ### Analysis **Data**: `LHON.txt` contains the information for case-control study, with both phenotype and genotype data for candidate gene 1. Perform the logistic regression anlaysis for this data, using `CC` as the reference genotype. 2. Obtain the odds ratios and confidence intervals for the `CT` and `TT` genotypes. 3. Redo the logistic regression analysis, but with `TT` as the reference genotype. How do the results change? Explain. ## Exercise 2: Association test with quantitative traits - Data: `bpdata.csv` contains diastolic and systolic blood pressures for 1000 individuals, and genotype data for 11 SNPs in a candidate gene for blood pressure. Covariates such as gender (`sex`) and body mass index (`bmi`) are included as well. 1. Perform a linear regression of systolic blood pressure (`sbp`) on `SNP3` using `lm()`. Compare the estimates, intervals and $p$ you get using * additive model * dominant model * recessive model * 2-parameter model 2. Provide plots illustrating the relationship between `sbp` and three genotypes at `SNP3` (using scatterplots and boxplots). 3. Redo the linear regression analysis of `sbp` for the additive model, adjusting for `sex` and `bmi`. Do the retuls change? 4. What proportion of the heritability of `sbp` is explained by all of the 11 SNPs, together? ## Exercise 3: GWAS for Transferrin and Height In this exercise, you will use `plink` and `GWASTools` (R package) for analysis. 1. Using R, investigate the `Transferrin.bim` and `Transferrin.fam` files. How many individuals are included in this study? How many SNPs are available? 2. Also investigate the adjusted transferrin phenotype values in the file `Tr.pheno` with R. How many individuals in the study have the transferrin measurements? What is the distribution of the transferrin phenotype (Histogram)? 3. Similarly investigate the adjusted height phenotypes in the file `Ht.pheno`. 4. Using `PLINK`, performa a GWAS of transferrin using the phenotype file `Tr.pheno` and `Transferrin.ped`, `Transferrin.bed`, `Transferrin.fam` files. For the association study, use the following quality control thresholds: minor allele frequence $>$ .05; at least a 99\% genotyping call rate (less than 1\% missing); HWE p-value greater than .001. 5. Perform a GWAS of height using the phenotype file `Ht.pheno` with PLINK. Use the same QC criteria given above. 6. Read your association results using R for transferrin and height from PLINK. Make Manhattan plots and Q-Q plots of the association results. (`GWASTools::manhattanPlot()` and `qqman::qqman()`). 7. Obtain the top 10 most significant SNPs for transferrin. Are the SNPs in LD? Create a text file containing the SNP names of the top 10 SNPs, then use `PLINK` to obtain the $R^2$ measures of LD for the SNPs. 8. Using `PLINK`, extract the SNPs from the transferrin data set that are within the `TF` gene and recode them in additive fashion: * Run the marginal (individual SNP) association tests and then use the minimum $p$-value from the SNPs in the `TF` gene. Try correcting for taking the minimum by Bonferroni correction. * Run collapsing by taking a weighted average of the SNPs within the `TF` gene. We can take a straight average (sum) of the values, or we can use the top principal components. 9. Repeat the above analysis for the random segment from chromosome 12.