BI390: Bioinformatics Workshop (Fall, 2020)
Instructor: Wu, Maoying
Where: Bioinformatics Computer Lab, 4C-302
When: 9:00-17:00, September 09 - September 22
Credit: 3.5
Reference: Github
Reference Guidebook
This course will provide hands-on training in
applying computational, statistical and machine learning approaches
for omics data.
In this practical course we will cover the following topics:
- High-performance computing for bioinformatics (MPICH and CUDA Programming),
- Biological sequence analysis (assembly, alignment, variant detection),
- Transcriptiome analysis (microarray, RNA-seq),
- Molecular phylogenetics (maximum-likelihood, Bayesian approach),
- Statistical Genetics of Diseases/Phenotypes (Linkage/Association/Meta-analysis/eQTL),
- Microbiome studies (microbiota, metagenomics),
- Machine Learning Techniques (clustering, classification),
- Other state-of-the-art algorithms as well as
- Network biology.
Due
Submit your final report through Canvas before 2020-10-31.
I'd like your to finish your report in a single tar.gz archive. You can find the
template provided by Oxford Bioinformatics, the renowned journal in the
field of bioinformatics.
Schedule
Section I: High-performance Computing (HPC) for Bioinformatics
In this section, you will learn how to build up a PC cluster, with
MPICH2, Torque and also environment modules. Furthermore, you should conduct
parallel computing with MPICH or CUDA framework.
- Lab 1a: Build up PC-based Parallel Cluster
[HPC Tutorial][Practicals][Reference][Torque]
- Lab 1b: MPI Programming
[MPI Tutorial][Practicals][mpiBLAST]
- Lab 1C: CUDA Programming (Optional)
[CUDA Tutorial][Practicals][gpuBLAST]
MPI Programming
GPU and CUDA Programming
Section II: Biological Sequence Analysis
In this section, you will survey two fundamental cores in NGS data analysis
- assembly and alignment. And then you can apply the procedure in the variant
calling and genetic association studies.
- Lab 2: comparative studies of the NGS short-read aligners
[Tutorial]
[Practicals]
Readings:
short_read_aligners.bib
- Lab 3: Practical Short read Assemblies
[Tutorial][Practicals]
[Data][answer sheet]
Readings:assemblers.bib
- Lab 4: Comprehensive Whole-Genome Resequencing
[Tutorial][Exercise][Materials]
Readings:
Picard Pipeline
Picard Tools
SAM Format
VCF Format
Section III: Transcriptomics Analysis
In this section, you will learn the related compuational/statistical
techniques used in two main-stream transcriptome technologies - microarray
and RNA-seq.
- Lab 5A: Fundamental Microarray Data Analysis
[S4-class Tutorial for R]
[How to Build R Packages]
[Slides]
[Practicals]
Readings:microarray.bib
- Lab 5B: RNA-Seq Data Analysis
[Slides][Tutorials[Exercises]
Readings: rnaseq.bib
Section IV: Phylogenetic analysis
In this section, you will study the related phylogetic reconstruction
techniques - Distance-based, Likelihood-based and Bayesian methods for
phylogenetic studies.
Section V: Statistical Genetics/Genomics
- Lab 12A: Genome-wide Association Studies and Meta-analysis
[Lectures][GWAS][Meta-analysis][Plink Tutorial][PLINK Exercise][Data]
Practicals:
- Read the following papers and tutorials, and compose a review on the fixed-effect model and
random-effects model.
- Conduct the meta-analysis on a series of GWAS publications of your interest using R package
meta.
- The meta-analysis should include choice of fixed effect and random effects, forest plot,
heterogeneity analysis, sensitivity analysis, publication bias analysis, etc.
Readings: genetics.bib, meta-analysis.bib,
rare-variants.bib, PLINK tutorial
- A basic introduction to fixed-effect and random-effects models for meta-analysis
- Meta-analysis: Fixed-effect vs. Random-effects Model
- The meta-analysis of genome-wide association studies
- Meta-analysis in genome-wide association studies
- Meta-analysis of genome-wide association studies identifies eight new loci for type 2 diabetes in east Asians
- Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson's disease
- Meta-analysis methods for genome-wide association studies and beyond
- Lab 12B: Statistical Methods for eQTL Analysis
[Dataset][Exercises]
Reading: eqtl.bib
Section VI: Metagenomics, Microbiome and Multivariate Statistics
- Lab 13: Amplicon Sequencing Analysis
[Slides][Tutorials][Exercises]
Readings:
Improved detection of changes in species richness in high-diversity microbial communities
Ecological studies of the microbial communities.
- Lab 14: Multivariate statistics for Microbiome Studies
[Lectures][Vegan Tutorials][Vegan Tutorial (Chinese)][Exercises]
Readings: microbiome.bib
Section VII: Scientific writing and Presentations
- Lab 15: Writing a scientific essay and presentation
[Tutorial][Other Materials]
|