Introduction to Computational Molecular Biology:
Genome and Protein Sequence Analysis
(Winter Quarter 2017)
Assignment 1, due Sunday Jan 15
Assignment 2, due Sunday Jan 22
Assignment 3, due Sunday Jan 29
Assignment 4, due Sunday Feb 5
Assignment 5, due Sunday Feb 12
Assignment 6, due Sunday Feb 19
Assignment 7, due Sunday Feb 26
Assignment 8, due Sunday Mar 5
Assignment 9, due Sunday Mar 12
SYLLABUS & LECTURE SLIDES:
Nature paper on Avida
Avida web site
Nature paper on human genome sequence
Nature paper on mouse genome sequence
Siepel et al. paper on PhyloHMMs & sequence conservation
Rabiner tutorial on HMMs
HMM scaling tutorial (Tobias Mann)
Supervised learning tutorial
- Biological Review : Gene and genome structure in prokaryotes and eukaryotes; the genetic code & codon usage; "global" genome organization. Sources and characteristics of sequence data; Genbank and other sequence databases.
- Lecture 1: Course overview.
- Lecture 2: Finding exact matches in sequences using suffix arrays. Algorithmic complexity. Directed graphs. Reading: Durbin et al. section 2.1, 2.2, 2.3.
- Discussion Section 1: HW1 and general programming tips.
- Lecture 3: Depth structure of directed acyclic graphs (DAGs); trees and linked lists. Dynamic programming on weighted DAGs. Reading: Durbin et al. 2.4, 2.5, 2.6.
- Lecture 4: Dynamic programming on weighted DAGs. Maximal-scoring sequence segments. Reading: Durbin et al. 6.1, 6.2, 6.3; Ewens & Grant 1.1, 1.2, 1.12, 3.1, 3.2, 3.4, 3.6, 5.2, 9.1, 9.2
- Discussion Section 2: HW1, HW2, minimum path algorithms, and memoization.
- Lecture 5: Maximal-scoring sequence segments. Reading: Ewens & Grant 5.3.1, 5.3.2, 12.1, 12.2, 12.3; Durbin et al. chapter 3
- Lecture 6: D-segments, relationship to 2-state HMMs. Sequence alignment.
- Discussion Section 3: HW1 comments, HW2 questions, D-segment algorithm, and coding guidelines.
- Lecture 7: Edit graphs & sequence alignment. Smith-Waterman algorithm. Needleman-Wunsch algorithm. Local vs. global. Reading: Ewens & Grant 12.2, 12.3, 1.14, Appendix B.10; Durbin et al. chapter 3
- Lecture 8: Multiple sequence alignment. Linear space algorithms.
- Discussion Section 4: HW2 comments, HW3 questions, edit graph optimization, and data structures.
- Lecture 9: Better scoring models: General & affine gap penalties; profiles. Smith-Waterman special cases.
- Lecture 10: Word nucleation approaches/BLAST. Probability models on sequences; review of basic probability theory: probability spaces, conditional probabilities, independence. Failure of equal frequency assumption for DNA.
- Discussion Section 5: HW4 questions, BLAST, and the Stack vs. the Heap.
- Lecture 11: Site models. Site model examples: 3' splice sites, 5' splice sites, protein motifs. Site probability models. Comparing alternative models. Neyman-Pearson lemma. Weight matrices for site models. Weight matrices for splice sites in C. elegans.
- Lecture 12: Score distributions. Limitations of site models (variable spacing, non-independence). Hidden Markov Models: introduction Reading: Siepel et al.
- Discussion Section 6: HW5 tips and questions, motif-finding algorithms, and valgrind.
- Lecture 13: Hidden Markov Models: formal definition. HMM examples: -- splice sites; 2-state models; 7-state prokaryote genome model.
- Lecture 14: Probabilities of sequences. Computing HMM probabilities via associated WDAG.HMM Parameter estimation: Viterbi training.
- Discussion Section 7: Viterbi training, HW6 tips, GENSCAN, and amortized analysis.
- Lecture 15: Forward-backward algorithm. Baum-Welch (EM) algorithm; techniques for finding global maxima in likelihood surface.
- Lecture 16: Detection of evolutionarily conserved regions using Phylo-HMMs.
- Discussion Section 8: Baum-Welch, alternative update calculations, and NP-complete proofs.
- Lecture 17: Detection of evolutionarily conserved regions using Phylo-HMMs (cont'd).
- Lecture 18: Detection of evolutionarily conserved regions using Phylo-HMMs (cont'd). Distribution of maximal segment scores in background sequence. Karlin-Altschul theory.
- Discussion Section 9: HW 8, Expectation-Maximization, and Markov Chain Monte Carlo (MCMC).
- Lecture 19(written portion notes): Brief introduction to microbiome research and a simple algorithm for designing minimal microbial communities with a specified metabolic capacity.
- Lecture 20: Karlin-Altschul theory (cont'd). Information theory.
- Discussion Section 10: HW 9, regular expressions, sed, AWK, and Git (solo and collaborative).
C/C++ PROGRAMMING GUIDES:
OTHER RELEVANT COURSES AT UW:
COMPUTATIONAL BIOLOGY COURSES AT OTHER SITES: