Jumat, 17 April 2009

III. Genomics and Computational Biology

HST.508 Genomics and Computational Biology

Fall 2002


Study Materials

Perl Guides
Getting started with Perl (Win) (PDF)
Installing and Configuring MacPerl
Perl Guide
Other Tutorials
Other


Next-generation DNA sequencing technology.
Next-generation DNA sequencing technology from University of California, Berkeley. (Image courtesy of the U.S. Department of Energy Genomes to Life Program.)

Course Highlights

Audio of each lecture session, in addition to downloadable problem sets and lecture notes, are available for this course.

Course Description

This course will assess the relationships among sequence, structure, and function in complex biological networks as well as progress in realistic modeling of quantitative, comprehensive, functional genomics analyses. Exercises will include algorithmic, statistical, database, and simulation approaches and practical applications to medicine, biotechnology, drug discovery, and genetic engineering. Future opportunities and current limitations will be critically addressed. In addition to the regular lecture sessions, supplementary sections are scheduled to address issues related to Perl, Mathematica and biology.

Special Features

Technical Requirements

Microsoft® Excel software is recommended for viewing the .xls files found on this course site. Free Microsoft® Excel viewer software can also be used to view the .xls files.
Mathematica® software is required to run the .nb files found on this course site.
Media player software, such as Quicktime® Player, RealOne™ Player, or Windows Media® Player, is required to run the .mp3 files found on this course site.




RealOne™ is a trademark or a registered trademark of RealNetworks, Inc.
Microsoft® is a registered trademark or trademark of Microsoft Corporation in the U.S. and/or other countries.
Mathematica® is a registered trademark of Wolfram Research, Inc.
QuickTime® is a trademark of Apple Computer, Inc., registered in the U.S. and other countries.
Windows Media® is a registered trademark or trademark of Microsoft Corporation in the U.S. and/or other countries.

Syllabus

Prerequisites

Introductory courses in biology, computer science, and statistics. If you have any doubt about whether you have the equivalent experience, you should attend the appropriate sections which will focus on catching up with extra sections supplementing catch-up topics in greatest demand.

Course Organization

In addition to the lectures there will be section discussion meetings at the days and times above. Students will participate in at least one of those sections and form problem-solving and project teams consisting of at least one biology expert and one math/computer/engineering expert (or two to four people knowledgeable in both disciplines). Separate section on the basics of programming and molecular biology will be available in the first few weeks to even out the expected wide variation in backgrounds. Grades will be based on six problem sets, one final project, and participation in discussion sections covering the problems and about one scientific article per week. Your time commitment will be about 4 hours for classes and 6 to 12 additional hours per week, depending on background.

Lecture Notes

In addition to downloadable lecture notes, audio files of each lecture are provided below. Media player software, such as Quicktime® Player, RealOne™ Player, or Windows Media® Player, is required to run the .mp3 files in this section.

These files are also available for download from iTunes®.





LEC #


TOPICS







1


Intro 1: Computational Side of Computational Biology. Statistics; Perl, Mathematica (PDF);
(MP3-Part 1 - 14.1 MB) (MP3-Part 2 - 10.9 MB)








2


Intro 2: Biological Side of Computational Biology. Comparative Genomics, Models & Applications (PDF - 1.2 MB);
(MP3-Part - 13.6 MB) (MP3-Part 2 - 10.9 MB)








3


DNA 1: Genome Sequencing, Polymorphisms, Populations, Statistics, Pharmacogenomics; Databases (PDF);
(MP3-Part 1 - 13.5 MB) (MP3-Part 2 - 10.6 MB)








4


DNA 2: Dynamic Programming, Blast, Multi-alignment, HiddenMarkovModels (PDF);
(MP3-Part 1 - 12.6 MB) (MP3-Part 2 - 11.4 MB)








5


RNA 1: Microarrays, Library Sequencing and Quantitation Concepts (PDF);
(MP3-Part 1 - 13.4 MB) (MP3-Part 2 - 11.7 MB)








6


RNA 2: Clustering by Gene or Condition and Other Regulon Data Sources Nucleic Acid Motifs; The Nature of Biological "proofs" (PDF);
(MP3-Part 1 - 12.9 MB) (MP3-Part 2 - 10 MB)








7


Protein 1: 3D Structural Genomics, Homology, Catalytic and Regulatory Dynamics, Function & Drug Design (PDF - 1.0 MB);
(MP3-Part 1 - 14.1 MB) (MP3-Part 2 - 9.7 MB)








8


Protein 2: Mass Spectrometry, Post-synthetic Modifications, Quantitation of Proteins, Metabolites, & Interactions (PDF);
(MP3-Part 1 - 13.2 MB) (MP3-Part 2 - 11.7 MB)








9


Networks 1: Systems Biology, Metabolic Kinetic & Flux Balance Optimization Methods (PDF);
(MP3 - 12.4 MB)








10


Networks 2: Molecular Computing, Self-assembly, Genetic Algorithms, Neural Networks (PDF);
(MP3-Part 1 - 10.6 MB) (MP3-Part 2 - 12.3 MB)








11


Networks 3: The Future of Computational Biology: Cellular, Developmental, Social, Ecological & Commercial Models (PDF);
(MP3-Part 1 - 14.1 MB) (MP3-Part 2 - 7.2 MB)







RealOne™ is a trademark or a registered trademark of RealNetworks, Inc.
QuickTime® is a trademark of Apple Computer, Inc., registered in the U.S. and other countries.
Windows Media® is a registered trademark or trademark of Microsoft Corporation in the U.S. and/or other countries.

Projects

Project Background
The full written project in the form of an article or grant proposal as well as the figures ready for (Microsoft® Powerpoint® or Microsoft® Word) presentation is due the day of lecture 12. We recommended that you start choosing a topic and team before the end of lecture 7.

The oral presentation will be limited to 6 minutes per person on each project team. This will give us 2 minutes (per person) for questions at the end. The presentations will be loaded on the computer in order of the schedule below, unless special requests are made.

Each team must have at least one computational "result". This can be as simple as checking a table in a published article or as complex as a new computational-biology algorithm and associated graphics.

There should be critical assessment of at least one previous relevant article.

Please cite and link pubmed or web references wherever possible.

The role that each member played in the team should be clearly stated in the written version. Each team member should present a substantial contribution orally, not merely introduce the final speaker(s).

The overall course grade will be 12% per problem set and 28% for the project.

The late policy is 5% (of 100%) off per day after the deadline of lecture 12 at noon. (If you are in the first group, you should get your slides emailed to us and confirm functioning in our hands by the end of lecture 12.)

Grading Rubric
This rubric is designed to be as explicit as possible to ensure that all students are graded consistently. Each component of the project will be graded on a scale from 1 to 5. The scale is explicitly defined for each component but is roughly as follows: 1 = poor, 2 = needs improvement, 3 = good, 4 = excellent, 5 = outstanding. Download complete rubric file. (PDF)
2002 Project Topic Ideas
  1. Protein-Protein Interactions: Network Structures.

  2. To correlate microarray data with the promoter site consensus sequence for a specific transcription factor.

  3. Genomic analysis of parasitic human pathogens, particularly Plasmodium falciparum, and Leishmania major.

  4. Simulation of the recombination of antibody genes by using perl to predict the amino acid sequences of the variable region of the antibody.

  5. Dynamic Programming analysis of Th2 chemokine receptors and ligands nucleotide and protein sequences.

  6. The Determination of a General Set of Fine-Grained Selection Criteria for the Discovery of siRNA in Humans.

  7. How to we manage cases in which conflicting, contradicting or "speculative" functional predictions are contributed by the various information sources used to build a network model.

  8. Develop an engine (or program) to predict protein function on context basis (non-homologous approach).

  9. TNF Receptor Biomining.

  10. Comparing Variable Selection Methods for Microarray Classification Models Based on Logistic Regression.

  11. Using the Index of Coincidence to identify Open Reading Frames.

  12. Transcriptional control mediated by cleansing of short sequences from gene regulatory regions.

  13. Software solution that provides a visual interface to nucleotide mutations.

  14. Identification of Potential Transcriptional Regulatory Elements by Comparison of Human and Pufferfish Genomic Sequences.

  15. Overlaying Clustering Results from PCA with Clustering Results from Self-Organizing Maps.



Microsoft® and PowerPoint® are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.