A Short Course on Statistical
Bioinformatics
presented by Wally Gilks (Centre for
Statistical Bioinformatics, University of Leeds)
An organism's DNA, including its genes, holds almost all the
information required for its development and function. Human
understanding of this information is at an early stage, but is
accumulating rapidly due to new high-throughput forms of
experimentation. This has led to large and rapidly expanding
databases of DNA sequence, and related databases of the structure
and function of biomolecules such as proteins. Bioinformatics is
concerned with the development of these databases, and tools for
deciphering and exploiting the information they contain.
The links from bioinformatic data to underlying biology are noisy and uncertain. For example, knowing an organism's DNA sequence doesn't clearly identify where the genes are located within it; knowing the sequence of a gene doesn't tell us the 3-dimensional structure of the protein encoded by it; and knowing a protein's structure may give us only weak clues as to its function. In each case, however, bioinformatic methods have been developed to exploit such information.
Statistics is a science for extracting meaning from noisy, uncertain information. It would seem natural, therefore, to approach bioinformatic questions from a statistical perspective. Indeed, there is now a considerable statistical literature on gene-expression (microarray) data analysis. However, many other areas of bioinformatics have so far had little exposure to statististical thinking. This is partly a consequence of the shortage of statisticians with an adequate background in molecular and cell biology.
This short course will provide a brief introduction to some biological fundamentals, some bioinformatic questions, and some statistical solutions. These will be presented in four lectures:
" DNA sequence analysis: discovering what lies lurking in the genome
" Biomolecular structure: predicting the shape of proteins and RNA molecules
" Gene-expression: finding out how genes control and are controlled
" Phylogenomics: uncovering evolutionary relationships between organisms or genes.
The aim will be to convey some appreciation of the excitement and importance of this new scientific frontier, and the challenges and opportunities it presents to both methodological and applied statisticians.
